Implement homograph numbers and improve entry search/sort behavior by myieye · Pull Request #2220 · sillsdev/languageforge-lexbox

myieye · 2026-03-24T16:03:38Z

Summary

This PR adds homograph numbers to MiniLcm.

Homograph numbers are now respected when sorting.

Regarding homograph number assignment
The FwData implementation currently relies on LibLCM to calculate and assign homograph numbers.
The CRDT implementation has a rudimentary implementation that only handles newly created entries as well as promoting potential pre-existing duplicates. However, an added sync test demonstrates that "broken" homograph numbers in CRDT-land are "repaired" after two syncs which round-trips them through LibLcm's auto-assignment.

coderabbitai · 2026-03-24T16:03:48Z

📝 Walkthrough

Walkthrough

This PR implements homograph number support to track and distinguish multiple entries sharing the same headword. It adds the HomographNumber property to Entry models, implements auto-assignment logic in CRDT, updates database schema, modifies sorting/search ordering, and includes comprehensive tests across both CRDT and FwData layers.

Changes

Cohort / File(s)	Summary
API Property Mapping `backend/FwLite/FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.cs`, `backend/FwLite/FwDataMiniLcmBridge/Api/UpdateProxy/UpdateEntryProxy.cs`, `backend/FwLite/MiniLcm/Models/Entry.cs`, `frontend/viewer/src/lib/dotnet-types/generated-types/MiniLcm/Models/IEntry.ts`	Added `HomographNumber` property to Entry models and proxies; mapped from LibLCM entries and exposed in frontend types; conditionally assigns when importing from FwData.
Homograph Auto-Assignment Logic `backend/FwLite/LcmCrdt/CrdtMiniLcmApi.cs`	Added `AssignHomographNumber` helper to auto-assign homograph numbers during entry creation; derives secondary ordering from morph type, queries existing entries, patches first zero entry to 1, assigns incrementing numbers to new entries.
Sorting & Full-Text Search `backend/FwLite/FwDataMiniLcmBridge/Api/Sorting.cs`, `backend/FwLite/LcmCrdt/Data/Sorting.cs`, `backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs`	Enabled `HomographNumber` as deterministic tie-breaker in headword ordering and best-match ranking across all three sorting implementations; re-enabled previously commented-out homograph tie-breaks.
Change Type & Sync `backend/FwLite/LcmCrdt/Changes/CreateEntryChange.cs`, `backend/FwLite/MiniLcm/SyncHelpers/EntrySync.cs`	Added `HomographNumber` property to `CreateEntryChange` for persistence; added JSON patch operation generation when homograph numbers differ during sync.
Database Schema & Migrations `backend/FwLite/LcmCrdt/Migrations/20260409130907_AddHomographNumbers.*`, `backend/FwLite/LcmCrdt/Migrations/LcmCrdtDbContextModelSnapshot.cs`	Added EF Core migration to create `HomographNumber` INTEGER column with default 0 on Entry table; updated model snapshot with new property configuration.
Test Data Verification `backend/FwLite/LcmCrdt.Tests/Changes/ChangeDeserializationRegressionData.`, `backend/FwLite/LcmCrdt.Tests/Data/MigrationTests_FromScriptedDb..verified.`, `backend/FwLite/LcmCrdt.Tests/Data/SnapshotDeserializationRegressionData.`, `backend/FwLite/LcmCrdt.Tests/Data/VerifyRegeneratedSnapshotsAfterMigrationFromScriptedDb.*`, `backend/FwLite/LcmCrdt.Tests/DataModelSnapshotTests.VerifyDbModel.verified.txt`	Updated all verified snapshot and test data files to include `HomographNumber` field (typically 0 for existing entries); added new entry fixture with homograph number 4.
Unit & Integration Tests `backend/FwLite/MiniLcm.Tests/CreateEntryTestsBase.cs`, `backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs`, `backend/FwLite/FwLiteProjectSync.Tests/SyncTests.cs`	Added five new homograph auto-assignment tests validating incrementation, explicit values, morph-type grouping, citation-form grouping, and multi-entry scenarios; added integration test verifying homograph correction through two-sync cycles; updated sorting test fixtures with explicit homograph variants.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Make headword matches trump fts rank #2112: Modifies EntrySearchService.FilterAndRank ordering logic to incorporate homograph-number tie-breaking.
Sort prefix matches above other contain matches #2163: Adds homograph-number tie-breakers to Sorting.ApplyHeadwordOrder and ApplyRoughBestMatchOrder methods.
Add morph types to MiniLcm #1857: Updates API mapping in FwDataMiniLcmApi.FromLexEntry and CreateEntry to handle additional Entry field mappings.

Suggested labels

📦 Lexbox

Suggested reviewers

rmunn

Poem

🐰 A homograph's number now takes its place,
Distinguishing entries with matching face,
Through sorting and sync they journey along,
Each one assigned where it belongs!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.21% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately describes the main changes: implementing homograph numbers and improving entry search/sort behavior, which align with the core objectives.
Description check	✅ Passed	The PR description clearly explains the addition of homograph numbers to MiniLcm, how they are respected in sorting, and implementation details for both FwData and CRDT.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/add-homograph-numbers-V1a05

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Implement CRUD for MorphTypeData, add DB migration Make LeadingToken and TrailingToken nullable Register JsonPatchChange for morphTypeData Reject MorphTypeData updates that change MorphType Do not allow ANY JsonPatch changes to MorphType Rename MorphType enum to MorphTypeKind Rename MorphTypeData to MorphType LeadingToken -> Prefix, TrailingToken -> Postfix Matches the names from liblcm, which may help reduce confusion.

* Add filtering on token aware headwords * Respect SecondaryOrder when sorting * Make FTS Headword column contain morph-tokens and all vernacular WS's

github-actions · 2026-04-09T12:37:23Z

UI unit Tests

1 files ±0 59 suites ±0 37s ⏱️ +5s
176 tests ±0 176 ✅ ±0 0 💤 ±0 0 ❌ ±0
245 runs ±0 245 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 08c9430. ± Comparison against base commit 4028077.

♻️ This comment has been updated with latest results.

argos-ci · 2026-04-09T12:39:05Z

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build	Status	Details	Updated (UTC)
default (Inspect)	⚠️ Changes detected (Review)	6 changed	May 5, 2026, 11:43 AM

* Initial work on morph types in UI Morph types now show leading/trailing tokens in headword, but do not yet have a dropdown for editing them in the entry UI. * Citation forms should not be decorated Lexeme forms should be decorated with prefix/postfix tokens according to the morph type, but citation forms are meant as "overrides" and should be reproduced exactly as-is, without morph type tokens. This is the rule used by FLEx for how it displays words, so FW Lite should do the same. As a bonus, there is now only one `headword` function in the writing system service, instead of two functions with the same name that did two slightly different things. * Fix tests --------- Co-authored-by: Tim Haasdyk <tim_haasdyk@sil.org>

myieye · 2026-04-09T15:46:30Z

@coderabbitai full review

coderabbitai · 2026-04-09T15:46:37Z

✅ Actions performed

Full review triggered.

* Seed canonical morph-types into CRDT projects - Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM) - Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes - Seed morph-types for existing projects in MigrateDb (before FTS refresh) - Add EF migration to clear FTS table so headwords are rebuilt with morph tokens - Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates * Stop creating morph-types in tests. They're now prepopulated * Stop printing verify diff content. It's too much. * Seed morph types before API testing * Add descriptions to canonical morph types * Sync morph-types when importing, because they already exist in CRDT * Verify our canonical morph-types match new fwdata projects * Fix non-FTS relevance order with morph-tokens in query

Add HomographNumber (int, 0 = unset) to the Entry model with full round-trip support through CRDT, FwData bridge, and sync. Key changes: - Entry model: add HomographNumber property with Copy() support - CreateEntryChange: persist HomographNumber in CRDT changes - CrdtMiniLcmApi: auto-assign homograph numbers on entry creation when HomographNumber is 0, respecting SecondaryOrder scoping. Updates existing lone entries from 0→1 when a second homograph appears. - FwDataMiniLcmApi: read HomographNumber from ILexEntry, set on create - UpdateEntryProxy: bidirectional HomographNumber sync to LibLCM - EntrySync: include HomographNumber in diff/patch operations - Sorting: uncomment HomographNumber in CRDT sort and search queries - Tests: uncomment sorting tests with HomographNumber, add auto- assignment tests, add sync test verifying LibLCM corrects numbers after entry deletion via two sync cycles https://claude.ai/code/session_01FJj2v135u6KdgVxoK4tRp2

jasonleenaylor · 2026-05-05T16:50:05Z

+            ?? stemOrder.FirstOrDefault();
+
+        var matchingEntries = await (
+            from e in repo.Entries


Does this exclude the deleted entries?

Yes. We mostly just read from Harmony's "projected tables" i.e. the clean relational ef-core database that Harmony projects its latest un-deleted snapshots to.

myieye · 2026-05-07T06:08:29Z

+                    // For now, we ALWAYS defer to LibLCM's auto-handling (triggered by setting LexemeForm/CitationForm)
+                    // This ensures that FwData/LibLCM will correct broken homograph numbering caused by the incomplete CRDT implementation.
+                    // if (entry.HomographNumber != 0)
+                    //     lexEntry.HomographNumber = entry.HomographNumber;


I'm now thinking that instead of commenting this out, we should set the homograph number and then explicitly ask LibLCM to cleanup the homograph numbers.
I assume that's possible. Then if it's valid it will work as requested and if it's not it'll get fixed.

github-actions Bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Mar 24, 2026

myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from 4246a62 to 2e82fa6 Compare March 27, 2026 16:00

Base automatically changed from claude/add-lexeme-headwords-TowRX to feat/sync-morph-types March 28, 2026 09:14

myieye force-pushed the feat/sync-morph-types branch from e9f7255 to 6af9827 Compare April 7, 2026 15:30

rmunn and others added 2 commits April 9, 2026 11:53

Respect entry morph-type when filtering and sorting (#2202)

7076b9e

* Add filtering on token aware headwords * Respect SecondaryOrder when sorting * Make FTS Headword column contain morph-tokens and all vernacular WS's

myieye force-pushed the feat/sync-morph-types branch 2 times, most recently from e3ec8c7 to 87c5dad Compare April 9, 2026 12:24

myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from 8d2c6c6 to b95da37 Compare April 9, 2026 12:35

myieye force-pushed the feat/sync-morph-types branch from 87c5dad to 8ed192f Compare April 9, 2026 15:24

myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from aaf00bb to 0db4516 Compare April 9, 2026 15:42

myieye marked this pull request as ready for review April 9, 2026 15:46

myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from 4be17e9 to 77a861f Compare April 10, 2026 10:34

myieye force-pushed the feat/sync-morph-types branch from 8ed192f to 374b1b2 Compare April 10, 2026 11:32

myieye added 2 commits April 10, 2026 13:41

Remove CreateMorphType and DeleteMorphType from MiniLcm API (#2221)

269be98

Add morph-type tests (#1850)

4028077

myieye force-pushed the claude/add-homograph-numbers-V1a05 branch 4 times, most recently from 8f0178b to 6b63086 Compare May 4, 2026 14:31

myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from 6b63086 to 654a29b Compare May 4, 2026 15:24

Adapt tests to fwdata ignoring passed homograph numbers

08c9430

myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from 441aad1 to 08c9430 Compare May 5, 2026 11:40

jasonleenaylor reviewed May 5, 2026

View reviewed changes

myieye commented May 7, 2026

View reviewed changes

rmunn force-pushed the feat/sync-morph-types branch from 4028077 to 9683cf5 Compare May 8, 2026 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement homograph numbers and improve entry search/sort behavior#2220

Implement homograph numbers and improve entry search/sort behavior#2220
myieye wants to merge 8 commits intofeat/sync-morph-typesfrom
claude/add-homograph-numbers-V1a05

myieye commented Mar 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

argos-ci Bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

myieye commented Apr 9, 2026

Uh oh!

coderabbitai Bot commented Apr 9, 2026

Uh oh!

jasonleenaylor May 5, 2026

Uh oh!

myieye May 6, 2026

Uh oh!

myieye May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

myieye commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

coderabbitai Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

UI unit Tests

Uh oh!

argos-ci Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myieye commented Apr 9, 2026

Uh oh!

coderabbitai Bot commented Apr 9, 2026

Uh oh!

jasonleenaylor May 5, 2026

Choose a reason for hiding this comment

Uh oh!

myieye May 6, 2026

Choose a reason for hiding this comment

Uh oh!

myieye May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

myieye commented Mar 24, 2026 •

edited

Loading

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading

github-actions Bot commented Apr 9, 2026 •

edited

Loading

argos-ci Bot commented Apr 9, 2026 •

edited

Loading