Implement homograph numbers and improve entry search/sort behavior#2220
Implement homograph numbers and improve entry search/sort behavior#2220myieye wants to merge 8 commits intofeat/sync-morph-typesfrom
Conversation
📝 WalkthroughWalkthroughThis PR implements homograph number support to track and distinguish multiple entries sharing the same headword. It adds the Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
4246a62 to
2e82fa6
Compare
e9f7255 to
6af9827
Compare
Implement CRUD for MorphTypeData, add DB migration Make LeadingToken and TrailingToken nullable Register JsonPatchChange for morphTypeData Reject MorphTypeData updates that change MorphType Do not allow ANY JsonPatch changes to MorphType Rename MorphType enum to MorphTypeKind Rename MorphTypeData to MorphType LeadingToken -> Prefix, TrailingToken -> Postfix Matches the names from liblcm, which may help reduce confusion.
* Add filtering on token aware headwords * Respect SecondaryOrder when sorting * Make FTS Headword column contain morph-tokens and all vernacular WS's
e3ec8c7 to
87c5dad
Compare
8d2c6c6 to
b95da37
Compare
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
* Initial work on morph types in UI Morph types now show leading/trailing tokens in headword, but do not yet have a dropdown for editing them in the entry UI. * Citation forms should not be decorated Lexeme forms should be decorated with prefix/postfix tokens according to the morph type, but citation forms are meant as "overrides" and should be reproduced exactly as-is, without morph type tokens. This is the rule used by FLEx for how it displays words, so FW Lite should do the same. As a bonus, there is now only one `headword` function in the writing system service, instead of two functions with the same name that did two slightly different things. * Fix tests --------- Co-authored-by: Tim Haasdyk <tim_haasdyk@sil.org>
87c5dad to
8ed192f
Compare
aaf00bb to
0db4516
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
4be17e9 to
77a861f
Compare
* Seed canonical morph-types into CRDT projects - Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM) - Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes - Seed morph-types for existing projects in MigrateDb (before FTS refresh) - Add EF migration to clear FTS table so headwords are rebuilt with morph tokens - Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates * Stop creating morph-types in tests. They're now prepopulated * Stop printing verify diff content. It's too much. * Seed morph types before API testing * Add descriptions to canonical morph types * Sync morph-types when importing, because they already exist in CRDT * Verify our canonical morph-types match new fwdata projects * Fix non-FTS relevance order with morph-tokens in query
8ed192f to
374b1b2
Compare
8f0178b to
6b63086
Compare
Add HomographNumber (int, 0 = unset) to the Entry model with full round-trip support through CRDT, FwData bridge, and sync. Key changes: - Entry model: add HomographNumber property with Copy() support - CreateEntryChange: persist HomographNumber in CRDT changes - CrdtMiniLcmApi: auto-assign homograph numbers on entry creation when HomographNumber is 0, respecting SecondaryOrder scoping. Updates existing lone entries from 0→1 when a second homograph appears. - FwDataMiniLcmApi: read HomographNumber from ILexEntry, set on create - UpdateEntryProxy: bidirectional HomographNumber sync to LibLCM - EntrySync: include HomographNumber in diff/patch operations - Sorting: uncomment HomographNumber in CRDT sort and search queries - Tests: uncomment sorting tests with HomographNumber, add auto- assignment tests, add sync test verifying LibLCM corrects numbers after entry deletion via two sync cycles https://claude.ai/code/session_01FJj2v135u6KdgVxoK4tRp2
6b63086 to
654a29b
Compare
441aad1 to
08c9430
Compare
| ?? stemOrder.FirstOrDefault(); | ||
|
|
||
| var matchingEntries = await ( | ||
| from e in repo.Entries |
There was a problem hiding this comment.
Does this exclude the deleted entries?
There was a problem hiding this comment.
Yes. We mostly just read from Harmony's "projected tables" i.e. the clean relational ef-core database that Harmony projects its latest un-deleted snapshots to.
| // For now, we ALWAYS defer to LibLCM's auto-handling (triggered by setting LexemeForm/CitationForm) | ||
| // This ensures that FwData/LibLCM will correct broken homograph numbering caused by the incomplete CRDT implementation. | ||
| // if (entry.HomographNumber != 0) | ||
| // lexEntry.HomographNumber = entry.HomographNumber; |
There was a problem hiding this comment.
I'm now thinking that instead of commenting this out, we should set the homograph number and then explicitly ask LibLCM to cleanup the homograph numbers.
I assume that's possible. Then if it's valid it will work as requested and if it's not it'll get fixed.
4028077 to
9683cf5
Compare
Summary
This PR adds homograph numbers to MiniLcm.
Homograph numbers are now respected when sorting.
Regarding homograph number assignment
The FwData implementation currently relies on LibLCM to calculate and assign homograph numbers.
The CRDT implementation has a rudimentary implementation that only handles newly created entries as well as promoting potential pre-existing duplicates. However, an added sync test demonstrates that "broken" homograph numbers in CRDT-land are "repaired" after two syncs which round-trips them through LibLcm's auto-assignment.