Skip to content

Implement homograph numbers and improve entry search/sort behavior#2220

Open
myieye wants to merge 8 commits intofeat/sync-morph-typesfrom
claude/add-homograph-numbers-V1a05
Open

Implement homograph numbers and improve entry search/sort behavior#2220
myieye wants to merge 8 commits intofeat/sync-morph-typesfrom
claude/add-homograph-numbers-V1a05

Conversation

@myieye
Copy link
Copy Markdown
Collaborator

@myieye myieye commented Mar 24, 2026

Summary

This PR adds homograph numbers to MiniLcm.

Homograph numbers are now respected when sorting.

Regarding homograph number assignment
The FwData implementation currently relies on LibLCM to calculate and assign homograph numbers.
The CRDT implementation has a rudimentary implementation that only handles newly created entries as well as promoting potential pre-existing duplicates. However, an added sync test demonstrates that "broken" homograph numbers in CRDT-land are "repaired" after two syncs which round-trips them through LibLcm's auto-assignment.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

This PR implements homograph number support to track and distinguish multiple entries sharing the same headword. It adds the HomographNumber property to Entry models, implements auto-assignment logic in CRDT, updates database schema, modifies sorting/search ordering, and includes comprehensive tests across both CRDT and FwData layers.

Changes

Cohort / File(s) Summary
API Property Mapping
backend/FwLite/FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.cs, backend/FwLite/FwDataMiniLcmBridge/Api/UpdateProxy/UpdateEntryProxy.cs, backend/FwLite/MiniLcm/Models/Entry.cs, frontend/viewer/src/lib/dotnet-types/generated-types/MiniLcm/Models/IEntry.ts
Added HomographNumber property to Entry models and proxies; mapped from LibLCM entries and exposed in frontend types; conditionally assigns when importing from FwData.
Homograph Auto-Assignment Logic
backend/FwLite/LcmCrdt/CrdtMiniLcmApi.cs
Added AssignHomographNumber helper to auto-assign homograph numbers during entry creation; derives secondary ordering from morph type, queries existing entries, patches first zero entry to 1, assigns incrementing numbers to new entries.
Sorting & Full-Text Search
backend/FwLite/FwDataMiniLcmBridge/Api/Sorting.cs, backend/FwLite/LcmCrdt/Data/Sorting.cs, backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs
Enabled HomographNumber as deterministic tie-breaker in headword ordering and best-match ranking across all three sorting implementations; re-enabled previously commented-out homograph tie-breaks.
Change Type & Sync
backend/FwLite/LcmCrdt/Changes/CreateEntryChange.cs, backend/FwLite/MiniLcm/SyncHelpers/EntrySync.cs
Added HomographNumber property to CreateEntryChange for persistence; added JSON patch operation generation when homograph numbers differ during sync.
Database Schema & Migrations
backend/FwLite/LcmCrdt/Migrations/20260409130907_AddHomographNumbers.*, backend/FwLite/LcmCrdt/Migrations/LcmCrdtDbContextModelSnapshot.cs
Added EF Core migration to create HomographNumber INTEGER column with default 0 on Entry table; updated model snapshot with new property configuration.
Test Data Verification
backend/FwLite/LcmCrdt.Tests/Changes/ChangeDeserializationRegressionData.*, backend/FwLite/LcmCrdt.Tests/Data/MigrationTests_FromScriptedDb.*.verified.*, backend/FwLite/LcmCrdt.Tests/Data/SnapshotDeserializationRegressionData.*, backend/FwLite/LcmCrdt.Tests/Data/VerifyRegeneratedSnapshotsAfterMigrationFromScriptedDb.*, backend/FwLite/LcmCrdt.Tests/DataModelSnapshotTests.VerifyDbModel.verified.txt
Updated all verified snapshot and test data files to include HomographNumber field (typically 0 for existing entries); added new entry fixture with homograph number 4.
Unit & Integration Tests
backend/FwLite/MiniLcm.Tests/CreateEntryTestsBase.cs, backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs, backend/FwLite/FwLiteProjectSync.Tests/SyncTests.cs
Added five new homograph auto-assignment tests validating incrementation, explicit values, morph-type grouping, citation-form grouping, and multi-entry scenarios; added integration test verifying homograph correction through two-sync cycles; updated sorting test fixtures with explicit homograph variants.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

📦 Lexbox

Suggested reviewers

  • rmunn

Poem

🐰 A homograph's number now takes its place,
Distinguishing entries with matching face,
Through sorting and sync they journey along,
Each one assigned where it belongs!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.21% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main changes: implementing homograph numbers and improving entry search/sort behavior, which align with the core objectives.
Description check ✅ Passed The PR description clearly explains the addition of homograph numbers to MiniLcm, how they are respected in sorting, and implementation details for both FwData and CRDT.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/add-homograph-numbers-V1a05

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Mar 24, 2026
@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from 4246a62 to 2e82fa6 Compare March 27, 2026 16:00
Base automatically changed from claude/add-lexeme-headwords-TowRX to feat/sync-morph-types March 28, 2026 09:14
@myieye myieye force-pushed the feat/sync-morph-types branch from e9f7255 to 6af9827 Compare April 7, 2026 15:30
rmunn and others added 2 commits April 9, 2026 11:53
Implement CRUD for MorphTypeData, add DB migration

Make LeadingToken and TrailingToken nullable

Register JsonPatchChange for morphTypeData

Reject MorphTypeData updates that change MorphType

Do not allow ANY JsonPatch changes to MorphType

Rename MorphType enum to MorphTypeKind

Rename MorphTypeData to MorphType

LeadingToken -> Prefix, TrailingToken -> Postfix
Matches the names from liblcm, which may help reduce confusion.
* Add filtering on token aware headwords
* Respect SecondaryOrder when sorting
* Make FTS Headword column contain morph-tokens and all vernacular WS's
@myieye myieye force-pushed the feat/sync-morph-types branch 2 times, most recently from e3ec8c7 to 87c5dad Compare April 9, 2026 12:24
@myieye myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from 8d2c6c6 to b95da37 Compare April 9, 2026 12:35
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

UI unit Tests

  1 files  ±0   59 suites  ±0   37s ⏱️ +5s
176 tests ±0  176 ✅ ±0  0 💤 ±0  0 ❌ ±0 
245 runs  ±0  245 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 08c9430. ± Comparison against base commit 4028077.

♻️ This comment has been updated with latest results.

@argos-ci
Copy link
Copy Markdown

argos-ci Bot commented Apr 9, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ⚠️ Changes detected (Review) 6 changed May 5, 2026, 11:43 AM

* Initial work on morph types in UI

Morph types now show leading/trailing tokens in headword, but do not yet
have a dropdown for editing them in the entry UI.

* Citation forms should not be decorated

Lexeme forms should be decorated with prefix/postfix tokens according to
the morph type, but citation forms are meant as "overrides" and should
be reproduced exactly as-is, without morph type tokens. This is the rule
used by FLEx for how it displays words, so FW Lite should do the same.

As a bonus, there is now only one `headword` function in the writing
system service, instead of two functions with the same name that did two
slightly different things.

* Fix tests

---------

Co-authored-by: Tim Haasdyk <tim_haasdyk@sil.org>
@myieye myieye force-pushed the feat/sync-morph-types branch from 87c5dad to 8ed192f Compare April 9, 2026 15:24
@myieye myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from aaf00bb to 0db4516 Compare April 9, 2026 15:42
@myieye myieye marked this pull request as ready for review April 9, 2026 15:46
@myieye
Copy link
Copy Markdown
Collaborator Author

myieye commented Apr 9, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

✅ Actions performed

Full review triggered.

@myieye myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from 4be17e9 to 77a861f Compare April 10, 2026 10:34
* Seed canonical morph-types into CRDT projects

- Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM)
- Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes
- Seed morph-types for existing projects in MigrateDb (before FTS refresh)
- Add EF migration to clear FTS table so headwords are rebuilt with morph tokens
- Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates

* Stop creating morph-types in tests. They're now prepopulated

* Stop printing verify diff content. It's too much.

* Seed morph types before API testing

* Add descriptions to canonical morph types

* Sync morph-types when importing, because they already exist in CRDT

* Verify our canonical morph-types match new fwdata projects

* Fix non-FTS relevance order with morph-tokens in query
@myieye myieye force-pushed the feat/sync-morph-types branch from 8ed192f to 374b1b2 Compare April 10, 2026 11:32
@myieye myieye force-pushed the claude/add-homograph-numbers-V1a05 branch 4 times, most recently from 8f0178b to 6b63086 Compare May 4, 2026 14:31
Add HomographNumber (int, 0 = unset) to the Entry model with full
round-trip support through CRDT, FwData bridge, and sync.

Key changes:
- Entry model: add HomographNumber property with Copy() support
- CreateEntryChange: persist HomographNumber in CRDT changes
- CrdtMiniLcmApi: auto-assign homograph numbers on entry creation
  when HomographNumber is 0, respecting SecondaryOrder scoping.
  Updates existing lone entries from 0→1 when a second homograph appears.
- FwDataMiniLcmApi: read HomographNumber from ILexEntry, set on create
- UpdateEntryProxy: bidirectional HomographNumber sync to LibLCM
- EntrySync: include HomographNumber in diff/patch operations
- Sorting: uncomment HomographNumber in CRDT sort and search queries
- Tests: uncomment sorting tests with HomographNumber, add auto-
  assignment tests, add sync test verifying LibLCM corrects numbers
  after entry deletion via two sync cycles

https://claude.ai/code/session_01FJj2v135u6KdgVxoK4tRp2
@myieye myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from 6b63086 to 654a29b Compare May 4, 2026 15:24
@myieye myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from 441aad1 to 08c9430 Compare May 5, 2026 11:40
?? stemOrder.FirstOrDefault();

var matchingEntries = await (
from e in repo.Entries
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this exclude the deleted entries?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We mostly just read from Harmony's "projected tables" i.e. the clean relational ef-core database that Harmony projects its latest un-deleted snapshots to.

// For now, we ALWAYS defer to LibLCM's auto-handling (triggered by setting LexemeForm/CitationForm)
// This ensures that FwData/LibLCM will correct broken homograph numbering caused by the incomplete CRDT implementation.
// if (entry.HomographNumber != 0)
// lexEntry.HomographNumber = entry.HomographNumber;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now thinking that instead of commenting this out, we should set the homograph number and then explicitly ask LibLCM to cleanup the homograph numbers.
I assume that's possible. Then if it's valid it will work as requested and if it's not it'll get fixed.

@rmunn rmunn force-pushed the feat/sync-morph-types branch from 4028077 to 9683cf5 Compare May 8, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants