Skip to content

Update entry search records if morph type tokens change#2246

Draft
rmunn wants to merge 14 commits into
developfrom
feat/update-search-entry-on-morph-type-change
Draft

Update entry search records if morph type tokens change#2246
rmunn wants to merge 14 commits into
developfrom
feat/update-search-entry-on-morph-type-change

Conversation

@rmunn
Copy link
Copy Markdown
Contributor

@rmunn rmunn commented Apr 22, 2026

Fixes #2213.

I chose to hook into the existing interceptor rather than creating a new one. That way if entries and morph types change in the same "transaction" (or the EF equivalent of a transaction), we can avoid duplicating the search record updates for entries that changed (and we can use the changed entries, rather than their original values, when calculating the updated headwords with the new morph types — if we used a separate interceptor then we wouldn't be able to guarantee that these two steps ran in the desired order).

rmunn and others added 7 commits April 9, 2026 11:53
Implement CRUD for MorphTypeData, add DB migration

Make LeadingToken and TrailingToken nullable

Register JsonPatchChange for morphTypeData

Reject MorphTypeData updates that change MorphType

Do not allow ANY JsonPatch changes to MorphType

Rename MorphType enum to MorphTypeKind

Rename MorphTypeData to MorphType

LeadingToken -> Prefix, TrailingToken -> Postfix
Matches the names from liblcm, which may help reduce confusion.
* Add filtering on token aware headwords
* Respect SecondaryOrder when sorting
* Make FTS Headword column contain morph-tokens and all vernacular WS's
* Initial work on morph types in UI

Morph types now show leading/trailing tokens in headword, but do not yet
have a dropdown for editing them in the entry UI.

* Citation forms should not be decorated

Lexeme forms should be decorated with prefix/postfix tokens according to
the morph type, but citation forms are meant as "overrides" and should
be reproduced exactly as-is, without morph type tokens. This is the rule
used by FLEx for how it displays words, so FW Lite should do the same.

As a bonus, there is now only one `headword` function in the writing
system service, instead of two functions with the same name that did two
slightly different things.

* Fix tests

---------

Co-authored-by: Tim Haasdyk <tim_haasdyk@sil.org>
* Seed canonical morph-types into CRDT projects

- Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM)
- Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes
- Seed morph-types for existing projects in MigrateDb (before FTS refresh)
- Add EF migration to clear FTS table so headwords are rebuilt with morph tokens
- Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates

* Stop creating morph-types in tests. They're now prepopulated

* Stop printing verify diff content. It's too much.

* Seed morph types before API testing

* Add descriptions to canonical morph types

* Sync morph-types when importing, because they already exist in CRDT

* Verify our canonical morph-types match new fwdata projects

* Fix non-FTS relevance order with morph-tokens in query
@rmunn rmunn self-assigned this Apr 22, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

Refactors regeneration by extracting logic into a new static RegenerateEntrySearchTable(LcmCrdtDbContext) that truncates and bulk-copies using dbContext.Set<MorphType>(). The EF save interceptor now detects MorphType changes (prefix/postfix) and triggers either incremental updates or full regeneration accordingly.

Changes

Cohort / File(s) Summary
Entry Search Service Refactor
backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs
Replaced instance RegenerateEntrySearchTable() async implementation with a delegating wrapper; added public static async Task RegenerateEntrySearchTable(LcmCrdtDbContext) that begins a transaction, truncates local EntrySearchRecord table, computes morphTypeDataLookup from dbContext.Set<MorphType>(), performs bulk copy, and commits.
Save-change Interceptor Update
backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs
Pattern-matches DbContext? to LcmCrdtDbContext; uses typed ChangeTracker.Entries<WritingSystem>(); detects MorphType modifications of Prefix/Postfix; if such morph-type changes exist, calls full regeneration (RegenerateEntrySearchTable(dbContext)), otherwise performs incremental update via EntrySearchService.UpdateEntrySearchTable(...).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • myieye
  • imnasnainaec

Poem

🐰 I nibbled through rows and hopped on keys,
Found morph-type tokens in the tall green weaves,
Static paws dug a fresh, clean table,
Interceptor ears heard changes—abled,
Headwords bloom where new tails breeze. 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Update entry search records if morph type tokens change' clearly summarizes the main change, which is updating search records when morph-type token properties change.
Description check ✅ Passed The description is directly related to the changeset, explaining why the existing interceptor was used and the transaction handling benefits.
Linked Issues check ✅ Passed The changes fully address issue #2213 requirements: morph-type token changes trigger headword recalculation via full table regeneration, and entry morph-type assignment changes are handled through interceptor detection.
Out of Scope Changes check ✅ Passed All changes are scoped to updating the entry search interceptor and supporting static regeneration method to handle morph-type changes, with no extraneous modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/update-search-entry-on-morph-type-change

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Apr 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

UI unit Tests

  1 files  ±0   59 suites  ±0   24s ⏱️ -8s
176 tests ±0  176 ✅ ±0  0 💤 ±0  0 ❌ ±0 
245 runs  ±0  245 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 9ac3909. ± Comparison against base commit 4028077.

♻️ This comment has been updated with latest results.

@argos-ci
Copy link
Copy Markdown

argos-ci Bot commented Apr 22, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ⚠️ Changes detected (Review) 6 changed Apr 29, 2026, 5:35 AM

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs`:
- Around line 76-78: The query that appends entries into toUpdate misses loading
Entry.Senses, so later ToEntrySearchRecord will rebuild Gloss/Definition from a
null/empty Senses and erase FTS content; modify the additional fetch (the call
using dbContext.Set<Entry>() with changedMorphTypeKinds, updatePending and
removePending) to eagerly load the senses (e.g., Include(e => e.Senses) and any
nested properties used when building gloss/definition) so that
ToEntrySearchRecord has the full Entry.Senses available when called.
- Around line 56-73: The MorphType lookup should be done with AsNoTracking() to
avoid comparing the same tracked instance (fix the creation of
morphTypeDataLookup by calling
dbContext.Set<MorphType>().AsNoTracking().ToDictionaryAsync(m => m.Kind)), and
when building the Entry query include Senses (add .Include(e => e.Senses) to the
query that projects entries before calling ToEntrySearchRecord) so that
Definition() and Gloss() have Senses available; update references to
changedMorphTypes, morphTypeDataLookup, changedMorphTypeKinds, and the Entry
query used before ToEntrySearchRecord accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f36ebc2e-470d-4fbe-ad4f-48335645dc4f

📥 Commits

Reviewing files that changed from the base of the PR and between 4028077 and c6630f1.

📒 Files selected for processing (2)
  • backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs
  • backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs

Comment thread backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs Outdated
Comment thread backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs Outdated
We want to look up the *old* values from the DB context before comparing
them to the changed values, because we specifically want to find out
which morph types have changed prefix/postfix tokens in order to do as
little work as required when we update entry search records.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs (2)

78-82: Prefer async materialization for the affected-entries query.

AddRange will enumerate the IQueryable synchronously here, executing a blocking DB query on the save path despite UpdateSearchTableOnSave being async. Materialize with ToListAsync to stay consistent with the rest of the interceptor (which uses FirstAsync/ToDictionaryAsync).

♻️ Proposed refactor
-                toUpdate.AddRange(dbContext.Set<Entry>().Include(e => e.Senses).Where(
-                    e => changedMorphTypeKinds.Contains(e.MorphType)
-                    && !updatePending.Contains(e.Id)
-                    && !removePending.Contains(e.Id)
-                ));
+                toUpdate.AddRange(await dbContext.Set<Entry>()
+                    .Include(e => e.Senses)
+                    .Where(e => changedMorphTypeKinds.Contains(e.MorphType)
+                        && !updatePending.Contains(e.Id)
+                        && !removePending.Contains(e.Id))
+                    .ToListAsync());

As per coding guidelines: "Use async/await for asynchronous operations in C# code; avoid using .Result or .Wait()".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs`
around lines 78 - 82, The synchronous enumeration occurs when calling
toUpdate.AddRange(dbContext.Set<Entry>().Include(e => e.Senses).Where(...)),
which blocks the async interceptor UpdateSearchTableOnSave; fix it by
materializing the query asynchronously: replace the synchronous AddRange of the
IQueryable with an awaited ToListAsync call on the query (preserving the same
predicates: changedMorphTypeKinds.Contains(e.MorphType) &&
!updatePending.Contains(e.Id) && !removePending.Contains(e.Id)), then AddRange
the resulting List<Entry>; ensure proper using of cancellation token if
available and that the method remains async.

56-63: Optional: scope the MorphType change detection to Prefix/Postfix modifications.

changedMorphTypes currently includes every Modified MorphType entity regardless of which property changed, which triggers an extra MorphType table round-trip (line 63) even when nothing relevant to headwords was touched. You can filter at change-tracker level to skip the lookup entirely in that case:

♻️ Proposed refactor
-        var changedMorphTypes = dbContext.ChangeTracker.Entries()
-            .Where(e => e.Entity is MorphType && e.State == EntityState.Modified)
-            .Select(e => (MorphType)e.Entity).ToList();
+        var changedMorphTypes = dbContext.ChangeTracker.Entries<MorphType>()
+            .Where(e => e.State == EntityState.Modified
+                && (e.Property(m => m.Prefix).IsModified || e.Property(m => m.Postfix).IsModified))
+            .Select(e => e.Entity).ToList();

The subsequent original-vs-current comparison at line 68 still correctly guards against no-op token updates, so this is purely an efficiency tweak.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs`
around lines 56 - 63, The change detection currently collects any Modified
MorphType, causing unnecessary round-trips; narrow the dbContext.ChangeTracker
filter in UpdateEntrySearchTableInterceptor to only include MorphType entries
where the Prefix or Postfix property was modified (e.g., replace the .Where(e =>
e.Entity is MorphType && e.State == EntityState.Modified) with a check on
e.Properties.Any(p => p.Metadata.Name == "Prefix" || p.Metadata.Name ==
"Postfix" && p.IsModified) so changedMorphTypes only contains MorphTypes with
relevant Prefix/Postfix edits, leaving the later original-vs-current comparison
and the morphTypeDataLookup fetch unchanged except that it will now be skipped
when no relevant morph types were modified.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs`:
- Around line 78-82: The synchronous enumeration occurs when calling
toUpdate.AddRange(dbContext.Set<Entry>().Include(e => e.Senses).Where(...)),
which blocks the async interceptor UpdateSearchTableOnSave; fix it by
materializing the query asynchronously: replace the synchronous AddRange of the
IQueryable with an awaited ToListAsync call on the query (preserving the same
predicates: changedMorphTypeKinds.Contains(e.MorphType) &&
!updatePending.Contains(e.Id) && !removePending.Contains(e.Id)), then AddRange
the resulting List<Entry>; ensure proper using of cancellation token if
available and that the method remains async.
- Around line 56-63: The change detection currently collects any Modified
MorphType, causing unnecessary round-trips; narrow the dbContext.ChangeTracker
filter in UpdateEntrySearchTableInterceptor to only include MorphType entries
where the Prefix or Postfix property was modified (e.g., replace the .Where(e =>
e.Entity is MorphType && e.State == EntityState.Modified) with a check on
e.Properties.Any(p => p.Metadata.Name == "Prefix" || p.Metadata.Name ==
"Postfix" && p.IsModified) so changedMorphTypes only contains MorphTypes with
relevant Prefix/Postfix edits, leaving the later original-vs-current comparison
and the morphTypeDataLookup fetch unchanged except that it will now be skipped
when no relevant morph types were modified.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 72bad84f-1088-47fa-b314-cbde88861ce8

📥 Commits

Reviewing files that changed from the base of the PR and between c6630f1 and 0f9f5c1.

📒 Files selected for processing (1)
  • backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs

@rmunn
Copy link
Copy Markdown
Contributor Author

rmunn commented Apr 24, 2026

@myieye - Please take a look at #2246 (review) and in particular the "Nitpick comment" about lines 78-82. I think that's more than a nitpick. My intention was to avoid pulling in all the entry and sense data from the database, and I hoped that by using .AddRange() on an IQueryable it would remain lazy (not loading all the data into the C# process) and put the work off until the FTS update, so all the work could be done SQL-side. But CodeRabbit is telling me that .AddRange() is synchronous, so it will block the async work at that point and cause a slowdown.

Could you let me know your opinion? I don't want to merge this until I've had a chance to talk it over with you.

@myieye
Copy link
Copy Markdown
Collaborator

myieye commented Apr 27, 2026

@rmunn I did my best (with the help of AI) to do a rough benchmark of how expensive it is to update a specific set of entries (i.e. using the interceptor) vs. simply regenerating the entire table.

Here's what I got:

modified entries interceptor save per-entry vs regen-50K
500 15.8 s 31.5 ms ≈ regen
2,000 57.5 s 28.8 ms 3.6× slower
5,000 142.2 s 28.4 ms 8.9× slower
10,000 271.1 s 27.1 ms 17× slower
25,000 709.4 s 28.4 ms 44× slower

I'm not sure how readable that is, but the e.g. "3.6× slower" means that the interceptor was that much slower than regenerating an entire 50k row table of entries.

Assuming this is even approximately correct, I think we'd likely be better off if we simply regenerated the table if a morph-type changes instead of refreshing a select set of entries.

We should talk about it tomorrow.

@rmunn
Copy link
Copy Markdown
Contributor Author

rmunn commented Apr 28, 2026

@myieye wrote:

I'm not sure how readable that is, but the e.g. "3.6× slower" means that the interceptor was that much slower than regenerating an entire 50k row table of entries.

Assuming this is even approximately correct, I think we'd likely be better off if we simply regenerated the table if a morph-type changes instead of refreshing a select set of entries.

Wow. Did not expect the attempted optimization to be that much slower! There's probably not all that much to talk about; I'll edit the interceptor to drop all attempts to be clever with restricting the list of modified entries, and just say "If any MorphType prefix or postfix tokens changed, then drop and recreate the whole table". And I'll go ahead and do that before the meeting, because there's no reason to wait until the meeting. I already agree that this approach is not nearly as feasible as it looked.

Benchmarking the attempt at optimizing (just replace the search table
for entries that used the changed morph types) revealed that it was far
SLOWER to try to optimize than to drop and recreate the entire search
table. So we will just drop and recreate the entire table if any morph
types have had their prefix or postfix tokens change. (We will not
trigger this if only descriptions changed).
@rmunn
Copy link
Copy Markdown
Contributor Author

rmunn commented Apr 29, 2026

Commit 1d7f18c changes the RegenerateEntrySearchTable method into a static method, which wants to be passed a DbContext. I'm mostly happy with it, but not 100% happy. One reason is because the interceptor class doesn't have an LcmCrdtDbContext, just a DbContext. (Which might well be an LcmCrdtDbContext, but it's not necessarily safe to assume that). Which means it doesn't have access to the WritingSystemOrdered property, so I had to duplicate the WritingSystemOrdered sort logic. Which is fine... until we decide later that we want to change that sort order, and forget that it's been duplicated.

@myieye , I would welcome your thoughts on how to solve this. Here are my ideas.

  1. The static RegenerateEntrySearchTable method should take an LcmCrdtDbContext, so that it has access to WritingSystemOrdered and we don't repeat ourselves. Then the interceptor code would check if (dbContext is LcmCrdtDbContext db) { RegenerateEntrySearchTable(db); } in order to type-check. (It would never be set up in DbContexts other than LcmCrdtDbContext, so that check should always succeed, but in theory we might have set it up somewhere else).
  2. The static RegenerateEntrySearchTable method should ask to be passed in a writing systems array, already in order. And possibly also a morph table lookup.
  3. ... Possibly other ideas, but while typing this I basically rubber-ducked myself into deciding to go with option 1.

rmunn added 5 commits April 29, 2026 12:25
This allows us to safely use the .WritingSystemsOrdered property.
Since the UpdateEntrySearchTable method needs an LcmCrdtDbContext
anyway, and is casting to that type (unsafely, which will fail with a
runtime error if that cast is incorrect), we might as well check at the
start of the method whether we're in the right DB context, and exit fast
if we aren't.
The pattern-matching check that safely casts dbContext to be an
LcmCrdtDbContext will return false if it's null, so there's no need for
an explicit null check at the start of the method any more.
Since we're already using the typed ChangeTracker.Entries method for
morph types lower down, we might as well use it for writing systems too.
@rmunn
Copy link
Copy Markdown
Contributor Author

rmunn commented Apr 29, 2026

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@rmunn
Copy link
Copy Markdown
Contributor Author

rmunn commented Apr 29, 2026

@CodeRabbit fullreview

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@rmunn: Sure! I'll kick off a full review of the PR now.


🧠 Learnings used
Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:33:24.170Z
Learning: In backend/FwLite/LcmCrdt, the MorphTypeData table has a unique index on the MorphType column (corresponding to the MorphType C# enum). The intended conflict resolution is "first one wins" using SQLite's ON CONFLICT IGNORE, because: (1) CRDT's NewEntity() cannot return null, so duplicate CreateMorphTypeDataChange objects (from concurrent offline users) result in a second row with DeletedAt set, potentially violating the unique constraint; (2) In practice, users cannot create new morph types in FieldWorks Classic or Lite. SQLite's ON CONFLICT IGNORE clause can only be applied to constraints defined inline in CREATE TABLE (not to separately-created indexes via CREATE UNIQUE INDEX), so implementing this requires either rebuilding the table with an inline UNIQUE ... ON CONFLICT IGNORE constraint or using an EF Core SaveChanges interceptor with INSERT OR IGNORE.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 2192
File: backend/FwLite/LcmCrdt/Changes/CreateMorphTypeDataChange.cs:21-32
Timestamp: 2026-03-11T02:23:28.891Z
Learning: In sillsdev/languageforge-lexbox LcmCrdt, the MorphTypeData table's UNIQUE ON CONFLICT IGNORE constraint on the MorphType column is implemented in migration 20260306045826_AddMorphTypeDataTable.cs using migrationBuilder.Sql() with a hand-written CREATE TABLE DDL (since EF Core's CreateTable() API doesn't support ON CONFLICT clauses). A separate migrationBuilder.CreateIndex() for IX_MorphTypeData_MorphType is also emitted so EF Core's model snapshot stays consistent with HasIndex(...).IsUnique() configuration. A separate migration 20260310065457_UniqueIndexOnMorphType (which used CREATE UNIQUE INDEX without ON CONFLICT support) was deleted as part of this consolidation.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 2192
File: backend/FwLite/LcmCrdt/CrdtMiniLcmApi.cs:352-365
Timestamp: 2026-03-12T06:58:58.411Z
Learning: In sillsdev/languageforge-lexbox backend/FwLite/LcmCrdt, the CrdtRepository class handles tombstone (soft-delete) filtering centrally in its ProjectSnapshot method, which removes deleted entries from the EF Core DB context before SaveChanges() is called. As a result, MiniLcmApi classes (CrdtMiniLcmApi, FwDataMiniLcmApi, etc.) do NOT need to add `DeletedAt == null` filters when querying the repository — deleted objects are never present in the live DB view. Do not suggest adding such filters in code reviews of these API classes.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:50:35.872Z
Learning: In sillsdev/languageforge-lexbox LcmCrdt, when a SQLite table needs `UNIQUE ON CONFLICT IGNORE` on a column (e.g., MorphTypeData.MorphType), the EF Core migration must use `migrationBuilder.Sql()` for the CREATE TABLE DDL (since EF Core's CreateTable() API doesn't support ON CONFLICT clauses). A separate named `CREATE UNIQUE INDEX` should also be emitted via `migrationBuilder.Sql()` so EF Core's model snapshot stays consistent with `HasIndex(...).IsUnique()` configuration. SQLite allows both the inline constraint and the named index on the same column; the inline ON CONFLICT IGNORE fires first on duplicate inserts.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:34:03.262Z
Learning: In the LcmCrdt SQLite database for sillsdev/languageforge-lexbox, `ON CONFLICT IGNORE` for a unique constraint on MorphTypeData.MorphType cannot be implemented via a standalone `CREATE UNIQUE INDEX` statement (SQLite doesn't support ON CONFLICT clauses on CREATE INDEX). It must be a table-level constraint in the CREATE TABLE DDL, requiring a table-rebuild migration. The practical risk is low because morph type creation is restricted in both FieldWorks Classic and FW Lite, but the theoretical CRDT offline-sync race condition exists.

Learnt from: CR
Repo: sillsdev/languageforge-lexbox PR: 0
File: backend/FwLite/AGENTS.md:0-0
Timestamp: 2026-02-04T14:29:56.355Z
Learning: Applies to backend/FwLite/**/MiniLcm/Models/Entry.cs : Entry is a core model that affects all downstream systems. When adding fields to Entry, ensure proper implementation in Copy(), GetReferences(), RemoveReference(), CRDT mirroring, change handling, sync logic, FwData mapping, and comprehensive test coverage across all three test projects
✅ Actions performed

Full review triggered.

@rmunn
Copy link
Copy Markdown
Contributor Author

rmunn commented Apr 29, 2026

@CodeRabbit fullreview

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@rmunn: Sure! I'll kick off a full review of the PR now.


🧠 Learnings used
Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:33:24.170Z
Learning: In backend/FwLite/LcmCrdt, the MorphTypeData table has a unique index on the MorphType column (corresponding to the MorphType C# enum). The intended conflict resolution is "first one wins" using SQLite's ON CONFLICT IGNORE, because: (1) CRDT's NewEntity() cannot return null, so duplicate CreateMorphTypeDataChange objects (from concurrent offline users) result in a second row with DeletedAt set, potentially violating the unique constraint; (2) In practice, users cannot create new morph types in FieldWorks Classic or Lite. SQLite's ON CONFLICT IGNORE clause can only be applied to constraints defined inline in CREATE TABLE (not to separately-created indexes via CREATE UNIQUE INDEX), so implementing this requires either rebuilding the table with an inline UNIQUE ... ON CONFLICT IGNORE constraint or using an EF Core SaveChanges interceptor with INSERT OR IGNORE.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 2192
File: backend/FwLite/LcmCrdt/Changes/CreateMorphTypeDataChange.cs:21-32
Timestamp: 2026-03-11T02:23:28.891Z
Learning: In sillsdev/languageforge-lexbox LcmCrdt, the MorphTypeData table's UNIQUE ON CONFLICT IGNORE constraint on the MorphType column is implemented in migration 20260306045826_AddMorphTypeDataTable.cs using migrationBuilder.Sql() with a hand-written CREATE TABLE DDL (since EF Core's CreateTable() API doesn't support ON CONFLICT clauses). A separate migrationBuilder.CreateIndex() for IX_MorphTypeData_MorphType is also emitted so EF Core's model snapshot stays consistent with HasIndex(...).IsUnique() configuration. A separate migration 20260310065457_UniqueIndexOnMorphType (which used CREATE UNIQUE INDEX without ON CONFLICT support) was deleted as part of this consolidation.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:50:35.872Z
Learning: In sillsdev/languageforge-lexbox LcmCrdt, when a SQLite table needs `UNIQUE ON CONFLICT IGNORE` on a column (e.g., MorphTypeData.MorphType), the EF Core migration must use `migrationBuilder.Sql()` for the CREATE TABLE DDL (since EF Core's CreateTable() API doesn't support ON CONFLICT clauses). A separate named `CREATE UNIQUE INDEX` should also be emitted via `migrationBuilder.Sql()` so EF Core's model snapshot stays consistent with `HasIndex(...).IsUnique()` configuration. SQLite allows both the inline constraint and the named index on the same column; the inline ON CONFLICT IGNORE fires first on duplicate inserts.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:34:03.262Z
Learning: In the LcmCrdt SQLite database for sillsdev/languageforge-lexbox, `ON CONFLICT IGNORE` for a unique constraint on MorphTypeData.MorphType cannot be implemented via a standalone `CREATE UNIQUE INDEX` statement (SQLite doesn't support ON CONFLICT clauses on CREATE INDEX). It must be a table-level constraint in the CREATE TABLE DDL, requiring a table-rebuild migration. The practical risk is low because morph type creation is restricted in both FieldWorks Classic and FW Lite, but the theoretical CRDT offline-sync race condition exists.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 2192
File: backend/FwLite/LcmCrdt/CrdtMiniLcmApi.cs:352-365
Timestamp: 2026-03-12T06:58:58.411Z
Learning: In sillsdev/languageforge-lexbox backend/FwLite/LcmCrdt, the CrdtRepository class handles tombstone (soft-delete) filtering centrally in its ProjectSnapshot method, which removes deleted entries from the EF Core DB context before SaveChanges() is called. As a result, MiniLcmApi classes (CrdtMiniLcmApi, FwDataMiniLcmApi, etc.) do NOT need to add `DeletedAt == null` filters when querying the repository — deleted objects are never present in the live DB view. Do not suggest adding such filters in code reviews of these API classes.

Learnt from: CR
Repo: sillsdev/languageforge-lexbox PR: 0
File: backend/FwLite/AGENTS.md:0-0
Timestamp: 2026-02-04T14:29:56.355Z
Learning: Applies to backend/FwLite/**/MiniLcm/Models/Entry.cs : Entry is a core model that affects all downstream systems. When adding fields to Entry, ensure proper implementation in Copy(), GetReferences(), RemoveReference(), CRDT mirroring, change handling, sync logic, FwData mapping, and comprehensive test coverage across all three test projects

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 2192
File: backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs:125-127
Timestamp: 2026-03-11T04:58:21.676Z
Learning: In sillsdev/languageforge-lexbox backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs SyncInternal, the two-pass sync pattern is intentional and lossless: pass 1 (e.g., line 126 for MorphTypeData) updates the live CRDT from FW data using the project snapshot as "before"; pass 2 (e.g., line 127) reads the *already-updated* live CRDT and pushes it to FW. This means a null value in the CRDT snapshot cannot destructively wipe an FW value — it will be reconciled to the FW value in pass 1 before pass 2 ever runs. A null reaching FW in pass 2 only happens when the user intentionally deleted it in CRDT, which is correct bidirectional sync behavior.

Learnt from: myieye
Repo: sillsdev/languageforge-lexbox PR: 2215
File: backend/FwLite/LcmCrdt/Changes/CreateCustomViewChange.cs:13-19
Timestamp: 2026-03-27T09:11:53.218Z
Learning: In the sillsdev/languageforge-lexbox LcmCrdt change classes (backend/FwLite/LcmCrdt/Changes/*.cs), it is the established pattern to assign object/array references directly in constructors without defensive copying — e.g., CreateEntryChange assigns MultiString references, CreateSenseChange assigns SemanticDomains list references, CreateExampleSentenceChange assigns Translations list references. Do not flag reference-sharing in these change classes as a bug; it is intentional and consistent across the codebase.

Learnt from: myieye
Repo: sillsdev/languageforge-lexbox PR: 2215
File: backend/FwLite/LcmCrdt/Changes/CreateCustomViewChange.cs:35-47
Timestamp: 2026-03-27T15:17:33.729Z
Learning: In sillsdev/languageforge-lexbox (backend/FwLite/LcmCrdt/Changes), the "Always check if referenced entities have been deleted before applying changes" guideline applies only to CRDT entity GUID references (i.e., IObjectWithId instances tracked by Harmony, returned via GetReferences()). For CustomView, ViewField.FieldId and ViewWritingSystem.WsId are plain string identifiers — NOT CRDT entity references. CustomView.GetReferences() returns an empty Guid[] and RemoveReference is a no-op. Do not flag string field/writing-system IDs in CustomView changes as missing entity-deletion checks.

Learnt from: CR
Repo: sillsdev/languageforge-lexbox PR: 0
File: backend/FwLite/AGENTS.md:0-0
Timestamp: 2026-02-04T14:29:56.355Z
Learning: Applies to backend/FwLite/**/MiniLcm/SyncHelpers/*.cs : When adding new fields to MiniLcm models, add diff logic for the new field in the appropriate sync helper file (e.g., EntrySync.cs for Entry fields). Sync bugs here cause data loss.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs`:
- Around line 54-67: The current SavingChangesAsync branch calls
EntrySearchService.RegenerateEntrySearchTable(dbContext) when changedMorphTypes
exist, which runs before EF has persisted Added/Deleted Entry changes; move the
full rebuild logic out of SavingChangesAsync into SavedChangesAsync (implement
SavedChangesAsync to detect changed MorphType items and then call
EntrySearchService.RegenerateEntrySearchTable(dbContext) after commit) or
alternatively change the branch to call RegenerateEntrySearchTable using a
post-save view by merging the pending toUpdate, toRemove and newWritingSystems
into the regeneration inputs; update code paths that reference
SavingChangesAsync, changedMorphTypes, toUpdate, toRemove, newWritingSystems and
EntrySearchService.RegenerateEntrySearchTable accordingly so regeneration always
happens against the committed state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 08fa0476-e8f6-488f-acba-30b14ef0a133

📥 Commits

Reviewing files that changed from the base of the PR and between 4028077 and 9ac3909.

📒 Files selected for processing (2)
  • backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs
  • backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs

Comment on lines +54 to +67
// Morph types with changes to prefix or postfix tokens will also require updated entry search records
// (Note that morph types can't be added or deleted, so we only need to catch changes, which will be rare)
var changedMorphTypes = dbContext.ChangeTracker.Entries<MorphType>()
.Where(e => e.State == EntityState.Modified && (e.Property(m => m.Prefix).IsModified || e.Property(m => m.Postfix).IsModified))
.Select(e => e.Entity).ToList();
if (toUpdate is [] && toRemove is [] && changedMorphTypes is []) return;
if (changedMorphTypes is [])
{
await EntrySearchService.UpdateEntrySearchTable(toUpdate, toRemove, newWritingSystems, dbContext);
}
else
{
await EntrySearchService.RegenerateEntrySearchTable(dbContext);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In EF Core, when ISaveChangesInterceptor.SavingChangesAsyncexecutes, haveAddedandDeletedchanges already been written to the database, and will a tracking query againstDbSet reflect the pre-save database snapshot or the eventual post-save state?

💡 Result:

No, Added and Deleted changes have NOT been written to the database when ISaveChangesInterceptor.SavingChangesAsync executes. This interceptor runs before EF Core generates and executes the database commands for inserts, updates, and deletes. A tracking query against DbSet (e.g. context.Set.Where(...)) during SavingChangesAsync will reflect the pre-save database snapshot. It will NOT see pending Added entities (since they don't exist in DB yet) and WILL see entities marked Deleted (since the DELETE has not executed). The query executes against the current database state, which excludes the pending changes from this SaveChanges call. Tracked entities may influence results via identity resolution, but new data from pending changes is not visible until after SavedChangesAsync (post-DB write and transaction commit).

Citations:


Defer the full rebuild until after the save succeeds.

At line 66, when morph types have changes to prefix or postfix, the code calls EntrySearchService.RegenerateEntrySearchTable(dbContext) from within SavingChangesAsync. At this point, EF has not yet written Added or Deleted changes to the database. A query against DbSet<Entry>() will reflect only the pre-save database snapshot—excluding any pending new entries and pending deletes that are part of this same SaveChanges call. If a single transaction changes morph tokens and also adds or removes entries, the search table will be rebuilt from incomplete data and remain stale after commit.

Move the full regeneration to SavedChangesAsync instead, or reconstruct it from the post-save state by merging the pending adds/removes/new writing systems into the regeneration input.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs`
around lines 54 - 67, The current SavingChangesAsync branch calls
EntrySearchService.RegenerateEntrySearchTable(dbContext) when changedMorphTypes
exist, which runs before EF has persisted Added/Deleted Entry changes; move the
full rebuild logic out of SavingChangesAsync into SavedChangesAsync (implement
SavedChangesAsync to detect changed MorphType items and then call
EntrySearchService.RegenerateEntrySearchTable(dbContext) after commit) or
alternatively change the branch to call RegenerateEntrySearchTable using a
post-save view by merging the pending toUpdate, toRemove and newWritingSystems
into the regeneration inputs; update code paths that reference
SavingChangesAsync, changedMorphTypes, toUpdate, toRemove, newWritingSystems and
EntrySearchService.RegenerateEntrySearchTable accordingly so regeneration always
happens against the committed state.

@rmunn
Copy link
Copy Markdown
Contributor Author

rmunn commented Apr 30, 2026

@myieye - As we discussed, we will want to postpone the entry table rebuilding until after the DB updates are saved. I remember you're working on a PR for that, so I'm going to mark this one as draft to prevent it being merged until yours is ready. Then I'll rebase this, or merge it into yours, or something. (I won't close it though, because closed PRs are very easy to forget about and get lost).

@rmunn rmunn marked this pull request as draft April 30, 2026 03:52
@rmunn rmunn force-pushed the feat/sync-morph-types branch from 4028077 to 9683cf5 Compare May 8, 2026 08:37
@myieye myieye force-pushed the feat/sync-morph-types branch 5 times, most recently from 418b97c to 54eb419 Compare May 13, 2026 09:02
Base automatically changed from feat/sync-morph-types to develop May 13, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants