Update entry search records if morph type tokens change#2246
Conversation
Implement CRUD for MorphTypeData, add DB migration Make LeadingToken and TrailingToken nullable Register JsonPatchChange for morphTypeData Reject MorphTypeData updates that change MorphType Do not allow ANY JsonPatch changes to MorphType Rename MorphType enum to MorphTypeKind Rename MorphTypeData to MorphType LeadingToken -> Prefix, TrailingToken -> Postfix Matches the names from liblcm, which may help reduce confusion.
* Add filtering on token aware headwords * Respect SecondaryOrder when sorting * Make FTS Headword column contain morph-tokens and all vernacular WS's
* Initial work on morph types in UI Morph types now show leading/trailing tokens in headword, but do not yet have a dropdown for editing them in the entry UI. * Citation forms should not be decorated Lexeme forms should be decorated with prefix/postfix tokens according to the morph type, but citation forms are meant as "overrides" and should be reproduced exactly as-is, without morph type tokens. This is the rule used by FLEx for how it displays words, so FW Lite should do the same. As a bonus, there is now only one `headword` function in the writing system service, instead of two functions with the same name that did two slightly different things. * Fix tests --------- Co-authored-by: Tim Haasdyk <tim_haasdyk@sil.org>
* Seed canonical morph-types into CRDT projects - Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM) - Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes - Seed morph-types for existing projects in MigrateDb (before FTS refresh) - Add EF migration to clear FTS table so headwords are rebuilt with morph tokens - Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates * Stop creating morph-types in tests. They're now prepopulated * Stop printing verify diff content. It's too much. * Seed morph types before API testing * Add descriptions to canonical morph types * Sync morph-types when importing, because they already exist in CRDT * Verify our canonical morph-types match new fwdata projects * Fix non-FTS relevance order with morph-tokens in query
📝 WalkthroughWalkthroughRefactors regeneration by extracting logic into a new static Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs`:
- Around line 76-78: The query that appends entries into toUpdate misses loading
Entry.Senses, so later ToEntrySearchRecord will rebuild Gloss/Definition from a
null/empty Senses and erase FTS content; modify the additional fetch (the call
using dbContext.Set<Entry>() with changedMorphTypeKinds, updatePending and
removePending) to eagerly load the senses (e.g., Include(e => e.Senses) and any
nested properties used when building gloss/definition) so that
ToEntrySearchRecord has the full Entry.Senses available when called.
- Around line 56-73: The MorphType lookup should be done with AsNoTracking() to
avoid comparing the same tracked instance (fix the creation of
morphTypeDataLookup by calling
dbContext.Set<MorphType>().AsNoTracking().ToDictionaryAsync(m => m.Kind)), and
when building the Entry query include Senses (add .Include(e => e.Senses) to the
query that projects entries before calling ToEntrySearchRecord) so that
Definition() and Gloss() have Senses available; update references to
changedMorphTypes, morphTypeDataLookup, changedMorphTypeKinds, and the Entry
query used before ToEntrySearchRecord accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f36ebc2e-470d-4fbe-ad4f-48335645dc4f
📒 Files selected for processing (2)
backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.csbackend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs
We want to look up the *old* values from the DB context before comparing them to the changed values, because we specifically want to find out which morph types have changed prefix/postfix tokens in order to do as little work as required when we update entry search records.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs (2)
78-82: Prefer async materialization for the affected-entries query.
AddRangewill enumerate theIQueryablesynchronously here, executing a blocking DB query on the save path despiteUpdateSearchTableOnSavebeing async. Materialize withToListAsyncto stay consistent with the rest of the interceptor (which usesFirstAsync/ToDictionaryAsync).♻️ Proposed refactor
- toUpdate.AddRange(dbContext.Set<Entry>().Include(e => e.Senses).Where( - e => changedMorphTypeKinds.Contains(e.MorphType) - && !updatePending.Contains(e.Id) - && !removePending.Contains(e.Id) - )); + toUpdate.AddRange(await dbContext.Set<Entry>() + .Include(e => e.Senses) + .Where(e => changedMorphTypeKinds.Contains(e.MorphType) + && !updatePending.Contains(e.Id) + && !removePending.Contains(e.Id)) + .ToListAsync());As per coding guidelines: "Use async/await for asynchronous operations in C# code; avoid using .Result or .Wait()".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs` around lines 78 - 82, The synchronous enumeration occurs when calling toUpdate.AddRange(dbContext.Set<Entry>().Include(e => e.Senses).Where(...)), which blocks the async interceptor UpdateSearchTableOnSave; fix it by materializing the query asynchronously: replace the synchronous AddRange of the IQueryable with an awaited ToListAsync call on the query (preserving the same predicates: changedMorphTypeKinds.Contains(e.MorphType) && !updatePending.Contains(e.Id) && !removePending.Contains(e.Id)), then AddRange the resulting List<Entry>; ensure proper using of cancellation token if available and that the method remains async.
56-63: Optional: scope the MorphType change detection to Prefix/Postfix modifications.
changedMorphTypescurrently includes everyModifiedMorphTypeentity regardless of which property changed, which triggers an extraMorphTypetable round-trip (line 63) even when nothing relevant to headwords was touched. You can filter at change-tracker level to skip the lookup entirely in that case:♻️ Proposed refactor
- var changedMorphTypes = dbContext.ChangeTracker.Entries() - .Where(e => e.Entity is MorphType && e.State == EntityState.Modified) - .Select(e => (MorphType)e.Entity).ToList(); + var changedMorphTypes = dbContext.ChangeTracker.Entries<MorphType>() + .Where(e => e.State == EntityState.Modified + && (e.Property(m => m.Prefix).IsModified || e.Property(m => m.Postfix).IsModified)) + .Select(e => e.Entity).ToList();The subsequent original-vs-current comparison at line 68 still correctly guards against no-op token updates, so this is purely an efficiency tweak.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs` around lines 56 - 63, The change detection currently collects any Modified MorphType, causing unnecessary round-trips; narrow the dbContext.ChangeTracker filter in UpdateEntrySearchTableInterceptor to only include MorphType entries where the Prefix or Postfix property was modified (e.g., replace the .Where(e => e.Entity is MorphType && e.State == EntityState.Modified) with a check on e.Properties.Any(p => p.Metadata.Name == "Prefix" || p.Metadata.Name == "Postfix" && p.IsModified) so changedMorphTypes only contains MorphTypes with relevant Prefix/Postfix edits, leaving the later original-vs-current comparison and the morphTypeDataLookup fetch unchanged except that it will now be skipped when no relevant morph types were modified.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs`:
- Around line 78-82: The synchronous enumeration occurs when calling
toUpdate.AddRange(dbContext.Set<Entry>().Include(e => e.Senses).Where(...)),
which blocks the async interceptor UpdateSearchTableOnSave; fix it by
materializing the query asynchronously: replace the synchronous AddRange of the
IQueryable with an awaited ToListAsync call on the query (preserving the same
predicates: changedMorphTypeKinds.Contains(e.MorphType) &&
!updatePending.Contains(e.Id) && !removePending.Contains(e.Id)), then AddRange
the resulting List<Entry>; ensure proper using of cancellation token if
available and that the method remains async.
- Around line 56-63: The change detection currently collects any Modified
MorphType, causing unnecessary round-trips; narrow the dbContext.ChangeTracker
filter in UpdateEntrySearchTableInterceptor to only include MorphType entries
where the Prefix or Postfix property was modified (e.g., replace the .Where(e =>
e.Entity is MorphType && e.State == EntityState.Modified) with a check on
e.Properties.Any(p => p.Metadata.Name == "Prefix" || p.Metadata.Name ==
"Postfix" && p.IsModified) so changedMorphTypes only contains MorphTypes with
relevant Prefix/Postfix edits, leaving the later original-vs-current comparison
and the morphTypeDataLookup fetch unchanged except that it will now be skipped
when no relevant morph types were modified.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 72bad84f-1088-47fa-b314-cbde88861ce8
📒 Files selected for processing (1)
backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs
|
@myieye - Please take a look at #2246 (review) and in particular the "Nitpick comment" about lines 78-82. I think that's more than a nitpick. My intention was to avoid pulling in all the entry and sense data from the database, and I hoped that by using Could you let me know your opinion? I don't want to merge this until I've had a chance to talk it over with you. |
|
@rmunn I did my best (with the help of AI) to do a rough benchmark of how expensive it is to update a specific set of entries (i.e. using the interceptor) vs. simply regenerating the entire table. Here's what I got:
I'm not sure how readable that is, but the e.g. "3.6× slower" means that the interceptor was that much slower than regenerating an entire 50k row table of entries. Assuming this is even approximately correct, I think we'd likely be better off if we simply regenerated the table if a morph-type changes instead of refreshing a select set of entries. We should talk about it tomorrow. |
|
@myieye wrote:
Wow. Did not expect the attempted optimization to be that much slower! There's probably not all that much to talk about; I'll edit the interceptor to drop all attempts to be clever with restricting the list of modified entries, and just say "If any MorphType prefix or postfix tokens changed, then drop and recreate the whole table". And I'll go ahead and do that before the meeting, because there's no reason to wait until the meeting. I already agree that this approach is not nearly as feasible as it looked. |
Benchmarking the attempt at optimizing (just replace the search table for entries that used the changed morph types) revealed that it was far SLOWER to try to optimize than to drop and recreate the entire search table. So we will just drop and recreate the entire table if any morph types have had their prefix or postfix tokens change. (We will not trigger this if only descriptions changed).
|
Commit 1d7f18c changes the @myieye , I would welcome your thoughts on how to solve this. Here are my ideas.
|
This allows us to safely use the .WritingSystemsOrdered property.
Since the UpdateEntrySearchTable method needs an LcmCrdtDbContext anyway, and is casting to that type (unsafely, which will fail with a runtime error if that cast is incorrect), we might as well check at the start of the method whether we're in the right DB context, and exit fast if we aren't.
The pattern-matching check that safely casts dbContext to be an LcmCrdtDbContext will return false if it's null, so there's no need for an explicit null check at the start of the method any more.
Since we're already using the typed ChangeTracker.Entries method for morph types lower down, we might as well use it for writing systems too.
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
|
@CodeRabbit fullreview |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
|
@CodeRabbit fullreview |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs`:
- Around line 54-67: The current SavingChangesAsync branch calls
EntrySearchService.RegenerateEntrySearchTable(dbContext) when changedMorphTypes
exist, which runs before EF has persisted Added/Deleted Entry changes; move the
full rebuild logic out of SavingChangesAsync into SavedChangesAsync (implement
SavedChangesAsync to detect changed MorphType items and then call
EntrySearchService.RegenerateEntrySearchTable(dbContext) after commit) or
alternatively change the branch to call RegenerateEntrySearchTable using a
post-save view by merging the pending toUpdate, toRemove and newWritingSystems
into the regeneration inputs; update code paths that reference
SavingChangesAsync, changedMorphTypes, toUpdate, toRemove, newWritingSystems and
EntrySearchService.RegenerateEntrySearchTable accordingly so regeneration always
happens against the committed state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 08fa0476-e8f6-488f-acba-30b14ef0a133
📒 Files selected for processing (2)
backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.csbackend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs
| // Morph types with changes to prefix or postfix tokens will also require updated entry search records | ||
| // (Note that morph types can't be added or deleted, so we only need to catch changes, which will be rare) | ||
| var changedMorphTypes = dbContext.ChangeTracker.Entries<MorphType>() | ||
| .Where(e => e.State == EntityState.Modified && (e.Property(m => m.Prefix).IsModified || e.Property(m => m.Postfix).IsModified)) | ||
| .Select(e => e.Entity).ToList(); | ||
| if (toUpdate is [] && toRemove is [] && changedMorphTypes is []) return; | ||
| if (changedMorphTypes is []) | ||
| { | ||
| await EntrySearchService.UpdateEntrySearchTable(toUpdate, toRemove, newWritingSystems, dbContext); | ||
| } | ||
| else | ||
| { | ||
| await EntrySearchService.RegenerateEntrySearchTable(dbContext); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In EF Core, when ISaveChangesInterceptor.SavingChangesAsyncexecutes, haveAddedandDeletedchanges already been written to the database, and will a tracking query againstDbSet reflect the pre-save database snapshot or the eventual post-save state?
💡 Result:
No, Added and Deleted changes have NOT been written to the database when ISaveChangesInterceptor.SavingChangesAsync executes. This interceptor runs before EF Core generates and executes the database commands for inserts, updates, and deletes. A tracking query against DbSet (e.g. context.Set.Where(...)) during SavingChangesAsync will reflect the pre-save database snapshot. It will NOT see pending Added entities (since they don't exist in DB yet) and WILL see entities marked Deleted (since the DELETE has not executed). The query executes against the current database state, which excludes the pending changes from this SaveChanges call. Tracked entities may influence results via identity resolution, but new data from pending changes is not visible until after SavedChangesAsync (post-DB write and transaction commit).
Citations:
- 1: https://learn.microsoft.com/en-us/ef/core/logging-events-diagnostics/interceptors
- 2: https://www.learnentityframeworkcore.com/saving/save-changes
- 3: https://learn.microsoft.com/en-us/ef/core/change-tracking/
- 4: https://stackoverflow.com/questions/73545092/ef-core-query-not-reflecting-updated-property-without-saving
Defer the full rebuild until after the save succeeds.
At line 66, when morph types have changes to prefix or postfix, the code calls EntrySearchService.RegenerateEntrySearchTable(dbContext) from within SavingChangesAsync. At this point, EF has not yet written Added or Deleted changes to the database. A query against DbSet<Entry>() will reflect only the pre-save database snapshot—excluding any pending new entries and pending deletes that are part of this same SaveChanges call. If a single transaction changes morph tokens and also adds or removes entries, the search table will be rebuilt from incomplete data and remain stale after commit.
Move the full regeneration to SavedChangesAsync instead, or reconstruct it from the post-save state by merging the pending adds/removes/new writing systems into the regeneration input.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/FwLite/LcmCrdt/FullTextSearch/UpdateEntrySearchTableInterceptor.cs`
around lines 54 - 67, The current SavingChangesAsync branch calls
EntrySearchService.RegenerateEntrySearchTable(dbContext) when changedMorphTypes
exist, which runs before EF has persisted Added/Deleted Entry changes; move the
full rebuild logic out of SavingChangesAsync into SavedChangesAsync (implement
SavedChangesAsync to detect changed MorphType items and then call
EntrySearchService.RegenerateEntrySearchTable(dbContext) after commit) or
alternatively change the branch to call RegenerateEntrySearchTable using a
post-save view by merging the pending toUpdate, toRemove and newWritingSystems
into the regeneration inputs; update code paths that reference
SavingChangesAsync, changedMorphTypes, toUpdate, toRemove, newWritingSystems and
EntrySearchService.RegenerateEntrySearchTable accordingly so regeneration always
happens against the committed state.
|
@myieye - As we discussed, we will want to postpone the entry table rebuilding until after the DB updates are saved. I remember you're working on a PR for that, so I'm going to mark this one as draft to prevent it being merged until yours is ready. Then I'll rebase this, or merge it into yours, or something. (I won't close it though, because closed PRs are very easy to forget about and get lost). |
4028077 to
9683cf5
Compare
418b97c to
54eb419
Compare
Fixes #2213.
I chose to hook into the existing interceptor rather than creating a new one. That way if entries and morph types change in the same "transaction" (or the EF equivalent of a transaction), we can avoid duplicating the search record updates for entries that changed (and we can use the changed entries, rather than their original values, when calculating the updated headwords with the new morph types — if we used a separate interceptor then we wouldn't be able to guarantee that these two steps ran in the desired order).