-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Rust: Refactor MaD provenance-based filtering #21072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
868e36f to
8ffed9d
Compare
8ffed9d to
f23351a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the Models as Data (MaD) provenance-based filtering logic in the Rust dataflow analysis, with the key goal of prioritizing specific trait implementation models over generic trait models.
Key Changes:
- Introduced a new filtering mechanism (
summaryModelRelevant) that distinguishes between exact models (specific implementations) and inexact models (trait models), preventing unnecessary duplication of flow steps - Simplified the callable resolution logic in
DataFlowImpl.qllby removing complex filtering that is now handled inModelsAsData.qll - Added specific manual models for
String::Addoperations to override generic trait models and improve analysis precision
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
rust/ql/lib/codeql/rust/dataflow/internal/ModelsAsData.qll |
Core refactoring - introduces summaryModel and summaryModelRelevant predicates to filter models based on exactness (trait vs specific implementation) and provenance (manual vs generated) |
rust/ql/lib/codeql/rust/dataflow/internal/DataFlowImpl.qll |
Simplifies viableCallable by delegating model filtering to ModelsAsData.qll and improves toString() for summarized callables with [summarized] prefix |
rust/ql/lib/codeql/rust/frameworks/stdlib/alloc.model.yml |
Adds specific manual models for String as Add trait to override generated generic trait models |
rust/ql/test/library-tests/dataflow/models/models.ext.yml |
Adds test model for <core::i32 as core::cmp::PartialOrd>::lt to verify specific implementations override trait models |
rust/ql/test/library-tests/dataflow/models/main.rs |
Adds test case for i32 comparison to verify new filtering behavior |
rust/ql/test/library-tests/dataflow/strings/main.rs |
Updates test expectation - removes SPURIOUS flow annotation, confirming the refactoring fixed a false positive |
Test expectation files (.expected) |
Updates model IDs and edge numbering reflecting the reduced number of duplicate models being applied |
rust/ql/test/library-tests/dataflow/global/viableCallable.expected |
Updates to show [summarized] prefix for better clarity in test output |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // only apply trait models to concrete implementations when they are not | ||
| // defined in source code, and when there are no exact model for the functions | ||
| isExact = false and | ||
| not summaryModel(f, _, _, _, provenance, true, _) and |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition checks for the absence of an exact model with the same provenance. However, this means a generated trait model could still be applied even when a manual specific implementation model exists (since they have different provenance values). Based on the comment stating "when there are no exact model for the functions" (line 148), this should likely check for any exact model regardless of provenance: not summaryModel(f, _, _, _, _, true, _)
| not summaryModel(f, _, _, _, provenance, true, _) and | |
| not summaryModel(f, _, _, _, _, true, _) and |
| isExact = true | ||
| or | ||
| // only apply trait models to concrete implementations when they are not | ||
| // defined in source code, and when there are no exact model for the functions |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "when there are no exact model for the functions" but should be clearer about whether this means "no exact model with any provenance" or "no exact model with the same provenance". Consider rephrasing for clarity.
| // defined in source code, and when there are no exact model for the functions | |
| // defined in source code, and when there is no exact model with the same provenance for the function |
| provenance.isManual() | ||
| or | ||
| // only apply generated models to functions not defined in source code, and | ||
| // when there are no exact manual models for the functions |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment uses plural "functions" but should use singular "function" since it refers to a specific function f.
| isExact = true | ||
| or | ||
| // only apply trait models to concrete implementations when they are not | ||
| // defined in source code, and when there are no exact model for the functions |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment uses plural "functions" but should use singular "function" since it refers to a specific function f.
| // defined in source code, and when there are no exact model for the functions | |
| // defined in source code, and when there is no exact model for the function |
paldepind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Copilot have some thoughts on the grammar that seems correct.
| private class SummarizedCallableFromModel extends SummarizedCallable::Range { | ||
| private string path; | ||
| private predicate summaryModel( | ||
| Function f, string input, string output, string kind, Provenance provenance, boolean isExact, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about isDirect or isInherited (with opposite value) instead of "exact"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for isExact, since that is what other languages use, and will be shared once #21051 is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but maybe that's an additional reason to get the naming right? :)
To me isExact sounds like something about the accuracy of the model itself and where the opposite would be an "approximate" model. The suggested names (and isInherited in particular) sounds more like what this is actually about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I have changed it to isInherited.
| exists(boolean isExact | summaryModel(f, input, output, kind, provenance, isExact, madId) | | ||
| ( | ||
| provenance.isManual() | ||
| or | ||
| // only apply generated models to functions not defined in source code, and | ||
| // when there are no exact manual models for the functions | ||
| provenance.isGenerated() and | ||
| not any(Provenance manual | summaryModel(f, _, _, _, manual, true, _)).isManual() and | ||
| not f.fromSource() | ||
| ) and | ||
| ( | ||
| isExact = true | ||
| or | ||
| // only apply trait models to concrete implementations when they are not | ||
| // defined in source code, and when there are no exact model for the functions | ||
| isExact = false and | ||
| not summaryModel(f, _, _, _, provenance, true, _) and | ||
| not f.fromSource() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like this to avoid a bit of repetition?
| exists(boolean isExact | summaryModel(f, input, output, kind, provenance, isExact, madId) | | |
| ( | |
| provenance.isManual() | |
| or | |
| // only apply generated models to functions not defined in source code, and | |
| // when there are no exact manual models for the functions | |
| provenance.isGenerated() and | |
| not any(Provenance manual | summaryModel(f, _, _, _, manual, true, _)).isManual() and | |
| not f.fromSource() | |
| ) and | |
| ( | |
| isExact = true | |
| or | |
| // only apply trait models to concrete implementations when they are not | |
| // defined in source code, and when there are no exact model for the functions | |
| isExact = false and | |
| not summaryModel(f, _, _, _, provenance, true, _) and | |
| not f.fromSource() | |
| ) | |
| exists(boolean isExact | summaryModel(f, input, output, kind, provenance, isExact, madId) | | |
| // Only apply generated or inherited models to functions in library code and | |
| // when no strictly better model exist | |
| if provenance.isGenerated() or isExact = false | |
| then | |
| not f.fromSource() and | |
| not exists(Provenance provenance2 | summaryModel(f, _, _, _, provenance2, true, _) | | |
| provenance.isGenerated() and provenance2.isManual() | |
| or | |
| provenance = provenance2 and isExact = false | |
| ) | |
| else any() | |
| ) |
e4865b8 to
646a932
Compare
646a932 to
da6d0ab
Compare
paldepind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR 👍 and for considering my comments :)
Refactors existing logic, and also prioritizes specific trait implementation models over generic trait models.