Rust: Refactor MaD provenance-based filtering #21072

hvitved · 2025-12-19T12:55:30Z

Refactors existing logic, and also prioritizes specific trait implementation models over generic trait models.

Copilot

Pull request overview

This PR refactors the Models as Data (MaD) provenance-based filtering logic in the Rust dataflow analysis, with the key goal of prioritizing specific trait implementation models over generic trait models.

Key Changes:

Introduced a new filtering mechanism (summaryModelRelevant) that distinguishes between exact models (specific implementations) and inexact models (trait models), preventing unnecessary duplication of flow steps
Simplified the callable resolution logic in DataFlowImpl.qll by removing complex filtering that is now handled in ModelsAsData.qll
Added specific manual models for String::Add operations to override generic trait models and improve analysis precision

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`rust/ql/lib/codeql/rust/dataflow/internal/ModelsAsData.qll`	Core refactoring - introduces `summaryModel` and `summaryModelRelevant` predicates to filter models based on exactness (trait vs specific implementation) and provenance (manual vs generated)
`rust/ql/lib/codeql/rust/dataflow/internal/DataFlowImpl.qll`	Simplifies `viableCallable` by delegating model filtering to `ModelsAsData.qll` and improves toString() for summarized callables with `[summarized]` prefix
`rust/ql/lib/codeql/rust/frameworks/stdlib/alloc.model.yml`	Adds specific manual models for `String as Add` trait to override generated generic trait models
`rust/ql/test/library-tests/dataflow/models/models.ext.yml`	Adds test model for `<core::i32 as core::cmp::PartialOrd>::lt` to verify specific implementations override trait models
`rust/ql/test/library-tests/dataflow/models/main.rs`	Adds test case for i32 comparison to verify new filtering behavior
`rust/ql/test/library-tests/dataflow/strings/main.rs`	Updates test expectation - removes SPURIOUS flow annotation, confirming the refactoring fixed a false positive
Test expectation files (`.expected`)	Updates model IDs and edge numbering reflecting the reduced number of duplicate models being applied
`rust/ql/test/library-tests/dataflow/global/viableCallable.expected`	Updates to show `[summarized]` prefix for better clarity in test output

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-19T18:42:08Z

rust/ql/lib/codeql/rust/dataflow/internal/ModelsAsData.qll

+      // only apply trait models to concrete implementations when they are not
+      // defined in source code, and when there are no exact model for the functions
+      isExact = false and
+      not summaryModel(f, _, _, _, provenance, true, _) and


The condition checks for the absence of an exact model with the same provenance. However, this means a generated trait model could still be applied even when a manual specific implementation model exists (since they have different provenance values). Based on the comment stating "when there are no exact model for the functions" (line 148), this should likely check for any exact model regardless of provenance: not summaryModel(f, _, _, _, _, true, _)

Suggested change

not summaryModel(f, _, _, _, provenance, true, _) and

not summaryModel(f, _, _, _, _, true, _) and

Copilot · 2025-12-19T18:42:09Z

rust/ql/lib/codeql/rust/dataflow/internal/ModelsAsData.qll

+      isExact = true
+      or
+      // only apply trait models to concrete implementations when they are not
+      // defined in source code, and when there are no exact model for the functions


The comment states "when there are no exact model for the functions" but should be clearer about whether this means "no exact model with any provenance" or "no exact model with the same provenance". Consider rephrasing for clarity.

Suggested change

// defined in source code, and when there are no exact model for the functions

// defined in source code, and when there is no exact model with the same provenance for the function

Copilot · 2025-12-19T18:42:09Z

rust/ql/lib/codeql/rust/dataflow/internal/ModelsAsData.qll

+      provenance.isManual()
+      or
+      // only apply generated models to functions not defined in source code, and
+      // when there are no exact manual models for the functions


The comment uses plural "functions" but should use singular "function" since it refers to a specific function f.

Copilot · 2025-12-19T18:42:09Z

rust/ql/lib/codeql/rust/dataflow/internal/ModelsAsData.qll

+      isExact = true
+      or
+      // only apply trait models to concrete implementations when they are not
+      // defined in source code, and when there are no exact model for the functions


The comment uses plural "functions" but should use singular "function" since it refers to a specific function f.

Suggested change

// defined in source code, and when there are no exact model for the functions

// defined in source code, and when there is no exact model for the function

paldepind

Looks good! Copilot have some thoughts on the grammar that seems correct.

paldepind · 2026-01-06T12:00:15Z

rust/ql/lib/codeql/rust/dataflow/internal/ModelsAsData.qll

-private class SummarizedCallableFromModel extends SummarizedCallable::Range {
-  private string path;
+private predicate summaryModel(
+  Function f, string input, string output, string kind, Provenance provenance, boolean isExact,


What about isDirect or isInherited (with opposite value) instead of "exact"?

I went for isExact, since that is what other languages use, and will be shared once #21051 is ready.

Makes sense, but maybe that's an additional reason to get the naming right? :)

To me isExact sounds like something about the accuracy of the model itself and where the opposite would be an "approximate" model. The suggested names (and isInherited in particular) sounds more like what this is actually about.

Fair, I have changed it to isInherited.

paldepind · 2026-01-06T12:04:44Z

rust/ql/lib/codeql/rust/dataflow/internal/ModelsAsData.qll

+  exists(boolean isExact | summaryModel(f, input, output, kind, provenance, isExact, madId) |
+    (
+      provenance.isManual()
+      or
+      // only apply generated models to functions not defined in source code, and
+      // when there are no exact manual models for the functions
+      provenance.isGenerated() and
+      not any(Provenance manual | summaryModel(f, _, _, _, manual, true, _)).isManual() and
+      not f.fromSource()
+    ) and
+    (
+      isExact = true
+      or
+      // only apply trait models to concrete implementations when they are not
+      // defined in source code, and when there are no exact model for the functions
+      isExact = false and
+      not summaryModel(f, _, _, _, provenance, true, _) and
+      not f.fromSource()
+    )


Maybe something like this to avoid a bit of repetition?

Suggested change

exists(boolean isExact | summaryModel(f, input, output, kind, provenance, isExact, madId) |

(

provenance.isManual()

or

// only apply generated models to functions not defined in source code, and

// when there are no exact manual models for the functions

provenance.isGenerated() and

not any(Provenance manual | summaryModel(f, _, _, _, manual, true, _)).isManual() and

not f.fromSource()

) and

(

isExact = true

or

// only apply trait models to concrete implementations when they are not

// defined in source code, and when there are no exact model for the functions

isExact = false and

not summaryModel(f, _, _, _, provenance, true, _) and

not f.fromSource()

)

exists(boolean isExact | summaryModel(f, input, output, kind, provenance, isExact, madId) |

// Only apply generated or inherited models to functions in library code and

// when no strictly better model exist

if provenance.isGenerated() or isExact = false

then

not f.fromSource() and

not exists(Provenance provenance2 | summaryModel(f, _, _, _, provenance2, true, _) |

provenance.isGenerated() and provenance2.isManual()

or

provenance = provenance2 and isExact = false

)

else any()

)

rust/ql/lib/codeql/rust/dataflow/internal/ModelsAsData.qll

paldepind

Thanks for the PR 👍 and for considering my comments :)

github-actions bot added the Rust Pull requests that update Rust code label Dec 19, 2025

hvitved force-pushed the rust/mad-provenance-refactor branch 2 times, most recently from 868e36f to 8ffed9d Compare December 19, 2025 14:16

Rust: Refactor MaD provanance-based filtering

f23351a

hvitved force-pushed the rust/mad-provenance-refactor branch from 8ffed9d to f23351a Compare December 19, 2025 18:36

hvitved added the no-change-note-required This PR does not need a change note label Dec 19, 2025

hvitved marked this pull request as ready for review December 19, 2025 18:38

hvitved requested a review from a team as a code owner December 19, 2025 18:38

Copilot AI review requested due to automatic review settings December 19, 2025 18:38

Copilot started reviewing on behalf of hvitved December 19, 2025 18:38 View session

Copilot AI reviewed Dec 19, 2025

View reviewed changes

hvitved requested a review from paldepind January 6, 2026 08:20

paldepind requested changes Jan 6, 2026

View reviewed changes

hvitved requested a review from paldepind January 6, 2026 12:30

hvitved force-pushed the rust/mad-provenance-refactor branch from e4865b8 to 646a932 Compare January 6, 2026 12:34

paldepind reviewed Jan 6, 2026

View reviewed changes

rust/ql/lib/codeql/rust/dataflow/internal/ModelsAsData.qll Outdated Show resolved Hide resolved

Address review comment

da6d0ab

hvitved force-pushed the rust/mad-provenance-refactor branch from 646a932 to da6d0ab Compare January 6, 2026 13:35

paldepind approved these changes Jan 6, 2026

View reviewed changes

hvitved merged commit 1a2f722 into github:main Jan 6, 2026
19 checks passed

hvitved deleted the rust/mad-provenance-refactor branch January 6, 2026 13:59

	not summaryModel(f, _, _, _, provenance, true, _) and
	not summaryModel(f, _, _, _, _, true, _) and

	// defined in source code, and when there are no exact model for the functions
	// defined in source code, and when there is no exact model with the same provenance for the function

Rust: Refactor MaD provenance-based filtering #21072

Rust: Refactor MaD provenance-based filtering #21072

Uh oh!

Conversation

hvitved commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

paldepind left a comment

Choose a reason for hiding this comment

Uh oh!

paldepind Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

hvitved Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

paldepind Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

hvitved Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

paldepind Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paldepind left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hvitved commented Dec 19, 2025 •

edited

Loading