feat(nanno): if snpeff deems a variant to be in multiple genes report them all by holmeso · Pull Request #400 · AdamaJava/adamajava

holmeso · 2026-02-27T06:51:42Z

This pull request refactors and improves the annotation extraction logic for VCF and SnpEff VCF files, optimizes TSV annotation field extraction, and enhances test coverage. The main focus is on making the annotation extraction more robust, efficient, and maintainable, especially for cases with multiple consequences or large INFO fields.

VCF and SnpEff VCF annotation improvements:

Refactored AnnotationSourceSnpEffVCF to extract all relevant consequences per gene, not just the first, and to support multiple consequences per alternate allele. This results in more comprehensive and accurate annotation extraction. [1] [2]
Simplified and consolidated logic in getAnnotation and related methods, reducing code duplication and improving maintainability. [1] [2]

Performance and maintainability enhancements:

Improved VCF INFO field parsing in AnnotationSourceVCF to efficiently handle large fields or requests for many annotation fields by parsing the INFO field once into a map when needed. [1] [2] [3]
Optimized TSV annotation extraction in AnnotationSourceTSV by precomputing field name and position arrays for faster lookup during annotation extraction. [1] [2] [3]

Testing improvements:

Added and expanded tests for SnpEff VCF annotation extraction, including setup for temporary files and use of ChrPositionRefAlt utilities.

Minor code cleanups:

Replaced deprecated or verbose code patterns (e.g., use of contains(",") replaced with more efficient indexOf(',') >= 0 checks).
Removed commented-out and redundant code for clarity. [1] [2]

These changes collectively improve the accuracy, efficiency, and maintainability of the annotation extraction logic across VCF, SnpEff VCF, and TSV sources.

… them all

Copilot

Pull request overview

This PR improves annotation extraction across TSV, generic VCF INFO, and SnpEff VCF sources, with a focus on supporting multiple-gene SnpEff annotations and reducing per-record parsing overhead.

Changes:

Update SnpEff VCF parsing to return consequences for multiple genes (and add/extend tests around multi-gene behavior).
Optimize VCF INFO parsing by optionally parsing INFO once into a map for large INFO strings or many requested fields.
Speed up TSV extraction by precomputing field name/position arrays.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
qannotate/test/au/edu/qimr/qannotate/nanno/AnnotationSourceTSVTest.java	Updates tests to match new TSV extraction signature (arrays rather than map).
qannotate/test/au/edu/qimr/qannotate/nanno/AnnotationSourceSnpEffVCFTest.java	Adds coverage for multi-gene and multi-allele SnpEff behaviors using temp VCF files.
qannotate/src/au/edu/qimr/qannotate/nanno/AnnotationSourceVCF.java	Adds INFO parsing heuristics to parse once into a map for performance on large fields/many keys.
qannotate/src/au/edu/qimr/qannotate/nanno/AnnotationSourceTSV.java	Precomputes field arrays for faster per-record TSV extraction.
qannotate/src/au/edu/qimr/qannotate/nanno/AnnotationSourceSnpEffVCF.java	Refactors annotation matching and expands extraction to return all relevant gene consequences.
qannotate/src/au/edu/qimr/qannotate/nanno/AnnotationSource.java	Simplifies `getAnnotation` flow and makes the “requested < next” early-return explicit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T06:55:49Z

qannotate/src/au/edu/qimr/qannotate/nanno/AnnotationSourceSnpEffVCF.java

-        String worstConsequence = "";
+        Map<String, String> worstByGene = new java.util.LinkedHashMap<>();
        for (String aa : annArray) {
            if (aa.startsWith(alt)) {


getWorstConsequence uses aa.startsWith(alt) to select consequences for an allele. This can incorrectly match alleles where alt is a prefix (e.g., requested alt A will also match an ANN entry starting with AA|...). To avoid mis-annotating multi-allelic records, match the allele token exactly (e.g., aa.startsWith(alt + "|") or parse up to the first '|').

Suggested change

if (aa.startsWith(alt)) {

int pipeIndex = aa.indexOf('|');

if (pipeIndex <= 0) {

// Malformed ANN entry or missing allele token; skip

continue;

}

String alleleToken = aa.substring(0, pipeIndex);

if (alleleToken.equals(alt)) {

@Coplit please implement your suggestion

Copilot · 2026-02-27T06:55:50Z

qannotate/src/au/edu/qimr/qannotate/nanno/AnnotationSourceTSV.java

+		// precompute arrays for fast extraction
+		this.fieldNames = headerNameAndPosition.keySet().toArray(new String[0]);
+		this.fieldPositions = new int[this.fieldNames.length];
+		for (int i = 0; i < this.fieldNames.length; i++) {
+			this.fieldPositions[i] = headerNameAndPosition.get(this.fieldNames[i]);
+		}


The precomputed fieldNames array is built from headerNameAndPosition.keySet(), which has non-deterministic iteration order for a HashMap. This can cause extracted annotations to be emitted in an arbitrary order that may not match the user-requested field order (and can diverge from emptyRecordResult, which preserves the input order). Consider building fieldNames/fieldPositions by iterating over the requested fieldNames string (split by ',') or using an order-preserving map (e.g., LinkedHashMap).

@copilot open a new pull request to apply changes based on this feedbacl

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-27T06:55:50Z

qannotate/src/au/edu/qimr/qannotate/nanno/AnnotationSourceTSV.java

+	public static String extractFieldsFromRecord(String[] record, String[] fieldNames, int[] fieldPositions) {
 		StringBuilder dataToReturn = new StringBuilder();
 		int recordLength = null != record ? record.length : 0;
-		if ( recordLength > 0 && null != fields) {
-//			String [] recordArray = TabTokenizer.tokenize(record);
-			for (Entry<String, Integer> entry : fields.entrySet()) {
-				/*
-				 * make sure that array length is not shorter than entry value
-				 */
-				if (recordLength > entry.getValue()) {
-					dataToReturn.append(( ! dataToReturn.isEmpty()) ? FIELD_DELIMITER_TAB : "").append(entry.getKey()).append("=").append(record[entry.getValue()]);
+		if (recordLength > 0 && null != fieldNames && null != fieldPositions) {
+			for (int i = 0; i < fieldNames.length; i++) {
+				int pos = fieldPositions[i];
+				if (recordLength > pos) {
+					dataToReturn.append((!dataToReturn.isEmpty()) ? FIELD_DELIMITER_TAB : "")


extractFieldsFromRecord iterates over fieldNames.length but indexes into fieldPositions[i] without verifying the arrays are the same length. If a caller passes mismatched arrays, this will throw ArrayIndexOutOfBoundsException. Consider validating lengths up front (and returning "" / throwing IllegalArgumentException), or iterating up to Math.min(fieldNames.length, fieldPositions.length).

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-27T06:57:04Z

@holmeso I've opened a new pull request, #401, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-02-27T06:59:25Z

@holmeso I've opened a new pull request, #402, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: holmeso <7066552+holmeso@users.noreply.github.com>

Copilot · 2026-02-27T07:02:19Z

@holmeso I've opened a new pull request, #403, to work on those changes. Once the pull request is ready, I'll request review from you.

fix(nanno): guard against mismatched array lengths in extractFieldsFromRecord

feat(nanno): if snpeff deems a variant to be in multiple genes report…

6516f60

… them all

holmeso requested a review from Copilot February 27, 2026 06:51

Copilot started reviewing on behalf of holmeso February 27, 2026 06:52 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

Initial plan

460c9c4

Copilot AI mentioned this pull request Feb 27, 2026

fix(nanno): guard against mismatched array lengths in extractFieldsFromRecord #401

Merged

9 tasks

Copilot AI mentioned this pull request Feb 27, 2026

fix(nanno): preserve user-requested field order in AnnotationSourceTSV #402

Draft

16 tasks

fix: guard against mismatched array lengths in extractFieldsFromRecord

283d201

Co-authored-by: holmeso <7066552+holmeso@users.noreply.github.com>

Copilot AI mentioned this pull request Feb 27, 2026

Fix non-deterministic field ordering in AnnotationSourceTSV annotation extraction #403

Draft

9 tasks

Merge pull request #401 from AdamaJava/copilot/sub-pr-400

6f724e2

fix(nanno): guard against mismatched array lengths in extractFieldsFromRecord

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nanno): if snpeff deems a variant to be in multiple genes report them all#400

feat(nanno): if snpeff deems a variant to be in multiple genes report them all#400
holmeso wants to merge 4 commits intomasterfrom
nanno_multi_gene

holmeso commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

holmeso Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

holmeso Feb 27, 2026

Uh oh!

holmeso Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

holmeso Feb 27, 2026

Uh oh!

Copilot AI commented Feb 27, 2026

Uh oh!

Copilot AI commented Feb 27, 2026

Uh oh!

Copilot AI commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-            if (aa.startsWith(alt)) {
+            int pipeIndex = aa.indexOf('|');
+            if (pipeIndex <= 0) {
+                // Malformed ANN entry or missing allele token; skip
+                continue;
+            }
+            String alleleToken = aa.substring(0, pipeIndex);
+            if (alleleToken.equals(alt)) {

Conversation

holmeso commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

holmeso Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

holmeso Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

holmeso Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

holmeso Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Feb 27, 2026

Uh oh!

Copilot AI commented Feb 27, 2026

Uh oh!

Copilot AI commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants