Skip to content

Workflow status script#13

Open
gm119 wants to merge 43 commits intomainfrom
workflow_status
Open

Workflow status script#13
gm119 wants to merge 43 commits intomainfrom
workflow_status

Conversation

@gm119
Copy link
Collaborator

@gm119 gm119 commented Feb 3, 2026

This PR adds populate_workflow_status.pl which makes json to be used to load FlyBase curation status information into the Alliance ABC literature database for those types of curation that are stored in the ABC in the 'workflow_tag' table.

Three types of FB curation are mapped to the appropriate Alliance information by this script:

  • community curation
  • first pass curation by a biocurator ('skim' curation at FB)
  • manual indexing ('thin' curation at FB)

Script logic:

  • Uses FB 'curated_by' pubprop information to determine the curation status of the three FB curation types being mapped to workflow_tag. Sets curation status to 'done' when a file of the standard expected filename format is found for a given curation type.

  • Uses the 'nocur' flag to identify papers that contain 'no genetic information'. Validates that the nocur flag is correct and then sets manual_indexing status to 'won't curate' (with 'no genetic information' curation_tag), overriding any 'done' status added in the first step above.

  • Identifies papers have not yet been manually indexed, but which contain high-priority data. Sets curation status to 'curation needed' for manual indexing, with a note explaining why the paper is high priority.

  • Adds publication-level internal notes to the 'note' of the appropriate workflow_tag curation type

    • first filters out internal notes that are either not being submitted to the Alliance or will be submitted in a different script (e.g. attached either to a topic or a topic curation status).
    • uses the internal note timestamp to identify which of the three workflow_tag curation types to add the internal note to.
    • For any internal notes where the timestamp did not match any of the workflow_tag timestamps for that publication (can happen if the note was added as an edit record), add it to the manual indexing status (if that exists), or then the first-pass curation status (if that exists).
    • Any internal notes that have not been matched up and added in the above steps are printed in the FB_workflow_status_data_errors.err file.

Changes in lib/AuditTable.pm and lib/Util.pm add subroutines needed to get the relevant data for populate_workflow_status.pl

gm119 added 30 commits January 20, 2026 12:22
… with edge case (multiple curators for same timestamp)
…roper assignment of curator when multiple possibilites
… subs get_all_currec_data and get_relevant_curator_from_candidate_list_using_pub_and_timestamp
@gm119 gm119 requested a review from ianlongden February 3, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant