Skip to content

Add a system verification/diagnostic/self test script #159

@collectoss-issue-migrator

Description

@collectoss-issue-migrator

Note

Migrated from augurlabs/augur#3485
Originally opened by @MoralCode on 2025-12-19


I think it would help for augur to provide a verification script for people to optionally run on their installations to automatically help detect and flag potential problems before they arise.

I think this should take the form of a new CLI command and should check both things that should already be true about an augur install, and also maybe pre-emptively check and warn for things that may affect future/planned migrations.

Basic informational checks:
Basic information about your augur instance/setup

  • report current DB alembic version, postgres version, checked out commit of augur and whether docker is in use or not
  • are all migrations applied?

basic operational checks:
are important baseline expectations for operation working okay?

  • are filesystem permissions correct for things like facade and logs (possibly with an attempt to automatically fix in leiu of a post-start hook, see podman compose unable to run docker lifecycle hooks #96 )
  • Check that all git repos in facade have no changes made to them (clean working trees)
  • check that the ratio of commits with no linked contributor is within acceptable parameters.

Benign Data Cleanliness:
not issues, just things that could use cleaning up

  • check for duplicate aliases for a contributor (same email present many times)

Bug Mitigation/regression prevention:
issues from previous bugs that may still affect older instances if cleanup was not performed

  • are there any duplicate repos (i.e. multiple entries in the repos table with the same repo_src_id?
  • detect duplicate issue urls (collectoss.tasks.github.events.collect_events: duplicate key value violates unique constraint "issue-insert-unique" #151, see below query)
  • detect bad commits table data (from the empty name bug)
  • detect unreasonable numbers of aliases per contributor (resolution bugs as a result of the prior one)
    • special spot checks for “dave” “andy” etc common names that were also being mis-connected
  • count contributors with null created_at dates (from the resolution bugs of late 2025 early 2026 - there shouldnt be many as theres an automatic task to correct these). if this doesnt go down within a few hours for no good reason, there may be an issue

Future-thinking checks:
planned schema or other major changes we could proactively check for:

  • Check that all dates in the columns affected by repo_deps_libyear column name misspelled  #29 are in an expected format (for future migration)
  • check that all gl_ prefixed columns in the contributors table are null (for future removal/merge into generic platform tables
  • check for differences between contributors table cntrb_email and cntrb_canonical for future merge of those columns
  • check for cntrb_canonical values in the email aliases table that are different/not present as actual aliases (prepare for column deletion)
  • print warnings about misspelled column names potentially changing in future (repo_deps_libyear column name misspelled  #29)

Consistency checks:
sanity checks not born from any particular issue, but intended to act as canaries when stuff goes wrong

  • make sure collection dates are all after author/committer date in commits table to the self check script

Running this report could potentially also become part of the process for reporting a bug

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions