fix: strip pgaudit logging flags from target during DMS migration#298
Merged
fix: strip pgaudit logging flags from target during DMS migration#298
Conversation
pgaudit.log and pgaudit.log_parameter flags on the target instance cause DMS migration jobs to fail with 'Internal error' at startup. The pgaudit hooks log DMS internal replication operations, which interferes with the migration process. This fix strips pgaudit.log and pgaudit.log_parameter from the target instance spec during migration, while keeping cloudsql.enable_pgaudit so the shared library stays loaded and the pgaudit extension can be restored from the source database dump. After migration and promotion, the original app spec (with all flags) is re-applied by naiserator. Users must re-run 'nais postgres enable-audit' to re-create the extension and per-user config on the new instance. Resolves #240 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Testing revealed that keeping cloudsql.enable_pgaudit=on on the target still causes DMS 'Internal error' - even just loading the shared library interferes with DMS replication. Two-pronged approach: 1. Strip ALL pgaudit flags from target (including cloudsql.enable_pgaudit) 2. Drop pgaudit extension from source databases before migration starts, so the DMS dump does not contain CREATE EXTENSION pgaudit After migration, users must re-run 'nais postgres enable-audit'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The postgres user may not own the pgaudit extension in the app database (it's typically owned by the app user). Use ALTER EXTENSION OWNER TO postgres before DROP to ensure we can remove it regardless of owner. Also make the drop a hard error instead of a warning since DMS validates that extensions match between source and target. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In Cloud SQL, the postgres user is not a true superuser and cannot drop extensions owned by other users. The pgaudit extension in the app database is typically owned by the app user, so we must connect as the app user (who has the password from PrepareSourceDatabase) to drop it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The pgaudit hooks on the source instance also interfere with DMS pglogical replication. Strip pgaudit flags from both source and target instances during PrepareSourceInstance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 'nais postgres enable-audit' command sets per-user database config: ALTER USER app_user IN DATABASE db SET pgaudit.log TO 'none' This is included in the pg_dump and causes pg_restore to fail on the target with 'role does not exist' because the source app user (e.g. audit-test-src) doesn't exist on the target instance. Reset this setting before dropping the extension so the DMS dump is completely clean of pgaudit references. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Removing cloudsql.enable_pgaudit from the source in PrepareSourceInstance (step 11) causes the shared library to be unloaded. When step 12 then tries to connect to install pglogical, PostgreSQL fails because the pgaudit extension references a library that is no longer loaded. Step 12b (which would DROP the extension) never gets to run. Fix: keep pgaudit flags on the source so the shared library stays loaded until step 12b can DROP EXTENSION pgaudit and reset per-user settings. The target already has pgaudit flags stripped (since commit d079777). Tested end-to-end: setup + promote both complete successfully on a source instance with pgaudit enabled (including per-user pgaudit.log settings).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bakgrunn
@mortenlj bekreftet i #240 at audit logging bryter DMS-migrering. Alle tre testdatabaser (i
dev-nais-dev/basseng) feilet med "Internal error" ved oppstart av migreringsjobben.Rotårsak
DefineInstance()brukerDeepCopy()som kopierer alle database flags fra kilde til target, inkludert:pgaudit.log=write,ddl,rolepgaudit.log_parameter=onDisse flaggene aktiverer pgaudit-hooks på target-instansen. Når DMS setter opp pglogical-replikering og kjører interne DDL/write-operasjoner, trigger pgaudit logging av disse — noe som forårsaker "Internal error" og feiler migreringsjobben.
Løsning
Samme mønster som HA/PITR-fixen i #294:
CreateInstance()— Stripperpgaudit.logogpgaudit.log_parameterfra target-instansens specPrepareTargetInstance()— Stripper de samme flaggene på CNRM-nivå (safety net)ValidateSourceInstance()— Logger advarsel om at pgaudit midlertidig deaktiveres, og at brukeren må kjørenais postgres enable-auditpå nytt etter migreringViktig: Vi beholder
cloudsql.enable_pgaudit=onpå target slik at pgaudit shared library forblir lastet. Dette er nødvendig for atCREATE EXTENSION pgauditfra kilde-databasens dump skal fungere under DMS restore.Etter migrering
Når target er promotert og original app-spec deployes av naiserator, kommer
pgaudit.logogpgaudit.log_parametertilbake automatisk. Brukeren må deretter kjøre:Dette er allerede dokumentert i nais-docs:
Verifisering
Testdatabasene i
dev-nais-dev/bassengfra @mortenlj kan brukes til å verifisere at fixen fungerer.