Stream backup archive directly to S3 to reduce ephemeral storage by Copilot · Pull Request #4263 · sillsdev/TheCombine

Copilot · 2026-04-10T21:34:12Z

Devin review: https://app.devin.ai/review/sillsdev/TheCombine/pull/4263

The daily backup job was consuming ~14 GB of node ephemeral storage because the tar.gz archive was written to local disk in full before being uploaded to S3 — making peak usage roughly raw_files + archive_size.

Changes

aws_backup.py: Add push_stream(dest) — launches aws s3 cp - s3://... as a subprocess and returns the Popen handle for callers to stream data via stdin. Add delete(dest) — removes an S3 object (uses check_results=False so a missing object does not raise an error).
combine_backup.py: Replace local tarfile.open(file, "x:gz") + aws.push(file) with tarfile.open(fileobj=upload_proc.stdin, mode="w:gz"), piping the archive directly to S3 as it is built. The archive is never materialized on disk. On any failure during tar creation (Exception or SystemExit), the upload process is killed and the partial S3 object is deleted. A warning is logged if the cleanup deletion itself fails. The same delete-and-warn logic applies if the upload process exits with a non-zero return code.

# Before: write full archive to disk, then upload
with tarfile.open(backup_file, "x:gz") as tar:
    tar.add(...)
aws.push(backup_file, aws_file)

# After: stream archive directly to S3, with cleanup on failure
upload_proc = aws.push_stream(aws_file)
try:
    with tarfile.open(fileobj=upload_proc.stdin, mode="w:gz") as tar:
        tar.add(...)
except (Exception, SystemExit):
    upload_proc.kill()
    upload_proc.wait()
    aws.delete(aws_file)
    raise

Peak ephemeral storage drops from roughly max(db_dump, backend_files) + archive_size to just max(db_dump, backend_files).

This change is

Agent-Logs-Url: https://github.com/sillsdev/TheCombine/sessions/aa2742e4-8a85-48c9-82a2-3d3d7ee75063 Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>

imnasnainaec · 2026-04-23T18:38:18Z

@copilot

The streaming approach (push_stream) starts the S3 upload before the tarball is fully created. If any step fails mid-stream, a partial backup gets silently uploaded to S3. Two possible approaches to fix this:

Add cleanup on failure: In the except/finally handling, if an error occurs during tar creation, delete the partial S3 object. You could call aws s3 rm on the S3 URI after the upload process completes (or is killed). This would require adding a delete/rm method to AwsBackup and wrapping the streaming section in a try/except that catches both Exception and SystemExit.

Kill the upload process on failure: Instead of letting the aws process complete the partial upload, kill it (upload_proc.kill()) in the error path. For multipart uploads, AWS automatically cleans up incomplete multipart uploads (or an S3 lifecycle policy can do this). However, for small uploads that complete as a single PUT, the object might still appear.

The most robust approach is a combination: kill the upload process on failure AND then attempt to delete the S3 object. The cleanup code should be in a finally/except block that catches SystemExit (raised by sys.exit in run_cmd) in addition to regular exceptions.

Relevant files: combine_backup.py (main function around lines 97-130), aws_backup.py (push_stream method, would need a delete method).

Agent-Logs-Url: https://github.com/sillsdev/TheCombine/sessions/ea2b91ff-fe07-4dcc-bca0-8bc9ef35f46b Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>

Copilot · 2026-04-23T18:41:36Z

@copilot

The streaming approach (push_stream) starts the S3 upload before the tarball is fully created. If any step fails mid-stream, a partial backup gets silently uploaded to S3. Two possible ap...

Implemented the combined approach in c4d01a2:

Added AwsBackup.delete() in aws_backup.py (uses check_results=False so a missing object doesn't raise).
Wrapped the streaming section in except (Exception, SystemExit) — on any failure: kills the upload process (upload_proc.kill() + wait()), then calls aws.delete(aws_file) to remove any partial S3 object, with a warning logged if the deletion itself fails. The original exception is then re-raised.
Also calls aws.delete + logs a warning if the upload process exits with a non-zero code after the tar finishes.

Initial plan

3d92132

Copilot AI assigned Copilot and imnasnainaec Apr 10, 2026

Copilot started work on behalf of imnasnainaec April 10, 2026 21:34 View session

Copilot AI linked an issue Apr 10, 2026 that may be closed by this pull request

Reduce ephemeral storage used by backup job #3944

Open

Stream backup tar.gz directly to S3 to reduce ephemeral storage

e751d63

Agent-Logs-Url: https://github.com/sillsdev/TheCombine/sessions/aa2742e4-8a85-48c9-82a2-3d3d7ee75063 Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Reduce ephemeral storage used by backup job~~ Stream backup archive directly to S3 to reduce ephemeral storage Apr 10, 2026

Copilot AI requested a review from imnasnainaec April 10, 2026 21:41

Copilot finished work on behalf of imnasnainaec April 10, 2026 21:41

Copilot started work on behalf of imnasnainaec April 23, 2026 18:38 View session

Kill upload and clean up partial S3 object on backup failure

c4d01a2

Agent-Logs-Url: https://github.com/sillsdev/TheCombine/sessions/ea2b91ff-fe07-4dcc-bca0-8bc9ef35f46b Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>

Copilot finished work on behalf of imnasnainaec April 23, 2026 18:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stream backup archive directly to S3 to reduce ephemeral storage#4263

Stream backup archive directly to S3 to reduce ephemeral storage#4263
Copilot wants to merge 3 commits intomasterfrom
copilot/reduce-ephemeral-storage-backup-job

Copilot AI commented Apr 10, 2026 •

edited by jasonleenaylor

Loading

Uh oh!

imnasnainaec commented Apr 23, 2026

Uh oh!

Copilot AI commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Copilot AI commented Apr 10, 2026 • edited by jasonleenaylor Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

imnasnainaec commented Apr 23, 2026

Uh oh!

Copilot AI commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 10, 2026 •

edited by jasonleenaylor

Loading