feat: implement cronjob to export data from snowflake to s3 POC [CM-945] #3833
+2,117
−1,155
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new scheduled job to export data from Snowflake to S3 and significantly refactors the Snowflake client to support programmatic access token authentication in addition to key-pair authentication. It also updates dependencies to use the latest Snowflake SDK. The changes improve flexibility, security, and automation for Snowflake data exports.
Snowflake S3 Export Job:
snowflakeS3Export.job.ts) that exports batches of data from Snowflake to S3 as Parquet files, supporting incremental exports based on the last run timestamp, and tracks progress using Redis.Snowflake Client Refactor:
SnowflakeClientto support both programmatic access token and key-pair authentication, using a new configuration interface and dynamic connection options. [1] [2]fromTokenmethod toSnowflakeClientfor convenient instantiation using environment variables for token-based authentication.Dependency Updates:
snowflake-sdkdependency from version^1.14.0to^2.3.3to ensure compatibility with new features and authentication methods.@crowd/snowflakeas a dependency to the cron service to enable use of the new client and job.Note
Medium Risk
Touches Snowflake authentication/connection setup and introduces an automated data export pipeline that writes to S3 and emits to Kafka, so misconfiguration could impact data movement or job stability.
Overview
Adds a new
snowflake-s3-exportcron job that incrementally exportsEVENT_REGISTRATIONSfrom Snowflake to S3 as Parquet viaCOPY INTO, persists the last-run timestamp/manifest state in Redis, then reads the exported files back from S3 and emits mapped activities to Kafka.Refactors
SnowflakeClientto accept either key-pair or programmatic access token (PAT) authentication (including a newfromTokenhelper), and bumpssnowflake-sdkto^2.3.3with@crowd/snowflakeadded to the cron service dependencies.Written by Cursor Bugbot for commit b703935. This will update automatically on new commits. Configure here.