Skip to content

πŸ› [firestore-bigquery-export] String partition values silently become NULL after 0.3.0 (regression vs 0.2.x)Β #2803

@kokky

Description

@kokky

[REQUIRED] Step 2: Describe your configuration

  • Extension name: firestore-bigquery-export
  • Extension version: 0.3.1 (regression appears to have been introduced in 0.3.0 / change-tracker 2.x β€” see analysis below)
  • Configuration values (redacted):
    • COLLECTION_PATH: myCollection
    • TIME_PARTITIONING: DAY
    • TIME_PARTITIONING_FIELD: order_week
    • TIME_PARTITIONING_FIELD_TYPE: DATE
    • TIME_PARTITIONING_FIRESTORE_FIELD: order_week

The Firestore field order_week holds an ISO 8601 date string (e.g. "2026-01-01"), and the BigQuery column type is DATE. This worked correctly under 0.2.x because BigQuery streaming insert implicitly casts ISO 8601 date strings to DATE.

[REQUIRED] Step 3: Describe the problem

After upgrading from 0.2.x to 0.3.1, the partition column is written as NULL for every row whose Firestore field is a string (including valid ISO 8601 date strings such as "2026-01-01"). All such rows now end up in the __NULL__ partition, which breaks partition pruning and existing analytics queries.

The change is not mentioned in the CHANGELOG:

  • 0.2.11: only documents chore: bump firestore-bigquery-change-tracker dependency to v2
  • 0.3.0: only documents partitioning config-validation changes and NONE / omit sentinel normalization

The CHANGELOG gives no hint that the runtime contract for partition field values has narrowed.

Steps to reproduce

  1. Create a Firestore collection where each document has a string ISO 8601 date field, e.g. { order_week: "2026-01-01" }
  2. Install firestore-bigquery-export 0.3.1 with:
    • TIME_PARTITIONING=DAY
    • TIME_PARTITIONING_FIELD=order_week
    • TIME_PARTITIONING_FIELD_TYPE=DATE
    • TIME_PARTITIONING_FIRESTORE_FIELD=order_week
  3. Write a document to the collection
  4. Inspect the row in <table>_raw_changelog
Expected result

Top-level order_week column equals 2026-01-01; the row lives in the 2026-01-01 DATE partition. This matches 0.2.x behaviour and is what BigQuery streaming insert supports natively.

Actual result

Top-level order_week column is NULL; the row lives in the __NULL__ partition. The data JSON column still contains the original order_week string, confirming the value was not lost upstream β€” only the partition column extraction drops it.

SELECT
  document_id,
  order_week,                                     -- top-level partition column: NULL
  JSON_VALUE(data, '$.order_week') AS data_order_week  -- still present
FROM `<project>.<dataset>.<table>_raw_changelog`
WHERE order_week IS NULL
ORDER BY timestamp DESC
LIMIT 5;

Possible cause (from reading the source)

In 0.2.x, Partitioning.getPartitionValue() appears to accept strings as-is via isValidPartitionTypeString, leaving the cast to BigQuery on streaming insert:

/* Return as Datetime value */
if (timePartitioningFieldType === PartitionFieldType.DATETIME) {
return BigQuery.datetime(fieldValue.toISOString()).value;
}
/* Return as Date value */
if (timePartitioningFieldType === PartitionFieldType.DATE) {
return BigQuery.date(fieldValue.toISOString().substring(0, 10)).value;
}
/* Return as Timestamp */
return BigQuery.timestamp(fieldValue).value;
}
/*
Extracts a valid Partition field from the Document Change Event.
Matches result based on a pre-defined Firestore field matching the event data object.
Return an empty object if no field name or value provided.
Returns empty object if not a string or timestamp (or result of serializing a timestamp)
Logs warning if not a valid datatype
Delete changes events have no data, return early as cannot partition on empty data.
**/
getPartitionValue(event: FirestoreDocumentChangeEvent) {
// When old data is disabled and the operation is delete
// the data and old data will be null
if (event.data == null && event.oldData == null) return {};
const firestoreFieldName = this.config.timePartitioningFirestoreField;
const fieldName = this.config.timePartitioningField;
const fieldValue =
event.operation === ChangeType.DELETE
? event.oldData[firestoreFieldName]
: event.data[firestoreFieldName];
if (!fieldName || !fieldValue) {
return {};
}
if (this.isValidPartitionTypeString(fieldValue)) {
return { [fieldName]: fieldValue };

In 0.3.x, after the partitioning refactor (#2447), PartitionValueConverter.convert() seems to only accept firebase.firestore.Timestamp, { _seconds, _nanoseconds }, or Date, and to return null for any other type β€” which would include ISO 8601 date strings:

https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/firestore-bigquery-change-tracker/src/bigquery/partitioning/converter.ts

getPartitionValue then omits the column, which would explain the NULL we observe in BigQuery:

https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/firestore-bigquery-change-tracker/src/bigquery/partitioning/index.ts

Related

Metadata

Metadata

Assignees

Labels

type: bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions