Fix pandas bug by QuanMPhm · Pull Request #290 · nerc-project/coldfront-plugin-cloud

QuanMPhm · 2026-01-28T17:19:20Z

While not entirely clear, it seems the recent Pandas relase (3.0.0)
changed read_csv() cast to pyarrow datatypes, causing an error.
Specifying the pyarrow engine seems to fix the issue

src/coldfront_plugin_cloud/management/commands/fetch_daily_billable_usage.py

knikolla

I believe the right way to go about this is to explicitly specify the engine.

It seems that the error points to an issue with the default engine c and switching to pyarrow fixes it.

df = pandas.read_csv(
    location,
    engine="pyarrow",
    dtype={INVOICE_COLUMN_COST: pandas.ArrowDtype(pyarrow.decimal128(12, 2))},
)

QuanMPhm · 2026-02-11T21:31:20Z

@knikolla I see. That makes sense. I'll make the change too on the invoicing code later today. Out of curiosity, how did you arrive at this solution? I didn't realize engine was an option, or solution to this, at least from the googling I did. The error stack trace referred to deep pandas internals that I didn't look closely into.

knikolla · 2026-02-11T21:48:54Z

@knikolla I see. That makes sense. I'll make the change too on the invoicing code later today. Out of curiosity, how did you arrive at this solution? I didn't realize engine was an option, or solution to this, at least from the googling I did. The error stack trace referred to deep pandas internals that I didn't look closely into.

@QuanMPhm For obscure internal errors that require digging into documentation, I find something like Gemini to be pretty helpful (50% of the time). I pasted the stack trace and it gave me the following information.

Then I verified that by reading the pandas docs with regards to the engine option and tested the code myself.

The error pyarrow.lib.ArrowInvalid: Got bytestring of length 8 (expected 16) is a bit of a "low-level" protest from Arrow.

In short: the pandas C engine (the default) is trying to pass data to the Arrow dtype, but they aren't speaking the same language. The C engine processes the CSV as strings/bytes, and when it hands those bytes to the Arrow decimal handler, the internal byte-length doesn't match what a decimal128 expects.

To fix this, you need to tell pandas to use the Arrow engine for the entire reading process, not just for the final data type.

The Fix: Switch the Engine
Add engine="pyarrow" to your read_csv call. This ensures that PyArrow handles the parsing from the very first byte.

knikolla · 2026-02-12T14:54:39Z

Going to trial having Copilot help with code reviews.

Copilot

Pull request overview

Adjusts invoice CSV ingestion in the daily billable usage management command to avoid a pandas 3.0 dtype-casting regression when reading cost values.

Changes:

Forces pandas.read_csv to use the pyarrow engine for invoice CSV parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/coldfront_plugin_cloud/management/commands/fetch_daily_billable_usage.py

As mentioned by Kristi[1], the better solution to the Pandas read_csv bug is to specify the engine as "pyarrow", rather than having the loading and casting step seperate. [1] nerc-project/coldfront-plugin-cloud#290 (review)

QuanMPhm · 2026-02-12T15:02:55Z

Interesting

QuanMPhm · 2026-02-12T15:04:42Z

@knikolla I'll let you resolve all the comments and merge.

While not entirely clear, it seems the recent Pandas relase (3.0.0) changed `read_csv()` cast to pyarrow datatypes, causing an error. Specifying the `pyarrow` engine seems to fix the issue Pinned pandas version to >=3.0, <4.0

As mentioned by Kristi[1], the better solution to the Pandas read_csv bug is to specify the engine as "pyarrow", rather than having the loading and casting step seperate. [1] nerc-project/coldfront-plugin-cloud#290 (review)

QuanMPhm requested review from jtriley, knikolla and naved001 January 28, 2026 17:19

QuanMPhm force-pushed the fix/pandas.3 branch 3 times, most recently from 61bd579 to 7054761 Compare January 28, 2026 18:02

knikolla approved these changes Feb 5, 2026

View reviewed changes

knikolla reviewed Feb 11, 2026

View reviewed changes

src/coldfront_plugin_cloud/management/commands/fetch_daily_billable_usage.py Outdated Show resolved Hide resolved

knikolla requested changes Feb 11, 2026

View reviewed changes

QuanMPhm force-pushed the fix/pandas.3 branch from 7054761 to 326dd45 Compare February 12, 2026 14:47

knikolla requested a review from Copilot February 12, 2026 14:54

Copilot started reviewing on behalf of knikolla February 12, 2026 14:54 View session

knikolla approved these changes Feb 12, 2026

View reviewed changes

Copilot AI reviewed Feb 12, 2026

View reviewed changes

QuanMPhm mentioned this pull request Feb 12, 2026

Fixed Pandas read_csv bug by specifying engine CCI-MOC/invoicing#262

Open

Fix compatibility issues with pandas 3

b508c66

While not entirely clear, it seems the recent Pandas relase (3.0.0) changed `read_csv()` cast to pyarrow datatypes, causing an error. Specifying the `pyarrow` engine seems to fix the issue Pinned pandas version to >=3.0, <4.0

QuanMPhm force-pushed the fix/pandas.3 branch from 326dd45 to b508c66 Compare February 12, 2026 15:19

knikolla merged commit aa758db into nerc-project:main Feb 12, 2026
4 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pandas bug#290

Fix pandas bug#290
knikolla merged 1 commit intonerc-project:mainfrom
QuanMPhm:fix/pandas.3

QuanMPhm commented Jan 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

knikolla left a comment

Uh oh!

QuanMPhm commented Feb 11, 2026

Uh oh!

knikolla commented Feb 11, 2026

Uh oh!

knikolla commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

QuanMPhm commented Feb 12, 2026

Uh oh!

QuanMPhm commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

QuanMPhm commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

knikolla left a comment

Choose a reason for hiding this comment

Uh oh!

QuanMPhm commented Feb 11, 2026

Uh oh!

knikolla commented Feb 11, 2026

Uh oh!

knikolla commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

QuanMPhm commented Feb 12, 2026

Uh oh!

QuanMPhm commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

QuanMPhm commented Jan 28, 2026 •

edited

Loading