[SYNPY-1749]Allow quote, apostrophe and ellipsis in store_row_async by danlu1 · Pull Request #1316 · Sage-Bionetworks/synapsePythonClient

danlu1 · 2026-02-09T20:01:40Z

Problem:

A JSON serialization issue occurs when a DataFrame passed to store_row_async contains a list or dictionary with strings that include both double quotes and apostrophes.

Solution:

Add default value "escapechar": "\\" to to_csv_kwargs in store_rows_async
Add to_csv_kwargs to _stream_and_update_from_df so it can take the passed to_csv_kwargs values for downstream data processing.

Testing:

Unit test and integration test have been added.

andrewelamb · 2026-02-09T22:44:41Z

@danlu1 Is this still WIP, or are you looking for reviews?

danlu1 · 2026-02-10T00:47:02Z

@andrewelamb sorry I should have marked this a draft.

…ctly when upload data from a dataframe

…ger output json string

danlu1 · 2026-02-18T18:40:24Z

The integration test failures are in the recordset and submission modules and do not appear to be related to my changes.

linglp

I think overall it looks good. The tests can be consolidated a bit to test all the edge cases in fewer integration tests to improve performance, and the docstring can be updated to reflect the new state of the code since json.dumps() was removed. There's also some logic that can be simplified in the redundant checks where sample_values is created but never actually used. The function name could also be more descriptive of what it actually does now (sanitizing special values rather than just converting dtypes)

synapseclient/models/mixins/table_components.py

tests/integration/synapseclient/models/async/test_table_async.py

tests/integration/synapseclient/models/synchronous/test_table.py

synapseclient/models/mixins/table_components.py

synapseclient/core/upload/upload_utils.py

BryanFauble

This is looking great, once we get the last few items handled (and develop merged in), I can approve!

BryanFauble

Nice work on this fix -- the backslash-escaping approach for embedded quotes makes sense, and the Ellipsis/pd.NA handling is solid. I flagged one issue that I think needs to be addressed before merge, plus a few cleanup nits.

Note: This comment was drafted with AI assistance and reviewed by me for accuracy.

synapseclient/models/mixins/table_components.py

tests/integration/synapseclient/models/async/test_table_async.py

tests/unit/synapseclient/mixins/unit_test_table_components.py

…v_kwargs

linglp

Hi @danlu1 ! Thanks for your hard work. I think I found two bugs after a few rounds of testing myself:

Bug 1: Nested np.nan not handled

df = pd.DataFrame({"val": [[1.0, np.nan], [2.0, 3.0]]})

The nested np.nan would currently pass through to JSON serialization, which could cause issues since np.nan is not valid JSON.

This happened becauseconvert_dtypes() only converts top-level np.nan to pd.NA but nested np.nan would remain unchanged. And even though in your code, you have:

if obj is pd.NA:

But this won't handle np.nan. The fix would just be:

def _reformat_special_values(obj):
    if pd.isna(obj):      # Catches pd.NA, np.nan, and None
        return None

Bug 2: Top-level missing values not handled

def _serialize_json_value(x):
    if isinstance(x, (list, dict)):
        # pd.NA handling is only here, inside _reformat_special_values
        ...
    if x is ...:
        return "..."
    return x  # <-- Top-level pd.NA, np.nan, None just pass through

Top-level pd.NA (or np.nan) isn't converted to None. It only works now because .replace({pd.NA: None}) is called later.

linglp · 2026-03-13T14:56:21Z

synapseclient/models/mixins/table_components.py

-        df[col] = (
-            df[col].replace({pd.NA: None}).astype(object)
-        )  # this will convert the int64 and float64 columns to object columns
+        sample_values = df[col].dropna()


dropna() + len() check is probably unnecessary.
Since apply() works fine on empty series and the function handles nulls, we could just apply the function directly:

for col in df.columns: df[col] = df[col].apply(_serialize_json_value)

I tried ^this and all the unit tests could pass.

linglp · 2026-03-13T15:41:43Z

synapseclient/models/mixins/table_components.py

+            df[col] = df[col].apply(_serialize_json_value)
+            # restore the original values of the column especially for the int64 and float64 columns since apply function changes the dtype
+            df[col] = df[col].convert_dtypes()
+            df[col] = df[col].replace({pd.NA: None}).astype(object)


the .replace({pd.NA: None}) is doing two things:

Catching top-level pd.NA that _serialize_json_value missed

Converting any pd.NA reintroduced by convert_dtypes()

I think it might be cleaner to have _serialize_json_value handles top-level NaN like this:

def _serialize_json_value(x): if pd.isna(x): # Handle top-level NA return None

As an example, the pd.NA here in the dataframe is handled by df[col].replace({pd.NA: None}) rather than _serialize_json_value.

df = pd.DataFrame({ "name": ["Alice", pd.NA, "Charlie"], "age": [25, 30, pd.NA] }) df = convert_dtypes_to_json_serializable(df2)

I am not sure if this is intended.

linglp · 2026-03-13T15:45:53Z

synapseclient/models/mixins/table_components.py

+                }
+
+            ],
        }).convert_dtypes()


do we still need to call .convert_dtypes() here? since convert_dtypes_to_json_serializable also calls convert_dtypes in one of the steps?

linglp · 2026-03-13T17:01:00Z

synapseclient/models/mixins/table_components.py

+            def _reformat_special_values(obj):
+                if obj is ...:
+                    return "..."
+                if obj is pd.NA:


I tried:

df = pd.DataFrame({"val": [[1.0, np.nan], [3.0, pd.NA], [None, None]]}) convert_dtypes_to_json_serializable(df) print(df)

and then I saw:

val 0 [1.0, nan] 1 [3.0, None] 2 [None, None]

As you could see, np.nan would remain as nan. Should this be changed to if pd.isna(obj) to catch all different representations of None, including `pd.NA, np.nan, and None?

reformat script

3feba11

danlu1 requested a review from a team as a code owner February 9, 2026 20:01

danlu1 added 3 commits February 9, 2026 16:23

reorganize code to ensure row columns remain int

1c68dac

add unit test for convert_dtypes_to_json_serializable

4a29a16

correct unit for datetime64

3ecb6ec

danlu1 marked this pull request as draft February 10, 2026 00:48

danlu1 added 8 commits February 9, 2026 18:08

remove the unwanted code

af989c0

revert changes in test_csv_to_pandas_df_with_date_columns

4d06d3a

update doctrings

e1b20dc

add integration test for store_rows

7ef7110

add to_csv kwargs to ensure double quote and apostophe formated corre…

a4913a6

…ctly when upload data from a dataframe

remove json string dumps function to let synapse decode data directly

98689d3

update unit test since the convert_dtypes_to_json_serializable no lon…

a0af1b6

…ger output json string

update integration test as no json string need to be generated

5002bd6

danlu1 marked this pull request as ready for review February 18, 2026 18:36

linglp requested changes Feb 20, 2026

View reviewed changes

danlu1 added 2 commits February 23, 2026 16:09

remvoe unwanted code

c874fe4

simplify test cases

dab80f0

danlu1 requested a review from linglp February 24, 2026 00:15

linglp reviewed Feb 25, 2026

View reviewed changes

tests/integration/synapseclient/models/synchronous/test_table.py Outdated Show resolved Hide resolved

linglp reviewed Feb 25, 2026

View reviewed changes

synapseclient/models/mixins/table_components.py Outdated Show resolved Hide resolved

linglp reviewed Feb 25, 2026

View reviewed changes

synapseclient/models/mixins/table_components.py Outdated Show resolved Hide resolved

BryanFauble reviewed Feb 26, 2026

View reviewed changes

synapseclient/core/upload/upload_utils.py Outdated Show resolved Hide resolved

BryanFauble requested changes Feb 26, 2026

View reviewed changes

BryanFauble requested changes Mar 5, 2026

View reviewed changes

danlu1 added 2 commits March 9, 2026 10:04

merge develop branch changes

8644201

add to_csv_kwargs to store_rows function for pandas dataframe

3412534

danlu1 requested review from BryanFauble and linglp March 9, 2026 19:03

danlu1 added 3 commits March 9, 2026 12:26

add default to_csv_kwargs for store_row_async

7db7c85

set escapechar default value in store_rows_async

8a75043

add notes to ensure escapechar is set correctly if using custom to_cs…

d00b30b

…v_kwargs

linglp requested changes Mar 13, 2026

View reviewed changes

Conversation

danlu1 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

Solution:

Testing:

Uh oh!

andrewelamb commented Feb 9, 2026

Uh oh!

danlu1 commented Feb 10, 2026

Uh oh!

danlu1 commented Feb 18, 2026

Uh oh!

linglp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BryanFauble left a comment

Choose a reason for hiding this comment

Uh oh!

BryanFauble left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linglp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

linglp Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

linglp Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

linglp Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

linglp Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

linglp Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danlu1 commented Feb 9, 2026 •

edited

Loading

linglp left a comment •

edited

Loading

linglp Mar 13, 2026 •

edited

Loading