Skip to content

Conversation

@CristianLara
Copy link
Contributor

Pandas 3.0 introduces several breaking changes (https://pandas.pydata.org/docs/whatsnew/v3.0.0.html#other-api-changes) that required fixes across the codebase:

  1. StringDtype inference: Pandas 3.0 infers string columns to use StringDtype instead of object dtype. This breaks DataFrame comparisons since our Data class expects object dtype (defined in COLUMN_DATA_TYPES). Fixed by setting pd.options.future.infer_string = False in modules that construct DataFrames.

  2. Deprecated inplace on replace(): The inplace=True parameter on Series.replace() is deprecated. Changed to assignment pattern: df["col"] = df["col"].replace(...).

  3. read_json() no longer accepts strings: pd.read_json() now requires file paths or file-like objects, not raw JSON strings. Wrapped JSON strings with StringIO().

  4. Read-only arrays from DataFrame.to_numpy(): Arrays returned by to_numpy() are now read-only views. Added .copy() before passing to torch.from_numpy() which requires writable arrays.

  5. Test dtype mismatches: Tests comparing DataFrames failed because manually constructed expected DataFrames had different dtypes than production code. Fixed by wrapping expected DataFrames with Data(df=...).df to ensure consistent dtype casting.

Pandas 3.0 introduces several breaking changes (https://pandas.pydata.org/docs/whatsnew/v3.0.0.html#other-api-changes) that required fixes across the codebase:

1. StringDtype inference: Pandas 3.0 infers string columns to use StringDtype instead of object dtype. This
   breaks DataFrame comparisons since our Data class expects object dtype
   (defined in COLUMN_DATA_TYPES). Fixed by setting
   `pd.options.future.infer_string = False` in modules that construct DataFrames.

2. Deprecated inplace on replace(): The `inplace=True` parameter on
   Series.replace() is deprecated. Changed to assignment pattern:
   `df["col"] = df["col"].replace(...)`.

3. read_json() no longer accepts strings: pd.read_json() now requires file
   paths or file-like objects, not raw JSON strings. Wrapped JSON strings
   with StringIO().

4. Read-only arrays from DataFrame.to_numpy(): Arrays returned by to_numpy()
   are now read-only views. Added .copy() before passing to torch.from_numpy()
   which requires writable arrays.

5. Test dtype mismatches: Tests comparing DataFrames failed because manually
   constructed expected DataFrames had different dtypes than production code.
   Fixed by wrapping expected DataFrames with Data(df=...).df to ensure
   consistent dtype casting.
@meta-cla meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jan 29, 2026
@meta-codesync
Copy link

meta-codesync bot commented Jan 29, 2026

@CristianLara has imported this pull request. If you are a Meta employee, you can view this in D91825190.

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.74%. Comparing base (5190f4b) to head (2275a23).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4838      +/-   ##
==========================================
- Coverage   96.74%   96.74%   -0.01%     
==========================================
  Files         588      588              
  Lines       61526    61532       +6     
==========================================
+ Hits        59524    59527       +3     
- Misses       2002     2005       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants