Skip to content

Conversation

@alphaleporus
Copy link

Metadata

Details

What does this PR implement/fix?
This PR modifies attributes_arff_from_df to improve robustness when handling pandas DataFrames. Instead of immediately raising a ValueError when encountering a categorical column with non-string values (e.g., integer-encoded categories), it now attempts to automatically convert the categories to strings.

Why is this change necessary?
Currently, the library crashes if a user provides a DataFrame with valid data but integer-based categories (e.g., [0, 1]). This forces users to manually cast categories to strings before calling the function. This change improves the User Experience by handling this conversion gracefully under the hood.

How can I reproduce the issue?
Create a DataFrame with integer categories and pass it to attributes_arff_from_df.

df = pd.DataFrame({"target": [0, 1]})
df["target"] = df["target"].astype("category")
# Before this PR: Raises ValueError
# After this PR: Automatically converts to string categories and succeeds
ds_funcs.attributes_arff_from_df(df)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] Automatically convert non-string categorical data in attributes_arff_from_df

1 participant