[ENH] Auto-convert categorical columns to string in attributes_arff_from_df #1490
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Metadata
attributes_arff_from_dfDetails
What does this PR implement/fix?
This PR modifies
attributes_arff_from_dfto improve robustness when handling pandas DataFrames. Instead of immediately raising aValueErrorwhen encountering a categorical column with non-string values (e.g., integer-encoded categories), it now attempts to automatically convert the categories to strings.Why is this change necessary?
Currently, the library crashes if a user provides a DataFrame with valid data but integer-based categories (e.g.,
[0, 1]). This forces users to manually cast categories to strings before calling the function. This change improves the User Experience by handling this conversion gracefully under the hood.How can I reproduce the issue?
Create a DataFrame with integer categories and pass it to
attributes_arff_from_df.