You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 19, 2025. It is now read-only.
This would help for defining the sequence of variables to impute or synthesize. Something like this would fit well in other functions:
def most_predictable(df, base_cols, candidate_cols, algorithm):
""" Identifies the most predictable column from a set of base columns.
Args:
df: DataFrame with base and candidate columns.
base_cols: List of column names to predict from.
candidate_cols: List of column names to compare on predictability given base_cols.
algorithm: Algorithm for determining predictability.
Returns:
Column name from candidate_cols which is most predictable from base_cols.
"""
This could be done with something like correlations, or algorithms like random forests (after standardizing data, and the standardization technique might be another arg).
cc @rickecon, per our chat if you can take a stab at this that'd be awesome.