Function to identify variable that can be best predicted from a set of base variables

This would help for defining the sequence of variables to impute or synthesize. Something like this would fit well in other functions:
```
def most_predictable(df, base_cols, candidate_cols, algorithm):
    """ Identifies the most predictable column from a set of base columns.
    
    Args:
        df: DataFrame with base and candidate columns.
        base_cols: List of column names to predict from.
        candidate_cols: List of column names to compare on predictability given base_cols.
        algorithm: Algorithm for determining predictability.

    Returns:
        Column name from candidate_cols which is most predictable from base_cols.
    """
```

This could be done with something like correlations, or algorithms like random forests (after standardizing data, and the standardization technique might be another arg).

cc @rickecon, per our chat if you can take a stab at this that'd be awesome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function to identify variable that can be best predicted from a set of base variables #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Function to identify variable that can be best predicted from a set of base variables #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions