Skip to content

Query: train data shape and integrate extra datasets #3

@peterthorpe5

Description

@peterthorpe5

Hello Anomaly Detection Team,
I’m currently testing the workflow using a single plate, following the instructions and your paper. However, I foresee some issues and would appreciate your advice:

  1. Very few training rows: After splitting controls (train/val/test), I only have 16 rows for training. This seems low - do you expect reasonable results with so few controls? What is a typical shape of your data here? Mine is currently:

    INFO: Train controls: (6, 391), Validation controls: (1, 391), Test controls: (9, 391), Treatments: (368, 391)

  2. Shape consistency: I assume the number of columns (features) must be identical between training and inference (as with CLIPn/PyTorch workflows), correct? Otherwise pythorch complains?

  3. Multi-plate datasets: If I want to use more than one plate, what is your suggested approach? Should I standardise features on each plate using StandardScaler before concatenating to reduce batch effects? This would integrate more data for training/ validation and then I could include other reference datasets … Is this how you approach this?4.
    Any guidance or best practices for combining multiple plates and increasing training data would be really helpful.
    Thanks for your time!

Peter Thorpe

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions