Query: train data shape and integrate extra datasets

Hello Anomaly Detection Team, 
I’m currently testing the workflow using a single plate, following the instructions and your paper. However, I foresee some issues and would appreciate your advice:

1.	Very few training rows: After splitting controls (train/val/test), I only have 16 rows for training. This seems low - do you expect reasonable results with so few controls? What is a typical shape of your data here?  Mine is currently:  

    INFO: Train controls: (6, 391), Validation controls: (1, 391), Test controls: (9, 391), Treatments: (368, 391)


2.	Shape consistency: I assume the number of columns (features) must be identical between training and inference (as with CLIPn/PyTorch workflows), correct? Otherwise pythorch complains?

4.	Multi-plate datasets: If I want to use more than one plate, what is your suggested approach? Should I standardise features on each plate using StandardScaler before concatenating to reduce batch effects? This would integrate more data for training/ validation and then I could include other reference datasets … Is this how you approach this?4.
Any guidance or best practices for combining multiple plates and increasing training data would be really helpful.
Thanks for your time!

Peter Thorpe


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query: train data shape and integrate extra datasets #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Query: train data shape and integrate extra datasets #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions