-
Notifications
You must be signed in to change notification settings - Fork 155
Description
Over at MLJFlux, @tiemvanderdeure has pointed out the following issue that is actually MLJ generic.
As the example below shows, a user presenting a table for training a model cannot present new data for prediction with a different ordering of the table columns:
N = 1000
X = (x1 = rand(Float32, N), x2 = randn(Float32, N), x3 = categorical(rand('a':'c', N)))
y = categorical(bitrand(N))
model = MLJFlux.NeuralNetworkBinaryClassifier(epochs = 10, builder=MLJFlux.MLP(; hidden=(5,4)), batch_size = 100)
mach = machine(model, X, y)
fit!(mach)
# this errors
predict(mach, (x3 = X.x3, x1 = X.x1, x2 = X.x2))
# this is false!
all(predict(mach, (x2 = X.x2, x1 = X.x1, x3 = X.x3)) .≈ predict(mach, X))Here is my response from the original post:
Mmm. I think this kind of implicit assumption - that the columns of tables are ordered, and that they be presented in a consistent order, is everywhere in MLJ, and probably elsewhere. [Transferring this issue to MLJ].
One could either try to allow tables to be presented in any column order, or throw a warning when the original order is violated. Personally, I think the latter would be sufficient. If MLJ had a generic data-front end for dealing with tables, apart from Tables.matrix which dumps the feature names, this could be an easy fix either way. But a lot of interfaces just don't save the feature names.
I'd support some kind of resolution, but it's a big ask to adapt across the ecosystem.