Would it be possible possible to check for common columns in X and y after the recipe has been applied?
import pandas as pd
import ibis
import ibis_ml as ml
con = ibis.duckdb.connect()
df = pd.DataFrame({
'cat1': ['AA', 'BBB', 'AA', 'BBB', 'CCC'],
'cat2': ['X', 'Y', 'Y', 'X', 'Z'],
'value': [10, 20, 30, 40, 50]
})
tbl = con.create_table("tmp", df, overwrite=True)
tr_oe = ml.Recipe(
ml.OrdinalEncode(ml.string(), min_frequency=2),
ml.Drop("value")
).fit(tbl, tbl.value)
# ValueError: `X` and `y` must not share column names
Currently if X and y have common columns the error
ValueError:Xandymust not share column namesis thrown.Would it be possible possible to check for common columns in X and y after the recipe has been applied?
Given that
DropandSelectwould be there, It would make more sense to enforce no column columns after the pipeline has processed, not before.