Skip to content Skip to sidebar Skip to footer

Sklearn Logistic Regression Valueerror: X Has 42 Features Per Sample; Expecting 1423

I'm stuck trying to fix an issue. Here is what I'm trying to do : I'd like to predict missing values (Nan) (categorical one) using logistic regression. Here is my code : df_1 : my

Solution 1:

Rule of thumb is to never use pandas.get_dummies on multiple dataframe. It does not guarantee you the same dimension.

import pandas as pd

print(pd.get_dummies(['a', 'b', 'c']))
   ab  c
010010102001

print(pd.get_dummies(['b', 'c']))
   b  c
010101

It is only safe if you do pandas.get_dummiesfirst then divide into x_train and x_test. But instead, you can use sklearn.preprocessing.OneHotEncoder:

import numpy as np
from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder(sparse=False)

ohe.fit_transform(np.reshape(['a', 'b', 'c'], (-1, 1)))

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

ohe.transform(np.reshape(['b', 'c'], (-1, 1))) # Its transform, NOT fit_transform
array([[0., 1., 0.],
       [0., 0., 1.]])

Notice that now it properly asserts two different inputs result in the same number of columns.

Post a Comment for "Sklearn Logistic Regression Valueerror: X Has 42 Features Per Sample; Expecting 1423"