Encountered "AttributeError" when run "train_test_split(preprocessed_data, output_var, ..." after "..RandomOverSampler.."  




Good day. Need some help.

I encountered "AttributeError" when run "x_train_res, y_train_res = over_sample.fit_resample(x_train, y_train.ravel())" in

SUP-4: Classification /Handling Data Imbalance / Exercise/ Balancing the data .

The preceding code of  preprocess assignment, , RandomOverSampler  seems OK  & did not flag out error.

The code is included as below. Any idea what is the cause? 


preprocess = ColumnTransformer(
('standardscaler', StandardScaler(), num_features),
('onehotencoder', OneHotEncoder(), cat_features)

preprocessed_data = preprocess.fit_transform(input_data)

x_train, x_test, y_train, y_test = train_test_split(preprocessed_data, output_var, test_size=0.3, random_state=42)

1 over_sample = RandomOverSampler(random_state=0)
----> 2 x_train_res, y_train_res = over_sample.fit_resample(x_train, y_train.ravel())
3 print("After OverSampling, counts of label '1': {}".format(sum(y_train_res==1)))
4 print("After OverSampling, counts of label '0': {} n".format(sum(y_train_res==0)))

C:\ProgramData\Anaconda3\lib\site-packages\imblearn\ in fit_resample(self, X, y)
75 check_classification_targets(y)
76 arrays_transformer = ArraysTransformer(X, y)
---> 77 X, y, binarize_y = self._check_X_y(X, y)
79 self.sampling_strategy_ = check_sampling_strategy(

C:\ProgramData\Anaconda3\lib\site-packages\imblearn\over_sampling\ in _check_X_y(self, X, y)
77 def _check_X_y(self, X, y):
78 y, binarize_y = check_target_type(y, indicate_one_vs_all=True)
---> 79 X, y = self._validate_data(
80 X, y, reset=True, accept_sparse=["csr", "csc"], dtype=None,
81 force_all_finite=False,

AttributeError: 'RandomOverSampler' object has no attribute '_validate_data'

3 Answers


I encountered exactly the same problem.

I have anaconda on personal laptop. Without thinking, I went to the anaconda prompt and used:

conda install -c conda-forge imbalanced-learn

No error msg from this, but I get the same AttributeError as yours when I run the cell in the notebook.

Then I notice package was for Anaconda Cloud platform. I am not a technical person, but think I should use this instead at the prompt:

pip install -U imbalanced-learn

This time, the cell runs succesfully without AttributeError.

I am not sure what I had done wrong initially, and whether I should uninstall what was installed the first time round, and how this is to be done. Perhaps the technical friends amongst us can help advise. tks, ym.




I'm also facing the same problem as you. If you are using Jupyter Notebook via Anaconda, there is an issue with the version of SKlearn. The imbalanced-learn team on github is aware of this problem:

Unfortunately, the solutions I can think of are quite messy and the best way is to wait for Anaconda to update their SKlearn library. 



Thank you.

Will try.



