Forum

Notifications
Clear all

[Solved] Value Error: Classification exercise 2 (classification using KNN)  

   RSS

0

Hi,

I am currently at exercise 2 for classification using KNN. This is the one that we need to work on Ex_KnnClassification_start.ipynb after producing the telco_churn.csv from the previous section.

I am able to run the code up till line 16 of the solution code but I am unable to run the following:

# Train the pipeline. You can add a semi-colon (';') at the end of the line to supresses the output printing
model.fit(x_train, y_train)


I received a value error both in my work as well as the solution provided. Can someone explain to me why? I dont see anyone else having the same problem in the forum though

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
      1 # Train the pipeline. You can add a semi-colon (';') at the end of the line to supresses the output printing
----> 2 model.fit(x_train, y_train)

~\anaconda3\lib\site-packages\sklearn\pipeline.py in fit(self, X, y, **fit_params)
    333             if self._final_estimator != 'passthrough':
    334                 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
--> 335                 self._final_estimator.fit(Xt, y, **fit_params_last_step)
    336 
    337         return self

~\anaconda3\lib\site-packages\sklearn\neighbors\_base.py in fit(self, X, y)
   1130         if not isinstance(X, (KDTree, BallTree)):
   1131             X, y = self._validate_data(X, y, accept_sparse="csr",
-> 1132                                        multi_output=True)
   1133 
   1134         if y.ndim == 1 or y.ndim == 2 and y.shape[1] == 1:

~\anaconda3\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    430                 y = check_array(y, **check_y_params)
    431             else:
--> 432                 X, y = check_X_y(X, y, **check_params)
    433             out = X, y
    434 

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     71                           FutureWarning)
     72         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73         return f(**kwargs)
     74     return inner_f
     75 

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    801                     ensure_min_samples=ensure_min_samples,
    802                     ensure_min_features=ensure_min_features,
--> 803                     estimator=estimator)
    804     if multi_output:
    805         y = check_array(y, accept_sparse='csr', force_all_finite=True,

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     71                           FutureWarning)
     72         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73         return f(**kwargs)
     74     return inner_f
     75 

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    644         if force_all_finite:
    645             _assert_all_finite(array,
--> 646                                allow_nan=force_all_finite == 'allow-nan')
    647 
    648     if ensure_min_samples > 0:

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
     98                     msg_err.format
     99                     (type_err,
--> 100                      msg_dtype if msg_dtype is not None else X.dtype)
    101             )
    102     # for object dtype data, we only check for NaNs (GH-13254)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').




1 Answer
1

Hi @foamling.

I have just verified that the solution code does work.

Here is the telco_churn.csv that I used, which was generated from the solution notebook for exercise 1. My guess is that you didn't remove the NaNs in the excel file before converting it to csv, and then sklearn rejected the NaN. Hope this helps!

 

Hi @siowy,

Thank you so much for your response. You are right, I failed to run one of the cell in the notebook for exercise 1 that replace "No phone service" with "No", resulting in NaN figures in the data file.

Share:

Delete your account