# any advise for the following questions

To the nearest 2 decimal place, what is the coefficient of determination (R2) value for the lower order model (i.e. Linear Regression) calculated for the training data?

How many rows are there in the original dataset (from ‘Telco_customer_churn.xlsx’)?

How many rows of data are missing from the TotalCharges columns?

What is the Precision score calculated on the test data (to the nearest 2 decimal places)?

What is the area under the curve (AUC) for the Receiver Operating Characteristics (ROC) graph when evaluated on the test data (to the nearest 2 decimal places)?

Using GridSearchCV, what the optimum `max_depth`

found for the Decision Tree model?

What was the optimum value found for the `max_features`

parameter in the Random Forest model?

Hello,

For the optimum value, you can try to set the GridSearch to search over a range of value. It is hard to give you the answer for the optimal max_depth and max_feature, as these will be dependent on the `random_state` that you set and other parameters that you used.

To find the number of rows, I believe you can do a simple df.head() or df.describe() to get the answer. Use df.isnull().sum() to get the total number of missing values. The df here refers to your dataset that you used pandas to read in as dataframe.

You can check the documentation here ( https://pandas.pydata.org/docs/getting_started/basics.html ) for more information.

To the nearest 2 decimal place, what is the coefficient of determination (R2) value for the lower order model (i.e. Linear Regression) calculated for the training data?

I would like to ask for guidance for the above question as well. This is my code, but the r2 answer derived seems to be wrong?

# TRAINING

##########

# Train/Test Split

x_train, x_test, y_train, y_test = train_test_split(features, output_var, test_size=0.3, random_state=42)# Train the pipeline

model.fit(x_train, y_train)# SCORING/EVALUATION

####################

# Fit the model on the test data

pred_train = model.predict(x_train)# Display the results of the metrics

rmse = np.sqrt(mean_squared_error(y_train, pred_train))

r2 = r2_score(y_train, pred_train)

print("Results on Test Data")

print("####################")

print("RMSE: {:.2f}".format(rmse))

print("R2 Score: {:.5f}".format(r2))

The r2 answer I had using the above code was 0.69 but apparently it is wrong. Any help here much appreciated!

@elsx

I had the same answer, but i realized the question is referring to the BiasVariance exercise. Just refer to the "Lower Order Model (i.e. model1)" for the correct r2.

Latest Post: LIB-3 Pandas Foundation Our newest member: Blessing Recent Posts Unread Posts Tags

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed