Forum

Notifications
Clear all

any advise for the following questions  

   RSS

0

To the nearest 2 decimal place, what is the coefficient of determination (R2) value for the lower order model (i.e. Linear Regression)  calculated for the training data?

 

How many rows are there in the original dataset (from ‘Telco_customer_churn.xlsx’)?

 

 

How many rows of data are missing from the TotalCharges columns?

 

What is the Precision score calculated on the test data (to the nearest 2 decimal places)?

 

 

What is the area under the curve (AUC) for the Receiver Operating Characteristics (ROC) graph when evaluated on the test data (to the nearest 2 decimal places)?

 

 

Using GridSearchCV, what the optimum max_depth found for the Decision Tree model?

 

 

What was the optimum value found for the max_features parameter in the Random Forest model?

3 Answers
0

Hello,

For the optimum value, you can try to set the GridSearch to search over a range of value. It is hard to give you the answer for the optimal max_depth and max_feature, as these will be dependent on the `random_state` that you set and other parameters that you used.

To find the number of rows, I believe you can do a simple df.head() or df.describe() to get the answer. Use df.isnull().sum() to get the total number of missing values. The df here refers to your dataset that you used pandas to read in as dataframe.

You can check the documentation here ( https://pandas.pydata.org/docs/getting_started/basics.html ) for more information.

 

This post was modified 2 months ago 2 times by Lim Tern Poh

@tplim

thanks. I managed to obtain the answers

0

To the nearest 2 decimal place, what is the coefficient of determination (R2) value for the lower order model (i.e. Linear Regression)  calculated for the training data?

I would like to ask for guidance for the above question as well. This is my code, but the r2 answer derived seems to be wrong?

# TRAINING
##########
# Train/Test Split
x_train, x_test, y_train, y_test = train_test_split(features, output_var, test_size=0.3, random_state=42)

# Train the pipeline
model.fit(x_train, y_train)

# SCORING/EVALUATION
####################
# Fit the model on the test data
pred_train = model.predict(x_train)

# Display the results of the metrics
rmse = np.sqrt(mean_squared_error(y_train, pred_train))
r2 = r2_score(y_train, pred_train)
print("Results on Test Data")
print("####################")
print("RMSE: {:.2f}".format(rmse))
print("R2 Score: {:.5f}".format(r2))

0

The r2 answer I had using the above code was 0.69 but apparently it is wrong. Any help here much appreciated!

@elsx
I had the same answer, but i realized the question is referring to the BiasVariance exercise. Just refer to the "Lower Order Model (i.e. model1)" for the correct r2.

THANKS!

Share:

Delete your account