Forum

Notifications
Clear all

Help for SUP-2: Regression  

   RSS

0

Hi all,  

I am doing a EDA on “Ames Housing” dataset.

 

I have some troubles in solving these questions on data.

# For the numerical columns, what does the distributions look like?
# How are the various attributes correlated to the outcome variable?
# What visualizations can you use to highlight outliers in the data?

 

The housing dataset https://www.kaggle.com/c/home-data-for-ml-course/overview has 38 numerical columns including the predictive value of sales price.

I use scatter matrix to see the distribution. But it takes too long to load. Am I on the right track?

Do I use covariance matrix to check which attributes are correlated to sales price?

Do I plot scatter plot for each attribute to sales price to see outliers?

Any efficient way in solving these questions?

Thank you.

Aaron

Share:

Delete your account