Help for SUP-2: Regression  



Hi all,  

I am doing a EDA on “Ames Housing” dataset.


I have some troubles in solving these questions on data.

# For the numerical columns, what does the distributions look like?
# How are the various attributes correlated to the outcome variable?
# What visualizations can you use to highlight outliers in the data?


The housing dataset has 38 numerical columns including the predictive value of sales price.

I use scatter matrix to see the distribution. But it takes too long to load. Am I on the right track?

Do I use covariance matrix to check which attributes are correlated to sales price?

Do I plot scatter plot for each attribute to sales price to see outliers?

Any efficient way in solving these questions?

Thank you.



