Help for SUP-2: Regression
I am doing a EDA on “Ames Housing” dataset.
I have some troubles in solving these questions on data.
# For the numerical columns, what does the distributions look like?
# How are the various attributes correlated to the outcome variable?
# What visualizations can you use to highlight outliers in the data?
The housing dataset https://www.kaggle.com/c/home-data-for-ml-course/overview has 38 numerical columns including the predictive value of sales price.
I use scatter matrix to see the distribution. But it takes too long to load. Am I on the right track?
Do I use covariance matrix to check which attributes are correlated to sales price?
Do I plot scatter plot for each attribute to sales price to see outliers?
Any efficient way in solving these questions?