Blake Meschke, Corey Tabor, Isaak Reyes, Norberto Huerta, and Dr. Ivona Grzegorczyk
When working with a regression problem an issue that often arises is having an abundance of data that hinders our model. A dataset may include a large set of variables, but some of the variables do not contain much of the necessary information. One of the solutions to this problem is implementing an algorithm called Principal Component Analysis, this algorithm reduces the dimensions of our data set. Principal Component Analysis helps us in visualizing the main component in our data set and improves our modeling results. We are using data from a Kaggle competition that includes a training and testing data set, with 24 variables included in each data set. Our goal is to take an extensive look at the process of Principal Component Analysis and how it preforms. We will do this by comparing the modelling results when using Principal Component Analysis and when not using Principal Component Analysis on the Kaggle dataset.