Build a set of regression models using advanced methods to predict the incident of violent crime. Focus on reducing the number of variables via subset selection and regularization.
The dependent variable, per capita violent crimes (VioCrime), was calculated using population and the sum of crime variables considered to be violent crimes in the United States. The actual values are the normalized version of violent crimes per 100,000 persons annually. The next paragraph describes the normalization process. Data is based on original values. All numeric data was normalized into the decimal range 0.00-1.00 using an Unsupervised, equal-interval binning method. Attributes retain their distribution and skew. Hence, for example, the population attribute has a mean value of 0.06 because most communities are small. For example, an attribute described as “mean people per household” is the normalized version of that variable on a 0-1 scale.