I wanted to see whether school districts are reporting arrests of students as required under Massachusetts state law. I also wanted to learn how to use R to join datasets, clean data and analyze large amounts of data.
I found that just one in 10 school districts and charter schools in Massachusetts reported arrests in the 2011-2022 school year. I reached out to several school districts and heard back from Lowell -- which had 6 arrests that school year, but only reported one arrest.
I used R to analyze data maintained by the Massachusetts Department of Education.
I used the following data sets for the state of Massachusetts:
School Safety and Discipline Report
I contacted the state to learn how to download data for the 2021-2022 school year - a needed step because the public-facing website database only has percentages and not numerators for statistics like the number of school-based arrests.
I learned R to go through the data, clean it, join 2018 and 2022 data for another analysis and create customized data frames so I could create filters. I figured out how to summarize and group by multiple conditions at once and spent a lot of time removing tables and recreating them until I got exactly what I wanted. I learned new techniques to replace NA values and filter out rows based on specific conditions.
A section about what new skills, approaches, etc you used, or where you grew the most during the project
I gained new skills in R and working in Github to find tools created by other users and modifying them to suit my own purposes. I have gained much more confidence in R after spending hours on this project, and have ;kkkk
A section about things you tried to do or wanted to do but did not have the skills/time (but if you have more time you might do)
It took many hours to gain confidence and expertise in R through trial and error.
Link to Github project: here