This is a submission for the Gitcoin Bounty here. I only dabble in the data science world, and this was my first real shot at putting together a comprehensive analysis. It's definitely not that mind-blowing or complex, but I enjoyed putting it together. It revealed to me that I have a lot of learning to do in different statistical analysis tools and approaches, but that this can be rewarding work.
See the notebook here. It seems that GitHub won't render the pandas_profiling analysis on their repo, so download the notebook and run it locally if you want to see it. It gives a lot of great overviews for the data.
See the post on gov.gitcoin.co here
- Use Plotly which has a lot more features and slick interface. I discovered it late in this process, and turns out it's not easy to just drop in as a replacement for the
pandas
default plotting setup. - Refresh and improve my statistics knowledge.
- Improve my
pandas
skills. They definitely got better through this process, but there are certain parts that I did quite naively that I'm sure could have been accomplished with better syntax and performance.