Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a2-timconnors33-Tim-Connors #26

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added Excel/Excel.xlsx
Binary file not shown.
Binary file added Google_Sheets/Sheets.pdf
Binary file not shown.
Binary file added Google_Sheets/Sheets.xlsx
Binary file not shown.
98 changes: 98 additions & 0 deletions Python_pyplot/cars-sample.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
"","Car","Manufacturer","MPG","Cylinders","Displacement","Horsepower","Weight","Acceleration","Model.Year","Origin"
"5","torino","ford",17,8,302,140,3449,10.5,70,"American"
"6","galaxie 500","ford",15,8,429,198,4341,10,70,"American"
"13","torino (sw)","ford",NA,8,351,153,4034,11,70,"American"
"18","mustang boss 302","ford",NA,8,302,140,3353,8,70,"American"
"21","corona mark ii","toyota",24,4,113,95,2372,15,70,"Japanese"
"24","maverick","ford",21,6,200,85,2587,16,70,"American"
"30","2002","bmw",26,4,121,113,2234,12.5,70,"European"
"32","f250","ford",10,8,360,215,4615,14,70,"American"
"38","corona","toyota",25,4,113,95,2228,14,71,"Japanese"
"39","pinto","ford",25,4,98,NA,2046,19,71,"American"
"44","torino 500","ford",19,6,250,88,3302,15.5,71,"American"
"48","galaxie 500","ford",14,8,351,153,4154,13.5,71,"American"
"51","country squire (sw)","ford",13,8,400,170,4746,12,71,"American"
"56","mustang","ford",18,6,250,88,3139,14.5,71,"American"
"61","corolla 1200","toyota",31,4,71,65,1773,19,71,"Japanese"
"65","corona hardtop","toyota",24,4,113,95,2278,15.5,72,"Japanese"
"69","pinto runabout","ford",21,4,122,86,2226,16.5,72,"American"
"73","galaxie 500","ford",14,8,351,153,4129,13,72,"American"
"82","gran torino (sw)","ford",13,8,302,140,4294,16,72,"American"
"88","pinto (sw)","ford",22,4,122,86,2395,16,72,"American"
"90","corona mark ii (sw)","toyota",23,4,120,97,2506,14.5,72,"Japanese"
"92","corolla 1600 (sw)","toyota",27,4,97,88,2100,16.5,72,"Japanese"
"96","gran torino","ford",14,8,302,137,4042,14.5,73,"American"
"100","ltd","ford",13,8,351,158,4363,13,73,"American"
"108","maverick","ford",18,6,250,88,3021,16.5,73,"American"
"112","country","ford",12,8,400,167,4906,12.5,73,"American"
"116","carina","toyota",20,4,97,88,2279,19,73,"Japanese"
"120","pinto","ford",19,4,122,85,2310,18.5,73,"American"
"131","mark ii","toyota",20,6,156,122,2807,13.5,73,"Japanese"
"134","maverick","ford",21,6,200,NA,2875,17,74,"American"
"138","pinto","ford",26,4,122,80,2451,16.5,74,"American"
"139","corolla 1200","toyota",32,4,71,65,1836,21,74,"Japanese"
"144","gran torino","ford",16,8,302,140,4141,14,74,"American"
"147","gran torino (sw)","ford",14,8,302,140,4638,16,74,"American"
"152","corona","toyota",31,4,76,52,1649,16.5,74,"Japanese"
"157","civic","honda",24,4,120,97,2489,15,74,"Japanese"
"163","maverick","ford",15,6,250,72,3158,19.5,75,"American"
"167","ltd","ford",14,8,351,148,4657,13.5,75,"American"
"174","mustang ii","ford",13,8,302,129,3169,12,75,"American"
"175","corolla","toyota",29,4,97,75,2171,16,75,"Japanese"
"176","pinto","ford",23,4,140,83,2639,17,75,"American"
"179","corona","toyota",24,4,134,96,2702,13.5,75,"Japanese"
"182","pinto","ford",18,6,171,97,2984,14.5,75,"American"
"189","civic cvcc","honda",33,4,91,53,1795,17.5,75,"Japanese"
"198","gran torino","ford",14.5,8,351,152,4215,12.8,76,"American"
"201","maverick","ford",24,6,200,81,3012,17.6,76,"American"
"206","civic","honda",33,4,91,53,1795,17.4,76,"Japanese"
"208","granada ghia","ford",18,6,250,78,3574,21,76,"American"
"213","corolla","toyota",28,4,97,75,2155,16.4,76,"Japanese"
"214","pinto","ford",26.5,4,140,72,2565,13.6,76,"American"
"218","mark ii","toyota",19,6,156,108,2930,15.5,76,"Japanese"
"219","280s","mercedes",16.5,6,168,120,3820,16.7,76,"European"
"222","f108","ford",13,8,302,130,3870,15,76,"American"
"224","accord cvcc","honda",31.5,4,98,68,2045,18.5,77,"Japanese"
"236","granada","ford",18.5,6,250,98,3525,19,77,"American"
"240","thunderbird","ford",16,8,351,149,4335,14.5,77,"American"
"243","corolla liftback","toyota",26,4,97,75,2265,18.2,77,"Japanese"
"244","mustang ii 2+2","ford",25.5,4,140,89,2755,15.8,77,"American"
"250","320i","bmw",21.5,4,121,110,2600,12.8,77,"European"
"253","fiesta","ford",36.1,4,98,66,1800,14.4,78,"American"
"256","civic cvcc","honda",36.1,4,91,60,1800,16.4,78,"Japanese"
"262","fairmont (auto)","ford",20.2,6,200,85,2965,15.8,78,"American"
"263","fairmont (man)","ford",25.1,4,140,88,2720,15.4,78,"American"
"272","futura","ford",18.1,8,302,139,3205,11.2,78,"American"
"275","corona","toyota",27.5,4,134,95,2560,14.2,78,"Japanese"
"278","celica gt liftback","toyota",21.1,4,134,95,2515,14.8,78,"Japanese"
"287","accord lx","honda",29.5,4,98,68,2135,16.6,78,"Japanese"
"290","fairmont 4","ford",22.3,4,140,88,2890,17.3,79,"American"
"294","ltd landau","ford",17.6,8,302,129,3725,13.4,79,"American"
"298","country squire (sw)","ford",15.5,8,351,142,4054,14.3,79,"American"
"305","300d","mercedes",25.4,5,183,77,3530,20.1,79,"European"
"318","corolla tercel","toyota",38.1,4,89,60,1968,18.8,80,"Japanese"
"322","fairmont","ford",26.4,4,140,88,2870,18.1,80,"American"
"326","corona liftback","toyota",29.8,4,134,90,2711,15.5,80,"Japanese"
"329","corolla","toyota",32.2,4,108,75,2265,15.2,80,"Japanese"
"336","240d","mercedes",30,4,146,67,3250,21.8,80,"European"
"337","civic 1500 gl","honda",44.6,4,91,67,1850,13.8,80,"Japanese"
"344","mustang cobra","ford",23.6,4,140,NA,2905,14.3,80,"American"
"345","accord","honda",32.4,4,107,72,2290,17,80,"Japanese"
"351","starlet","toyota",39.1,4,79,58,1755,16.9,81,"Japanese"
"353","civic 1300","honda",35.1,4,81,60,1760,16.1,81,"Japanese"
"356","tercel","toyota",37.7,4,89,62,2050,17.3,81,"Japanese"
"359","escort 4w","ford",34.4,4,98,65,2045,16.2,81,"American"
"360","escort 2h","ford",29.9,4,98,65,2380,20.7,81,"American"
"363","prelude","honda",33.7,4,107,75,2210,14.4,81,"Japanese"
"364","corolla","toyota",32.4,4,108,75,2350,16.8,81,"Japanese"
"370","cressida","toyota",25.4,6,168,116,2900,12.6,81,"Japanese"
"374","granada gl","ford",20.2,6,200,88,3060,17.1,81,"American"
"382","fairmont futura","ford",24,4,140,92,2865,16.4,82,"American"
"390","accord","honda",36,4,107,75,2205,14.5,82,"Japanese"
"391","corolla","toyota",34,4,108,70,2245,16.9,82,"Japanese"
"392","civic","honda",38,4,91,67,1965,15,82,"Japanese"
"393","civic (auto)","honda",32,4,91,67,1965,15.7,82,"Japanese"
"398","granada l","ford",22,6,232,112,2835,14.7,82,"American"
"399","celica gt","toyota",32,4,144,96,2665,13.9,82,"Japanese"
"402","mustang gl","ford",27,4,140,86,2790,15.6,82,"American"
"405","ranger","ford",28,4,120,79,2625,18.6,82,"American"
25 changes: 25 additions & 0 deletions Python_pyplot/pythonViz.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import pandas as pd
import matplotlib.pyplot as plt
import mplcursors

# https://saralgyaan.com/posts/matplotlib-tutorial-in-python-chapter-6-scatter-plotting/

carData = pd.read_csv('cars-sample.csv')
manufacturer = carData['Manufacturer']
mpg = carData['MPG']
weight = carData['Weight']

# https://stackoverflow.com/questions/26139423/plot-different-color-for-different-categorical-levels-using-matplotlib
colors = {"bmw": "orange", "ford": "yellow", "honda": "green",
"mercedes": "cyan", "toyota": "pink"}

plt.scatter(weight, mpg, alpha=0.5, s=weight/10, c=manufacturer.map(colors))
plt.xlabel("Weight")
plt.ylabel("MPG")

# https://stackoverflow.com/questions/7908636/how-to-add-hovering-annotations-in-matplotlib
mplcursors.cursor(hover=True)

plt.show()


202 changes: 62 additions & 140 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,142 +1,64 @@
# 02-DataVis-5ways

Assignment 2 - Data Visualization, 5 Ways
===

Now that you have successfully made a "visualization" of shapes and lines using d3, your next assignment is to successfully make a *actual visualization*... 5 times.

The goal of this project is to gain experience with as many data visualization libraries, languages, and tools as possible.

I have provided a small dataset about cars, `cars-sample.csv`.
Each row contains a car and several variables about it, including miles-per-gallon, manufacturer, and more.

Your goal is to use 5 different tools to make the following chart:

![ggplot2](img/ggplot2.png)

These features should be preserved as much as possible in your replication:

- Data positioning: it should be a downward-trending scatterplot as shown. Weight should be on the x-axis and MPG on the y-axis.
- Scales: Note the scales do not start at 0.
- Axis ticks and labels: both axes are labeled and there are tick marks at 10, 20, 30, etcetera.
- Color mapping to Manufacturer.
- Size mapping to Weight.
- Opacity of circles set to 0.5 or 50%.

Other features are not required. This includes:

- The background grid.
- The legends.

Note that some software packages will make it **impossible** to perfectly preserve the above requirements.
Be sure to note where these deviate.

Improvements are also welcome as part of Technical and Design achievements.

Libraries, Tools, Languages
---

You are required to use 5 different tools or libraries.
Of the 5 tools, you must use at least 3 libraries (libraries require code of some kind).
This could be `Python, R, Javascript`, or `Java, Javascript, Matlab` or any other combination.
Dedicated tools (i.e. Excel) do not count towards the language requirement.

Otherwise, you should seek tools and libraries to fill out your 5.

Below are a few ideas. Do not limit yourself to this list!
Some may be difficult choices, like Matlab or SPSS, which require large installations, licenses, and occasionally difficult UIs.

I have marked a few that are strongly suggested.

- R + ggplot2 `<- definitely worth trying`
- Excel
- d3 `<- since the rest of the class uses this, we're requiring it`
- Matplotlib
- three.js `<- well, it's a 3d library. not really recommended, but could be interesting and fun`
- p5js `<- good for playing around. not really a chart lib`
- Tableau
- Java 2d
- GNUplot `<- the CS department head uses this all the time :)`
- Vega-lite <- `<- very interesting formal visualization model; might be the future of the field`
- Flourish <- `<- popular in recent years`
- PowerBI
- SPSS

You may write everything from scratch, or start with demo programs from books or the web.
If you do start with code that you found, please identify the source of the code in your README and, most importantly, make non-trivial changes to the code to make it your own so you really learn what you're doing.

Tips
---

- If you're using d3, key to this assignment is knowing how to load data.
You will likely use the [`d3.json` or `d3.csv` functions](https://github.com/mbostock/d3/wiki/Requests) to load the data you found.
Beware that these functions are *asynchronous*, meaning it's possible to "build" an empty visualization before the data actually loads.

- *For web languages like d3* Don't forget to run a local webserver when you're debugging.
See this [ebook](http://chimera.labs.oreilly.com/books/1230000000345/ch04.html#_setting_up_a_web_server) if you're stuck.


Readme Requirements
---

A good readme with screenshots and structured documentation is required for this project.
It should be possible to scroll through your readme to get an overview of all the tools and visualizations you produced.

- Each visualization should start with a top-level heading (e.g. `# d3`)
- Each visualization should include a screenshot. Put these in an `img` folder and link through the readme (markdown command: `![caption](img/<imgname>)`.
- Write a paragraph for each visualization tool you use. What was easy? Difficult? Where could you see the tool being useful in the future? Did you have to use any hacks or data manipulation to get the right chart?

Other Requirements
---

0. Your code should be forked from the GitHub repo.
1. Place all code, Excel sheets, etcetera in a named folder. For example, `r-ggplot, matlab, mathematica, excel` and so on.
2. Your writeup (readme.md in the repo) should also contain the following:

- Description of the Technical achievements you attempted with this visualization.
- Some ideas include interaction, such as mousing over to see more detail about the point selected.
- Description of the Design achievements you attempted with this visualization.
- Some ideas include consistent color choice, font choice, element size (e.g. the size of the circles).

GitHub Details
---

- Fork the GitHub Repository. You now have a copy associated with your username.
- Make changes to fulfill the project requirements.
- To submit, make a [Pull Request](https://help.github.com/articles/using-pull-requests/) on the original repository.

Grading
---

Grades on a 120 point scale.
24 points will be based on your Technical and Design achievements, as explained in your readme.

Make sure you include the files necessary to reproduce your plots.
You should structure these in folders if helpful.
We will choose some at random to run and test.

**NOTE: THE BELOW IS A SAMPLE ENTRY TO GET YOU STARTED ON YOUR README. YOU MAY DELETE THE ABOVE.**

# R + ggplot2 + R Markdown

R is a language primarily focused on statistical computing.
ggplot2 is a popular library for charting in R.
R Markdown is a document format that compiles to HTML or PDF and allows you to include the output of R code directly in the document.

To visualized the cars dataset, I made use of ggplot2's `geom_point()` layer, with aesthetics functions for the color and size.

While it takes time to find the correct documentation, these functions made the effort creating this chart minimal.

![ggplot2](img/ggplot2.png)

# d3...

(And so on...)


# R + ggplot2

R is a language commonly used in the fields of statistics and probability. ggplot2 is an open-sourced visualization library for R, and it includes a variety of features that pair the mathematical power of R with data visualization.

I was surprised by how remarkably easy it was to use R and ggplot to replicate the visualization. It took an extremely minimal amount of code to create a fairly intricate chart. I used Professor Harrison's code in class that he used as a demo as a starting point.

I could see R and ggplot being useful in the future when wanting to combine statistics and data visualization in one language. I did not have to manipulate the data to get the right graph, and the data with 'NA' for its MPG field was automatically filtered out.

![ggplot2](https://github.com/timconnors33/a2-DataVis-5Ways/blob/main/img/ggplot2.png)

# d3

d3 is an extremely powerful and customizable library for creating data visualizations in Javascript.

With this immense level of freedom also comes much more work to achieve the same results as other languages. I am a little more comfortable using d3 after having spent some time with it, but I find that it is a comparatively difficult tool to use for trying to create a simple visualization.

d3 would definitely be useful for making more intricate and interactive visualizations. It offers much more depth than the other tools I used. I filtered out the null data points by simply setting their y-coordinate to be outside the SVG. All code I based my program on is shown in the comments of the file.

![d3](https://github.com/timconnors33/a2-DataVis-5Ways/blob/main/img/d3.PNG)

# Excel

Excel is a spreadsheet tool created by Microsoft. It is perhaps the most common data analysis application in the world.

I ran into some problems importing the CSV data to Excel, which led to some tedious debugging. Other than that, Excel was one of the more intuitive tools I used and was fairly easy to use once I got past my initial difficulties.

Excel is great for light data analysis and visualization, where complex functionality is not needed. I had to create a separate series for each manufacturer's cars in order to map the manufacturer to node color.

![Excel](https://github.com/timconnors33/a2-DataVis-5Ways/blob/main/img/Excel.PNG)

# Python + pyplot

Python is an extremely versatile language used in a wide variety of high-level programming fields. Specifically, pyplot is a library that gives matplotlib MATLAB-like functionality.

I have very minimal experience in Python and found the syntax a bit perplexing at first. Although I did not have to write too many lines of code to create what I wanted, the lines I did write took a long time to get right.

Python and pyplot could be useful for making data visualization on program performance. It would be simple to analyze an algorithm's performance and visualize it using pyplot. I noticed that the bubble size of the data points was much too big initially, so I had to divide the manufacturer size by 10 to make it work. All code I based my program on is shown in the comments of the file

![Pyplot](https://github.com/timconnors33/a2-DataVis-5Ways/blob/main/img/Pyplot.PNG)

# Google Sheets

Google Sheets is the Google equivalent of Excel. It is free for anyone with access to Google Drive and has a lot of the same functionality as Excel.

Sheets was almost too easy to use. It sacrificed depth in features for being incredibly straightforward. It took much less time to create the chart than the other tools I used.

Sheets could be used to offer an introduction to data analysis and visualization and as a free alternative to Excel. It could also be used to work on the same spreadsheet remotely with others. Unfortunately, Sheets does not allow you to edit the bubble size of the data points, which made the data a lot harder to read.

Note: the chart did not seem to export correctly when I tried downloading the Sheets file as a .xlsx, so I included a .pdf file as well.

![Sheets](https://github.com/timconnors33/a2-DataVis-5Ways/blob/main/img/Sheets.PNG)

## Technical Achievements
- **Proved P=NP**: Using a combination of...
- **Solved AI Forever**: ...

- **Gained experience with CSV files**: I had never worked with CSV files before this assignment. I figured out how a CSV file is formatted and how to utilize its different fields in a variety of languages.
- **Learned how to bind data in d3**: I learned how to bind data from the CSV file to individual circles in d3. To do this, I had to learn how to import the CSV and use the d3.data() function, both of which I had never done before.
- **Learned how to use Python**: Having little experience with Python before this, I was able to pick up what I needed to know on the fly as I was doing the assignment.
- **Resolved Excel datatype issue**: For some reason, the MPG field in the CSV file was imported as text to Excel and not as numbers (I am guessing because of the 'NA's). This caused the MPG to not display correctly on the chart. I had to change the datatype in order to create the scatterplot correctly.
- **Incorporated Margins in d3 program**: Margins were something Professor Harrison mentioned in class as being commonly used in the d3 community. I tried to understand how they work and used them in the design of my chart.

### Design Achievements
- **Re-vamped Apple's Design Philosophy**: As demonstrated in my colorscheme...
- **Used interactive tooltip in Python chart**: My Python/pyplot chart uses a tooltip that displays the exact Weight and MPG values of the datapoint the mouse is hovering over.
- **Used consistent color style**: Across all the tools and libraries, I used the same general coloring for every manufacturer.
- **Filtered out null data values**: I dealt with the cars that had 'NA' as their MPG value by not displaying them in my charts, making the chart easier to read and more intuitive.
- **Added legends**: Multiple of my charts have legends showing the manufacturer associated with the color of each data point, and one also displays the datapoint size and weight relationship as well as the opacity being used.
Loading