Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up for 2.0 #145

Closed
4 tasks done
domoritz opened this issue Feb 23, 2020 · 6 comments · Fixed by #187
Closed
4 tasks done

Clean up for 2.0 #145

domoritz opened this issue Feb 23, 2020 · 6 comments · Fixed by #187

Comments

@domoritz
Copy link
Member

domoritz commented Feb 23, 2020

For the 2.0 release, let's clean up datasets we don't need anymore.

@eitanlees
Copy link
Contributor

Here are the current weather related datasets:

  • annual-precip.json
  • climate.json
  • co2-concentration.csv
  • seattle-temps.csv
  • seattle-weather.csv
  • sf-temps.csv
  • weather.csv
  • weather.json
  • windvectors.csv

What were you thinking with respect to consolidation?

@domoritz
Copy link
Member Author

I think we can probably merge (after looking at examples and the data more carefully).

  • seattle-temps.csv
  • seattle-weather.csv
  • sf-temps.csv
  • weather.csv

weather.json is similar but contains a prediction and is used for a specific chart so let's keep it.

@eitanlees
Copy link
Contributor

Thoughts on weather data

There seems to be one set of daily records and one set of hourly records.

Daily Weather

seattle-weather.csv contains daily weather information.

date precipitation temp_max temp_min wind weather
0 2012/01/01 0 12.8 5 4.7 drizzle
1 2012/01/02 10.9 10.6 2.8 4.5 rain
2 2012/01/03 0.8 11.7 7.2 2.3 rain
3 2012/01/04 20.3 12.2 5.6 4.7 rain
4 2012/01/05 1.3 8.9 2.8 6.1 rain

weather.csv contains the same data from seattle-weather.csv but has data from New York as well.

location date precipitation temp_max temp_min wind weather
0 Seattle 2012-01-01 0 12.8 5 4.7 drizzle
1 Seattle 2012-01-02 10.9 10.6 2.8 4.5 rain
2 Seattle 2012-01-03 0.8 11.7 7.2 2.3 rain
3 Seattle 2012-01-04 20.3 12.2 5.6 4.7 rain
4 Seattle 2012-01-05 1.3 8.9 2.8 6.1 rain

Note: The dates are slightly different formats.

Hourly Weather

seattle-temps.csv contains hourly weather information.

date temp
0 2010-01-01T01:00:00-08:00 39.2
1 2010-01-01T02:00:00-08:00 39
2 2010-01-01T03:00:00-08:00 38.9
3 2010-01-01T04:00:00-08:00 38.8
4 2010-01-01T05:00:00-08:00 38.7

sf-temps.csv also contains hourly weather information with the same dates.

date temp
0 2010-01-01T01:00:00-08:00 47.4
1 2010-01-01T02:00:00-08:00 46.9
2 2010-01-01T03:00:00-08:00 46.5
3 2010-01-01T04:00:00-08:00 46
4 2010-01-01T05:00:00-08:00 45.9

They can be easily concatenated with the addition of a location variable.

Proposal

I think a sensible consolidation would be to combine the hourly and daily datasets respectively

  • weather-daily.csv
  • weather-hourly.csv

Concerns

Removing seattle-weather.csv would break many examples and tutorials. If the data is combined then any example would have to filter before being visualized.

I haven't checked the usage of the datasets but there would be similar issues.

@domoritz
Copy link
Member Author

Thank you for the analysis @eitanlees. I agree that removing the seattle-weather dataset would break too many examples so let's keep it. I like that you can quickly import it and create a demo visualization without having to filter/facet by location.

seattle-temps.csv and sf-temps.csv are not used that much and only contain temperature information. The source also says (which I know you created) "30-year temperature averages recorded hourly from the Seattle Tacoma International Airport weather station". But the dates are all for 2010. I'm inclined to just remove the two datasets and we can add seattle-weather-hourly.csv instead.

The data for Seattle is used in https://vega.github.io/vega-lite/examples/trellis_area_seattle.html and https://vega.github.io/vega/examples/annual-temperature/ and https://vega.github.io/vega/examples/heatmap/.

@eitanlees
Copy link
Contributor

The temperature data comes from a 30 year observation period and the average temperatures were reported. The study ended in 2010 and that is where the date comes from. This type of measurement is called an Hourly Normal, rather than a direct temperature measurement at only one time. It's used for studying broader trends in climate sciences. For more information see the NOAA Hourly Normal Documentation

I agree with the recommended changes.

@domoritz
Copy link
Member Author

domoritz commented Jun 2, 2020

Oh, I see. Maybe it would be nice to join the normal temperatures (and other values) into the dataset then so you can compare actual to normal temperatures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants