Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faceted map example chart #1711

Open
6 tasks
palewire opened this issue Sep 27, 2019 · 21 comments
Open
6 tasks

Faceted map example chart #1711

palewire opened this issue Sep 27, 2019 · 21 comments
Labels
documentation good first issue vega: vega-datasets Requires upstream action in `vega-datasets`

Comments

@palewire
Copy link
Contributor

palewire commented Sep 27, 2019

Tasks

Based on (#1711 (comment)), (#1711 (comment))

Tip

The tasks below should be completed in order of appearance.
Earlier tasks can be performed by any contributor (new or old) familiar with alt.Chart.mark_geoshape

  • Find an appropriate dataset
    • Contains a categorical field, not directly dependent on location
    • License that enables adding to vega/vega-datasets and using/transforming within vega projects (non-commercial)
      • Current dataset licenses: 1, 2
  • Demonstrate using the dataset (in this issue)
  • Open an upstream issue, linking back to this one
  • Follow up by opening an upstream PR

Example (not altair)

(#1711 (comment)) @mattijn

Maybe something like species richness for a limited number of species.
Similar to below, from here, but then preferably aggregated per county, instead of raster cells:

Image
@mattijn
Copy link
Contributor

mattijn commented Sep 30, 2019

Try this:

import altair as alt
from vega_datasets import data

states = alt.topo_feature(data.us_10m.url, 'states')
source = data.income.url

alt.Chart(source).mark_geoshape().encode(
    shape=alt.Shape(field='geo', type='geojson'),
    color='pct:Q',
    column='group:N',
    tooltip=['name:N', 'group:N', 'pct:Q']
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=states, key='id'),
    as_='geo'
).properties(
    width=75,
    height=150
).project(
    type='albersUsa'
)

image

I noticed there is no shorthand for type='geojson' (otherwise you could do something as shape='geo:G'). It's also not mentioned in the Altair docs, where it is in the Vega-Lite docs

@irisslee
Copy link

irisslee commented Oct 4, 2019

Here's an example using the LA riots sample dataset

import altair as alt
from vega_datasets import data

df = data.la_riots()

n = alt.topo_feature('https://gist.githubusercontent.com/irisslee/70039051188dac8f64e14182b5a459a9/raw/2412c45551cff577f7b10604ca523bd3f4dd31d3/countytopo.json', 'county')

LAbasemap = alt.Chart(n).mark_geoshape(
    fill='lightgray',
    stroke='white'
).properties(width = 400, height =400).project('mercator')

points = alt.Chart().mark_circle().encode( 
    longitude = 'longitude:Q',
    latitude='latitude:Q',
    size = alt.value(15), 
    color = 'gender:N'
)

alt.layer(LAbasemap, points, data=df).facet('gender:N') 

visualization

@jakevdp
Copy link
Collaborator

jakevdp commented Oct 4, 2019

That's a nice example of the mechanics of a faceted map, but I think for this particular dataset the visualization would be more effective without splitting it across facets.

@palewire
Copy link
Contributor Author

palewire commented Oct 4, 2019 via email

@jakevdp
Copy link
Collaborator

jakevdp commented Oct 4, 2019

I haven't been able to come up with a good example.

@mattijn
Copy link
Contributor

mattijn commented Oct 4, 2019

I add one already in #1714..

@palewire
Copy link
Contributor Author

palewire commented Oct 4, 2019 via email

@mattijn
Copy link
Contributor

mattijn commented Oct 4, 2019

@palewire
Copy link
Contributor Author

palewire commented Oct 4, 2019

I think facets by time series segment or by a quantitative bracket are interesting, but I'd wager that both are much less common than charts that facet by a nominative category.

@mattijn
Copy link
Contributor

mattijn commented Oct 4, 2019

How does a facet by quantitative data look like? Albeit years can be a quantitative data type as well, aren't they used as nominative categories here?

import altair as alt
from vega_datasets import data

countries = alt.topo_feature(data.world_110m.url, 'countries')
source = 'https://raw.githubusercontent.com/mattijn/datasets/master/cities_prediction_population.csv'

base = alt.Chart(countries).mark_geoshape(
    fill='lightgray',
    stroke='white',
    strokeWidth=0.2
).properties(width=300, height=200).project('naturalEarth1')

cities = alt.Chart().mark_circle().encode( 
    latitude='lat:Q',    
    longitude='lon:Q',
    size=alt.Size('population:Q',scale=alt.Scale(range=[0, 1000]), legend=alt.Legend(title="Population (million)")),
    fill=alt.value('green'),
    stroke=alt.value('white'),
    tooltip=['city:N','population:Q']
)

alt.layer(base, cities, data=source).facet(
    facet='year:N', 
    columns=2, 
    title='The 20 Most Populous Cities in the World by 2100'
)

image

Based on https://www.visualcapitalist.com/animated-map-worlds-populous-cities-2100/

@palewire
Copy link
Contributor Author

palewire commented Oct 4, 2019 via email

@mattijn
Copy link
Contributor

mattijn commented Oct 4, 2019

Yeah, my example is more ordinal then nominal

@palewire
Copy link
Contributor Author

palewire commented Oct 4, 2019

In my opinion, the best Altair examples import from vega_datasets and do not require any transformation of data prior to plotting.

With those requirements, I'm not sure there's a suitable dataset in the current example list other than the LA riots dataset used by @irisslee. However, that set may require the import of outside geographies for the base map, something I think we should also aim to avoid.

Unless we can find a good candidate with the examples, or solve the issue of the base map for the riots data, I think we should consider nominating a new example dataset for vega_datasets to document this relatively common news chart.

@dangotbanned
Copy link
Member

@mattijn was this not solved by?

If we were going by the issue title/description alone; your example (US Income by State: Wrapped Facet) seems like the solution.

Reading through the thread, it isn't clear to me what the additional requirements would be for closing the issue

Note

I'm trying to do some housekeeping on old issues, e.g. closing, labelling, adding relationships.

@mattijn
Copy link
Contributor

mattijn commented Jan 1, 2025

Happy new year @dangotbanned!
This is still a valid issue in a sense that there is currently not a good dataset that describes what @palewire is after.

Maybe something like species richness for a limited number of species.
Similar to:

image

Taken from here, but then preferably aggregated per county, instead of raster cells.

@dangotbanned
Copy link
Member

#1711 (comment)

@mattijn in that case, how about we source a suitable dataset and open an issue in https://github.com/vega/vega-datasets?

We could then have access to it after dealing with:

@mattijn
Copy link
Contributor

mattijn commented Jan 1, 2025

Sounds good to me!

@dangotbanned
Copy link
Member

Sounds good to me!

Great @mattijn !

I'm not 100% sure the license for the dataset you linked would work for us
https://creativecommons.org/licenses/by-nc-nd/4.0/

A potential starting point might be here though https://github.com/datasets

@giswqs
Copy link

giswqs commented Jan 4, 2025

Two data sources for consideration:

Zillow Research Data: https://www.zillow.com/research/data/
Data Commons: https://datacommons.org/

A sample US housing dataset derived from the Zillow data.
https://github.com/opengeos/data/blob/main/housing/Core/RDC_Inventory_Core_Metrics_State.csv

@giswqs
Copy link

giswqs commented Jan 4, 2025

A demo I created a few years ago for using the Zillow housing data. I am not quite sure about the dataset license though.

Demo: https://www.youtube.com/watch?v=gMghAsNuTbw
Web App: https://huggingface.co/spaces/giswqs/Streamlit/blob/main/pages/2_%F0%9F%8F%A0_U.S._Housing.py

If the Zillow dataset license does not work for you, https://datacommons.org/ has a lot of other open datasets.

@dangotbanned
Copy link
Member

#1711 (comment), #1711 (comment)

Thanks for showing interest in this issue @giswqs!

I've just updated the description with an overview of what work I'm thinking we need to do. Hope you find that helpful.

I don't personally work with spatial data, but I'm quite impressed by your GitHub profile! 🧠

@dangotbanned dangotbanned added the vega: vega-datasets Requires upstream action in `vega-datasets` label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation good first issue vega: vega-datasets Requires upstream action in `vega-datasets`
Projects
None yet
Development

No branches or pull requests

6 participants