Skip to content

whosonfirst-data/whosonfirst-brands

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whosonfirst-brands

Brands in Who's On First documents.

Caveats

This is a work in progress and very much still "wet paint" and there is little to no tooling for this stuff yet.

Where do all these #brands come from?

At the moment, they come from the Elasticsearch index running the Who's On First Spelunker. They are the product of a not very sophisticated faceting process on an unanalyzed copy of the wof:name field (called unsuprisingly name_not_analyzed). Like this:

curl -s -v --max-time 600 'http://localhost:9200/spelunker/_search?from=0&size=50' -d '{"query": {"term": {"w:placetype": "venue"}}, "aggregations": {"brands": {"terms": {"field": "name_not_analyzed", "size": 0}}}, "size": 0}' > brands.json

That produces something like 16 million distinct names. We have not imported most of those. Instead we have limited the #brands included here to only those with 50 (or more) venues. So instead of 16 million #brands we have about 7,400 as of this writing. Maybe the cut-off point should be 25, maybe it should be 10. Maybe it should be 5. We don't know yet. We're figuring it out as we go.

It is assumed that a whole bunch of these records will be superseded or deprecated or both. That work remains tomorrow's problem.

About

Brands in Who's On First documents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published