diff --git a/README.md b/README.md index 1ce50cf4..7299deab 100644 --- a/README.md +++ b/README.md @@ -17,57 +17,101 @@ Node.js is required. See [Pelias software requirements](https://github.com/pelias/documentation/blob/master/requirements.md) for required and recommended versions. -## Types +## Quickstart Usage -There are two major categories of Who's on First data supported: hierarchy (or admin) data, and venues. +To install the required Node.js module dependencies, download data for the entire planet (20GB+) and execute the importer, run: -Hierarchy data represents things like cities, countries, counties, boroughs, etc. +```bash +npm install +npm run download +npm start +``` -Venues represent individual places like the Statue of Liberty, a gas station, etc. Venues are subdivided by country, and sometimes regions within a country. +## Configuration -Currently, the supported hierarchy types are: +This importer is configured using the [`pelias-config`](https://github.com/pelias/config) module. +The following configuration options are supported by this importer. -- borough -- continent -- country -- county -- dependency -- disputed -- [empire](https://www.youtube.com/watch?v=-bzWSJG93P8) -- localadmin -- locality -- macrocounty -- macrohood -- macroregion -- marinearea -- neighbourhood -- ocean -- region -- postalcodes (optional, see configuration) +### `imports.whosonfirst.datapath` -Other types may be included in the future. +* Required: yes +* Default: `` -[The Who's on First documentation](https://github.com/whosonfirst/whosonfirst-placetypes) has a description of all the types supported by Who's on First. +Full path to where Who's on First data is located (note: the included [downloader script](#downloading-the-data) will automatically place the WOF data here, and is the recommended way to obtain WOF data) -## Configuration +### `imports.whosonfirst.importPlace` -This importer is configured using the [`pelias-config`](https://github.com/pelias/config) module. -The following configuration options are supported by this importer. +* Required: no +* Default: `` + +Set to a WOF ID or array of IDs to import data only for descendants of those records, rather than the entire planet. + +You can use the [Who's on First Spelunker](https://spelunker.whosonfirst.org) or the `source_id` field from any WOF result of a Pelias query to determine these values. + +Specifying a value for `importPlace` will download the full planet SQLite database (27GB). Support for individual country downloads [may be added in the future](https://github.com/pelias/whosonfirst/issues/459) + +### `imports.whosonfirst.importVenues` + +* Required: no +* Default: `false` + +Set to true to enable importing venue records. There are over 15 million venues so this option will add substantial download and disk usage requirements. + +It is currently [not recommended to import venues](https://github.com/pelias/whosonfirst/issues/94). + + +### `imports.whosonfirst.importPostalcodes` + +* Required: no +* Default: `false` + +Set to true to enable importing postalcode records. There are over 3 million postal code records. + +Setting this option to `true` is well tested and [may become the default in the future](https://github.com/pelias/config/issues/61). + +### `imports.whosonfirst.missingFilesAreFatal` + +* Required: no +* Default: `false` -| key | required | default | description | -| --- | --- | --- | --- | -| `imports.whosonfirst.datapath` | yes | | full path to where Who's on First data is located (note: the included [downloader script](#downloading-the-data) will automatically place the WOF data here, and is the recommended way to obtain WOF data) | -| `imports.whosonfirst.importPostalcodes` | no | false | set to `true` to include postalcodes in the data download and import process | -| `imports.whosonfirst.importVenues` | no | false | set to `true` to include venues in the data download and import process | -| `imports.whosonfirst.importPlace` | no | | set to a WOF id (number or string) indicating the region of interest, only data pertaining to that place shall be downloaded. Use the WOF [spelunker tool](https://spelunker.whosonfirst.org) search for an ID of a place. | -| `imports.whosonfirst.missingFilesAreFatal` | no | false | set to `true` for missing files from [Who's on First bundles](https://dist.whosonfirst.org/bundles/) to stop the import process | -| `imports.whosonfirst.maxDownloads` | no | 4 | the maximum number of files to download simultaneously. Higher values can be faster, but can also cause donwload errors | -| `imports.whosonfirst.dataHost` | no | `https://dist.whosonfirst.org/` | The location to download Who's on First data from. Changing this can be useful to use custom data, pin data to a specific date, etc | -| `imports.whosonfirst.sqlite` | no | false | Set to `true` to use Who's on First SQLite databases instead of GeoJSON bundles. | +Set to `true` for missing files from [Who's on First bundles](https://dist.whosonfirst.org/bundles/) to stop the import process. + +This flag is useful if you consider it vital that all Who's on First data is successfully imported, and can be helpful to guard against incomplete downloads or other types of failure. + +### `imports.whosonfirst.maxDownloads` + +* Required: no +* Default: `4` + +The maximum number of files to download simultaneously. Higher values can be faster, but can also cause donwload errors. + +### `imports.whosonfirst.dataHost` + +* Required: no +* Default: `https://dist.whosonfirst.org/` + +The location to download Who's on First data from. Changing this can be useful to use custom data, pin data to a specific date, etc. + +### `imports.whosonfirst.sqlite` + +* Required: no +* Default: `false` + +Set to `true` to use Who's on First SQLite databases instead of GeoJSON bundles. + +SQLite databases take up less space on disk and can be much more efficient to +download and extract. + +This option may [become the default in the near future](https://github.com/pelias/whosonfirst/issues/460). + +However, both the Who's on First processes to generate +these files and the Pelias code to use them is new and not yet considered +production ready. ## Downloading the Data -* The `download` script will download the required bundles/sqlite databases and place the data into the datapath configured in [pelias-config](https://github.com/pelias/config) in the required directory layout. +The `download` script will download the required bundles/sqlite databases into the datapath configured in `imports.whosonfirst.datapath`. + To install the required node module dependencies and run the download script: ```bash @@ -79,11 +123,9 @@ npm run download npm run download -- --admin-only # to only download hierarchy data, without venues or postalcodes ``` -When running an instance intended to provide coverage for an area smaller than the entire world, -it is recommended that the `importPlace` config parameter is used to limit the data download to records -that are parents or descendants of the specified place. See the configuration details in the above section of this document. -We currently only support a single ID at a time. If multiple places need to be downloaded, the script can be executed multiple times; -one for each desired place. +**Note:** The download script will always download data for the entire planet. Support for downloading data for specific countries is [a possible future enhancement](https://github.com/pelias/whosonfirst/issues/459). + +When using `imports.whosonfirst.importPlace`, a new SQLite database will only be downloaded if new data is available. Otherwise, the existing download will be reused. **Warning**: Who's on First data is _big_. Just the hierarchy data is tens of GB, and the full dataset is over 100GB on disk. Additionally, Who's on First uses one file per record. In addition to lots of disk space, @@ -92,14 +134,38 @@ Linux/Mac, `df -ih` can show you how many free inodes you have. Expect to use a few million inodes for Who's on First. You probably don't want to store multiple copies of the Who's on First data due to its disk requirements. -## Usage +## Types -To install the required node module dependencies and execute the importer, run: +There are two major categories of Who's on First data supported: hierarchy (or admin) data, and venues. + +Hierarchy data represents things like cities, countries, counties, boroughs, etc. + +Venues represent individual places like the Statue of Liberty, a gas station, etc. Venues are subdivided by country, and sometimes regions within a country. + +Currently, the supported hierarchy types are: + +- borough +- continent +- country +- county +- dependency +- disputed +- [empire](https://www.youtube.com/watch?v=-bzWSJG93P8) +- localadmin +- locality +- macrocounty +- macrohood +- macroregion +- marinearea +- neighbourhood +- ocean +- region +- postalcodes (optional, see configuration) + +Other types may be included in the future. + +[The Who's on First documentation](https://github.com/whosonfirst/whosonfirst-placetypes) has a description of all the types supported by Who's on First. -```bash -$> npm install -$> npm start -``` ### In Other Projects