Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved documentation #461

Merged
merged 4 commits into from
Aug 19, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 115 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,57 +17,101 @@ Node.js is required.

See [Pelias software requirements](https://github.com/pelias/documentation/blob/master/requirements.md) for required and recommended versions.

## Types
## Quickstart Usage

There are two major categories of Who's on First data supported: hierarchy (or admin) data, and venues.
To install the required Node.js module dependencies, download data for the entire planet (20GB+) and execute the importer, run:

Hierarchy data represents things like cities, countries, counties, boroughs, etc.
```bash
npm install
npm run download
npm start
```

Venues represent individual places like the Statue of Liberty, a gas station, etc. Venues are subdivided by country, and sometimes regions within a country.
## Configuration

Currently, the supported hierarchy types are:
This importer is configured using the [`pelias-config`](https://github.com/pelias/config) module.
The following configuration options are supported by this importer.

- borough
- continent
- country
- county
- dependency
- disputed
- [empire](https://www.youtube.com/watch?v=-bzWSJG93P8)
- localadmin
- locality
- macrocounty
- macrohood
- macroregion
- marinearea
- neighbourhood
- ocean
- region
- postalcodes (optional, see configuration)
### `imports.whosonfirst.datapath`

Other types may be included in the future.
* Required: yes
* Default: ``

[The Who's on First documentation](https://github.com/whosonfirst/whosonfirst-placetypes) has a description of all the types supported by Who's on First.
Full path to where Who's on First data is located (note: the included [downloader script](#downloading-the-data) will automatically place the WOF data here, and is the recommended way to obtain WOF data)

## Configuration
### `imports.whosonfirst.importPlace`

This importer is configured using the [`pelias-config`](https://github.com/pelias/config) module.
The following configuration options are supported by this importer.
* Required: no
* Default: ``

Set to a WOF ID or array of IDs to import data only for descendants of those records, rather than the entire planet.

You can use the [Who's on First Spelunker](https://spelunker.whosonfirst.org) or the `source_id` field from any WOF result of a Pelias query to determine these values.

Specifying a value for `importPlace` will download the full planet SQLite database (27GB). Support for individual country downloads [may be added in the future](https://github.com/pelias/whosonfirst/issues/459)

### `imports.whosonfirst.importVenues`

* Required: no
* Default: `false`

Set to true to enable importing venue records. There are over 15 million venues so this option will add substantial download and disk usage requirements.

It is currently [not recommended to import venues](https://github.com/pelias/whosonfirst/issues/94).


### `imports.whosonfirst.importPostalcodes`

* Required: no
* Default: `false`

Set to true to enable importing postalcode records. There are over 3 million postal code records.

Setting this option to `true` is well tested and [may become the default in the future](https://github.com/pelias/config/issues/61).

### `imports.whosonfirst.missingFilesAreFatal`

* Required: no
* Default: `false`

| key | required | default | description |
| --- | --- | --- | --- |
| `imports.whosonfirst.datapath` | yes | | full path to where Who's on First data is located (note: the included [downloader script](#downloading-the-data) will automatically place the WOF data here, and is the recommended way to obtain WOF data) |
| `imports.whosonfirst.importPostalcodes` | no | false | set to `true` to include postalcodes in the data download and import process |
| `imports.whosonfirst.importVenues` | no | false | set to `true` to include venues in the data download and import process |
| `imports.whosonfirst.importPlace` | no | | set to a WOF id (number or string) indicating the region of interest, only data pertaining to that place shall be downloaded. Use the WOF [spelunker tool](https://spelunker.whosonfirst.org) search for an ID of a place. |
| `imports.whosonfirst.missingFilesAreFatal` | no | false | set to `true` for missing files from [Who's on First bundles](https://dist.whosonfirst.org/bundles/) to stop the import process |
| `imports.whosonfirst.maxDownloads` | no | 4 | the maximum number of files to download simultaneously. Higher values can be faster, but can also cause donwload errors |
| `imports.whosonfirst.dataHost` | no | `https://dist.whosonfirst.org/` | The location to download Who's on First data from. Changing this can be useful to use custom data, pin data to a specific date, etc |
| `imports.whosonfirst.sqlite` | no | false | Set to `true` to use Who's on First SQLite databases instead of GeoJSON bundles. |
Set to `true` for missing files from [Who's on First bundles](https://dist.whosonfirst.org/bundles/) to stop the import process.

This flag is useful if you consider it vital that all Who's on First data is successfully imported, and can be helpful to guard against incomplete downloads or other types of failure.

### `imports.whosonfirst.maxDownloads`

* Required: no
* Default: `4`

The maximum number of files to download simultaneously. Higher values can be faster, but can also cause donwload errors.

### `imports.whosonfirst.dataHost`

* Required: no
* Default: `https://dist.whosonfirst.org/`

The location to download Who's on First data from. Changing this can be useful to use custom data, pin data to a specific date, etc.

### `imports.whosonfirst.sqlite`

* Required: no
* Default: `false`

Set to `true` to use Who's on First SQLite databases instead of GeoJSON bundles.

SQLite databases take up less space on disk and can be much more efficient to
download and extract.

This option may [become the default in the near future](https://github.com/pelias/whosonfirst/issues/460).

However, both the Who's on First processes to generate
these files and the Pelias code to use them is new and not yet considered
production ready.

## Downloading the Data

* The `download` script will download the required bundles/sqlite databases and place the data into the datapath configured in [pelias-config](https://github.com/pelias/config) in the required directory layout.
The `download` script will download the required bundles/sqlite databases into the datapath configured in `imports.whosonfirst.datapath`.

To install the required node module dependencies and run the download script:

```bash
Expand All @@ -79,11 +123,9 @@ npm run download
npm run download -- --admin-only # to only download hierarchy data, without venues or postalcodes
```

When running an instance intended to provide coverage for an area smaller than the entire world,
it is recommended that the `importPlace` config parameter is used to limit the data download to records
that are parents or descendants of the specified place. See the configuration details in the above section of this document.
We currently only support a single ID at a time. If multiple places need to be downloaded, the script can be executed multiple times;
one for each desired place.
**Note:** The download script will always download data for the entire planet. Support for downloading data for specific countries is [a possible future enhancement](https://github.com/pelias/whosonfirst/issues/459).

When using `imports.whosonfirst.importPlace`, a new SQLite database will only be downloaded if new data is available. Otherwise, the existing download will be reused.

**Warning**: Who's on First data is _big_. Just the hierarchy data is tens of GB, and the full dataset is over 100GB on disk.
Additionally, Who's on First uses one file per record. In addition to lots of disk space,
Expand All @@ -92,14 +134,38 @@ Linux/Mac, `df -ih` can show you how many free inodes you have.

Expect to use a few million inodes for Who's on First. You probably don't want to store multiple copies of the Who's on First data due to its disk requirements.

## Usage
## Types

To install the required node module dependencies and execute the importer, run:
There are two major categories of Who's on First data supported: hierarchy (or admin) data, and venues.

Hierarchy data represents things like cities, countries, counties, boroughs, etc.

Venues represent individual places like the Statue of Liberty, a gas station, etc. Venues are subdivided by country, and sometimes regions within a country.

Currently, the supported hierarchy types are:

- borough
- continent
- country
- county
- dependency
- disputed
- [empire](https://www.youtube.com/watch?v=-bzWSJG93P8)
- localadmin
- locality
- macrocounty
- macrohood
- macroregion
- marinearea
- neighbourhood
- ocean
- region
- postalcodes (optional, see configuration)

Other types may be included in the future.

[The Who's on First documentation](https://github.com/whosonfirst/whosonfirst-placetypes) has a description of all the types supported by Who's on First.

```bash
$> npm install
$> npm start
```

### In Other Projects

Expand Down