geonames

Adjust geonames, geopoint, nyc_taxis and so_vector tracks for serverl…

Dec 20, 2023

f85d96a · Dec 20, 2023

This branch is 47 commits behind elastic/rally-tracks:master.

Name	Name	Last commit message	Last commit date
parent directory ..
challenges	challenges	Adjust geonames, geopoint, nyc_taxis and so_vector tracks for serverl…	Dec 20, 2023
operations	operations	Add sort by keyword with and without can match challenge to http-logs…	Jan 23, 2023
README.md	README.md	Adjust geonames, geopoint, nyc_taxis and so_vector tracks for serverl…	Dec 20, 2023
files.txt	files.txt	Provide download helper script for tracks	Nov 8, 2017
index.json	index.json	Adjust geonames, geopoint, nyc_taxis and so_vector tracks for serverl…	Dec 20, 2023
terms.txt	terms.txt	Index coordinates as geopoints in geonames track	May 2, 2017
track.json	track.json	Adjust geonames, geopoint, nyc_taxis and so_vector tracks for serverl…	Dec 20, 2023
track.py	track.py	Finish black migration (elastic#308 )	Aug 30, 2022

README.md

Geonames track

This track is based on a geonames dump of the file allCountries.zip retrieved as of April 27, 2017.

For further details about the semantics of individual fields, please see the geonames dump README.

Modifications:

The original CSV data have been converted to JSON.
We combine the original longitude and latitude fields to a new location field of type geo_point.

Example Document

{
  "geonameid": 2986043,
  "name": "Pic de Font Blanca",
  "asciiname": "Pic de Font Blanca",
  "alternatenames": "Pic de Font Blanca,Pic du Port",
  "feature_class": "T",
  "feature_code": "PK",
  "country_code": "AD",
  "admin1_code": "00",
  "population": 0,
  "dem": "2860",
  "timezone": "Europe/Andorra",
  "location": [
    1.53335,
    42.64991
  ]
}

Parameters

This track allows to overwrite the following parameters with Rally 0.8.0+ using --track-params:

bulk_size (default: 5000)
bulk_indexing_clients (default: 8): Number of clients that issue bulk indexing requests.
ingest_percentage (default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested.
conflicts (default: "random"): Type of id conflicts to simulate. Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id).
conflict_probability (default: 25): A number between 0 and 100 that defines the probability of id conflicts. This requires to run the respective challenge. Combining conflicts=sequential and conflict-probability=0 makes Rally generate index ids by itself, instead of relying on Elasticsearch's automatic id generation <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#_automatic_id_generation>_.
on_conflict (default: "index"): Whether to use an "index" or an "update" action when simulating an id conflict.
recency (default: 0): A number between 0 and 1 that defines whether to bias towards more recent ids when simulating conflicts. See the Rally docs for the full definition of this parameter. This requires to run the respective challenge.
number_of_replicas (default: 0)
number_of_shards (default: 5)
max_num_segments: The maximum number of segments to force-merge to.
source_enabled (default: true): A boolean defining whether the _source field is stored in the index.
index_settings: A list of index settings. Index settings defined elsewhere (e.g. number_of_replicas) need to be overridden explicitly.
cluster_health (default: "green"): The minimum required cluster health.
error_level (default: "non-fatal"): Available for bulk operations only to specify ignore-response-error-level.
post_ingest_sleep (default: false): Whether to pause after ingest and prior to subsequent operations.
post_ingest_sleep_duration (default: 30): Sleep duration in seconds.
include_non_serverless_index_settings (default: true for non-serverless clusters, false for serverless clusters): Whether to include non-serverless index settings.
include_force_merge (default: true for non-serverless clusters, false for serverless clusters): Whether to include force merge operation.
include_target_throughput (default: true for non-serverless clusters, false for serverless clusters): Whether to apply target throughput.

License

We use the same license for the data as the original data from Geonames:

This work is licensed under a Creative Commons Attribution 3.0 License,
see http://creativecommons.org/licenses/by/3.0/
The Data is provided "as is" without warranty or any representation of accuracy, timeliness or completeness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

geonames

geonames

README.md

Geonames track

Example Document

Parameters

License

Files

geonames

Directory actions

More options

Directory actions

More options

Latest commit

History

geonames

Folders and files

parent directory

README.md

Geonames track

Example Document

Parameters

License