Skip to content

Files

This branch is 47 commits behind elastic/rally-tracks:master.

geonames

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Dec 20, 2023
Jan 23, 2023
Dec 20, 2023
Nov 8, 2017
Dec 20, 2023
May 2, 2017
Dec 20, 2023
Aug 30, 2022

Geonames track

This track is based on a geonames dump of the file allCountries.zip retrieved as of April 27, 2017.

For further details about the semantics of individual fields, please see the geonames dump README.

Modifications:

  • The original CSV data have been converted to JSON.
  • We combine the original longitude and latitude fields to a new location field of type geo_point.

Example Document

{
  "geonameid": 2986043,
  "name": "Pic de Font Blanca",
  "asciiname": "Pic de Font Blanca",
  "alternatenames": "Pic de Font Blanca,Pic du Port",
  "feature_class": "T",
  "feature_code": "PK",
  "country_code": "AD",
  "admin1_code": "00",
  "population": 0,
  "dem": "2860",
  "timezone": "Europe/Andorra",
  "location": [
    1.53335,
    42.64991
  ]
}

Parameters

This track allows to overwrite the following parameters with Rally 0.8.0+ using --track-params:

  • bulk_size (default: 5000)
  • bulk_indexing_clients (default: 8): Number of clients that issue bulk indexing requests.
  • ingest_percentage (default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested.
  • conflicts (default: "random"): Type of id conflicts to simulate. Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id).
  • conflict_probability (default: 25): A number between 0 and 100 that defines the probability of id conflicts. This requires to run the respective challenge. Combining conflicts=sequential and conflict-probability=0 makes Rally generate index ids by itself, instead of relying on Elasticsearch's automatic id generation <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#_automatic_id_generation>_.
  • on_conflict (default: "index"): Whether to use an "index" or an "update" action when simulating an id conflict.
  • recency (default: 0): A number between 0 and 1 that defines whether to bias towards more recent ids when simulating conflicts. See the Rally docs for the full definition of this parameter. This requires to run the respective challenge.
  • number_of_replicas (default: 0)
  • number_of_shards (default: 5)
  • max_num_segments: The maximum number of segments to force-merge to.
  • source_enabled (default: true): A boolean defining whether the _source field is stored in the index.
  • index_settings: A list of index settings. Index settings defined elsewhere (e.g. number_of_replicas) need to be overridden explicitly.
  • cluster_health (default: "green"): The minimum required cluster health.
  • error_level (default: "non-fatal"): Available for bulk operations only to specify ignore-response-error-level.
  • post_ingest_sleep (default: false): Whether to pause after ingest and prior to subsequent operations.
  • post_ingest_sleep_duration (default: 30): Sleep duration in seconds.
  • include_non_serverless_index_settings (default: true for non-serverless clusters, false for serverless clusters): Whether to include non-serverless index settings.
  • include_force_merge (default: true for non-serverless clusters, false for serverless clusters): Whether to include force merge operation.
  • include_target_throughput (default: true for non-serverless clusters, false for serverless clusters): Whether to apply target throughput.

License

We use the same license for the data as the original data from Geonames:

This work is licensed under a Creative Commons Attribution 3.0 License,
see http://creativecommons.org/licenses/by/3.0/
The Data is provided "as is" without warranty or any representation of accuracy, timeliness or completeness.