Skip to content

Commit

Permalink
Docs: Rewrote the migrating-to-2.0 section
Browse files Browse the repository at this point in the history
  • Loading branch information
clintongormley committed Aug 14, 2015
1 parent 0240b58 commit db1e838
Show file tree
Hide file tree
Showing 16 changed files with 1,545 additions and 982 deletions.
1,004 changes: 22 additions & 982 deletions docs/reference/migration/migrate_2_0.asciidoc

Large diffs are not rendered by default.

69 changes: 69 additions & 0 deletions docs/reference/migration/migrate_2_0/aggs.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
=== Aggregation changes

==== Min doc count defaults to zero

Both the `histogram` and `date_histogram` aggregations now have a default
`min_doc_count` of `0` instead of `1`.

==== Timezone for date field

Specifying the `time_zone` parameter in queries or aggregations on fields of
type `date` must now be either an ISO 8601 UTC offset, or a timezone id. For
example, the value `+1:00` must now be written as `+01:00`.

==== Time zones and offsets

The `histogram` and the `date_histogram` aggregation now support a simplified
`offset` option that replaces the previous `pre_offset` and `post_offset`
rounding options. Instead of having to specify two separate offset shifts of
the underlying buckets, the `offset` option moves the bucket boundaries in
positive or negative direction depending on its argument.

The `date_histogram` options for `pre_zone` and `post_zone` are replaced by
the `time_zone` option. The behavior of `time_zone` is equivalent to the
former `pre_zone` option. Setting `time_zone` to a value like "+01:00" now
will lead to the bucket calculations being applied in the specified time zone.
The `key` is returned as the timestamp in UTC, but the `key_as_string` is
returned in the time zone specified.

In addition to this, the `pre_zone_adjust_large_interval` is removed because
we now always return dates and bucket keys in UTC.

==== Including/excluding terms

`include`/`exclude` filtering on the `terms` aggregation now uses the same
syntax as <<regexp-syntax,regexp queries>> instead of the Java regular
expression syntax. While simple regexps should still work, more complex ones
might need some rewriting. Also, the `flags` parameter is no longer supported.

==== Boolean fields

Aggregations on `boolean` fields will now return `0` and `1` as keys, and
`"true"` and `"false"` as string keys. See <<migration-bool-fields>> for more
information.


==== Java aggregation classes

The `date_histogram` aggregation now returns a `Histogram` object in the
response, and the `DateHistogram` class has been removed. Similarly the
`date_range`, `ipv4_range`, and `geo_distance` aggregations all return a
`Range` object in the response, and the `IPV4Range`, `DateRange`, and
`GeoDistance` classes have been removed.

The motivation for this is to have a single response API for the Range and
Histogram aggregations regardless of the type of data being queried. To
support this some changes were made in the `MultiBucketAggregation` interface
which applies to all bucket aggregations:

* The `getKey()` method now returns `Object` instead of `String`. The actual
object type returned depends on the type of aggregation requested (e.g. the
`date_histogram` will return a `DateTime` object for this method whereas a
`histogram` will return a `Number`).
* A `getKeyAsString()` method has been added to return the String
representation of the key.
* All other `getKeyAsX()` methods have been removed.
* The `getBucketAsKey(String)` methods have been removed on all aggregations
except the `filters` and `terms` aggregations.


129 changes: 129 additions & 0 deletions docs/reference/migration/migrate_2_0/crud.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
=== CRUD and routing changes

==== Explicit custom routing

Custom `routing` values can no longer be extracted from the document body, but
must be specified explicitly as part of the query string, or in the metadata
line in the <<docs-bulk,`bulk`>> API. See <<migration-meta-fields>> for an
example.

==== Routing hash function

The default hash function that is used for routing has been changed from
`djb2` to `murmur3`. This change should be transparent unless you relied on
very specific properties of `djb2`. This will help ensure a better balance of
the document counts between shards.

In addition, the following routing-related node settings have been deprecated:

`cluster.routing.operation.hash.type`::

This was an undocumented setting that allowed to configure which hash function
to use for routing. `murmur3` is now enforced on new indices.

`cluster.routing.operation.use_type`::

This was an undocumented setting that allowed to take the `_type` of the
document into account when computing its shard (default: `false`). `false` is
now enforced on new indices.

==== Delete API with custom routing

The delete API used to be broadcast to all shards in the index which meant
that, when using custom routing, the `routing` parameter was optional. Now,
the delete request is forwarded only to the document holding the shard. If you
are using custom routing then you should specify the `routing` value when
deleting a document, just as is already required for the `index`, `create`,
and `update` APIs.

To make sure that you never forget a routing value, make routing required with
the following mapping:

[source,js]
---------------------------
PUT my_index
{
"mappings": {
"my_type": {
"_routing": {
"required": true
}
}
}
}
---------------------------

==== All stored meta-fields returned by default

Previously, meta-fields like `_routing`, `_timestamp`, etc would only be
included in a GET request if specifically requested with the `fields`
parameter. Now, all meta-fields which have stored values will be returned by
default. Additionally, they are now returned at the top level (along with
`_index`, `_type`, and `_id`) instead of in the `fields` element.

For instance, the following request:

[source,sh]
---------------
GET /my_index/my_type/1
---------------

might return:

[source,js]
---------------
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_timestamp": 10000000, <1>,
"_source": {
"foo" : [ "bar" ]
}
}
---------------
<1> The `_timestamp` is returned by default, and at the top level.


==== Async replication

The `replication` parameter has been removed from all CRUD operations
(`index`, `create`, `update`, `delete`, `bulk`) as it interfered with the
<<indices-synced-flush,synced flush>> feature. These operations are now
synchronous only and a request will only return once the changes have been
replicated to all active shards in the shard group.

Instead, use more client processes to send more requests in parallel.

==== Documents must be specified without a type wrapper

Previously, the document body could be wrapped in another object with the name
of the `type`:

[source,js]
--------------------------
PUT my_index/my_type/1
{
"my_type": { <1>
"text": "quick brown fox"
}
}
--------------------------
<1> This `my_type` wrapper is not part of the document itself, but represents the document type.

This feature was deprecated before but could be reenabled with the
`mapping.allow_type_wrapper` index setting. This setting is no longer
supported. The above document should be indexed as follows:

[source,js]
--------------------------
PUT my_index/my_type/1
{
"text": "quick brown fox"
}
--------------------------

==== Term Vectors API

Usage of `/_termvector` is deprecated in favor of `/_termvectors`.

42 changes: 42 additions & 0 deletions docs/reference/migration/migrate_2_0/index_apis.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
=== Index API changes

==== Index aliases


Fields used in alias filters no longer have to exist in the mapping at alias
creation time. Previously, alias filters were parsed at alias creation time
and the parsed form was cached in memory. Now, alias filters are parsed at
request time and the fields in filters are resolved from the current mapping.

This also means that index aliases now support `has_parent` and `has_child`
queries.

The <<alias-retrieving, GET alias api>> will now throw an exception if no
matching aliases are found. This change brings the defaults for this API in
line with the other Indices APIs. The <<multi-index>> options can be used on a
request to change this behavior.

==== File based index templates

Index templates can no longer be configured on disk. Use the
<<indices-templates,`_template`>> API instead.

==== Analyze API changes


The Analyze API now returns the the `position` of the first token as `0`
instead of `1`.

The `prefer_local` parameter has been removed. The `_analyze` API is a light
operation and the caller shouldn't be concerned about whether it executes on
the node that receives the request or another node.

The `text()` method on `AnalyzeRequest` now returns `String[]` instead of
`String`.

==== Removed `id_cache` from clear cache api

The <<indices-clearcache,clear cache>> API no longer supports the `id_cache`
option. Instead, use the `fielddata` option to clear the cache for the
`_parent` field.

76 changes: 76 additions & 0 deletions docs/reference/migration/migrate_2_0/java.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
=== Java API changes

==== Transport API construction

The `TransportClient` construction code has changed, it now uses the builder
pattern. Instead of:

[source,java]
--------------------------------------------------
Settings settings = Settings.settingsBuilder()
.put("cluster.name", "myClusterName").build();
Client client = new TransportClient(settings);
--------------------------------------------------

Use the following:

[source,java]
--------------------------------------------------
Settings settings = Settings.settingsBuilder()
.put("cluster.name", "myClusterName").build();
Client client = TransportClient.builder().settings(settings).build();
--------------------------------------------------

==== Automatically thread client listeners

Previously, the user had to set request listener threads to `true` when on the
client side in order not to block IO threads on heavy operations. This proved
to be very trappy for users, and ended up creating problems that are very hard
to debug.

In 2.0, Elasticsearch automatically threads listeners that are used from the
client when the client is a node client or a transport client. Threading can
no longer be manually set.


==== Query/filter refactoring

`org.elasticsearch.index.queries.FilterBuilders` has been removed as part of the merge of
queries and filters. These filters are now available in `QueryBuilders` with the same name.
All methods that used to accept a `FilterBuilder` now accept a `QueryBuilder` instead.

In addition some query builders have been removed or renamed:

* `commonTerms(...)` renamed with `commonTermsQuery(...)`
* `queryString(...)` renamed with `queryStringQuery(...)`
* `simpleQueryString(...)` renamed with `simpleQueryStringQuery(...)`
* `textPhrase(...)` removed
* `textPhrasePrefix(...)` removed
* `textPhrasePrefixQuery(...)` removed
* `filtered(...)` removed. Use `filteredQuery(...)` instead.
* `inQuery(...)` removed.

==== GetIndexRequest

`GetIndexRequest.features()` now returns an array of Feature Enums instead of an array of String values.

The following deprecated methods have been removed:

* `GetIndexRequest.addFeatures(String[])` - Use
`GetIndexRequest.addFeatures(Feature[])` instead

* `GetIndexRequest.features(String[])` - Use
`GetIndexRequest.features(Feature[])` instead.

* `GetIndexRequestBuilder.addFeatures(String[])` - Use
`GetIndexRequestBuilder.addFeatures(Feature[])` instead.

* `GetIndexRequestBuilder.setFeatures(String[])` - Use
`GetIndexRequestBuilder.setFeatures(Feature[])` instead.


==== BytesQueryBuilder removed

The redundant BytesQueryBuilder has been removed in favour of the
WrapperQueryBuilder internally.

Loading

0 comments on commit db1e838

Please sign in to comment.