Table of contents
Using
dedup
command to remove identical document defined by field from the search result.dedup [int] <field-list> [keepempty=<bool>] [consecutive=<bool>]
- int: optional. The
dedup
command retains multiple events for each combination when you specify <int>. The number for <int> must be greater than 0. If you do not specify a number, only the first occurring event is kept. All other duplicates are removed from the results. Default: 1 - keepempty: optional. if true, keep the document if the any field in the field-list has NULL value or field is MISSING. Default: false.
- consecutive: optional. If set to true, removes only events with duplicate combinations of values that are consecutive. Default: false.
- field-list: mandatory. The comma-delimited field list. At least one field is required.
The example show dedup the document with gender field.
PPL query:
os> source=accounts | dedup gender | fields account_number, gender; fetched rows / total rows = 2/2 +------------------+----------+ | account_number | gender | |------------------+----------| | 1 | M | | 13 | F | +------------------+----------+
The example show dedup the document with gender field keep 2 duplication.
PPL query:
os> source=accounts | dedup 2 gender | fields account_number, gender; fetched rows / total rows = 3/3 +------------------+----------+ | account_number | gender | |------------------+----------| | 1 | M | | 6 | M | | 13 | F | +------------------+----------+
The example show dedup the document by keep null value field.
PPL query:
os> source=accounts | dedup email keepempty=true | fields account_number, email; fetched rows / total rows = 4/4 +------------------+-----------------------+ | account_number | email | |------------------+-----------------------| | 1 | [email protected] | | 6 | [email protected] | | 13 | null | | 18 | [email protected] | +------------------+-----------------------+
The example show dedup the document by ignore the empty value field.
PPL query:
os> source=accounts | dedup email | fields account_number, email; fetched rows / total rows = 3/3 +------------------+-----------------------+ | account_number | email | |------------------+-----------------------| | 1 | [email protected] | | 6 | [email protected] | | 18 | [email protected] | +------------------+-----------------------+
The example show dedup the consecutive document.
PPL query:
os> source=accounts | dedup gender consecutive=true | fields account_number, gender; fetched rows / total rows = 3/3 +------------------+----------+ | account_number | gender | |------------------+----------| | 1 | M | | 13 | F | | 18 | M | +------------------+----------+
The dedup
command is not rewritten to OpenSearch DSL, it is only executed on the coordination node.