Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest DateProcessor (small) speedup, optimize collections code in DateFormatter.forPattern #91521

Merged
merged 6 commits into from
Nov 14, 2022

Conversation

joegallo
Copy link
Contributor

@joegallo joegallo commented Nov 10, 2022

  • Changes the return type of splitCombinedPatterns to avoid an array->collection conversion
  • Direct for-each iteration over a collection in forPattern rather than the streams approach

It's a relatively small change to the code, but since we run this each time a DateProcessor executes on an IngestDocument, it's a pretty hot code path. In my little microbenchmark it cuts ~20% of the time spent in a DateProcessor, or about 1% of the ingest time overall (but again for emphasis, that's just my little microbenchmark).

Here's the before/after flamegraph:

Screen Shot 2022-11-10 at 2 48 13 PM

Screen Shot 2022-11-10 at 2 49 20 PM

to match how it's done on the previous ternary for timezone
to keep with the return type from delimitedListToStringArray, rather
than wrapping it
This is still pretty simple as a for each, and the for each is faster.
@joegallo joegallo added >non-issue :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.6.0 labels Nov 10, 2022
@joegallo joegallo requested a review from grcevski November 10, 2022 20:00
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Nov 10, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I would just add a comment on line 116 in forPattern to say it's performance sensitive code ;), just so that we don't undo it by a refactor in the future.

@joegallo joegallo merged commit 0ee5c76 into elastic:main Nov 14, 2022
@joegallo joegallo deleted the ingest-date-processor-speedup branch November 14, 2022 17:58
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Nov 15, 2022
* main: (163 commits)
  [DOCS] Edits frequent items aggregation (elastic#91564)
  Handle providers of optional services in ubermodule classloader (elastic#91217)
  Add `exportDockerImages` lifecycle task for exporting docker tarballs (elastic#91571)
  Fix CSV dependency report output file location in DRA CI job
  Fix variable placeholder for Strings.format calls (elastic#91531)
  Fix output dir creation in ConcatFileTask (elastic#91568)
  Fix declaration of dependencies in DRA snapshots CI job (elastic#91569)
  Upgrade Gradle Enterprise plugin to 3.11.4 (elastic#91435)
  Ingest DateProcessor (small) speedup, optimize collections code in DateFormatter.forPattern (elastic#91521)
  Fix inter project handling of generateDependenciesReport (elastic#91555)
  [Synthetics] Add synthetics-* read to fleet-server (elastic#91391)
  [ML] Copy more settings when creating DF analytics destination index (elastic#91546)
  Reduce CartesianCentroidIT flakiness (elastic#91553)
  Propagate last node to reinitialized routing tables (elastic#91549)
  Forecast write load during rollovers (elastic#91425)
  [DOCS] Warn about potential overhead of named queries (elastic#91512)
  Datastream unavailable exception metadata (elastic#91461)
  Generate docker images and dependency report in DRA ci job (elastic#91545)
  Support cartesian_bounds aggregation on point and shape (elastic#91298)
  Add support for EQL samples queries (elastic#91312)
  ...

# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue Team:Data Management Meta label for data/management team v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants