-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix problems in GeoIPv2 code #71598
Fix problems in GeoIPv2 code #71598
Conversation
Pinging @elastic/es-core-features (Team:Core/Features) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two small comments. Otherwise LGTM
@@ -219,6 +219,9 @@ private static XContentBuilder mappings() { | |||
.startObject("chunk") | |||
.field("type", "integer") | |||
.endObject() | |||
.startObject("timestamp") | |||
.field("type", "long") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use type date
? This still accepts time in ms since epoch and treats values as date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as timestamp is part of the id now it doesn't need to be indexed separately. I've removed it from mapping
MessageDigest md = MessageDigests.md5(); | ||
for (byte[] buf = getChunk(is); buf.length != 0; buf = getChunk(is)) { | ||
md.update(buf); | ||
client.prepareIndex(DATABASES_INDEX).setId(name + "_" + chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe keep the _id but with timestamp? That way the _id has meaning and if due to some issue we index a document with the same _id then we fail with an error (b/c create=true).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched back to using _id with added timestamp as you suggested
This change fixes number of problems in GeoIPv2 code: - closes streams from Files.list in GeoIpCli, which should fix tests on Windows - makes sure that total download time in GeoIP stats is non-negative (we serialize it as vInt which can cause problems with negative numbers and it can happen when clock was changed during operation) - fixes handling of failed/simultaneous downloads, elastic#69951 was meant as a way to prevent 2 persistent tasks to index chunks but it would prevent any update if single download failed mid indexing, this change uses timestamp (lastUpdate) as sort of UUID. This should still prevent 2 tasks to step on each other toes (overwriting chunks) but in the end still only single task should be able to update task state (this is handled by persistent tasks framework) Closes elastic#71145 # Conflicts: # modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpDownloader.java
* Fix problems in GeoIPv2 code (#71598) This change fixes number of problems in GeoIPv2 code: - closes streams from Files.list in GeoIpCli, which should fix tests on Windows - makes sure that total download time in GeoIP stats is non-negative (we serialize it as vInt which can cause problems with negative numbers and it can happen when clock was changed during operation) - fixes handling of failed/simultaneous downloads, #69951 was meant as a way to prevent 2 persistent tasks to index chunks but it would prevent any update if single download failed mid indexing, this change uses timestamp (lastUpdate) as sort of UUID. This should still prevent 2 tasks to step on each other toes (overwriting chunks) but in the end still only single task should be able to update task state (this is handled by persistent tasks framework) Closes #71145 # Conflicts: # modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpDownloader.java * fix compilation
This change fixes number of problems in GeoIPv2 code:
Files.list
inGeoIpCli
, which should fix tests on WindowsvInt
which can cause problems with negative numbers and it can happen when clock was changed during operation)lastUpdate
) as sort of UUID. This should still prevent 2 tasks to step on each other toes (overwriting chunks) but in the end still only single task should be able to update task state (this is handled by persistent tasks framework)Closes #71145