Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch: Make the span range configurable #91

Merged

Conversation

johanneswuerbach
Copy link
Contributor

Which problem is this PR solving?

We don't want to load the multiple GBs/day into memory a day to determine our dependency graph. Instead we are fine with an approximation based on the connections we saw in the past 2h.

Short description of the changes

Make the loaded span range configurable. Query taken from #86 ❤️

@johanneswuerbach
Copy link
Contributor Author

Looks like those tests are also failing in master https://travis-ci.org/github/jaegertracing/spark-dependencies/builds/607266470, testing the image manually at the moment.

README.md Outdated
Defaults to false
* `ES_INDEX_PREFIX`: index prefix of Jaeger indices. By default unset.
* `ES_SPAN_RANGE`: How far into the past should the job look, the maximum and default is 24h.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a link to ES docs that lists what values can be used in this query?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename this to ES_TIME_RANGE? And document that span start time is used for the comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, I struggled a bit with wording. How does the change sound?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good 👍

@@ -219,7 +228,9 @@ void run(String[] spanIndices, String[] depIndices,String peerServiceTag) {
String spanIndex = spanIndices[i];
String depIndex = depIndices[i];
log.info("Running Dependencies job for {}, reading from {} index, result storing to {}", day, spanIndex, depIndex);
JavaPairRDD<String, Iterable<Span>> traces = JavaEsSpark.esJsonRDD(sc, spanIndex)
// Send raw query to ES to select only the docs / spans we want to consider for this job
String esQuery = String.format("{\"range\": {\"startTimeMillis\": { \"gte\": \"now-%s\" }}}", spanRange);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment in the sense that it does not change the default behavior bc the daily indices are max 24h.

@johanneswuerbach johanneswuerbach force-pushed the configurable-range branch 3 times, most recently from e686110 to 1d37d18 Compare June 9, 2020 09:23
@pavolloffay pavolloffay closed this Jun 9, 2020
@pavolloffay pavolloffay reopened this Jun 9, 2020
@pavolloffay
Copy link
Member

Reopened the PR to trigger the travis.

If the travis does not start could you please change the commit SHA to trigger it again?

@johanneswuerbach
Copy link
Contributor Author

johanneswuerbach commented Jun 9, 2020

Pushed a few times before noticing that it actually seems to trigger a CI run https://travis-ci.org/github/jaegertracing/spark-dependencies/pull_requests :-(, but the status isn't reflected here.

@johanneswuerbach
Copy link
Contributor Author

@pavolloffay looks like travis passed now 😍

@pavolloffay
Copy link
Member

Travis had a good day finally 😄

Thanks for the PR @johanneswuerbach

@pavolloffay pavolloffay merged commit 6333604 into jaegertracing:master Jun 10, 2020
@johanneswuerbach
Copy link
Contributor Author

@pavolloffay any chance the new version could be pushed to docker hub? That would awesome 🙇

@pavolloffay
Copy link
Member

It should be there. There is only latest tag that is built on every merge to the master branch.

@johanneswuerbach
Copy link
Contributor Author

I already checked the docker hub, but it seems the last build was 8 months ago https://hub.docker.com/r/jaegertracing/spark-dependencies/builds

When I pull the latest image I also don't see those changes.

Maybe there is something broken with the integration?

@pavolloffay
Copy link
Member

Probably I will look into it

@pavolloffay
Copy link
Member

#96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants