-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Cassandra support for Autocomplete tags #2309
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good start... next in-memory I think as then we can add some tests
zipkin-storage/cassandra/src/main/resources/zipkin2-schema-indexes.cql
Outdated
Show resolved
Hide resolved
return new SelectTagKeys(factory); | ||
} | ||
|
||
static class AccumulateTagsAllResults extends AccumulateAllResults<List<String>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible refactor possibility, as I recognise this class :P
Factory(Session session) { | ||
this.session = session; | ||
this.preparedStatement = | ||
session.prepare(QueryBuilder.select("key").distinct().from(TABLE_TAGS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the cardinality of tags will be? But this won't scale.
But, for example, if there's 1 million tags in a Zipkin storage this full-table scan is going to be painful (if not timeout).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I would do a progressive query (by pagination but only simple one, no ordering and deifnirively not with offset
but with after
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to have a whitelist of tags to index and not just index all tags you might get away without doing this at all. And just return the list of whitelisted tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the whitelist implementation. Let me know your thoughts.
I have added the in-memory api. I will revisit the cassandra tomorrow and address the above concerns. |
made a comment about fixed cardinality.. we definitely need to document this as it is indeed inappropriate for unbounded. #2236 (comment) One thing @zeagord and I discussed is initially inheriting the config for the other names (service/span). This is for simplicity. In the future we could add a timestamp/lookback parameter to only fetch the values for a range. However, same problem would apply to service/span so thinking of that later |
Need to add tests and revisit the Elastic search design. |
zipkin-server/src/main/java/zipkin2/server/internal/ZipkinQueryApiV2.java
Outdated
Show resolved
Hide resolved
In elasticsearch, how about using |
reuse _q field in elasticsearch
we can think about it but the performance might be bad.
for example, getting the key names would require an expression I am not
sure how to express unless we hard code the possible key names.
if we hard code the possible key names yes it could work, but we have to
check the performance of scanning all span documents to get the values
|
good news is we can try it. actually I think we need to hard code key names
anyway especially in Cassandra.
…On Mon, 10 Dec 2018, 08:27 Adrian Cole ***@***.*** wrote:
> reuse _q field in elasticsearch
we can think about it but the performance might be bad.
for example, getting the key names would require an expression I am not
sure how to express unless we hard code the possible key names.
if we hard code the possible key names yes it could work, but we have to
check the performance of scanning all span documents to get the values
|
so on cassandra (and elasticsearch) we'll need to ensure the "deduper" is in use to avoid thrashing writes. In both cases, it might be helpful to reverse-engineer the service-span mapping to re-use the same table. ex PRIMARY KEY ((type, key), value) This could make data management in general easier long term. Since there is a time bomb on elasticsearch #2219, we might want to solve that first before merging this (or at least before cutting a release with it). Meanwhile, we can allow UI testing to work with static managed list of tags. (ex there may be only several values associated with phase, for example.. so one way is to allow the UI to configure predefined where it is small) |
Bodyconverters for ES looks for "key" from the result of aggregations which clashes with the key in the tag {k,v}. It could be the name of the aggregation. I will change the name of the aggs alone and see if it works. |
Factory(Session session) { | ||
this.session = session; | ||
this.preparedStatement = | ||
session.prepare(QueryBuilder.select("key").distinct().from(TABLE_TAGS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to have a whitelist of tags to index and not just index all tags you might get away without doing this at all. And just return the list of whitelisted tags.
zipkin-storage/cassandra-v1/src/main/resources/cassandra-schema-cql3-upgrade-1.txt
Outdated
Show resolved
Hide resolved
if service-span and tags are collapsed to one table it does make full table scans (eg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made most of a pass!
zipkin-storage/cassandra-v1/src/main/java/zipkin2/storage/cassandra/v1/CassandraStorage.java
Show resolved
Hide resolved
...n-storage/cassandra-v1/src/main/java/zipkin2/storage/cassandra/v1/CassandraSpanConsumer.java
Outdated
Show resolved
Hide resolved
zipkin-storage/cassandra-v1/src/main/java/zipkin2/storage/cassandra/v1/CassandraTagStore.java
Outdated
Show resolved
Hide resolved
zipkin-storage/cassandra-v1/src/main/java/zipkin2/storage/cassandra/v1/CassandraTagStore.java
Outdated
Show resolved
Hide resolved
zipkin-storage/cassandra-v1/src/main/java/zipkin2/storage/cassandra/v1/CassandraTagStore.java
Outdated
Show resolved
Hide resolved
zipkin-storage/cassandra/src/main/java/zipkin2/storage/cassandra/CassandraSpanConsumer.java
Outdated
Show resolved
Hide resolved
zipkin-storage/cassandra-v1/src/main/java/zipkin2/storage/cassandra/v1/InsertTags.java
Outdated
Show resolved
Hide resolved
zipkin-storage/cassandra-v1/src/main/java/zipkin2/storage/cassandra/v1/SelectTagValues.java
Outdated
Show resolved
Hide resolved
zipkin-storage/cassandra-v1/src/main/java/zipkin2/storage/cassandra/v1/CassandraTagStore.java
Outdated
Show resolved
Hide resolved
zipkin-storage/cassandra/src/main/java/zipkin2/storage/cassandra/CassandraSpanConsumer.java
Outdated
Show resolved
Hide resolved
zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/ElasticsearchTagStore.java
Outdated
Show resolved
Hide resolved
this chops off the basic functionality and will allow the UI work to start immediately when merged. I can help rework the other commits similarly #2332 |
10c728f
to
fcceb20
Compare
fcceb20
to
109c40f
Compare
I think this is nearly ready. we need to test the auto-upgrade logic and also update the README files to talk about how autocomplete works |
5119fd7
to
3ea8062
Compare
Code changes LGTM. One comment though: most of the diff is whitespace/code-style changes. For the reviewer there's a huge waste of time reading diffs that have nothing to do with the actual PR. It would be great if those changes where separated out to a separate follow-up commit (still within the PR). That way by reviewing just the first commit of the PR a lot of time could be saved. |
@michaelsembwever sorry about the formatting thing. I was in a rush to get something stable before I turned off internet for the vacation, but that amplified efforts to others.. not sure the better call but I apologize nevertheless. thanks for reviewing despite this. |
FYI travis is failing still on the same tests as last push this needs to be looked into prior to merge |
0745564
to
cf61fdc
Compare
No description provided.