-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use span name table instead of materialized view #1374
Conversation
llinder
commented
Oct 31, 2016
- This switches span name and service name lookups to use a table instead of a materialized view. This change will fix Empty span name list in UI with Cassandra3 storage component #1360
- Change limit logic to limit on a set of trace IDs instead of limiting on the provided collection first. In pracitce this didn't make a noticiable difference in the results but it seems like the intended logic.
merging tomorrow unless someone says not too cc @openzipkin/cassandra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to check the README to ensure nothing is invalidated there
please hold @adriancole |
@michaelsembwever no problemo |
This looks good. It's a shame that MV didn't work out, but i can't see a way around it compared to the simplicity of a manually denormalised table. Am updating the stress yaml files appropriately for the schema change. Looking into compatibility too. I see no incompatibility, although the MV should be dropped, and make sure the table gets created within an existing keyspace. |
Stress resultsVery rough runs on a dual-core laptop. Write OnlyExisting
Changes here
Write&ReadExisting
Changes here
|
A concern i've had with the cassandra3 schema is the partition keys of This problem can be addressed, if need be, on the schema proposed here by re-introducing the |
The stress tests are interesting in that they (potentially) show that the schema proposed here is slower than the existing materialised view. I did not expect this. I wonder if some of the complexity and its latency to the materialised view is hidden from these results. I have no objections to accepting the new schema and any (potentially) small lost in performance. |
I pushed my commit to this branch containing the updates to the stress yaml profiles. I didn't actually mean to push it to your fork. I had no idea I could do that… @llinder |
} | ||
} | ||
|
||
static BigInteger traceId(Span span) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a rebase on master, as this shifted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically this is in CassandraUtil now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks necessary to keep the experience up. We can re-introduce the deduper as a separate change or add it here. If we add it here, let's make sure the test is carried over, too.
just needs a rebase to buff-out the trace id thing.
@michaelsembwever thanks for updating the stress yaml profiles and running a comparison test. It is indeed interesting to see the results of the stress test. Really wish the MV worked as well, especially given it appears to perform better as well. I will review the Regarding the schema change. Is the preference to create a new CQL that contains the drop/add statements? Thanks for the detailed review and comments! |
- This switches span name and service name lookups to use a table instead of a materialized view. This change will fix openzipkin#1360 - Change limit logic to limit on a set of trace IDs instead of limiting on the provided collection first. In pracitce this didn't make a noticiable difference in the results but it seems like the intended logic.
Rebased the latest upstream changes on master. Also added the deduper on writes writes on I think the only outstanding thing to address is the schema changes. Given that this is still marked experimental do we want to start maintaining schema change sets? If thats what is preferred I will pull the table/view changes out to a separate file. |
Let's leave the DeduplicatingExecutor out for now.
Check out how the Schema class in the old cassandra storage dealt with upgrades. |
I don't think we should be dealing with upgrades at this point. One of the reasons we merged earlier was under the assumption that this is experimental. With that in mind, there should be no expectations around schema upgrade logic (that tends to very much clutter the code when in place) |
} catch (RuntimeException ex) { | ||
return Futures.immediateFailedFuture(ex); | ||
} | ||
} | ||
|
||
static BigInteger traceId(Span span) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's still some drift here, but I'll take care of it post merge
This reverts #1374 and associated cleanup as the tests are failing. It wasn't noticed earlier that circleci wasn't running cassandra3 tests! https://circleci.com/gh/openzipkin/zipkin/374 This reverts commit 64abfdd. This reverts commit f9c7af1. This reverts commit b2e3425.
@llinder @michaelsembwever @abesto I think there's something wrong with our circleci or something as the cassandra3 tests skip. This led me to merge on false-green. Today, I noticed locally that the tests fail, which is bad, so I reverted this commit and the associated cleanups. sorry about the confusion. Once it is working locally, please raise another PR. Probably best to start by reverting the revert commit above, so you can start with cleanups applied. |
+1 |
I'm not too sure what happened with this merge, but it did not include my commit. |
- This switches span name and service name lookups to use a table instead of a materialized view. This change will fix #1360 - Change limit logic to limit on a set of trace IDs instead of limiting on the provided collection first. In pracitce this didn't make a noticiable difference in the results but it seems like the intended logic. - update stress profiles for new span_name_by_service tables references: - #1392 - #1360 - #1374
Re-created in #1392 |
- This switches span name and service name lookups to use a table instead of a materialized view. This change will fix #1360 - Change limit logic to limit on a set of trace IDs instead of limiting on the provided collection first. In pracitce this didn't make a noticiable difference in the results but it seems like the intended logic. - update stress profiles for new span_name_by_service tables references: - #1392 - #1360 - #1374