Skip to content

Haystack Traces: Supporting Multiple Backends

Ayan Sen edited this page Nov 26, 2018 · 7 revisions

Limitations of the current design

  • Currently all the backend driver code resides as part of the haystack traces codebase, hence adding more drives can lead to dependency conflicts.
  • As of now we support a single cassandra cluster as a backend, with increase in the incoming trace data this could be a challenge since scaling cassandra or for that matter any other backend isn't a trivial thing to do.

High Level Architecture

The diagram below shows high level to depict how we plan to change the haystack traces component to support multiple storage backends

Design Highlights

  • We plan to have a sidecar container running with the indexer and reader components which is responsible for actually persisting the data in the persistent store and retrieving the data from it. The components talk to the sidecar using grpc - This makes adding more backends easier, where any can write a grpc service with a new backend and update the sidecar used during deployments

  • The cassandra backend we are rewriting in a way that it could write to multiple cassandra clusters in a round robin manner and the reader would do a scatter and gather from all the cassandra clusters(All this code would be abstracted in the cassandra-storage-backend app) - This makes it easier to scale where we can just spin up a new cassandra cluster without having to add nodes to the existing cluster.

    Note this behaviour is not mandated by the code where if on chooses they can write to a single cassandra cluster too like in today's case.