Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Cassandra db schema on session initialization #5922

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

akstron
Copy link
Contributor

@akstron akstron commented Sep 2, 2024

Create Schema (if not present) on Session Initialization

Once a session is established with cassandra db, the added code parses the template file containing queries for creating schema and create queries out of it. Post which it executes those queries to create the required types and tables.

Which problem is this PR solving?

Resolves #5797

Description of the changes

  • The PR includes the following changes:
    1. Embedding template files into binary
    1. Creation of database schema in initialization steps once session to database is established.

How was this change tested?

  • Schema rendering is being tested with unit test.

Checklist

plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
datacenter string
trace_ttl int
dependencies_ttl int
keyspace string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should already be defined in the main config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about this config: jaeger/cmd/jaeger/config-cassandra.yaml ? How would I know what all config key-value pairs are already defined?

Copy link
Contributor Author

@akstron akstron Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yurishkuro , I needed some help here. Is this: jaeger/cmd/jaeger/internal/all-in-one.yaml the example for the config file which would be used in v2? If yes, under what section should I expect the cassandra specific configs to be put? I couldn't find any example for this in the current code base.

Just to get the complete picture, would the schema configs I am adding look something like this:

schema:
    datacenter: <datacenterName>
    keyspace: <keyspace>
    .......

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd/jaeger/config-cassandra.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if the current place of defining it works?

plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
return result
}

func constructQueriesFromTemplateFiles(session cassandra.Session, params *StorageConfigParams) ([]cassandra.Query, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cassandra.Session not able to execute multiple queries at once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about running individual queries in parallel or a batch query option? I didn't try executing parallel queries.

plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
@akstron akstron force-pushed the create-database-scheme-cassandra branch 2 times, most recently from 69275fb to bcad4c0 Compare October 25, 2024 13:34
@akstron akstron marked this pull request as ready for review October 25, 2024 13:52
@akstron akstron requested a review from a team as a code owner October 25, 2024 13:52
@@ -24,6 +24,12 @@ extensions:
cassandra:
schema:
keyspace: "jaeger_v1_dc1"
datacenter: "test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add these values somewhere in create default config function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to func DefaultConfiguration. Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case do you need the values to still be defined here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I was thinking this acts as a quick reference to how the config file should look. I will remove this since this may not be required.

@@ -51,13 +51,26 @@ type Connection struct {
ProtoVersion int `mapstructure:"proto_version"`
}

type SchemaConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments to each field with sample values/units.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

pkg/cassandra/config/config.go Outdated Show resolved Hide resolved
}

for _, query := range casQueries {
err := query.Exec()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a corresponding table already exists, will this be a no-op or will it update the settings in the table? We need to allow users to be able to customize the settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the table creation queries use: CREATE TABLE IF NOT EXISTS, so this will be a no-op.

@akstron akstron force-pushed the create-database-scheme-cassandra branch from f9dd90e to 90368b1 Compare October 28, 2024 10:46
…ution for initialize database

Signed-off-by: Alok Kumar Singh <[email protected]>
@akstron akstron force-pushed the create-database-scheme-cassandra branch from 90368b1 to afc786d Compare October 28, 2024 10:50
//Datacenter is the name for network topology
Datacenter string `mapstructure:"datacenter" valid:"optional"`
// TraceTTL is Time To Live (TTL) for the trace data in seconds
TraceTTL int `mapstructure:"trace_ttl" valid:"optional"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the type here be time.Duration so that the user could specify 72h?

Copy link
Contributor Author

@akstron akstron Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that there should be a validation to make sure that user can't specify something like "ms"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the point of using strong type - the business logic does not need to worry about validations, they should happen separately during parsing. I believe if you simply change int to time.Duration and use 10ms as a value in YAML, the parser will do the right thing.

// CasVersion is version of cassandra used
CasVersion int `mapstructure:"cas_version" valid:"optional"`
// CompactionWindow of format "^[0-9]+[mhd]$" tells the compaction window of the db
CompactionWindow string `mapstructure:"compaction_window" valid:"optional"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is defining a time interval? Can we then also use time.Duration type?

// CompactionWindowSize is the numberical part of CompactionWindow. Extracted from CompactionWindow
CompactionWindowSize int `mapstructure:"compaction_window_size" valid:"optional"`
// CompactionWindowUnit is the time unit of the CompactionWindow. Extracted from CompactionWindow
CompactionWindowUnit string `mapstructure:"compaction_window_unit" valid:"optional"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use time.Duration type we will not need the unit separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Size and Unit seems to be derived fields, if so they should not be declared in the Config struct. You can create a different struct for use with Go templates, e.g.

type TemplateParams struct {
  config.Schema // embed config
  // add other derived fields
}

Copy link
Contributor Author

@akstron akstron Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TemplateParams with embedded config.Schema and other derived fields.

@@ -0,0 +1,84 @@
-- There are total 9 queries here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "test"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Improved test and changed name in recent commit] The test is to check if the extracted query strings are same as the expected query strings.

traceTTLMinutes := cfg.TraceTTL / 60

cfg.CompactionWindowSize = (traceTTLMinutes + 30 - 1) / 30
cfg.CompactionWindowUnit = "m"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below you use "MINUTES", is "m" also valid?

Copy link
Contributor Author

@akstron akstron Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, m, h and d are valid.

Signed-off-by: Alok Kumar Singh <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
@@ -0,0 +1,43 @@
-- There are total 4 queries here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of test schema?

Copy link
Contributor Author

@akstron akstron Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of testing on all the queries and manually creating expected result for all of them, I reduced the number of queries to ease the creation of expected result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't really make sense to me. We can either test that the template is correct or that the code using the template is correct. You're not doing the former by having another template, and the latter you can do with the primary template. We don't need to validate that the output of running template is "as expected", that's like testing the Go template package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to validate that the output of running template is "as expected", that's like testing the Go template package.
We are also removing comments and constructing individual query strings out of it by iterating over lines in the template output. The test checks this "individual query string" construction, which not just involves using the template package.

and the latter you can do with the primary template.
Sure, I think I can write an integration test for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create database scheme in Cassandra automatically
3 participants