Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase data sizes #138

Merged
merged 7 commits into from
Jun 17, 2019
Merged

Increase data sizes #138

merged 7 commits into from
Jun 17, 2019

Conversation

dahankzter
Copy link
Contributor

@dahankzter dahankzter commented Jun 13, 2019

This is an attempt at simply generate more data on disk. Tuples and UDT's have more fields and stringly types are made significantly longer.

Fixes: #121
Fixes: #75

@dahankzter dahankzter requested a review from penberg June 13, 2019 10:50
The number of partition keys, clustering keys and regular columns
can now be configured by the user via CLI args.
schema.go Outdated
)

func GenSchema(cs *CompactionStrategy) *Schema {
func GenSchema(cs *CompactionStrategy, maxPartitionKeys, maxClusteringKeys, maxColumns int) *Schema {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could wrap these configuration parameters to a SchemaConfig struct in a follow-up patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and while it gives significant refactoring possibilities it also allows the developer to "forget to apply" a certain config but that can happen with an argument as well I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it now. It's a quick fix and makes it much nicer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaConfig added.

schema.go Outdated
}
var mvs []MaterializedView
numMvs := 1
for i := 0; i < numMvs; i++ {
col, err := validMVColumn()
if err != nil {
fmt.Println(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something more structured for error reporting?

Copy link
Contributor Author

@dahankzter dahankzter Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what to do in this case though. It simply means that we didn't generate a proper column that can take part in the MV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better message added.

@@ -39,7 +39,10 @@ const (
TYPE_VARCHAR = SimpleType("varchar")
TYPE_VARINT = SimpleType("varint")

MaxUDTParts = 10
MaxStringLength = 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limits blob maximum size too, no? We should allow much larger blobs (for example, 1 MB or even 10 MB).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I can add a separate one for blobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 1e6 bytes now perhaps enough to start with?

@penberg
Copy link
Contributor

penberg commented Jun 17, 2019

Looks good, please merge.

@dahankzter dahankzter merged commit fad33c4 into master Jun 17, 2019
@penberg penberg deleted the increase_data_sizes branch December 11, 2019 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Low rate of db population of 1TB dataset Column count configuration
2 participants