Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supports multi-region #625

Closed
1 of 6 tasks
v0y4g3r opened this issue Nov 24, 2022 · 9 comments · Fixed by #653
Closed
1 of 6 tasks

Supports multi-region #625

v0y4g3r opened this issue Nov 24, 2022 · 9 comments · Fixed by #653
Assignees
Labels
C-enhancement Category Enhancements tracking-issue A tracking issue for a feature.
Milestone

Comments

@v0y4g3r
Copy link
Contributor

v0y4g3r commented Nov 24, 2022

What type of enhancement is this?

Tech debt reduction

What does the enhancement do?

Currently in GreptimeDB, only one region can be created for a table on one datanode, which will cause data corruption when the number of table partitons exceeds the number of datanodes.

In fact, table engine supports creating multiple regions for one table, but we have to go through all those components on datanodes whether they can accommodate multiple regions.

Implementation challenges

To support multiple regions on datanode, both standalone and distributed mode should be considered.

  • In distributed mode, requests are parsed on frontend, and frontend forward queries to datanode. Thus, datanode's incoming requests must carry region info, for example, the region to insert, regions to query from etc. It should be relatively easy to add this parameters.
  • While in standalone mode, things get a bit tricky. If datanode has only one region, then query engine is not aware of partition rules. But if datanode has multiple regions for one table, then query engine should associate requests with corresponding regions according to partition rules. There are two problems here:
  • In standalone mode, catalog manager does not support storing partition rules, we must add an options field to SystemCatalogTable
    pub struct SystemCatalogTable {
  • Find regions for queries according to partition rules. Currently it's implemented in DistTable. But in standalone mode, requests are directly forwarded to datanode. Thus we need to extract the partition logic to some separate struct to reused it in both distributed mode and standalone mode.
    async fn insert(&self, request: InsertRequest) -> table::Result<usize> {
    let partition_rule = self.find_partition_rule().await.map_err(TableError::new)?;
    let spliter = WriteSpliter::with_partition_rule(partition_rule);
    let inserts = spliter.split(request).map_err(TableError::new)?;
    let result = match self.dist_insert(inserts).await.map_err(TableError::new)? {
    client::ObjectResult::Select(_) => unreachable!(),
    client::ObjectResult::Mutate(result) => result,
    };
    Ok(result.success as usize)
    }

Future work

  • Scan on schema change: when schema change happens while scanning multiple regions on one datanode, currently it just raises an error, we may need to try to adapt to schema change.
    ensure!(
    first_schema.version() == schema.version(),
    RegionSchemaMismatchSnafu {
    table: common_catalog::format_full_table_name(
    &table_info.catalog_name,
    &table_info.schema_name,
    &table_info.name
    )
    }
  • We need to find a way to pass region filters to datanode through substrait plans so that we don't have to iterate all regions on some datanode Pass region filters through substrait plans #926
  • DROP TABLE SQL is not currently implemented in parser, we need to support drop all regions on datanode. Support drop table clause #497
  • We need to adapt DDL operations to multi regions to procedure framework to allow fault recovery as soon as Tracking issue for the procedure framework #286 finishes.
@v0y4g3r v0y4g3r added the C-enhancement Category Enhancements label Nov 24, 2022
@v0y4g3r
Copy link
Contributor Author

v0y4g3r commented Nov 28, 2022

MitoTable should be modified to accommodate multiple regions, and it's create/open/insert/select methods must explicitly provide regions.

Currently all partition rules are parsed in frontend and datanode does not validate if data inserted satisfies partition rule, but we may add this validation once repartition is supported.

@v0y4g3r v0y4g3r self-assigned this Nov 28, 2022
@v0y4g3r

This comment was marked as resolved.

@evenyag

This comment was marked as resolved.

@v0y4g3r
Copy link
Contributor Author

v0y4g3r commented Nov 29, 2022

Maybe we should add an regions: Option<Vec<RegionNumber>> argument.

Vec<RegionNumber> should be adequate as None and an empty vector should represent the same thing.

Also, if there're regions with different schema in a table, then what will be the schema for record batches yielded? Can we compare the versions of schemas of all regions and find the schema with the min version?

@evenyag
Copy link
Contributor

evenyag commented Nov 29, 2022

Can we compare the versions of schemas of all regions and find the schema with the min version

This might be the most applicable way before #275 is done. BTW, this reminds me that we need to move the version field somewhere else, as mentioned in #349

@evenyag

This comment was marked as resolved.

@v0y4g3r

This comment was marked as resolved.

@xtang xtang mentioned this issue Nov 30, 2022
24 tasks
@xtang xtang added this to the Release v0.1 milestone Nov 30, 2022
@v0y4g3r v0y4g3r reopened this Feb 7, 2023
@v0y4g3r v0y4g3r changed the title Table engine supports multi-region Supports multi-region Feb 7, 2023
@v0y4g3r v0y4g3r modified the milestones: v0.3, v0.1 Feb 15, 2023
@v0y4g3r v0y4g3r added the tracking-issue A tracking issue for a feature. label Feb 15, 2023
@MichaelScofield
Copy link
Collaborator

Looks like region failover(#1126) also requires multi region support.

@fengjiachun
Copy link
Collaborator

Close this issue as version 0.4 already supports multi-region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements tracking-issue A tracking issue for a feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants