Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a DM Upstream Simulator for Testing Incremental Replication from Upstream #4835

Open
dsdashun opened this issue Mar 10, 2022 · 0 comments
Labels
area/dm Issues or PRs related to DM. type/feature Issues about a new feature

Comments

@dsdashun
Copy link
Contributor

Is your feature request related to a problem?

Currently, when doing some tests for DM on incremental replication, we need to let the upstream continuously generate binlog streams. So we need a tool to continuously simulate the upstream workloads. The current solutions are not very convenient. (See the alternatives discussion below )

Describe the feature you'd like

The DM can provide a simulator, which has several features that can solve some problems mentioned above:

  • It can continuously apply meaningful modifications on upstream tables with the table schema provided.
  • It can define specific workload easily.
  • It can simulate batch DDL changes on several sharded tables with one click.
  • After the table schema change, the simulation will use the latest table structures.

Describe alternatives you've considered

Usually, we simulate upstream workload either by using some benchmark tools like sysbench, or by using some random SQL generating programs like sql-smith. However, this is not very convenient in some cases.

  • For those benchmark tools, the table schemas are pre-defined. If we need to provide a bunch of binlog stream from upstream clusters with specified table schemas, we need to modify the code.
  • For random sql-generator, the table schema can be defined by our own. However, when the generated SQL is executed on the upstream, usually no data is actually modified, because the filter clause is purely randomly generated. So it cannot provide a stable stream of binlogs from the upstream.
  • Some workload is hard to simulate from existing tools. For example, if we want to simulate a transaction with one insert of table A followed by updating several records on table B, and at last delete that inserted row, the existing tools can hardly do this, we need to write our own code to achieve this kind of simulation.
  • If the upstream clusters have several sharded tables, there is no way to batch apply the DDLs on the set of sharded tables with one command.
  • There is no way to simulate DMLs affecting the online table schema change. For example, we first let the upstream simulate binlog streams on the current table structure, then we do the DDL on the upstream to change the table structure. After that we expect the upstream to simulate DMLs using the latest table structures on the fly, without modifying any code.

Teachability, Documentation, Adoption, Migration Strategy

No response

@dsdashun dsdashun added area/dm Issues or PRs related to DM. type/feature Issues about a new feature labels Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dm Issues or PRs related to DM. type/feature Issues about a new feature
Projects
None yet
Development

No branches or pull requests

1 participant