title | summary | toc |
---|---|---|
Data Domiciling with ALTER DATABASE ... PLACEMENT RESTRICTED |
Learn how to implement data domiciling using CockroachDB's multi-region SQL capabilities and the ALTER DATABASE ... PLACEMENT RESTRICTED statement. |
true |
{% include {{page.version.version}}/sql/data-domiciling-intro.md %}
This page has instructions for data domiciling in multi-region clusters using the ALTER DATABASE ... PLACEMENT RESTRICTED
statement. At a high level, this process involves:
- Controlling the placement of specific row or table data using regional tables - specifically, the
REGIONAL BY ROW
,REGIONAL BY TABLE
. - Further restricting where the data in those regional tables is stored using the
ALTER DATABASE ... PLACEMENT RESTRICTED
statement, which constrains the replicas for a partition or table to be stored in only the home regions associated with those rows or tables.
For more information, see the sections below.
This page assumes you are already familiar with:
- CockroachDB's multi-region SQL abstractions.
- The fact that CockroachDB stores your data in a distributed key-value store, which is split into chunks called ranges.
In the following example, you will go through the process of configuring the MovR data set using multi-region SQL statements. Then, as part of implementing a data domiciling strategy, you will apply restricted replica settings using the ALTER DATABASE ... PLACEMENT RESTRICTED
statement. Finally, you will verify that the resulting replica placements are as expected using a combination of replication reports, the SHOW RANGE FOR ROW
statement, and queries against the crdb_internal.ranges_no_leases
table.
For the purposes of this example, the data domiciling requirement is to configure a multi-region deployment of the MovR database such that data for EU-based users, vehicles, etc. is being stored on CockroachDB nodes running in EU localities.
{% include {{page.version.version}}/sql/start-a-multi-region-demo-cluster.md %}
You now have a cluster running across 9 nodes, with 3 nodes each in the following regions:
us-east1
us-west1
europe-west1
You can verify this using the SHOW REGIONS
statement:
{% include_cached copy-clipboard.html %}
SHOW REGIONS;
region | zones | database_names | primary_region_of
---------------+---------+----------------+--------------------
europe-west1 | {b,c,d} | {} | {}
us-east1 | {b,c,d} | {} | {}
us-west1 | {a,b,c} | {} | {}
(3 rows)
Execute the following statements to set the database regions. This information is necessary so that CockroachDB can later move data around to optimize access to particular data from particular regions.
{% include copy-clipboard.html %}
ALTER DATABASE movr PRIMARY REGION "europe-west1";
ALTER DATABASE movr ADD REGION "us-east1";
ALTER DATABASE movr ADD REGION "us-west1";
{% include {{page.version.version}}/sql/multiregion-movr-global.md %}
{% include {{page.version.version}}/sql/multiregion-movr-regional-by-row.md %}
Next, run a replication report to see which ranges are still not in compliance with your desired domiciling: that data on EU-based entities (users, etc.) does not leave EU-based nodes.
With the default settings, you should expect some replicas in the cluster to be violating this constraint. This is because non-voting replicas are enabled by default in multi-region clusters to enable stale reads of data in regional tables from outside those tables' home regions. For many use cases, this is preferred, but it keeps you from meeting the domiciling requirements for this example.
{% include_cached copy-clipboard.html %}
SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
zone_id | subzone_id | type | config | report_id | violation_start | violating_ranges
----------+------------+------------+------------------------+-----------+-------------------------------+-------------------
52 | 0 | constraint | +region=europe-west1:1 | 1 | 2022-01-19 16:33:49.485535+00 | 10
52 | 0 | constraint | +region=us-east1:1 | 1 | 2022-01-19 16:34:49.930886+00 | 58
52 | 0 | constraint | +region=us-west1:1 | 1 | 2022-01-19 16:34:49.930886+00 | 61
Based on this output, you can see that plenty of replicas are out of compliance (see the violating_ranges
column) for the reason described above: the presence of non-voting replicas in other regions to enable fast stale reads from those regions.
{{site.data.alerts.callout_info}} The Replication Reports do not consider non-voting replicas located outside of a table's home region to be in compliance with the constraints on that table. {{site.data.alerts.end}}
Next, run the query suggested in the Replication Reports documentation that should show which database and table names contain the violating_ranges
.
{% include_cached copy-clipboard.html %}
WITH
partition_violations
AS (
SELECT
*
FROM
system.replication_constraint_stats
WHERE
violating_ranges > 0
),
report
AS (
SELECT
crdb_internal.zones.zone_id,
crdb_internal.zones.subzone_id,
target,
database_name,
table_name,
index_name,
partition_violations.type,
partition_violations.config,
partition_violations.violation_start,
partition_violations.violating_ranges
FROM
crdb_internal.zones, partition_violations
WHERE
crdb_internal.zones.zone_id
= partition_violations.zone_id
)
SELECT * FROM report;
zone_id | subzone_id | target | database_name | table_name | index_name | type | config | violation_start | violating_ranges
----------+------------+---------------+---------------+------------+------------+------------+------------------------+-------------------------------+-------------------
52 | 0 | DATABASE movr | movr | NULL | NULL | constraint | +region=europe-west1:1 | 2022-01-19 16:33:49.485535+00 | 16
52 | 0 | DATABASE movr | movr | NULL | NULL | constraint | +region=us-west1:1 | 2022-01-19 16:34:49.930886+00 | 78
52 | 0 | DATABASE movr | movr | NULL | NULL | constraint | +region=us-east1:1 | 2022-01-19 16:34:49.930886+00 | 78
This output shows that the movr
database has ranges out of compliance, which you saw previously. Unfortunately, this output does not contain the table or index names due to a current limitation of the replication reports: non-voting replicas are not associated with any tables or indexes by the reports.
Now that you know some replicas are out of compliance, you need to see exactly where the ranges for a given row of data are stored. To accomplish this, you must use the SHOW RANGE FOR ROW
statement.
The values needed to find a range using SHOW RANGE FOR ROW
are the columns in the row's primary index. To find out which columns are in the primary index for the users
table, use the SHOW INDEXES
statement:
{% include_cached copy-clipboard.html %}
show indexes from users;
table_name | index_name | non_unique | seq_in_index | column_name | direction | storing | implicit
-------------+------------+------------+--------------+-------------+-----------+---------+-----------
users | primary | false | 1 | region | ASC | false | true
users | primary | false | 2 | city | ASC | false | false
users | primary | false | 3 | id | ASC | false | false
users | primary | false | 4 | name | N/A | true | false
users | primary | false | 5 | address | N/A | true | false
users | primary | false | 6 | credit_card | N/A | true | false
(6 rows)
The columns in the primary index are those values of column_name
where the value of index_name
is primary
and the value of storing
is false
. The columns that meet these criteria are:
region
city
id
To get the values of those columns for one row in EU, use the following selection query:
{% include_cached copy-clipboard.html %}
select region, city, id from users where region = 'europe-west1' limit 1;
region | city | id
---------------+-----------+---------------------------------------
europe-west1 | amsterdam | ae147ae1-47ae-4800-8000-000000000022
(1 row)
To see exactly where the ranges for this row of data are stored in the cluster, pass the column values above to the SHOW RANGE FOR ROW
statement:
{% include_cached copy-clipboard.html %}
show range from table users for row ('europe-west1', 'amsterdam', 'ae147ae1-47ae-4800-8000-000000000022');
start_key | end_key | range_id | lease_holder | lease_holder_locality | replicas | replica_localities
------------+-------------------+----------+--------------+--------------------------+-------------+-----------------------------------------------------------------------------------------------------------------------------------
/"\x80" | /"\x80"/PrefixEnd | 75 | 9 | region=europe-west1,az=d | {1,6,7,8,9} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"}
(1 row)
The output above shows that:
- The replicas associated with this row of the
users
table have their home region in theeurope-west1
region - In particular, the leaseholder is in the
europe-west1
region - However, some of the non-leaseholder replicas are not in the
europe-west1
region, according to the value of thereplica_localities
column. That column shows that there are also non-leaseholder replicas located in theus-east1
andus-west1
regions. These are the non-voting replicas referenced previously. This shows that not all of the replicas associated with this row are meeting the requirement to be stored on nodes within EU localities.
To confirm the above replica placement information by other means, use the statement below to query the crdb_internal.ranges_no_leases
table and see the value of its replica_localities
column. The value of that column should match the output from the SHOW RANGE FOR ROW
query above.
{% include_cached copy-clipboard.html %}
SELECT * FROM crdb_internal.ranges_no_leases WHERE range_id = 162 AND database_name = 'movr' AND table_name = 'users' ORDER BY start_key ASC LIMIT 1;
range_id | start_key | start_pretty | end_key | end_pretty | table_id | database_name | schema_name | table_name | index_name | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until
-----------+--------------------------+--------------------+--------------------------+------------------------------+----------+---------------+-------------+------------+------------+-------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------+------------------+-----------------------
162 | \xbd\x8a\x12\x80\x00\x01 | /Table/53/2/"\x80" | \xbd\x8a\x12\x80\x00\x02 | /Table/53/2/"\x80"/PrefixEnd | 53 | movr | | users | | {3,5,7,8,9} | {"region=us-east1,az=d","region=us-west1,az=b","region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {8,7,9} | {3,5} | {} | NULL
(1 row)
To ensure that data on EU-based users, vehicles, etc. from REGIONAL BY ROW
tables is stored only on EU-based nodes in the cluster, you must disable the use of non-voting replicas on all of these tables. You can do this using the ALTER DATABASE ... PLACEMENT RESTRICTED
statement.
To use this statement, you must set the enable_multiregion_placement_policy
session setting or the sql.defaults.multiregion_placement_policy.enabled
cluster setting:
{% include_cached copy-clipboard.html %}
SET enable_multiregion_placement_policy=on;
SET
Next, use the ALTER DATABASE ... PLACEMENT RESTRICTED
statement to disable non-voting replicas for regional tables:
{% include_cached copy-clipboard.html %}
ALTER DATABASE movr PLACEMENT RESTRICTED;
ALTER DATABASE PLACEMENT
The restricted replica placement settings should start to apply immediately.
{{site.data.alerts.callout_info}}
ALTER DATABASE ... PLACEMENT RESTRICTED
does not affect the replica placement for global tables, which are designed to provide fast, up-to-date reads from all database regions.
{{site.data.alerts.end}}
Now that you have restricted the placement of non-voting replicas for all regional tables, you can run another replication report to see the effects:
{% include_cached copy-clipboard.html %}
SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
zone_id | subzone_id | type | config | report_id | violation_start | violating_ranges
----------+------------+------------+--------------------+-----------+-------------------------------+-------------------
57 | 0 | constraint | +region=us-east1:1 | 1 | 2022-01-19 19:09:00.235247+00 | 1
57 | 0 | constraint | +region=us-west1:1 | 1 | 2022-01-19 19:09:00.235247+00 | 1
The output above shows that there are now far fewer replicas that do not meet the data domiciling goal. As described above, ALTER DATABASE ... PLACEMENT RESTRICTED
does not affect the replica placement for GLOBAL
tables, so it's likely that these few replicas are part of a global table.
To verify that the constraint violating replicas are indeed part of a GLOBAL
table, run the replication report query from Step 3 again as shown below. This will display the database and table names of these replicas.
{% include_cached copy-clipboard.html %}
WITH
partition_violations
AS (
SELECT
*
FROM
system.replication_constraint_stats
WHERE
violating_ranges > 0
),
report
AS (
SELECT
crdb_internal.zones.zone_id,
crdb_internal.zones.subzone_id,
target,
database_name,
table_name,
index_name,
partition_violations.type,
partition_violations.config,
partition_violations.violation_start,
partition_violations.violating_ranges
FROM
crdb_internal.zones, partition_violations
WHERE
crdb_internal.zones.zone_id
= partition_violations.zone_id
)
SELECT * FROM report;
zone_id | subzone_id | target | database_name | table_name | index_name | type | config | violation_start | violating_ranges
----------+------------+-------------------------------+---------------+-------------+------------+------------+--------------------+-------------------------------+-------------------
57 | 0 | TABLE movr.public.promo_codes | movr | promo_codes | NULL | constraint | +region=us-east1:1 | 2022-01-19 19:09:00.235247+00 | 1
57 | 0 | TABLE movr.public.promo_codes | movr | promo_codes | NULL | constraint | +region=us-west1:1 | 2022-01-19 19:09:00.235247+00 | 1
As expected, these replicas are part of the promo_codes
table, which was configured to use the GLOBAL
table locality in Step 2.
To verify the replica placement using SHOW RANGE FOR ROW
for the same row as in Step 3, run the same query you ran earlier:
{% include_cached copy-clipboard.html %}
show range from table users for row ('europe-west1', 'amsterdam', 'ae147ae1-47ae-4800-8000-000000000022');
start_key | end_key | range_id | lease_holder | lease_holder_locality | replicas | replica_localities
------------+-------------------+----------+--------------+--------------------------+----------+-------------------------------------------------------------------------------------
/"\x80" | /"\x80"/PrefixEnd | 137 | 8 | region=europe-west1,az=c | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"}
(1 row)
The output shows that the replicas underlying this data are now in compliance with the data domiciling requirements listed above: that data for EU-based entities (users, etc.) does not leave EU-based nodes.
Specifically, the output above shows that:
- The replicas underlying this row of the
users
table are based in theeurope-west-1
region - The leaseholder is in the
europe-west1
region - All of the
replica_localities
are in theeurope-west1
region as well (unlike in Step 3), proving that the replicas underlying this row of data are meeting the domiciling requirement.
{% include_cached copy-clipboard.html %}
select replica_localities from [show range from table users for row ('europe-west1', 'amsterdam', 'ae147ae1-47ae-4800-8000-000000000022')];
replica_localities
--------------------------------------------------------------------------------------
{"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"}
(1 row)
To confirm the above replica placement information by other means, use the statement below to query the crdb_internal.ranges_no_leases
table and see the value of its replica_localities
column. The value of that column should match the output from the SHOW RANGE FOR ROW
query above.
{% include_cached copy-clipboard.html %}
SELECT replica_localities FROM crdb_internal.ranges_no_leases WHERE range_id = 162 AND database_name = 'movr' AND table_name = 'users' ORDER BY start_key ASC LIMIT 1;
replica_localities
--------------------------------------------------------------------------------------
{"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"}
(1 row)
Now that you have verified that the system is configured to meet the domiciling requirement, it's a good idea to run these verification queries on a regular basis (via automation of some kind) to ensure that the requirement continues to be met.
{{site.data.alerts.callout_info}} The steps above are necessary but not sufficient to accomplish a data domiciling solution using CockroachDB. Be sure to review the limitations of CockroachDB for data domiciling and design your total solution with those limitations in mind. {{site.data.alerts.end}}
{% include {{page.version.version}}/sql/data-domiciling-limitations.md %}
- Data Domiciling with CockroachDB
- Data Domiciling with Separate Databases
- Choosing a multi-region configuration
- Install CockroachDB
- Migrate to Multi-region SQL
- Multi-region Overview
- Multi-region SQL Performance
- Multi-region overview
- Reads and Writes in CockroachDB
- When to use REGIONAL vs GLOBAL tables
- When to use ZONE vs REGION survival goals
- When to use
REGIONAL
vs.GLOBAL
tables - When to use
ZONE
vs.REGION
survival goals ADD REGION