Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for database sharding #181

Open
1 task done
MatzeKitt opened this issue Oct 16, 2023 · 5 comments
Open
1 task done

Add support for database sharding #181

MatzeKitt opened this issue Oct 16, 2023 · 5 comments

Comments

@MatzeKitt
Copy link

MatzeKitt commented Oct 16, 2023

Feature Request

Describe your use case and the problem you are facing

We’re using database sharding along with HyperDB and face some problems when it comes to use WP-CLI. Especially, but not exclusively, it’s hard until impossible to use the wp db commands. E.g. running wp db tables results in the error mentioned in wp-cli/wp-cli#4670, and also the workarounds there don’t work with sharding, since WP-CLI doesn’t know the shards here.

We’re running a pretty normal HyperDB configuration with the following db-config.php:

<?php
// Exit if accessed directly
defined( 'ABSPATH' ) || exit;

$wpdb->charset = 'utf8mb4';
$wpdb->collate = 'utf8mb4_unicode_ci';
$wpdb->save_queries = false;
$wpdb->persistent = false;
$wpdb->max_connections = 10;
$wpdb->check_tcp_responsiveness = true;

// Add global database
$wpdb->add_database( [
	'host' => DB_HOST,
	'user' => DB_USER,
	'password' => DB_PASSWORD,
	'name' => DB_NAME,
	'write' => 1,
	'read' => 2,
	'dataset' => 'global',
	'timeout' => .5,
] );

// Add 16 shards for blog databases
foreach ( array_merge( range( 0, 9 ), range( 'a', 'f' ) ) as $v ) {
	$wpdb->add_database( [
		'host' => DB_HOST,
		'user' => DB_USER,
		'password' => DB_PASSWORD,
		'name' => DB_NAME.'_'.$v,
		'write' => 1,
		'read' => 2,
		'dataset' => $v,
		'timeout' => .5,
	] );
}

$wpdb->add_callback( 'get_blog_shard' );

function get_blog_shard( $query, $wpdb ) {
	if ( preg_match( "/^{$wpdb->base_prefix}\d+_/i", $wpdb->table ) ) {
		return substr( md5( $wpdb->blogid ), 0, 1 );
	}
}

function wpdb_connection_error( $host, $port, $op, $tbl, $ds, $dbhname, $wpdb ) {
	dead_db();
}

$wpdb->add_callback( 'wpdb_connection_error', 'db_connection_error' );

Describe the solution you'd like

I would like to get support for the sharding callback added to $wpdb in order to determine the correct database tables just by specifying the --url parameter in a WP-CLI request.

@danielbachhuber
Copy link
Member

Sounds like it would be a great improvement!

Feel free to submit a pull request, if you'd like. Here is some guidance on our pull request best practices.

@MatzeKitt
Copy link
Author

I’ll try 😅

@MatzeKitt
Copy link
Author

MatzeKitt commented Jan 19, 2024

Before creating a pull request, I want to make sure whether my current approach is valid at all in my current working version. The current version is limited due to the fact that I only tested one command: vendor/bin/wp db search 'string' --network

First of all, in Utils\wp_get_table_names() I need to get the tables from information_schema.tables, since HyperDB doesn’t store any information about the tables. $wpdb->dbname is empty in this case.

		if ( empty( $wpdb->dbname ) && $wpdb instanceof \HyperDB ) {
			$mode = str_ends_with( $wpdb->dbhname, '__r' ) ? 'read' : 'write';
			$tables = [];

			foreach ( $wpdb->hyper_servers as $hyper_server ) {
				if ( empty( $hyper_server[ $mode ] ) ) {
					continue;
				}

				foreach ( $hyper_server[ $mode ] as $server ) {
					foreach ( $server as $server_data ) {
						$databases[] = $server_data['name'];
					}
				}
			}

			$table_schemas = $wpdb->get_results( sprintf( "SELECT table_schema as db, table_name as table_name FROM information_schema.tables WHERE table_schema IN ('%s')", implode( "', '", $wpdb->_escape( $databases ) ) ) );

			foreach ( $table_schemas as $table_schema ) {
				$tables[] = $table_schema->db . '.' . $table_schema->table_name;
			}
		} else {
			// Note: BC change 1.5.0, tables are sorted (via TABLES view).
			// phpcs:ignore WordPress.DB.PreparedSQL.NotPrepared -- uses esc_sql_ident() and $wpdb->_escape().
			$tables = $wpdb->get_col( sprintf( "SHOW TABLES WHERE %s IN ('%s')", esc_sql_ident( 'Tables_in_' . $wpdb->dbname ), implode( "', '", $wpdb->_escape( $wp_tables ) ) ) );
		}

The else is the current implementation. For our HyperDB version with database shardings, the result is a list of table names with database prefix: db.table, e.g. wordpress.wp_options if the database is called wordpress.

Now, since the table name is escaped using DB_Command::esc_sql_ident() in DB_Command::search() as well as DB_Command::get_columns(), I need to adjust that to properly escape the database and the table name:

		if ( str_contains( $table, '.' ) ) {
			list( $db, $table ) = explode( '.', $table );
			$table_sql          = self::esc_sql_ident( $db ) . '.' . self::esc_sql_ident( $table );
		} else {
			$table_sql = self::esc_sql_ident( $table );
		}

This way it works, but it feels a little bit hacky to me. What do you think about it? We can also discuss it via Slack, my username there is KittMedia.

@danielbachhuber
Copy link
Member

danielbachhuber commented Feb 11, 2024

How many total tables do you have? And what's the particular use case you're trying to solve for?

A couple of thoughts:

  • We should probably create a holistic solution for HyperDB support, instead of adding support to each command on a one-off basis.
  • We'll probably want to make sure the solution gracefully accommodates the scenario where you have thousands of database tables.

@MatzeKitt
Copy link
Author

We have currently around 1.500 sites in a multisite in 16 different shards, so roughly 16.500 tables. The problem I try to solve is that any global db operation fails since WP-CLI is not aware of all the tables.

If you can suggest a holistic solution, I would love to hear it so that I can try to implement it. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants