From 52eef65d37ebf685731559556c4fac8baf06d7bc Mon Sep 17 00:00:00 2001 From: Duo Zhang Date: Thu, 20 Jun 2024 22:09:32 +0800 Subject: [PATCH] HBASE-28548 Add documentation about the URI based connection registry (#5981) Signed-off-by: Nick Dimiduk --- src/main/asciidoc/_chapters/architecture.adoc | 68 ++++++++++++++++++- .../asciidoc/_chapters/configuration.adoc | 35 +++++++--- 2 files changed, 92 insertions(+), 11 deletions(-) diff --git a/src/main/asciidoc/_chapters/architecture.adoc b/src/main/asciidoc/_chapters/architecture.adoc index 4aead5e3e841..08f353972468 100644 --- a/src/main/asciidoc/_chapters/architecture.adoc +++ b/src/main/asciidoc/_chapters/architecture.adoc @@ -357,7 +357,7 @@ as bootstrap nodes, not only masters . Support refreshing bootstrap nodes, for spreading loads across the nodes in the cluster, and also remove the dead nodes in bootstrap nodes. -To explicitly enable the Master-based registry, use +To explicitly enable the rpc-based registry, use [source, xml] ---- @@ -417,6 +417,72 @@ configuration to fallback to the ZooKeeper based connection registry implementat ---- +[[client.connectionuri]] +=== Connection URI +Starting from 2.7.0, we add the support for specifying the connection information for a HBase +cluster through an URI, which we call a "connection URI". And we've added several methods in +_ConnectionFactory_ to let you get a connection to the cluster specified by the URI. It looks +like: + +[source, java] +---- + URI uri = new URI("hbase+rpc://server1:16020,server2:16020,server3:16020"); + try (Connection conn = ConnectionFactory.createConnection(uri)) { + ... + } +---- + +==== Supported Schemes +Currently there are two schemes supported, _hbase+rpc_ for _RpcConnectionRegistry_ and _hbase+zk_ +for _ZKConnectionRegistry_. _MasterRegistry_ is deprecated so we do not expose it through +connection URI. + +For _hbase+rpc_, it looks like +[source, shell] +---- +hbase+rpc://server1:16020,server2:16020,server3:16020 +---- + +The authority part _server1:16020,server2:16020,server3:16020_ specifies the bootstrap nodes and +their rpc ports, i.e, the configuration value for _hbase.client.bootstrap.servers_ in the past. + +For _hbase+zk_, it looks like +[source, shell] +---- +hbase+zk://zk1:2181,zk2:2181,zk3:2181/hbase +---- + +The authority part _zk1:2181,zk2:2181,zk3:2181_ is the zk quorum, i.e, the configuration value +for _hbase.zookeeper.quorum_ in the past. +The path part _/hbase_ is the znode parent, i.e, the configuration value for +_zookeeper.znode.parent_ in the past. + +==== Specify Configuration through URI Queries +To let users fully specify the connection information through a connection URI, we support +specifying configuration values through URI Queries. It looks like: + +[source, shell] +---- +hbase+rpc://server1:16020?hbase.client.operation.timeout=10000 +---- + +In this way you can set the operation timeout to 10 seconds. Notice that, the configuration values +specified in the connection URI will override the ones in the configuration file. + +==== Implement Your Own Connection Registry +We use _ServiceLoader_ to load different connection registry implementations, the entry point is +_org.apache.hadoop.hbase.client.ConnectionRegistryURIFactory_. So if you implement your own +_ConnectionRegistryURIFactory_ which has a different scheme, and register it in the services file, +we can load it at runtime. + +Connection URI is still a very new feature which has not been used extensively in production, so +we do not want to expose the ability to customize _ConnectionRegistryURIFactory_ yet as the API +may be changed frequently in the beginning. + +If you really want to implement your own connection registry, you can use the above way but take +your own risk. + + [[client.filter]] == Client Request Filters diff --git a/src/main/asciidoc/_chapters/configuration.adoc b/src/main/asciidoc/_chapters/configuration.adoc index 8e25bf9ed787..47481ab5c559 100644 --- a/src/main/asciidoc/_chapters/configuration.adoc +++ b/src/main/asciidoc/_chapters/configuration.adoc @@ -746,8 +746,8 @@ be changed for particular daemons via the HBase UI. If you are running HBase in standalone mode, you don't need to configure anything for your client to work provided that they are all on the same machine. -Starting release 3.0.0, the default connection registry has been switched to a master based -implementation. Refer to <> for more details about what a connection +Starting release 3.0.0, the default connection registry has been switched to a rpc based +implementation. Refer to <> for more details about what a connection registry is and implications of this change. Depending on your HBase version, following is the expected minimal client configuration. @@ -772,11 +772,11 @@ before they can do anything. This can be configured in the client configuration ---- -==== Starting 3.0.0 release +==== Starting from 3.0.0 release -The default implementation was switched to a master based connection registry. With this -implementation, clients always contact the active or stand-by master RPC end points to fetch the -connection registry information. This means that the clients should have access to the list of +The default implementation was switched to a rpc based connection registry. With this +implementation, by default clients contact the active or stand-by master RPC end points to fetch +the connection registry information. This means that the clients should have access to the list of active and master end points before they can do anything. This can be configured in the client configuration xml as follows: @@ -796,8 +796,22 @@ configuration xml as follows: The configuration value for _hbase.masters_ is a comma separated list of _host:port_ values. If no port value is specified, the default of _16000_ is assumed. -Usually this configuration is kept out in the _hbase-site.xml_ and is picked up by the client from -the `CLASSPATH`. +Of course you are free to specify bootstrap nodes other than masters, like: +[source,xml] +---- + + + + hbase.client.bootstrap.servers + server1:16020,server2:16020,server3:16020 + +---- + +The configuration value for _hbase.client.bootstrap.servers_ is a comma separated list of +_host:port_ values. Notice that port must be specified here. + +Usually these configurations are kept out in the _hbase-site.xml_ and is picked up by the client +from the `CLASSPATH`. If you are configuring an IDE to run an HBase client, you should include the _conf/_ directory on your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up @@ -827,14 +841,15 @@ in the content of the first _hbase-site.xml_ found on the client's `CLASSPATH`, the _hbase.X.X.X.jar_). It is also possible to specify configuration directly without having to read from a _hbase-site.xml_. -For example, to set the ZooKeeper ensemble for the cluster programmatically do as follows: +For example, to set the ZooKeeper ensemble or bootstrap nodes for the cluster programmatically +do as follows: [source,java] ---- Configuration config = HBaseConfiguration.create(); config.set("hbase.zookeeper.quorum", "localhost"); // Until 2.x.y versions // ---- or ---- -config.set("hbase.masters", "localhost:1234"); // Starting 3.0.0 version +config.set("hbase.client.bootstrap.servers", "localhost:1234"); // Starting 3.0.0 version ---- [[config_timeouts]]