Skip to content

Commit

Permalink
HBASE-28548 Add documentation about the URI based connection registry (
Browse files Browse the repository at this point in the history
…#5981)

Signed-off-by: Nick Dimiduk <[email protected]>
  • Loading branch information
Apache9 authored Jun 20, 2024
1 parent 62e7fe8 commit 52eef65
Show file tree
Hide file tree
Showing 2 changed files with 92 additions and 11 deletions.
68 changes: 67 additions & 1 deletion src/main/asciidoc/_chapters/architecture.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ as bootstrap nodes, not only masters
. Support refreshing bootstrap nodes, for spreading loads across the nodes in the cluster, and also
remove the dead nodes in bootstrap nodes.

To explicitly enable the Master-based registry, use
To explicitly enable the rpc-based registry, use

[source, xml]
----
Expand Down Expand Up @@ -417,6 +417,72 @@ configuration to fallback to the ZooKeeper based connection registry implementat
</property>
----

[[client.connectionuri]]
=== Connection URI
Starting from 2.7.0, we add the support for specifying the connection information for a HBase
cluster through an URI, which we call a "connection URI". And we've added several methods in
_ConnectionFactory_ to let you get a connection to the cluster specified by the URI. It looks
like:

[source, java]
----
URI uri = new URI("hbase+rpc://server1:16020,server2:16020,server3:16020");
try (Connection conn = ConnectionFactory.createConnection(uri)) {
...
}
----

==== Supported Schemes
Currently there are two schemes supported, _hbase+rpc_ for _RpcConnectionRegistry_ and _hbase+zk_
for _ZKConnectionRegistry_. _MasterRegistry_ is deprecated so we do not expose it through
connection URI.

For _hbase+rpc_, it looks like
[source, shell]
----
hbase+rpc://server1:16020,server2:16020,server3:16020
----

The authority part _server1:16020,server2:16020,server3:16020_ specifies the bootstrap nodes and
their rpc ports, i.e, the configuration value for _hbase.client.bootstrap.servers_ in the past.

For _hbase+zk_, it looks like
[source, shell]
----
hbase+zk://zk1:2181,zk2:2181,zk3:2181/hbase
----

The authority part _zk1:2181,zk2:2181,zk3:2181_ is the zk quorum, i.e, the configuration value
for _hbase.zookeeper.quorum_ in the past.
The path part _/hbase_ is the znode parent, i.e, the configuration value for
_zookeeper.znode.parent_ in the past.

==== Specify Configuration through URI Queries
To let users fully specify the connection information through a connection URI, we support
specifying configuration values through URI Queries. It looks like:

[source, shell]
----
hbase+rpc://server1:16020?hbase.client.operation.timeout=10000
----

In this way you can set the operation timeout to 10 seconds. Notice that, the configuration values
specified in the connection URI will override the ones in the configuration file.

==== Implement Your Own Connection Registry
We use _ServiceLoader_ to load different connection registry implementations, the entry point is
_org.apache.hadoop.hbase.client.ConnectionRegistryURIFactory_. So if you implement your own
_ConnectionRegistryURIFactory_ which has a different scheme, and register it in the services file,
we can load it at runtime.

Connection URI is still a very new feature which has not been used extensively in production, so
we do not want to expose the ability to customize _ConnectionRegistryURIFactory_ yet as the API
may be changed frequently in the beginning.

If you really want to implement your own connection registry, you can use the above way but take
your own risk.


[[client.filter]]
== Client Request Filters

Expand Down
35 changes: 25 additions & 10 deletions src/main/asciidoc/_chapters/configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -746,8 +746,8 @@ be changed for particular daemons via the HBase UI.
If you are running HBase in standalone mode, you don't need to configure anything for your client
to work provided that they are all on the same machine.

Starting release 3.0.0, the default connection registry has been switched to a master based
implementation. Refer to <<client.masterregistry>> for more details about what a connection
Starting release 3.0.0, the default connection registry has been switched to a rpc based
implementation. Refer to <<client.rpcconnectionregistry>> for more details about what a connection
registry is and implications of this change. Depending on your HBase version, following is the
expected minimal client configuration.

Expand All @@ -772,11 +772,11 @@ before they can do anything. This can be configured in the client configuration
</configuration>
----

==== Starting 3.0.0 release
==== Starting from 3.0.0 release

The default implementation was switched to a master based connection registry. With this
implementation, clients always contact the active or stand-by master RPC end points to fetch the
connection registry information. This means that the clients should have access to the list of
The default implementation was switched to a rpc based connection registry. With this
implementation, by default clients contact the active or stand-by master RPC end points to fetch
the connection registry information. This means that the clients should have access to the list of
active and master end points before they can do anything. This can be configured in the client
configuration xml as follows:

Expand All @@ -796,8 +796,22 @@ configuration xml as follows:
The configuration value for _hbase.masters_ is a comma separated list of _host:port_ values. If no
port value is specified, the default of _16000_ is assumed.

Usually this configuration is kept out in the _hbase-site.xml_ and is picked up by the client from
the `CLASSPATH`.
Of course you are free to specify bootstrap nodes other than masters, like:
[source,xml]
----
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<property>
<name>hbase.client.bootstrap.servers</name>
<value>server1:16020,server2:16020,server3:16020</value>
</property>
----

The configuration value for _hbase.client.bootstrap.servers_ is a comma separated list of
_host:port_ values. Notice that port must be specified here.

Usually these configurations are kept out in the _hbase-site.xml_ and is picked up by the client
from the `CLASSPATH`.

If you are configuring an IDE to run an HBase client, you should include the _conf/_ directory on
your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up
Expand Down Expand Up @@ -827,14 +841,15 @@ in the content of the first _hbase-site.xml_ found on the client's `CLASSPATH`,
the _hbase.X.X.X.jar_). It is also possible to specify configuration directly without having to
read from a _hbase-site.xml_.

For example, to set the ZooKeeper ensemble for the cluster programmatically do as follows:
For example, to set the ZooKeeper ensemble or bootstrap nodes for the cluster programmatically
do as follows:

[source,java]
----
Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "localhost"); // Until 2.x.y versions
// ---- or ----
config.set("hbase.masters", "localhost:1234"); // Starting 3.0.0 version
config.set("hbase.client.bootstrap.servers", "localhost:1234"); // Starting 3.0.0 version
----

[[config_timeouts]]
Expand Down

0 comments on commit 52eef65

Please sign in to comment.