Skip to content

Commit

Permalink
Catalog to Datasource changes (#1027)
Browse files Browse the repository at this point in the history
Signed-off-by: vamsi-amazon <[email protected]>
  • Loading branch information
vmmusings authored Nov 4, 2022
1 parent 79340e8 commit 3e30379
Show file tree
Hide file tree
Showing 20 changed files with 159 additions and 159 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@


/**
* Table implementation to handle show catalogs command.
* Table implementation to handle show datasources command.
* Since catalog information is not tied to any storage engine, this info
* is handled via Catalog Table.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ public void open() {
for (Catalog catalog : catalogs) {
exprValues.add(
new ExprTupleValue(new LinkedHashMap<>(ImmutableMap.of(
"CATALOG_NAME", ExprValueUtils.stringValue(catalog.getName()),
"DATASOURCE_NAME", ExprValueUtils.stringValue(catalog.getName()),
"CONNECTOR_TYPE", ExprValueUtils.stringValue(catalog.getConnectorType().name())))));
}
iterator = exprValues.iterator();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ public enum CatalogTableSchema {

CATALOG_TABLE_SCHEMA(new LinkedHashMap<>() {
{
put("CATALOG_NAME", STRING);
put("DATASOURCE_NAME", STRING);
put("CONNECTOR_TYPE", STRING);
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ void testIterator() {
assertTrue(catalogTableScan.hasNext());
for (Catalog catalog : catalogSet) {
assertEquals(new ExprTupleValue(new LinkedHashMap<>(ImmutableMap.of(
"CATALOG_NAME", ExprValueUtils.stringValue(catalog.getName()),
"DATASOURCE_NAME", ExprValueUtils.stringValue(catalog.getName()),
"CONNECTOR_TYPE", ExprValueUtils.stringValue(catalog.getConnectorType().name())))),
catalogTableScan.next());
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ void testGetFieldTypes() {
CatalogTable catalogTable = new CatalogTable(catalogService);
Map<String, ExprType> fieldTypes = catalogTable.getFieldTypes();
Map<String, ExprType> expectedTypes = new HashMap<>();
expectedTypes.put("CATALOG_NAME", ExprCoreType.STRING);
expectedTypes.put("DATASOURCE_NAME", ExprCoreType.STRING);
expectedTypes.put("CONNECTOR_TYPE", ExprCoreType.STRING);
assertEquals(expectedTypes, fieldTypes);
}
Expand Down
2 changes: 1 addition & 1 deletion docs/category.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"user/ppl/cmd/ad.rst",
"user/ppl/cmd/dedup.rst",
"user/ppl/cmd/describe.rst",
"user/ppl/cmd/showcatalogs.rst",
"user/ppl/cmd/showdatasources.rst",
"user/ppl/cmd/information_schema.rst",
"user/ppl/cmd/eval.rst",
"user/ppl/cmd/fields.rst",
Expand Down
30 changes: 15 additions & 15 deletions docs/user/general/identifiers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -184,18 +184,18 @@ Fully Qualified Table Names

Description
-----------
With the introduction of different datasource catalogs along with Opensearch, support for fully qualified table names became compulsory to resolve tables to a catalog.
With the introduction of different datasources along with Opensearch, support for fully qualified table names became compulsory to resolve tables to a datasource.

Format for fully qualified table name.
``<catalogName>.<schemaName>.<tableName>``
``<datasourceName>.<schemaName>.<tableName>``

* catalogName:[Mandatory] Catalog information is mandatory when querying over tables from catalogs other than opensearch connector.
* datasourceName:[Mandatory] Datasource information is mandatory when querying over tables from datasources other than opensearch connector.

* schemaName:[Optional] Schema is a logical abstraction for a group of tables. In the current state, we only support ``default`` and ``information_schema``. Any schema mentioned in the fully qualified name other than these two will be resolved to be part of tableName.

* tableName:[Mandatory] tableName is mandatory.

The current resolution algorithm works in such a way, the old queries on opensearch work without specifying any catalog name.
The current resolution algorithm works in such a way, the old queries on opensearch work without specifying any datasource name.
So queries on opensearch indices doesn't need a fully qualified table name.

Table Name Resolution Algorithm.
Expand All @@ -205,24 +205,24 @@ Fully qualified Name is divided into parts based on ``.`` character.

TableName resolution algorithm works in the following manner.

1. Take the first part of the qualified name and resolve it to a catalog from the list of catalogs configured.
If it doesn't resolve to any of the catalog names configured, catalog name will default to ``@opensearch`` catalog.
1. Take the first part of the qualified name and resolve it to a datasource from the list of datasources configured.
If it doesn't resolve to any of the datasource names configured, datasource name will default to ``@opensearch`` datasource.

2. Take the first part of the remaining qualified name after capturing the catalog name.
If this part represents any of the supported schemas under catalog, it will resolve to the same otherwise schema name will resolve to ``default`` schema.
2. Take the first part of the remaining qualified name after capturing the datasource name.
If this part represents any of the supported schemas under datasource, it will resolve to the same otherwise schema name will resolve to ``default`` schema.
Currently ``default`` and ``information_schema`` are the only schemas supported.

3. Rest of the parts are combined to resolve tablename.

** Only table name identifiers are supported with fully qualified names, identifiers used for columns and other attributes doesn't require prefixing with catalog and schema information.**
** Only table name identifiers are supported with fully qualified names, identifiers used for columns and other attributes doesn't require prefixing with datasource and schema information.**

Examples
--------
Assume [my_prometheus] is the only catalog configured other than default opensearch engine.
Assume [my_prometheus] is the only datasource configured other than default opensearch engine.

1. ``my_prometheus.default.http_requests_total``

catalogName = ``my_prometheus`` [Is in the list of catalogs configured].
datasourceName = ``my_prometheus`` [Is in the list of datasources configured].

schemaName = ``default`` [Is in the list of schemas supported].

Expand All @@ -231,7 +231,7 @@ tableName = ``http_requests_total``.
2. ``logs.12.13.1``


catalogName = ``@opensearch`` [Resolves to default @opensearch connector since [my_prometheus] is the only catalog configured name.]
datasourceName = ``@opensearch`` [Resolves to default @opensearch connector since [my_prometheus] is the only catalog configured name.]

schemaName = ``default`` [No supported schema found, so default to `default`].

Expand All @@ -241,23 +241,23 @@ tableName = ``logs.12.13.1``.
3. ``my_prometheus.http_requests_total``


catalogName = ```my_prometheus`` [Is in the list of catalogs configured].
datasourceName = ```my_prometheus`` [Is in the list of datasources configured].

schemaName = ``default`` [No supported schema found, so default to `default`].

tableName = ``http_requests_total``.

4. ``prometheus.http_requests_total``

catalogName = ``@opensearch`` [Resolves to default @opensearch connector since [my_prometheus] is the only catalog configured name.]
datasourceName = ``@opensearch`` [Resolves to default @opensearch connector since [my_prometheus] is the only datasource configured name.]

schemaName = ``default`` [No supported schema found, so default to `default`].

tableName = ``prometheus.http_requests_total``.

5. ``prometheus.default.http_requests_total.1.2.3``

catalogName = ``@opensearch`` [Resolves to default @opensearch connector since [my_prometheus] is the only catalog configured name.]
datasourceName = ``@opensearch`` [Resolves to default @opensearch connector since [my_prometheus] is the only catalog configured name.]

schemaName = ``default`` [No supported schema found, so default to `default`].

Expand Down
83 changes: 0 additions & 83 deletions docs/user/ppl/admin/catalog.rst

This file was deleted.

83 changes: 83 additions & 0 deletions docs/user/ppl/admin/datasources.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
.. highlight:: sh

===================
Datasource Settings
===================

.. rubric:: Table of contents

.. contents::
:local:
:depth: 1

Introduction
============

The concept of ``datasource`` is introduced to support the federation of SQL/PPL query engine to multiple data stores.
This helps PPL users to leverage data from multiple data stores and derive correlation and insights.
Datasource definition provides the information to connect to a data store and also gives a name to them to refer in PPL commands.


Definitions of datasource and connector
====================================
* Connector is a component that adapts the query engine to a datastore. For example, Prometheus connector would adapt and help execute the queries to run on Prometheus datastore. connector name is enough in the datasource definition json.
* Datasource is a construct to define how to connect to a data store and which connector to adapt by query engine.

Example Prometheus Datasource Definition ::

[{
"name" : "my_prometheus",
"connector": "prometheus",
"properties" : {
"prometheus.uri" : "http://localhost:8080",
"prometheus.auth.type" : "basicauth",
"prometheus.auth.username" : "admin",
"prometheus.auth.password" : "admin"
}
}]
Datasource configuration Restrictions.

* ``name``, ``connector``, ``properties`` are required fields in the datasource configuration.
* All the datasource names should be unique and match the following regex[``[@*A-Za-z]+?[*a-zA-Z_\-0-9]*``].
* Allowed Connectors.
* ``prometheus`` [More details: `Prometheus Connector <prometheus_connector.rst>`_]
* All the allowed config parameters in ``properties`` are defined in individual connector pages mentioned above.

Configuring a datasource in OpenSearch
======================================

* Datasources are configured in opensearch keystore as secure settings under ``plugins.query.federation.datasources.config`` key as they contain credential info.
* A json file containing array of datasource configurations should be injected into keystore with the above mentioned key. sample json file can be seen in the above section.


[**To be run on all the nodes in the cluster**] Command to add datasources.json file to OpenSearch Keystore ::

>> bin/opensearch-keystore add-file plugins.query.federation.datasource.config datasources.json

Datasources can be configured during opensearch start up or can be updated while the opensearch is running.
If we update a datasource configuration during runtime, the following api should be triggered to update the query engine with the latest changes.

[**Required only if we update keystore settings during runtime**] Secure Settings refresh api::

>> curl --request POST \
--url http://{{opensearch-domain}}:9200/_nodes/reload_secure_settings \
--data '{"secure_settings_password":"{{keystore-password}}"}'


Using a datasource in PPL command
====================================
Datasource is referred in source command as show in the code block below.
Based on the abstraction designed by the connector,
one can refer the corresponding entity as table in the source command.
For example in prometheus connector, each metric is abstracted as a table.
so we can refer a metric and apply stats over it in the following way.

Example source command with prometheus datasource ::

>> source = my_prometheus.prometheus_http_requests_total | stats avg(@value) by job;


Limitations of datasource
====================================
Datasource settings are global and users with PPL access are allowed to fetch data from all the defined datasources.
PPL access can be controlled using roles.(More details: `Security Settings <security.rst>`_)
14 changes: 7 additions & 7 deletions docs/user/ppl/cmd/information_schema.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,19 @@ Metadata queries using information_schema

Description
============
| Use ``information_schema`` in source command to query tables information under a catalog.
| Use ``information_schema`` in source command to query tables information under a datasource.
In the current state, ``information_schema`` only support metadata of tables.
This schema will be extended for views, columns and other metadata info in future.


Syntax
============
source = catalog.information_schema.tables;
source = datasource.information_schema.tables;

Example 1: Fetch tables in prometheus catalog.
Example 1: Fetch tables in prometheus datasource.
==============================================

The examples fetches tables in the prometheus catalog.
The examples fetches tables in the prometheus datasource.

PPL query for fetching PROMETHEUS TABLES with where clause::

Expand All @@ -36,10 +36,10 @@ PPL query for fetching PROMETHEUS TABLES with where clause::
+-----------------+----------------+--------------------------------+--------------+--------+---------------------------+


Example 2: Search tables in prometheus catalog.
===============================================
Example 2: Search tables in prometheus datasource.
=================================================

The examples searches tables in the prometheus catalog.
The examples searches tables in the prometheus datasource.

PPL query for searching PROMETHEUS TABLES::

Expand Down
36 changes: 0 additions & 36 deletions docs/user/ppl/cmd/showcatalogs.rst

This file was deleted.

Loading

0 comments on commit 3e30379

Please sign in to comment.