Tableau Web Data Connector for Elasticsearch
This is an instance of a Tableau Web Data Connector for Elasticsearch. This will extract a set of data from Elasticsearch based on the cluster URL, index name, type and optional custom query and create a Tableau extract of the data. The connector will query the specified type's mapping in Elasticsearch to report the fields and data types that Tableau can expect to see.
This connector should be periodically refreshed, as the extract only includes data from the point in time that it was executed.
The connector works by retrieving 'pages' of data from Elasticsearch up to either the limit specified, or up to the total number of hits. The user can override the batch size to retrieve more records per page if desired.
The 2.3 release (in the release-2.3
branch and current development version in master
) supports Tableau 10.4 or later.
The 2.0 release (in the release-2.0
branch) supports Tableau 10.0 or later.
The 1.0 release (in the release-1.0
branch) supports Tableau 9.1.6 or later, 9.2.4 or later, and 9.3.
Elasticsearch 5.+ is recommended
- Fields with
array
values will have the value from the first element used, otherwise the entire array will be passed as a value (which probably will not display in Tableau correctly) - Extremely large datasets can cause issues and crash Tableau as all available memory is consumed
- 'Incremental Refresh' is only supported in 'Search Result' mode
There is some configuration needed in Elasticsearch for the connector to work:
- You must enable CORS support in your Elasticsearch server. Set the following setting in
elasticsearch.yml
:
http.cors.enabled: true
Additionally, in current versions of Elasticsearch (2.3+), it is required to define which origins
are allowed to send CORS requests (this is defined by the origin
HTTP request header). The following configuration in elasticsearch.yml
will allow ALL origins but is
considered insecure:
http.cors.allow-origin: "*"
For more detailed information on Elasticsearch configuration options refer to: Elasticsearch Configuration Reference
As an alternative to enabling CORS
through the Elasticsearch configuration file, you can setup a proxy in front of Elasticsearch that
will set the proper HTTP response headers.
As an example, in AWS - here's a link that describes how to setup an API gateway that sends CORS headers: http://enable-cors.org/server_awsapigateway.html
An instance of an API gateway (with CORS enabled) is used that forwards requests to your Elasticsearch instance.
The Elasticsearch URL used in the Tableau connector configuration should be the URL of your API Gateway.
Install grunt:
npm install -g grunt
Install bower:
npm install -g bower
Run the following from the command line:
npm install
bower install
From the command line execute:
grunt build:dist
This will package the connector files in the dist
folder, combining javascript and CSS into single files.
You can build, watch sources for changes, and run the application at the command line with grunt with:
grunt
This will watch all sub-directories for changes and reload the application if anything changes. Running this app will simply host all the connector resources but when requested stand alone will not do anything useful. You should either use the connector within the Web Data Connector SDK test harness, or use the connector from Tableau Desktop or Server.
Note that internally there are tasks that run the
build:dev
target, to perform HTML templating, and copy all source files to thepublic/
source folder where the NodeJS server will serve static resources from
Make note of the URL that the connector app is running on, e.g.:
Elasticsearch Tableau Web Data connector server listening at http://0.0.0.0:3000
Simply choose the 'Web Data Connector' as your data source from within Tableau Desktop, or use the Web Connector SDK and enter the URL..
A Dockerfile
is supplied in docker/Dockerfile
that will build an image that creates a development build of the
latest source from Github, and runs the node server.
You can build an image from the root of the project:
docker build docker -t <name of tag>
and can then start a container, which will map the server to the host's port 3000 from this image with:
docker run <name of tag> -p 3000:3000
For convenience, the connector comes with winser to install the connector web server as a Windows service.
To install as a Windows service:
npm run-script install-service
This will install a service named elasticsearch-tableau-connector
. Open the
Windows service manager (services.msc
) to start the service.
To uninstall the service:
npm run-script uninstall-service
The connector UI can be loaded from a web browser (outside of Tableau Desktop). Simply enter the URL of the connector (defaults to http://localhost:3000/elasticsearch-connector.html if running the project locally).
The 'Submit' button will not be displayed, but you can still use the 'Preview' feature of the connector.
Execute the build for this project from the command line:
grunt build:dist
For each file in the dist/
folder, import into Tableau Server by:
- Ensure the Tableau command line tools are in your PATH
- From a command line (and your working director is the
dist/
folder) execute the following:
tabadmin import_webdataconnector elasticsearch-connector.html
tabadmin import_webdataconnector elasticsearch-connector.min.css
tabadmin import_webdataconnector elasticsearch-connector.min.js
Get the URL of the elasticsearch-connector.html
on the Tableau Server by executing:
tabadmin list_webdataconnectors --urls
And from Tableau go to 'Web Data Connector' and enter the URL of the connector:
http://<your tableau server>/webdataconnectors/elasticsearch-connector.html
If you are running this web app locally, and testing from the Tableau Web Data Connector SDK, simply enter:
http://localhost:3000/elasticsearch-connector.html
into the Web Connector URL
input field in the SDK's form.
From there you should see this connector's UI:
Connector UI when in Aggregation mode:
Connector UI after fetching preview data:
The Elasticsearch connector UI includes the following fields:
Field Name | Data Type | Description |
---|---|---|
Connection Name | String | Name of the data source connection displayed in the Tableau workbook |
Elasticsearch URL | String | [Required] URL of the Elasticsearch cluster |
Use HTTP Basic authentication | Boolean | [Required] Indicates if the Elasticsearch cluster requires HTTP Basic Auth |
Username | String | If 'Use HTTP Basic Auth' is checked, this is the user name |
Password | String | If 'Use HTTP Basic Auth' is checked, this is the password |
Index name | String | [Required] Name of the index in the Elasticsearch cluster |
Index Filter for Type Selection | String | If the index selected is an alias, this selection is required to choose the index to filter types by. Only types from this selection will be available in the 'Type' selection |
Use fields for type from all indexes? | Boolean | If selected, then fields in the extract will be a union of all fields of the selected type across all indexes of the alias. Note this only applies when selecting an index alias. Defaults to false |
Type | String | [Required] Name of the type in the Elasticsearch cluster to query |
Override Field Defaults | Boolean | If selected, then additional options to override default handling of fields is available |
Parse date fields in local time? | Boolean | If selected then all date or timestamp fields will be parsed in local time, the default (unselected) will parse as UTC |
Improve Field Names? | Boolean | If selected, then additional logic will be applied to improve field names in similar ways to the following : Tableau help article. Default is false. |
Result Mode | Option | Option to retrieve search results from Elasticsearch (Search Result Mode) or from a query using aggregation (Aggregation Mode) |
Use custom query? | Boolean | If true, indicates if the extract should use a custom query against Elasticsearch in search result mode, if false extract will be a 'match all' |
Query | String | If Use custom query? is true, this will be the JSON request payload sent to Elasticsearch in search result mode. from , and size will be overwritten if supplied. Refer to Elasticsearch Query DSL for a reference on writing a query |
Use Incremental Refresh | Boolean | If checked, then Tableau can fetch data using incremental refresh mode |
Incremental Refresh Column | String | Name of the column to use for incremental refreshes. Should be a date, time or integer column |
Batch size of per request to Elasticsearch | Integer | Number of rows to retrieve at once, defaults to 10, should probably be 1000+ |
Total limit on number of rows to sync | Integer | Limit of rows to include in extract, defaults to 100, but generally should be left blank to indicate that all matching rows should be included |
Use custom query? (aggregation mode) | Boolean | If true, indicates the data extract should use a custom query that includes an aggregation request |
Custom query | String | JSON payload sent in the request for Elasticsearch, must include aggregations or aggs element for Terms, Range, Date Range or Date Histogram |
Filter data included in aggregations | Boolean | If checked, then you can enter a filter that will be used against data used in an aggregation request |
Filter | String | Uses Lucene Query String Syntax to define a filter applied against aggregation data |
Metrics | Metric | One or more metrics to calculate for the aggregation results. Valid options are Count, Min, Max, Sum, Average, Stats, and Extended Stats. Refer to 'Metrics' section |
Buckets | Bucket | Bucket to aggregate results to and calculate metrics for, or multiple levels of child buckets. See buckets for more information |
Supported metrics:
The connector supports requesting data for Elasticsearch from the UI to preview the data that will be created in the data extract in Tableau. The preview button will send this request to Elasticsearch based on the current configuration and populate a table at the bottom of the view. This feature is useful for debugging to make sure any custom queries and other configuration returns a valid response.
Note - it is recommended to set a small limit if in 'Search result mode' to limit the amount of data returned
The submit button will save the configuration for the data extract with Tableau and continue the process of creating the extract.
Tableau only supports field names with alphanumeric characters and underscores. The connector will replace non-supported characters with '_' underscore characters.
e.g.:
Elasticsearch field name | Resulting 'safe' Tableau field name |
---|---|
field_name |
field_name |
@timestamp |
_timestamp |
name.first |
name_first |
car.@name |
car__name |
Note: Field names available to select from the Connector UI will be the Elasticsearch field names, but data in Preview Mode and what is actually provided to Tableau will be converted to the safe names
Note: Additional logic will be applied to rename fields according to: Tableau help articlewhen 'Improve Field Names' has been selected under 'Override Field Defaults'
For types that include mapping with objects (fields with their set of properties), a concatenated field name will be created. For the following mapping:
{
"person": {
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"address": {
"properties": {
"street": {
"type": "string"
},
"city": {
"type": "string"
},
"zip": {
"type": "string"
}
}
}
}
}
}
Will create the following fields:
person_firstName
person_lastName
person_address_street
person_address_city
person_address_zip
For geo_point
fields in Elasticsearch, this connector will create two separate Tableau fields by parsing the lat, lon
value:
- Latitude - field will be named
<field-name>_latitude
- float type - Longitude - field will be named
<field-name>_longitude
- float type
The connector supports Tableau's incremental refresh feature in 'Search Result' mode. This can be used to extract large datasets from Elasticsearch that can be incrementally imported into Tableau.
Your Elasticsearch type should have a date, time or integer field that is used to query for incremental data. The last value for this
column is stored and used on subsequent extracts as the starting value. E.g., if the last value seen for a field @timestamp
was 1/1/2000 00:00:00
then the next time an incremental extract is processed, the query to Elasticsearch will filter on the @timestamp
field for values
greater than 1/1/2000 00:00:00
.
Generally the value should be unique and be automatically incremented as new data is added to Elasticsearch (why a timestamp or auto incrementing sequence number are good choices).
For more information refer to:
This project has been made possible in part by support from DialogTech