-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created druid.rst and added documentaton #35251
Closed
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
120 changes: 120 additions & 0 deletions
120
docs/apache-airflow-providers-apache-druid/connections/druid.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
.. _druid_integration: | ||
|
||
Apache Druid Integration with Apache Airflow | ||
============================================ | ||
|
||
Introduction | ||
------------ | ||
|
||
Apache Druid, a high-performance, real-time analytics database, can be seamlessly integrated with Apache Airflow using the DruidHook and DruidDbApiHook. The DruidHook facilitates the submission of ingestion tasks to the Druid cluster, while the DruidDbApiHook enables querying of the Druid broker. | ||
|
||
Establishing the Connection | ||
--------------------------- | ||
|
||
To establish a connection between Apache Airflow and Apache Druid, follow these steps: | ||
|
||
1. **Install the required packages**: Ensure that the necessary dependencies for integrating Apache Druid with Apache Airflow are installed. | ||
|
||
2. **Configure the Druid connection**: Set up the connection details, including the Druid connection ID, host, port, and any authentication credentials required to access the Druid cluster. | ||
|
||
3. **Initialize the DruidHook and DruidDbApiHook**: Create instances of the DruidHook and DruidDbApiHook in your Apache Airflow DAG (Directed Acyclic Graph) to facilitate communication with the Druid cluster and broker, respectively. | ||
|
||
.. _druid_connection: | ||
|
||
Druid Connection | ||
================ | ||
|
||
The Druid Connection type enables integration with the Apache Druid database within Apache Airflow. | ||
|
||
Default Connection IDs | ||
---------------------- | ||
|
||
Hooks and operators related to Druid use the default connection ID `druid_default` by default. | ||
|
||
Configuring the Connection | ||
-------------------------- | ||
|
||
The following parameters can be configured for the Druid connection: | ||
|
||
- **Host**: Host name or IP address of the Druid server. | ||
- **Port**: Port number of the Druid server. | ||
- **Schema**: Specify the schema, such as "http" or "https", for the Druid connection. | ||
- **Username**: Username for authentication (if applicable). | ||
- **Password**: Password for authentication (if applicable). | ||
- **Endpoint**: Endpoint for the Druid API. | ||
- **Optional Parameters**: Additional parameters that can be used in the Druid connection. | ||
|
||
- **param1**: Description of param1. | ||
- **param2**: Description of param2. | ||
- **param3**: Description of param3. | ||
|
||
By providing these configuration details, users can effectively set up the Druid connection within Apache Airflow, enabling seamless integration with the Apache Druid database for data analysis and querying purposes. | ||
|
||
Examples | ||
-------- | ||
|
||
Below is an example of how to configure the Druid connection in Apache Airflow: | ||
|
||
.. code-block:: python | ||
|
||
from airflow.providers.druid.hooks.druid import DruidHook | ||
|
||
# Set up the Druid connection | ||
druid_hook = DruidHook() | ||
druid_connection = druid_hook.get_connection("druid_default") | ||
|
||
# Configure the connection parameters | ||
druid_connection.host = "druid.example.com" | ||
druid_connection.port = 8082 | ||
druid_connection.schema = "http" | ||
druid_connection.login = "username" | ||
druid_connection.password = "password" | ||
druid_connection.extra = {"param1": "value1", "param2": "value2", "param3": "value3"} | ||
|
||
# Execute queries and operations with the configured connection | ||
|
||
For further details on advanced configurations and best practices, refer to the Apache Druid and Apache Airflow documentation. | ||
|
||
Executing Operations | ||
-------------------- | ||
|
||
Once the connection is established, you can perform various operations on the Druid cluster and broker: | ||
|
||
1. **Submitting ingestion tasks**: Utilize the DruidHook to submit ingestion tasks, allowing you to load data into the Druid database for analysis and querying. | ||
|
||
2. **Querying data**: Use the DruidDbApiHook to execute queries on the data stored in the Druid database, retrieving valuable insights and analytics for further processing. | ||
|
||
3. **Managing the Druid database**: Leverage the capabilities of the DruidHook and DruidDbApiHook to manage and maintain the Druid database, ensuring smooth and efficient data operations within your Apache Airflow workflows. | ||
|
||
For more detailed usage and configuration instructions, refer to the Apache Druid and Apache Airflow documentation. | ||
|
||
Example | ||
------- | ||
|
||
Below is an example of how to use the DruidHook and DruidDbApiHook to interact with the Druid cluster and broker in Apache Airflow: | ||
|
||
.. code-block:: python | ||
|
||
# Example code demonstrating the usage of the DruidHook and DruidDbApiHook | ||
# Import the necessary classes | ||
from airflow.providers.druid.hooks.druid import DruidHook, DruidDbApiHook | ||
|
||
# Initialize the DruidHook and DruidDbApiHook | ||
druid_hook = DruidHook() | ||
druid_db_api_hook = DruidDbApiHook() | ||
|
||
# Submit an ingestion task to the Druid cluster | ||
druid_hook.submit_indexing_job(json_index_spec) | ||
|
||
# Execute a query on the Druid broker | ||
result = druid_db_api_hook.run(sql_query) | ||
|
||
# Process the query results for further analysis and visualization | ||
process_results(result) | ||
|
||
For additional usage examples and best practices, consult the Apache Druid and Apache Airflow documentation. | ||
|
||
Conclusion | ||
---------- | ||
|
||
By integrating Apache Druid with Apache Airflow, you can harness the power of real-time analytics and efficient data processing within your data workflows, enabling seamless data ingestion, querying, and analysis for your business needs. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for #35244 (comment)
We need to align the connections first