apache · vogievetsky · Jul 24, 2023 · Jun 7, 2023 · Jun 6, 2023 · Jun 6, 2023
diff --git a/.github/workflows/static-checks.yml b/.github/workflows/static-checks.yml
@@ -150,7 +150,6 @@ jobs:
         run: |
           (cd website && npm install)
           cd website
-          npm run link-lint
           npm run spellcheck
 
       - name: web console

diff --git a/.gitignore b/.gitignore
@@ -35,6 +35,8 @@ integration-tests/gen-scripts/
 **/.local/
 **/druidapi.egg-info/
 examples/quickstart/jupyter-notebooks/docker-jupyter/notebooks
+website/.docusaurus/
+
 
 # ignore NetBeans IDE specific files
 nbproject

diff --git a/README.md b/README.md
@@ -84,9 +84,13 @@ Use the built-in query workbench to prototype [DruidSQL](https://druid.apache.or
 
 ### Documentation
 
-See the [latest documentation](https://druid.apache.org/docs/latest/) for the documentation for the current official release.  If you need information on a previous release, you can browse [previous releases documentation](https://druid.apache.org/docs/).
+See the [latest documentation](https://druid.apache.org/docs/latest/) for the documentation for the current official release. If you need information on a previous release, you can browse [previous releases documentation](https://druid.apache.org/docs/).
 
-Make documentation and tutorials updates in [`/docs`](https://github.com/apache/druid/tree/master/docs) using [MarkDown](https://www.markdownguide.org/) and contribute them using a pull request.
+Make documentation and tutorials updates in [`/docs`](https://github.com/apache/druid/tree/master/docs) using [MarkDown](https://www.markdownguide.org/) or extended Markdown [(MDX)](https://mdxjs.com/). Then, open a pull request.
+
+To build the site locally, you need Node 16.14 or higher and to install Docusaurus 2 with `npm|yarn install`  in the `website` directory. Then you can run `npm|yarn start` to launch a local build of the docs.
+
+If you're looking to update non-doc pages like Use Cases, those files are in the [`druid-website-src`](https://github.com/apache/druid-website-src/tree/master) repo.
 
 ### Community
 

diff --git a/docs/design/architecture.md b/docs/design/architecture.md
@@ -29,7 +29,7 @@ Druid has a distributed architecture that is designed to be cloud-friendly and e
 
 The following diagram shows the services that make up the Druid architecture, how they are typically organized into servers, and how queries and data flow through this architecture.
 
-<img src="../assets/druid-architecture.png" width="800"/>
+![](../assets/druid-architecture.png)
 
 The following sections describe the components of this architecture. 
 
@@ -107,7 +107,7 @@ example, a single day, if your datasource is partitioned by day). Within a chunk
 [_segments_](../design/segments.md). Each segment is a single file, typically comprising up to a few million rows of data. Since segments are
 organized into time chunks, it's sometimes helpful to think of segments as living on a timeline like the following:
 
-<img src="../assets/druid-timeline.png" width="800" />
+![](../assets/druid-timeline.png)
 
 A datasource may have anywhere from just a few segments, up to hundreds of thousands and even millions of segments. Each
 segment is created by a MiddleManager as _mutable_ and _uncommitted_. Data is queryable as soon as it is added to

diff --git a/docs/design/processes.md b/docs/design/processes.md
@@ -43,7 +43,7 @@ Druid processes can be deployed any way you like, but for ease of deployment we
 * **Query**
 * **Data**
 
-<img src="../assets/druid-architecture.png" width="800"/>
+![](../assets/druid-architecture.png)
 
 This section describes the Druid processes and the suggested Master/Query/Data server organization, as shown in the architecture diagram above.
 

diff --git a/docs/ingestion/ingestion-spec.md b/docs/ingestion/ingestion-spec.md
@@ -237,7 +237,7 @@ Dimension objects can have the following components:
 
 | Field | Description | Default |
 |-------|-------------|---------|
-| type | Either `auto`, `string`, `long`, `float`, `double`, or `json`. For the `auto` type, Druid determines the most appropriate type for the dimension and assigns one of the following: STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json> columns, all sharing a common 'nested' format. When Druid infers the schema with schema auto-discovery, the type is `auto`. | `string` |
+| type | Either `auto`, `string`, `long`, `float`, `double`, or `json`. For the `auto` type, Druid determines the most appropriate type for the dimension and assigns one of the following: STRING, ARRAY<STRING\>, LONG, ARRAY<LONG\>, DOUBLE, ARRAY<DOUBLE\>, or COMPLEX<json\> columns, all sharing a common 'nested' format. When Druid infers the schema with schema auto-discovery, the type is `auto`. | `string` |
 | name | The name of the dimension. This will be used as the field name to read from input records, as well as the column name stored in generated segments.<br /><br />Note that you can use a [`transformSpec`](#transformspec) if you want to rename columns during ingestion time. | none (required) |
 | createBitmapIndex | For `string` typed dimensions, whether or not bitmap indexes should be created for the column in generated segments. Creating a bitmap index requires more storage, but speeds up certain kinds of filtering (especially equality and prefix filtering). Only supported for `string` typed dimensions. | `true` |
 | multiValueHandling | Specify the type of handling for [multi-value fields](../querying/multi-value-dimensions.md). Possible values are `sorted_array`, `sorted_set`, and `array`. `sorted_array` and `sorted_set` order the array upon ingestion. `sorted_set` removes duplicates. `array` ingests data as-is | `sorted_array` |

diff --git a/docs/multi-stage-query/api.md b/docs/multi-stage-query/api.md
@@ -3,6 +3,8 @@ id: api
 title: SQL-based ingestion and multi-stage query task API
 sidebar_label: API
 ---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
 
 <!--
   ~ Licensed to the Apache Software Foundation (ASF) under one
@@ -52,9 +54,10 @@ As an experimental feature, this endpoint also accepts SELECT queries. SELECT qu
 by the controller, and written into the [task report](#get-the-report-for-a-query-task) as an array of arrays. The
 behavior and result format of plain SELECT queries (without INSERT or REPLACE) is subject to change.
 
-<!--DOCUSAURUS_CODE_TABS-->
+<Tabs>
+
+<TabItem value="1" label="HTTP">
 
-<!--HTTP-->
 
 ```
 POST /druid/v2/sql/task
@@ -69,7 +72,10 @@ POST /druid/v2/sql/task
 }
 ```
 
-<!--curl-->
+</TabItem>
+
+<TabItem value="2" label="curl">
+
 
 ```bash
 # Make sure you replace `username`, `password`, `your-instance`, and `port` with the values for your deployment.
@@ -83,7 +89,10 @@ curl --location --request POST 'https://<username>:<password>@<your-instance>:<p
   }'
 ```
 
-<!--Python-->
+</TabItem>
+
+<TabItem value="3" label="Python">
+
 
 ```python
 import json
@@ -108,7 +117,9 @@ print(response.text)
 
 ```
 
-<!--END_DOCUSAURUS_CODE_TABS-->
+</TabItem>
+
+</Tabs>
 
 #### Response
 
@@ -132,22 +143,29 @@ You can retrieve status of a query to see if it is still running, completed succ
 
 #### Request
 
-<!--DOCUSAURUS_CODE_TABS-->
+<Tabs>
+
+<TabItem value="4" label="HTTP">
 
-<!--HTTP-->
 
 ```
 GET /druid/indexer/v1/task/<taskId>/status
 ```
 
-<!--curl-->
+</TabItem>
+
+<TabItem value="5" label="curl">
+
 
 ```bash
 # Make sure you replace `username`, `password`, `your-instance`, `port`, and `taskId` with the values for your deployment.
 curl --location --request GET 'https://<username>:<password>@<your-instance>:<port>/druid/indexer/v1/task/<taskId>/status'
 ```
 
-<!--Python-->
+</TabItem>
+
+<TabItem value="6" label="Python">
+
 
 ```python
 import requests
@@ -163,7 +181,9 @@ response = requests.get(url, headers=headers, data=payload, auth=('USER', 'PASSW
 print(response.text)
 ```
 
-<!--END_DOCUSAURUS_CODE_TABS-->
+</TabItem>
+
+</Tabs>
 
 #### Response
 
@@ -208,22 +228,29 @@ For an explanation of the fields in a report, see [Report response fields](#repo
 
 #### Request
 
-<!--DOCUSAURUS_CODE_TABS-->
+<Tabs>
+
+<TabItem value="7" label="HTTP">
 
-<!--HTTP-->
 
 ```
 GET /druid/indexer/v1/task/<taskId>/reports
 ```
 
-<!--curl-->
+</TabItem>
+
+<TabItem value="8" label="curl">
+
 
 ```bash
 # Make sure you replace `username`, `password`, `your-instance`, `port`, and `taskId` with the values for your deployment.
 curl --location --request GET 'https://<username>:<password>@<your-instance>:<port>/druid/indexer/v1/task/<taskId>/reports'
 ```
 
-<!--Python-->
+</TabItem>
+
+<TabItem value="9" label="Python">
+
 
 ```python
 import requests
@@ -236,7 +263,9 @@ response = requests.get(url, headers=headers, auth=('USER', 'PASSWORD'))
 print(response.text)
 ```
 
-<!--END_DOCUSAURUS_CODE_TABS-->
+</TabItem>
+
+</Tabs>
 
 #### Response
 
@@ -511,7 +540,7 @@ The response shows an example report for a query.
                 "0": 1,
                 "1": 1,
                 "2": 1
-              }, 
+              },
               "totalMergersForUltimateLevel": 1,
               "progressDigest": 1
             }
@@ -587,22 +616,29 @@ The following table describes the response fields when you retrieve a report for
 
 #### Request
 
-<!--DOCUSAURUS_CODE_TABS-->
+<Tabs>
+
+<TabItem value="10" label="HTTP">
 
-<!--HTTP-->
 
 ```
 POST /druid/indexer/v1/task/<taskId>/shutdown
 ```
 
-<!--curl-->
+</TabItem>
+
+<TabItem value="11" label="curl">
+
 
 ```bash
 # Make sure you replace `username`, `password`, `your-instance`, `port`, and `taskId` with the values for your deployment.
 curl --location --request POST 'https://<username>:<password>@<your-instance>:<port>/druid/indexer/v1/task/<taskId>/shutdown'
 ```
 
-<!--Python-->
+</TabItem>
+
+<TabItem value="12" label="Python">
+
 
 ```python
 import requests
@@ -618,7 +654,9 @@ response = requests.post(url, headers=headers, data=payload, auth=('USER', 'PASS
 print(response.text)
 ```
 
-<!--END_DOCUSAURUS_CODE_TABS-->
+</TabItem>
+
+</Tabs>
 
 #### Response