Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add task report fields in response of SQL statements endpoint #16808

Conversation

Akshat-Jain
Copy link
Contributor

@Akshat-Jain Akshat-Jain commented Jul 26, 2024

Description

This PR adds an optional query param detail to /v2/sql/statements/query-id endpoint.

As per offline discussion with @vogievetsky, the PR adds 3 additional fields to the API response when detail=true is passed:

  1. stages: Stages for the query from MSQ task report
  2. counters: Stage counters for the query from MSQ task report
  3. warnings: Warning reports for the query from MSQ task report

This is to allow us to have a single API that returns the status and task report for MSQ queries.

Sample API response when detail=true is passed
{
  "queryId": "query-db88b7ba-7c1e-4eb1-b553-3076b0284365",
  "state": "SUCCESS",
  "createdAt": "2024-07-26T18:38:22.855Z",
  "schema": [
    {
      "name": "__time",
      "type": "TIMESTAMP",
      "nativeType": "LONG"
    },
    {
      "name": "name",
      "type": "VARCHAR",
      "nativeType": "STRING"
    },
    {
      "name": "rollNumber",
      "type": "BIGINT",
      "nativeType": "LONG"
    },
    {
      "name": "country",
      "type": "VARCHAR",
      "nativeType": "STRING"
    },
    {
      "name": "age",
      "type": "BIGINT",
      "nativeType": "LONG"
    },
    {
      "name": "grade",
      "type": "VARCHAR",
      "nativeType": "STRING"
    }
  ],
  "durationMs": 1656,
  "result": {
    "numTotalRows": 6,
    "totalSizeInBytes": 701,
    "dataSource": "__query_select",
    "sampleRecords": [
      [
        1442278018771,
        "Ankit Singh",
        6,
        "India",
        26,
        "E"
      ],
      [
        1442018818771,
        "Adarsh",
        1,
        "India",
        20,
        "A"
      ],
      [
        1442191618771,
        "Amit",
        4,
        "India",
        26,
        "C"
      ],
      [
        1442105218771,
        "Akshat",
        3,
        "India",
        24,
        "A"
      ],
      [
        1442191618771,
        "Ankit Kumar",
        5,
        "India",
        26,
        "D"
      ],
      [
        1442018818771,
        "Ajith",
        2,
        "India",
        22,
        "B"
      ]
    ],
    "pages": [
      {
        "id": 0,
        "numRows": 6,
        "sizeInBytes": 701
      }
    ]
  },
  "stages": [
    {
      "stageNumber": 0,
      "definition": {
        "id": "query-db88b7ba-7c1e-4eb1-b553-3076b0284365_0",
        "input": [
          {
            "type": "table",
            "dataSource": "roll_number_datasource",
            "intervals": [
              "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
            ]
          }
        ],
        "processor": {
          "type": "scan",
          "query": {
            "queryType": "scan",
            "dataSource": {
              "type": "inputNumber",
              "inputNumber": 0
            },
            "intervals": {
              "type": "intervals",
              "intervals": [
                "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
              ]
            },
            "resultFormat": "compactedList",
            "limit": 1001,
            "columns": [
              "__time",
              "age",
              "country",
              "grade",
              "name",
              "rollNumber"
            ],
            "context": {
              "__resultFormat": "array",
              "__user": "allowAll",
              "enableWindowing": true,
              "executionMode": "async",
              "finalize": true,
              "maxNumTasks": 3,
              "maxParseExceptions": 0,
              "queryId": "db88b7ba-7c1e-4eb1-b553-3076b0284365",
              "scanSignature": "[{\"name\":\"__time\",\"type\":\"LONG\"},{\"name\":\"age\",\"type\":\"LONG\"},{\"name\":\"country\",\"type\":\"STRING\"},{\"name\":\"grade\",\"type\":\"STRING\"},{\"name\":\"name\",\"type\":\"STRING\"},{\"name\":\"rollNumber\",\"type\":\"LONG\"}]",
              "sqlOuterLimit": 1001,
              "sqlQueryId": "db88b7ba-7c1e-4eb1-b553-3076b0284365",
              "sqlStringifyArrays": false
            },
            "columnTypes": [
              "LONG",
              "LONG",
              "STRING",
              "STRING",
              "STRING",
              "LONG"
            ],
            "granularity": {
              "type": "all"
            },
            "legacy": false
          }
        },
        "signature": [
          {
            "name": "__boost",
            "type": "LONG"
          },
          {
            "name": "__time",
            "type": "LONG"
          },
          {
            "name": "age",
            "type": "LONG"
          },
          {
            "name": "country",
            "type": "STRING"
          },
          {
            "name": "grade",
            "type": "STRING"
          },
          {
            "name": "name",
            "type": "STRING"
          },
          {
            "name": "rollNumber",
            "type": "LONG"
          }
        ],
        "shuffleSpec": {
          "type": "mix"
        },
        "maxWorkerCount": 2
      },
      "phase": "FINISHED",
      "workerCount": 2,
      "partitionCount": 1,
      "shuffle": "mix",
      "output": "localStorage",
      "startTime": "2024-07-26T18:38:23.102Z",
      "duration": 1202
    },
    {
      "stageNumber": 1,
      "definition": {
        "id": "query-db88b7ba-7c1e-4eb1-b553-3076b0284365_1",
        "input": [
          {
            "type": "stage",
            "stage": 0
          }
        ],
        "processor": {
          "type": "limit",
          "limit": 1001
        },
        "signature": [
          {
            "name": "__boost",
            "type": "LONG"
          },
          {
            "name": "__time",
            "type": "LONG"
          },
          {
            "name": "age",
            "type": "LONG"
          },
          {
            "name": "country",
            "type": "STRING"
          },
          {
            "name": "grade",
            "type": "STRING"
          },
          {
            "name": "name",
            "type": "STRING"
          },
          {
            "name": "rollNumber",
            "type": "LONG"
          }
        ],
        "shuffleSpec": {
          "type": "maxCount",
          "clusterBy": {
            "columns": [
              {
                "columnName": "__boost",
                "order": "ASCENDING"
              }
            ]
          },
          "partitions": 1
        },
        "maxWorkerCount": 1
      },
      "phase": "FINISHED",
      "workerCount": 1,
      "partitionCount": 1,
      "shuffle": "globalSort",
      "output": "localStorage",
      "startTime": "2024-07-26T18:38:24.303Z",
      "duration": 14,
      "sort": true
    }
  ],
  "counters": {
    "0": {
      "0": {
        "input0": {
          "type": "channel",
          "rows": [
            3
          ],
          "bytes": [
            4039
          ],
          "files": [
            2
          ],
          "totalFiles": [
            2
          ]
        },
        "output": {
          "type": "channel",
          "rows": [
            3
          ],
          "bytes": [
            351
          ],
          "frames": [
            2
          ]
        },
        "shuffle": {
          "type": "channel",
          "rows": [
            3
          ],
          "bytes": [
            351
          ],
          "frames": [
            2
          ]
        }
      },
      "1": {
        "input0": {
          "type": "channel",
          "rows": [
            3
          ],
          "bytes": [
            4038
          ],
          "files": [
            2
          ],
          "totalFiles": [
            2
          ]
        },
        "output": {
          "type": "channel",
          "rows": [
            3
          ],
          "bytes": [
            350
          ],
          "frames": [
            2
          ]
        },
        "shuffle": {
          "type": "channel",
          "rows": [
            3
          ],
          "bytes": [
            350
          ],
          "frames": [
            2
          ]
        }
      }
    },
    "1": {
      "0": {
        "input0": {
          "type": "channel",
          "rows": [
            6
          ],
          "bytes": [
            701
          ],
          "frames": [
            4
          ]
        },
        "output": {
          "type": "channel",
          "rows": [
            6
          ],
          "bytes": [
            701
          ],
          "frames": [
            4
          ]
        },
        "shuffle": {
          "type": "channel",
          "rows": [
            6
          ],
          "bytes": [
            599
          ],
          "frames": [
            1
          ]
        },
        "sortProgress": {
          "type": "sortProgress",
          "totalMergingLevels": 3,
          "levelToTotalBatches": {
            "0": 2,
            "1": 1,
            "2": 1
          },
          "levelToMergedBatches": {
            "0": 2,
            "1": 1,
            "2": 1
          },
          "totalMergersForUltimateLevel": 1,
          "progressDigest": 1.0
        }
      }
    }
  },
  "warnings": []
}

Release Notes

Add optional boolean query parameter detail for the API to get query status.

Path: /v2/sql/statements/query-id?detail=true
Method: GET

If the optional query parameter detail is supplied, then the response also includes the following:

  • A stages object that summarizes information about the different stages being used for query execution, such as stage number, phase, start time, duration, input and output information, processing methods, and partitioning.
  • A counters object that provides details on the rows, bytes, and files processed at various stages for each worker across different channels, along with sort progress.
  • A warnings object that provides details about any warnings.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jul 26, 2024
Copy link
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to have detail as the parameter name or detailed. detailed=true seems better than detail=true. Maybe I am overthinking, but perhaps someone with more knowledge of API designing would help out here.

@@ -108,4 +108,14 @@ private void putAll(final Map<Integer, Map<Integer, CounterSnapshots>> otherMap)
}
}
}

@Override
public String toString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this now right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed for the new test added in the PR: testMSQSelectRunningQueryWithDetail as assertSqlStatementResult() method has the following assertion logic for counters:

if (actual.getCounters() == null || expected.getCounters() == null) {
       Assert.assertEquals(expected.getCounters(), actual.getCounters());
     } else {
       Assert.assertEquals(expected.getCounters().toString(), actual.getCounters().toString());
     }

@LakshSingla
Copy link
Contributor

@Akshat-Jain Please add the release notes in the description and relevant documentation for this flag.

@LakshSingla LakshSingla merged commit bb4d6cc into apache:master Aug 1, 2024
87 of 88 checks passed
@@ -605,7 +626,10 @@ private Optional<SqlStatementResult> getStatementStatus(
sqlStatementState,
msqControllerTask.getQuerySpec().getDestination()
).orElse(null) : null,
null
null,
detail ? SqlStatementResourceHelper.getQueryStagesReport(msqTaskReportPayloadSupplier.get().orElse(null)) : null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wont you call the overlord 3 times here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cryptoe Have raised a PR to get rid of the redundant calls: #16839

Thanks for pointing this out!

sreemanamala pushed a commit to sreemanamala/druid that referenced this pull request Aug 6, 2024
…#16808)

If the optional query parameter detail is supplied, then the response also includes the following:

 * A stages object that summarizes information about the different stages being used for query execution, such as stage number, phase, start time, duration, input and output information, processing methods, and partitioning.
* A counters object that provides details on the rows, bytes, and files processed at various stages for each worker across different channels, along with sort progress.
* A warnings object that provides details about any warnings.
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Documentation Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying Design Review Release Notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants