Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(async): long-running async job API #698

Merged
merged 24 commits into from
Dec 11, 2024

Conversation

Josh-Matsuoka
Copy link
Contributor

@Josh-Matsuoka Josh-Matsuoka commented Oct 29, 2024

Fixes #286

Hi,

This adds a prototype for long running API operations (first step of #286).

The workflow proceeds as described in the issue:

  • Cryostat receives a patch request to archive a recording
  • Basic validation is done (check that target is valid and that the recording exists)
  • Following this an ArchiveRequest is created that tracks a job ID and the requested recording
  • The archive request is then passed on to the ArchiveRequestGenerator (works similarly to the InterruptibleReportsGenerator)
  • ArchiveRequestGenerator performs the archiving on a separate thread, firing a notification to the web client for success or failure which includes the job ID

TODO: Web Client needs to be adjusted to account for the new notifications, refreshing the tables when it receives them.
TODO: Implement the same framework for Grafana uploads of active/archived recordings, as well as report generation for active/archived recordings

@mergify mergify bot added the safe-to-test label Oct 29, 2024
@Josh-Matsuoka Josh-Matsuoka added the feat New feature or request label Oct 29, 2024
@Josh-Matsuoka
Copy link
Contributor Author

@andrewazores which other endpoints do you think could use a framework like this? What other operations are likely to block the client for a long time?

@andrewazores
Copy link
Member

andrewazores commented Oct 30, 2024

Archiving an active recording has the obvious long-running concerns where the current architecture can lead to the client waiting a long time for Cryostat to respond. Another similar case right now is uploading recordings to the jfr-datasource for viewing in Grafana. Both the Grafana feature and the report generation feature also run into it for both active and archived recordings.

In the future, features like taking a heap dump should also fit into this same long-running task architecture.

@Josh-Matsuoka
Copy link
Contributor Author

/build_test

Copy link

Workflow started at 11/21/2024, 5:30:33 PM. View Actions Run.

Copy link

OpenAPI schema change detected:

diff --git a/schema/openapi.yaml b/schema/openapi.yaml
index 4b168c9..8c8c378 100644
--- a/schema/openapi.yaml
+++ b/schema/openapi.yaml
@@ -1,25 +1,13 @@
 ---
 components:
   schemas:
-    AnalysisResult:
-      properties:
-        evaluation:
-          $ref: '#/components/schemas/Evaluation'
-        name:
-          type: string
-        score:
-          format: double
-          type: number
-        topic:
-          type: string
-      type: object
     Annotations:
       properties:
         cryostat:
           additionalProperties:
             type: string
           type: object
         platform:
           additionalProperties:
             type: string
           type: object
@@ -232,33 +220,20 @@ components:
           format: uri
           type: string
         id:
           $ref: '#/components/schemas/UUID_Flat'
         realm:
           $ref: '#/components/schemas/DiscoveryNode_Flat'
       required:
         - id
         - realm
       type: object
-    Evaluation:
-      properties:
-        explanation:
-          type: string
-        solution:
-          type: string
-        suggestions:
-          items:
-            $ref: '#/components/schemas/Suggestion'
-          type: array
-        summary:
-          type: string
-      type: object
     Event:
       properties:
         clazz:
           type: string
         description:
           type: string
         fields:
           items:
             $ref: '#/components/schemas/Field'
           type: array
@@ -302,20 +277,30 @@ components:
         relationKey:
           type: string
       type: object
     FileUpload:
       type: object
     GitInfo:
       properties:
         hash:
           type: string
       type: object
+    HttpServerResponse:
+      properties:
+        chunked:
+          type: boolean
+        statusCode:
+          format: int32
+          type: integer
+        statusMessage:
+          type: string
+      type: object
     Instant:
       example: 2022-03-10T16:15:50Z
       format: date-time
       type: string
     JsonObject:
       items:
         properties:
           key:
             type: string
           value: {}
@@ -547,29 +532,20 @@ components:
       type: object
     SerializableOptionDescriptor:
       properties:
         defaultValue:
           type: string
         description:
           type: string
         name:
           type: string
       type: object
-    Suggestion:
-      properties:
-        name:
-          type: string
-        setting:
-          type: string
-        value:
-          type: string
-      type: object
     Target:
       properties:
         agent:
           readOnly: true
           type: boolean
         alias:
           pattern: \S
           type: string
         annotations:
           $ref: '#/components/schemas/Annotations'
@@ -1218,22 +1194,27 @@ paths:
       tags:
         - Event Templates
   /api/v4/grafana/{encodedKey}:
     post:
       parameters:
         - in: path
           name: encodedKey
           required: true
           schema:
             type: string
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/HttpServerResponse'
       responses:
-        "202":
+        "200":
           content:
             text/plain:
               schema:
                 type: string
           description: OK
         "401":
           description: Not Authorized
         "403":
           description: Not Allowed
       security:
@@ -1449,34 +1430,27 @@ paths:
       tags:
         - Archived Recordings
   /api/v4/reports/{encodedKey}:
     get:
       parameters:
         - in: path
           name: encodedKey
           required: true
           schema:
             type: string
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/HttpServerResponse'
       responses:
-        "202":
-          content:
-            text/plain:
-              schema:
-                type: string
-          description: OK
         "200":
-          content:
-            application/json:
-              schema:
-                additionalProperties:
-                  $ref: '#/components/schemas/AnalysisResult'
-                type: object
           description: OK
         "401":
           description: Not Authorized
         "403":
           description: Not Allowed
       security:
         - SecurityScheme: []
       tags:
         - Reports
   /api/v4/rules:
@@ -2126,59 +2100,60 @@ paths:
             format: int64
             type: integer
         - in: path
           name: targetId
           required: true
           schema:
             format: int64
             type: integer
       requestBody:
         content:
-          text/plain:
+          application/json:
             schema:
-              type: string
+              $ref: '#/components/schemas/HttpServerResponse'
       responses:
-        "202":
+        "200":
           content:
             text/plain:
               schema:
                 type: string
           description: OK
         "401":
           description: Not Authorized
         "403":
           description: Not Allowed
-        "404":
-          description: Not Found
-        "502":
-          description: Target not Reachable
       security:
         - SecurityScheme: []
       tags:
         - Active Recordings
   /api/v4/targets/{targetId}/recordings/{remoteId}/upload:
     post:
       parameters:
         - in: path
           name: remoteId
           required: true
           schema:
             format: int64
             type: integer
         - in: path
           name: targetId
           required: true
           schema:
             format: int64
             type: integer
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/HttpServerResponse'
       responses:
-        "202":
+        "200":
           content:
             text/plain:
               schema:
                 type: string
           description: OK
         "401":
           description: Not Authorized
         "403":
           description: Not Allowed
       security:
@@ -2193,34 +2168,27 @@ paths:
           required: true
           schema:
             format: int64
             type: integer
         - in: path
           name: targetId
           required: true
           schema:
             format: int64
             type: integer
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/HttpServerResponse'
       responses:
         "200":
-          content:
-            application/json:
-              schema:
-                additionalProperties:
-                  $ref: '#/components/schemas/AnalysisResult'
-                type: object
-          description: OK
-        "202":
-          content:
-            text/plain:
-              schema:
-                type: string
           description: OK
         "401":
           description: Not Authorized
         "403":
           description: Not Allowed
       security:
         - SecurityScheme: []
       tags:
         - Reports
   /api/v4/targets/{targetId}/snapshot:

Copy link

No GraphQL schema changes detected.

Copy link

Schema changes committed by the CI.

Copy link

CI build and push: At least one test failed ❌
https://github.com/cryostatio/cryostat/actions/runs/11962766557

@Josh-Matsuoka
Copy link
Contributor Author

Looking deeper into the test failures it looks like the endpoint set up for the test doesn't have the cache set up

`2024-11-26 15:56:19,348 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (executor-thread-3) HTTP Request to /api/v4/targets/30/reports/10 failed, error id: 19faaffe-b9cd-402e-a5c0-af736e49801d-1: java.lang.IllegalStateException: This cache is not an instance of io.quarkus.cache.CaffeineCache
at io.quarkus.cache.runtime.AbstractCache.as(AbstractCache.java:26)
at io.cryostat.reports.MemoryCachingReportsService.keyExists(MemoryCachingReportsService.java:117)
at io.cryostat.reports.ReportsServiceImpl_Subclass.keyExists(Unknown Source)
at io.cryostat.reports.ReportsServiceImpl_ClientProxy.keyExists(Unknown Source)
at io.cryostat.reports.Reports.getActive(Reports.java:131)
at io.cryostat.reports.Reports$quarkusrestinvoker$getActive_6023c517109fbf80c83578744a8b35943949dbe9.invoke(Unknown Source)
at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:141)
at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:147)
at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:635)
at org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2516)
at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2495)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1521)
at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:11)
at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:11)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)

2024-11-26 15:56:19,349 INFO [io.qua.htt.access-log] (executor-thread-3) 127.0.0.1 - - [26/Nov/2024:15:56:19 -0500] "GET /api/v4/targets/30/reports/10 HTTP/1.1" 500 1852
HTTP/1.1 500 Internal Server Error
Cache-Control: no-cache
content-type: application/json; charset=utf-8
content-length: 1852
`

@andrewazores
Copy link
Member

java.lang.IllegalStateException: This cache is not an instance of io.quarkus.cache.CaffeineCache

Huh. What is it an instance of, then? And whatever it is, does it have getIfPresent() or an equivalent?

@andrewazores
Copy link
Member

I guess it might just not have any cache instance at all in tests:

quarkus.cache.enabled=false

That's set up on purpose because we want to make sure that we actually execute through the real implementation during tests, for the most part, rather than just hitting cache layers. So I guess another possibility would be to just check whether the cache is enabled/available in those keyExists methods, too. The quarkusCache and memoryCache fields in the class should be useful here.

@Josh-Matsuoka
Copy link
Contributor Author

/build_test

Copy link

Workflow started at 11/28/2024, 2:44:30 PM. View Actions Run.

Copy link

No OpenAPI schema changes detected.

Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/12284128716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request safe-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Request] Long-running API operations should return Accepted response and notify by WebSocket
2 participants