Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable behavior on HTTP status codes #4

Closed
berndruecker opened this issue Nov 18, 2019 · 7 comments
Closed

Configurable behavior on HTTP status codes #4

berndruecker opened this issue Nov 18, 2019 · 7 comments

Comments

@berndruecker
Copy link
Contributor

berndruecker commented Nov 18, 2019

Originally came up to Allow HTTP 202 for HTTP as a service task could be to invoke an HTTP endpoint:

68870485-3f7dfa00-06fb-11ea-9cff-a46e7e9c6a66

If that HTTP endpoint returns a 202 it means, that it will do the processing asynchronously. In this case the following behavior of the HTTP Worker would be beneficial:

  • Lock the task for a configured timeout (e.g. 10 minutes), so that the HTTP request is not done again
  • Do NOT complete the current job, so that the workflow keeps waiting in the service task
  • Some external component needs to call back Zeebe with the job id in order to complete it, so that the workflow moves on
  • If this does not happen within the timeout, the HTTP request is retried

This might happen around https://github.com/camunda-cloud/zeebe-http-worker/blob/master/src/worker/worker.ts#L136

A discussion is if we need to have this behavior configurable. Or in other words: Do people want the workflow to continue in case of HTTP 202 (I could imagine that this is the expectation for most people).

Together with #3 this improves the support for orchestration of functions. So for example in AWS it could work like this:

68871059-26c21400-06fc-11ea-9515-79107439b966

As an alternative you could go for an explicit modeling of the asynchronous indirection:

68927888-173cdc80-0789-11ea-92b7-7aca265afe01

The downside is that the workflow model gets pretty bloated. While it is technically correct and you could argue that it is good that you can see it is asynchronous under the hood, I am pretty sure a lot of people will freak out about it, especially compared to AWS Step Functions where it is always one box.

Also another challenge with that alternative is the correlation (how to route the message to the right waiting workflow instance). In the 202 scenario described above this is solved by the "jobId" from Zeebe. In the async modeling scenario somebody has to create a unique UUID to be used for correlation. This is currently not out-of-the-box with the HTTP cloud worker.

@saig0
Copy link
Contributor

saig0 commented Nov 22, 2019

As a first step, the worker completes a job with a variable jobKey that holds the key of the completed job if the status code is 202. The jobKey can be used to correlate asynchronous results. For example, by using the jobKey as correlationKey on a following message catch event/receive task.

See b485db5.

@berndruecker
Copy link
Contributor Author

That does not work IMHO as you might have parallel pathes overwriting that message.
At the moment the jobKey can be passed on to the external lambda and used to complete a waiting task/job - so no need to send in a message.

@berndruecker berndruecker changed the title Support HTTP 202 for HTTP Worker Configurable behavior on HTTP status codes Nov 28, 2019
@saig0
Copy link
Contributor

saig0 commented Dec 2, 2019

Currently, jobs are not failed if the returned status code is in the given statusCodeFailure task header ( default: 3xx, 4xx, 5xx). The job command is not sent.

@berndruecker
Copy link
Contributor Author

Thanks! I will add a test case and fix that.

@berndruecker
Copy link
Contributor Author

berndruecker commented Dec 2, 2019

@saig0: c3f6a52. Also opened #22 for later

@saig0
Copy link
Contributor

saig0 commented Dec 2, 2019

Regarding the jobKey variable, I don't see a big issue here. Since the HTTP worker is a reusable worker, I assume that the task has output variable mappings to map the generic job variables into concrete workflow instance variables. If the jobKey is not listed then it is not merged into the workflow instance variables.

My idea was that if the worker gets a 202 status code, it completes the job with its jobKey which is used as correlationKey in a following message catch event. So, the workflow instance continues without waiting for the result in the task - until it enters the message catch event. The invoked application when publishes a message with the correlation key (directly or via Kafka etc.) and the message is correlated to the workflow instance.

However, it is okay to remove it for now and wait until someone requests it.

@berndruecker
Copy link
Contributor Author

Thanks for the explanation! My hope would be that camunda/camunda#3417 allows to have a more flexible approach without the worker doing this (e.g. using the jobKey in an EL on input/output) - but let's wait for that to be sketched out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants