Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to associate an ID with tasks #27764

Merged
merged 7 commits into from
Jan 12, 2018

Conversation

imotov
Copy link
Contributor

@imotov imotov commented Dec 11, 2017

Adds support for capturing the X-Opaque-Id header from a REST request and storing it's value in the tasks that this request started. It works for all user-initiated tasks (not only search).

Closes #23250

Usage:

$ curl -H "X-Opaque-Id: imotov" -H "foo:bar" "localhost:9200/_tasks?pretty&group_by=parents"
{
  "tasks" : {
    "7qrTVbiDQKiZfubUP7DPkg:6998" : {
      "node" : "7qrTVbiDQKiZfubUP7DPkg",
      "id" : 6998,
      "type" : "transport",
      "action" : "cluster:monitor/tasks/lists",
      "start_time_in_millis" : 1513029940042,
      "running_time_in_nanos" : 266794,
      "cancellable" : false,
      "headers" : {
        "X-Opaque-Id" : "imotov"
      },
      "children" : [
        {
          "node" : "V-PuCjPhRp2ryuEsNw6V1g",
          "id" : 6088,
          "type" : "netty",
          "action" : "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis" : 1513029940043,
          "running_time_in_nanos" : 67785,
          "cancellable" : false,
          "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998",
          "headers" : {
            "X-Opaque-Id" : "imotov"
          }
        },
        {
          "node" : "7qrTVbiDQKiZfubUP7DPkg",
          "id" : 6999,
          "type" : "direct",
          "action" : "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis" : 1513029940043,
          "running_time_in_nanos" : 98754,
          "cancellable" : false,
          "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998",
          "headers" : {
            "X-Opaque-Id" : "imotov"
          }
        }
      ]
    }
  }
}

@imotov imotov added :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. WIP labels Dec 11, 2017
@imotov imotov requested a review from s1monw December 11, 2017 22:26
@mattweber
Copy link
Contributor

+1

I could definitely use this feature!

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks reasonable @imotov lets make it reviewable ie. move it out of PoC mode and we take it from there?

Adds support for capturing the X-Opaque-Id header from a REST request and storing it's value in the tasks that this request started. It works for all user-initiated tasks (not only search).

Closes elastic#23250
@imotov imotov changed the title POC for adding ability to associate a search task ID Add adding ability to associate an ID with tasks Dec 21, 2017
@imotov imotov added review and removed WIP labels Dec 21, 2017
@imotov
Copy link
Contributor Author

imotov commented Dec 21, 2017

Added pluggable headers, tests and documentation.

@imotov imotov requested a review from nik9000 December 21, 2017 03:04
@s1monw
Copy link
Contributor

s1monw commented Jan 8, 2018

@imotov I just saw this that you updated it. Sorry. I will do a review now but it needs a merge with master.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM I left one question / suggestion

@@ -80,7 +89,15 @@ public void setTaskResultsService(TaskResultsService taskResultsService) {
* Returns the task manager tracked task or null if the task doesn't support the task manager
*/
public Task register(String type, String action, TaskAwareRequest request) {
Task task = request.createTask(taskIdGenerator.incrementAndGet(), type, action, request.getParentTask());
Map<String, String> headers = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though we have http.max_header_size I wonder if we should also add a hard limit to the total size here. We can simply accumulate the size of the keys and values for this and fail hard if is exceeded. I really don't want to hold on to MB of strings here that are literally user input. It can be simply a derivative of http.max_header_size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I am going to resolve conflicts and add this check.

@imotov
Copy link
Contributor Author

imotov commented Jan 8, 2018

@s1monw I merged in master, resolved conflicts and pushed check for the headers size.

@imotov imotov changed the title Add adding ability to associate an ID with tasks Add ability to associate an ID with tasks Jan 9, 2018

/**
* Task Manager service for keeping track of currently running tasks on the nodes
*/
public class TaskManager extends AbstractComponent implements ClusterStateApplier {
private static final TimeValue WAIT_FOR_COMPLETION_POLL = timeValueMillis(100);

/** Rest headers that are copied to the task */
private final Set<String> taskHeaders;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we iterate this Set a lot so we'd be better off making sure it isn't a HashSet which requires iteration of the buckets. Premature optimization, I know, but I'd feel comfortable if we copied it to a List here.

@imotov
Copy link
Contributor Author

imotov commented Jan 10, 2018

@s1monw do you want to take another look before I merge this in? Also should this be 7.0-only or should I try back-porting this to 6.x? What do you think?

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@imotov imotov added the v7.0.0 label Jan 11, 2018
@nik9000
Copy link
Member

nik9000 commented Jan 12, 2018

Also should this be 7.0-only or should I try back-porting this to 6.x? What do you think?

It is a little breaking for the plugin API but not very. I think it'd be fine to backport.

@imotov imotov merged commit c75ac31 into elastic:master Jan 12, 2018
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 13, 2018
* master: (30 commits)
  Fix lock accounting in releasable lock
  Add ability to associate an ID with tasks  (elastic#27764)
  [DOCS] Removed differencies between text and code (elastic#27993)
  text fixes (elastic#28136)
  Update getting-started.asciidoc (elastic#28145)
  [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187)
  Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186)
  Do not keep 5.x commits once having 6.x commits (elastic#28188)
  Rename core module to server (elastic#28180)
  upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183)
  [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads
  Primary send safe commit in file-based recovery (elastic#28038)
  [Docs] Correct response json in rank-eval.asciidoc
  Add scroll parameter to _reindex API (elastic#28041)
  Include all sentences smaller than fragment_size in the unified highlighter (elastic#28132)
  Modifies the JavaAPI docs related to AggregationBuilder
  [Docs] Improvements in script-fields.asciidoc (elastic#28174)
  [Docs] Remove Kerberos/SPNEGO Shield plugin (elastic#28019)
  Ignore null value for range field (elastic#27845) (elastic#28116)
  Fix environment variable substitutions in list setting (elastic#28106)
  ...
matarrese added a commit to matarrese/elasticsearch that referenced this pull request Jan 13, 2018
* master: (59 commits)
  Correct backport replica rollback to 6.2 (elastic#28181)
  Backport replica rollback to 6.2 (elastic#28181)
  Rename deleteLocalTranslog to createNewTranslog
  AwaitsFix #testRecoveryAfterPrimaryPromotion
  TEST: init unassigned gcp in testAcquireIndexCommit
  Replica start peer recovery with safe commit (elastic#28181)
  Truncate tlog cli should assign global checkpoint (elastic#28192)
  Fix lock accounting in releasable lock
  Add ability to associate an ID with tasks  (elastic#27764)
  [DOCS] Removed differencies between text and code (elastic#27993)
  text fixes (elastic#28136)
  Update getting-started.asciidoc (elastic#28145)
  [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187)
  Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186)
  Do not keep 5.x commits once having 6.x commits (elastic#28188)
  Rename core module to server (elastic#28180)
  upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183)
  [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads
  Primary send safe commit in file-based recovery (elastic#28038)
  [Docs] Correct response json in rank-eval.asciidoc
  ...
imotov added a commit that referenced this pull request Jan 15, 2018
Adds support for capturing the X-Opaque-Id header from a REST request and storing it's value in the tasks that this request started. It works for all user-initiated tasks (not only search).

Closes #23250

Usage:
```
$ curl -H "X-Opaque-Id: imotov" -H "foo:bar" "localhost:9200/_tasks?pretty&group_by=parents"
{
  "tasks" : {
    "7qrTVbiDQKiZfubUP7DPkg:6998" : {
      "node" : "7qrTVbiDQKiZfubUP7DPkg",
      "id" : 6998,
      "type" : "transport",
      "action" : "cluster:monitor/tasks/lists",
      "start_time_in_millis" : 1513029940042,
      "running_time_in_nanos" : 266794,
      "cancellable" : false,
      "headers" : {
        "X-Opaque-Id" : "imotov"
      },
      "children" : [
        {
          "node" : "V-PuCjPhRp2ryuEsNw6V1g",
          "id" : 6088,
          "type" : "netty",
          "action" : "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis" : 1513029940043,
          "running_time_in_nanos" : 67785,
          "cancellable" : false,
          "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998",
          "headers" : {
            "X-Opaque-Id" : "imotov"
          }
        },
        {
          "node" : "7qrTVbiDQKiZfubUP7DPkg",
          "id" : 6999,
          "type" : "direct",
          "action" : "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis" : 1513029940043,
          "running_time_in_nanos" : 98754,
          "cancellable" : false,
          "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998",
          "headers" : {
            "X-Opaque-Id" : "imotov"
          }
        }
      ]
    }
  }
}
```
imotov added a commit that referenced this pull request Jan 15, 2018
Update the serialization version after backporting #27764 to 6.x.
matarrese added a commit to matarrese/elasticsearch that referenced this pull request Jan 15, 2018
* master: (74 commits)
  Update version of TaskInfo header serialization after backport
  TEST: Tightens file-based condition in peer-recovery
  Correct backport replica rollback to 6.2 (elastic#28181)
  Backport replica rollback to 6.2 (elastic#28181)
  Rename deleteLocalTranslog to createNewTranslog
  AwaitsFix #testRecoveryAfterPrimaryPromotion
  TEST: init unassigned gcp in testAcquireIndexCommit
  Replica start peer recovery with safe commit (elastic#28181)
  Truncate tlog cli should assign global checkpoint (elastic#28192)
  Fix lock accounting in releasable lock
  Add ability to associate an ID with tasks  (elastic#27764)
  [DOCS] Removed differencies between text and code (elastic#27993)
  text fixes (elastic#28136)
  Update getting-started.asciidoc (elastic#28145)
  [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187)
  Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186)
  Do not keep 5.x commits once having 6.x commits (elastic#28188)
  Rename core module to server (elastic#28180)
  upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183)
  [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads
  ...
@Mpdreamz
Copy link
Member

Super late to this but is there a particular reason for this being a request header and not a common query string parameter? I would personally much rather settle on the latter.

@imotov
Copy link
Contributor Author

imotov commented Jan 16, 2018

@Mpdreamz it was the most straightforward way to implement it internally across all requests in a transparent way. It also seems different in the nature to common request parameters, that change the output format. In our case we just mark the request, which is more similar to authentication or "run-as" headers then to the common format parameters.

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 16, 2018
* compile-with-jdk-9: (56 commits)
  TEST: init unassigned gcp in testAcquireIndexCommit
  Replica start peer recovery with safe commit (elastic#28181)
  Truncate tlog cli should assign global checkpoint (elastic#28192)
  Fix lock accounting in releasable lock
  Add ability to associate an ID with tasks  (elastic#27764)
  [DOCS] Removed differencies between text and code (elastic#27993)
  text fixes (elastic#28136)
  Update getting-started.asciidoc (elastic#28145)
  [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187)
  Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186)
  Do not keep 5.x commits once having 6.x commits (elastic#28188)
  Rename core module to server (elastic#28180)
  upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183)
  [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads
  Primary send safe commit in file-based recovery (elastic#28038)
  [Docs] Correct response json in rank-eval.asciidoc
  Add scroll parameter to _reindex API (elastic#28041)
  Include all sentences smaller than fragment_size in the unified highlighter (elastic#28132)
  Modifies the JavaAPI docs related to AggregationBuilder
  [Docs] Improvements in script-fields.asciidoc (elastic#28174)
  ...
@codebrain
Copy link
Contributor

+1 to @Mpdreamz sentiment around this being preferred as a querystring parameter.

Implemented in the .NET client - https://github.com/elastic/elasticsearch-net/pull/3277/files

As far as I can see this parameter is only valid on certain user-initiated actions, yet the parameter value is set at the request headers level? In the .NET client this means that the value can theoretically be set on any API call - which may start to mislead client users. I can see the justification for the RunAs parameter, but this one does look different in scope.

Using the parameter in the headers collection as a filter to the task lists is also a little non-obvious and ought to be on the querystring call to tasks list, imo.

@imotov
Copy link
Contributor Author

imotov commented Jun 7, 2018

As far as I can see this parameter is only valid on certain user-initiated actions

I am not totally sure what you mean by this. Could you elaborate?

@codebrain
Copy link
Contributor

As far as I can tell this parameter only makes sense on API calls where there are potentials for long running tasks? i.e. if a client set this parameter (unique id) on every API call, would every API call result in a task in the task management API?

@imotov
Copy link
Contributor Author

imotov commented Jun 8, 2018

As far as I can tell this parameter only makes sense on API calls where there are potentials for long running tasks?

That's the main use case, but nobody prevents you from using it on every call since sometimes it's not very clear how long a call will take and definition of "long" is somewhat vague.

if a client set this parameter (unique id) on every API call, would every API call result in a task in the task management API?

Every API call (including internal API calls) result in a task in the task manager regardless. Even if you perform a call to list tasks in the task manager you will see this call in the list of returned tasks.

@codebrain
Copy link
Contributor

Thanks for clarifying @imotov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. >enhancement v6.2.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants