Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Adding an open job to a group does not update autodetect with related scheduled events #31651

Closed
davidkyle opened this issue Jun 28, 2018 · 12 comments
Assignees
Labels

Comments

@davidkyle
Copy link
Member

Steps to reproduce:

  1. Create a calendar with a scheduled event
  2. Add a new job group to the calendar
  3. Create a job and open it. In the UI this is achieved by creating the job but not starting the datafeed
  4. Edit the job to add it to the new group
  5. Run the job and see that is has not picked up the calendar associated with the group

The problem is that updating the job with a group does not update autodetect with scheduled events.

cc @pheyos

@davidkyle davidkyle added good first issue low hanging fruit v7.0.0 v6.3.0 :ml Machine learning v6.4.0 labels Jun 28, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@jucaleb4
Copy link

jucaleb4 commented Jun 30, 2018

Hi, can I take a stab at this? @davidkyle @pheyos

@jucaleb4
Copy link

jucaleb4 commented Jul 2, 2018

I tried reproducing your steps and got this, it looks like the job job_id2 picked up the calendar planned-outages when they shared a common job and running it. Am I missing a step, as it looks like this solves step 5?

@davidkyle
Copy link
Member Author

Hi @jucaleb4 thank for your interest I'm delighted you want to take a stab at this.

For background on ML Calendars and Scheduled events see the docs, the basic idea is to define periods of time where anomalies won't be generated.

Below are the detailed steps to reproduce, you will need some sample data and a script into index it I've created a gist for that purpose.

  1. Extract the files from the gist and index the farequote dataset by running python upload_farequote.py with the file farequote_epoch_ms.csv in the same directory

  2. Create the index pattern farequote for the farequote index

  3. In Maching Learning create a single metric job on the farequote index pattern.

  4. Set the aggregation to count and run the job. Notice the single large anomaly on the 2016/06/25

  5. Go to Machine Learning -> Settings -> Calendar Management

  6. Create a new Calendar farequote

  7. Add the new group farequoute-group to the calendar

  8. Add a new event for a single day on 2016/06/25 with the description farequote anomaly

  9. Save the calendar

  10. Clone the original farequote job and name it farequote-with-calendar and add the Job Group farequote-group

  11. Save the job and click Start datafeed then Start

  12. View the job results and notice that there is no anomaly on 2016/06/25

This time the calendar worked as expected as no anomalies were created you should see results similar to this.
calendar-results

The bug occurs when you add the job to a group after the job has been created. To recreate the bug:

  1. Clone the original farequote job again. Name it farequote-bug and don't add the job group

  2. Click Save, click Ok to the influencers question then click Close instead of Start Datafeed. At this point the job is open.

  3. Go back the the Job Management page and edit the job farequote-bug (edit is one of the actions on the right of the screen)

  4. Add the job group farequote-group and click Update

  5. Now start the datafeed for job farequote-bug (the first action under actions on the right of the screen)

  6. View the results. The Calendar entry is not picked up and an anomaly is created on 2016/06/25.

The job is part of the job group which uses the farequote calendar. All jobs in the group should pick up the calendar entries.

The relevant code is in JobManager.updateJob, in particular when the job is changed by adding a group it needs to send an update to the native ML process

@jucaleb4
Copy link

jucaleb4 commented Jul 3, 2018

This will help me get started, thanks @davidkyle. If I have anymore questions, I'll let you know along the way.

@jucaleb4
Copy link

jucaleb4 commented Jul 7, 2018

@davidkyle Hi David, so the current elasticsearch source code is es7.0. However, the kibana download version I have is 6.3.0. So when I try to run ES, the Kibana complains about the type mismatch. Is there a way I can either a) downgrade es to 6.3.0 or b) upgrade kibana to 7.0 to test this code?

edit: figured it out!

@droberts195
Copy link
Contributor

Hi @jucaleb4. Are you still planning to submit a PR for this?

@jucaleb4
Copy link

@droberts195 I have not submitted a PR, I have been trying to get this to work but have not gotten a working solution. If it's fine with @davidkyle, you can tackle it

@droberts195
Copy link
Contributor

Thanks for having a go @jucaleb4. We'll transfer it to someone in the ML team.

@benwtrent
Copy link
Member

@jucaleb4 hey, I just opened a PR, let me know if you want to go over how I figured it out/fixed it :)

@jucaleb4
Copy link

@benwtrent hey, thanks for fixing it. Your changes make sense, it's something similar to what I did by adding a new boolean for a new group inside isAutodetectProcessUpdate(). When I ran it locally the changes didn't register. How did it work for you?

@benwtrent
Copy link
Member

Job has to be open for the change to be captured. Additionally, you have to signal that a change has occurred: https://github.com/elastic/elasticsearch/pull/32881/files#diff-150690f539e82196e503011ae92ed227R69

Scheduled events are separate but related things to a job, so one has to take into account letting the system know that the scheduled events need refreshed for the job as well.

benwtrent added a commit that referenced this issue Aug 17, 2018
* ML: fix updating opened jobs scheduled events (#31651)

* Adding UpdateParamsTests license header

* Adding integration test and addressing PR comments

* addressing test and job names
benwtrent added a commit that referenced this issue Aug 17, 2018
* ML: fix updating opened jobs scheduled events (#31651)

* Adding UpdateParamsTests license header

* Adding integration test and addressing PR comments

* addressing test and job names
jasontedor added a commit that referenced this issue Aug 18, 2018
* elastic/master: (46 commits)
  NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (#32764)
  [DOCS] Splits the users API documentation into multiple pages (#32825)
  [DOCS] Splits the token APIs into separate pages (#32865)
  [DOCS] Creates redirects for role management APIs page
  Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966)
  TEST: Mute testRetentionPolicyChangeDuringRecovery
  [DOCS] Fixes more broken links to role management APIs
  [Docs] Tweaks and fixes to rollup docs
  [DOCS] Fixes links to role management APIs
  [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature
  [DOCS] Splits the roles API documentation into multiple pages (#32794)
  [TEST]  Run pre 6.4 nodes in non-FIPS JVMs (#32901)
  Make Geo Context Mapping Parsing More Strict (#32821)
  [ML] fix updating opened jobs scheduled events (#31651) (#32881)
  Scripted metric aggregations: add deprecation warning and system property to control legacy params (#31597)
  Tests: Fix timezone conversion in DateTimeUnitTests
  Enable FIPS140LicenseBootstrapCheck (#32903)
  Fix InternalAutoDateHistogram reproducible failure (#32723)
  Remove assertion in testDocStats on deletedDocs counter (#32914)
  HLRC: Move ML request converters into their own class (#32906)
  ...
jasontedor added a commit that referenced this issue Aug 18, 2018
* 6.x: (42 commits)
  [DOCS] Splits the users API documentation into multiple pages (#32825)
  [DOCS] Splits the token APIs into separate pages (#32865)
  [DOCS] Creates redirects for role management APIs page
  Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966)
  TEST: Mute testRetentionPolicyChangeDuringRecovery
  [DOCS] Fixes more broken links to role management APIs
  [Docs] Tweaks and fixes to rollup docs
  [DOCS] Fixes links to role management APIs
  [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature
  [DOCS] Splits the roles API documentation into multiple pages (#32794)
  [TEST]  Run pre 6.4 nodes in non-FIPS JVMs (#32901)
  Remove assertion in testDocStats on deletedDocs counter (#32914)
  [ML] fix updating opened jobs scheduled events (#31651) (#32881)
  Enable FIPS140LicenseBootstrapCheck (#32903)
  HLRC: Move ML request converters into their own class (#32906)
  [DOCS] Update getting-started.asciidoc (#29518)
  Fix allowed value for HighlighterBuilder encoder in javadocs (#32780)
  [DOCS] Add "remove a tag" script logic as an example (#32556)
  RFC: Test that example plugins build stand-alone (#32235)
  Security: remove put privilege API (#32879)
  ...
@colings86 colings86 removed the v7.0.0 label Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants