Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate journey id, silently groups synthetic pings under the last journey in list #442

Closed
awahab07 opened this issue Jan 13, 2022 · 12 comments · Fixed by #568
Closed
Assignees
Labels
bug Something isn't working

Comments

@awahab07
Copy link

If an id is repeated across journeys in a suite, in Uptime, all the pings of the journeys with the same id will go under the journey occurring last in order. Uptime will show only one monitor in monitors overview page, with monitor title as the title of the journey read last.

No warning/error is reported in logs.

Correct data and screenshots are shown under each ping though.

Example:
A journey script as:

import { journey, step, expect } from '@elastic/synthetics';

journey({id: 'todo-journey-id', name: 'check if title is present'}, ({ page, params }) => {
  step('launch app', async () => {
    await page.goto(params.url);
  });

  step('assert title', async () => {
    const header = await page.$('h1');
    expect(await header.textContent()).toBe('todos');
    await page.type('input.new-todo', 'New Title');
  });
});

journey({id: 'todo-journey-id', name: 'check if input placeholder is correct'}, ({ page, params }) => {
  step('launch app', async () => {
    await page.goto(params.url);
  });

  step('assert placeholder value', async () => {
    const input = await page.$('input.new-todo');
    expect(await input.getAttribute('placeholder')).toBe(
      'What needs to be done?'
    );
  });
});

Will result in Uptime showing:
image

Whereas the the list of pings contains pings of both monitors with correct data and screenshots:
image

Heartbeat version: heartbeat version 8.1.0 (amd64), libbeat 8.1.0 [84924e9f32d018e360303714c26f754dbab34f93 built 2021-12-30 16:14:09 +0000 UTC]

Synthetics version: 1.0.0-beta.17

@awahab07 awahab07 added the bug Something isn't working label Jan 13, 2022
@paulb-elastic
Copy link
Contributor

I understand this is by design, where the id is used for uniqueness in Heartbeat (which is where the id is passed to for the results).

For instance, if you want to run the same monitor (e.g. ping my-endpoint every 30 seconds) from multiple servers, you’d use the same id in the YML configuration and they’d appear under the same monitor in Uptime (with different locations).

From https://www.elastic.co/guide/en/beats/heartbeat/current/heartbeat-reference-yml.html:

# ID used to uniquely identify this monitor in elasticsearch even if the config changes
  id: my-monitor

cc @andrewvc / @vigneshshanmugam - is that right?

@andrewvc
Copy link
Contributor

Yes, that's right @paulb-elastic . Ultimately, we can only do so much to prevent ID duplication. If you run two hearbeats in two locations with the same ID and totally different tests things will be weird.

Today heartbeat just lets the last configured ID (which is really just about which one is parsed most recently by the config parser) win, we also log an error.

We can definitely do something about shared IDs within a suite. We could either copy the heartbeat behavior or just have heartbeat run neither journey, instead reporting an error status for that journey until fixed.

Curious to hear other's preferences, but I kind of prefer the latter. Thoughts?

@paulb-elastic
Copy link
Contributor

Running neither probably makes it more obvious that something is wrong as they won't see any results for either.

With the last wins approach, they may see results for one and think everything is ok, not waiting to see if results for the second come through (although, surfacing errors more prominently as being done for suites specifically may help make it more obvious than going hunting in log files).

@vigneshshanmugam
Copy link
Member

We can definitely do something about shared IDs within a suite. We could either copy the heartbeat behavior or just have heartbeat run neither journey, instead reporting an error status for that journey until fixed.

I prefer this solution over the current one, Reporting errors on both journeys would make it obvious.

@andrewvc
Copy link
Contributor

Just to explain the current heartbeat behavior, the reason heartbeat behaves that way is partially in order to have a softer fail mode in the case of a race in terms of fleet provisioning, e.g. a situation where fleet or libbeat autodiscover replaces a monitor by double adding it then removing the old one, which I sort of suspect happened in the past but was hard to conclusively prove.

This constraint doesn't exist for suites, where such a thing is not possible. I propose that if we fix this, we fix this at the synthetics level. Basically, you can define a monitor twice, and the monitor 'runs', but it just returns an error "monitor defined more than once" as a result.

@andrewvc
Copy link
Contributor

A last point, it should still be discoverable if duplicated via '--dry-run', that way the error appears in the kibana UI.

@andrewvc
Copy link
Contributor

When running synthetics locally:

  1. We should report the status of the journey as being in error due to having a duplicate ID
  2. In the summary line at the end of the local run we should not report the journey as being up or down, but rather in a new category of 'invalid'
  3. For the JSON output we should similarly report this as invalid, we'll have to look at the current JSON schema and determine what makes the most sense here
  4. We'll need a follow-up heartbeat PR to handle the new JSON data
  5. These errors will only show up in the discovery phase of heartbeat running suites as an error there, with the monitor no longer being reported to ES.

@andrewvc
Copy link
Contributor

What we've agreed on in our meeting just now:

  1. Duplicate monitors will reported as errors in the suite discovery phase
  2. Suites tab will show these errors
  3. It would also log this error
  4. The duplicate monitors will not be run in any form
  5. This means that alerts will not fire since we will cease sending new data to the monitors

The error will be at the top level of the response sent from the discovery phase

Additional notes

  • During the reconciliation phase of monitor management duplicate monitor ID monitors will be soft deleted with a new field : "deleted_at: timestamp" that indicates they are deleted.

Issues to make:

  1. Heartbeat implementation
  2. Synthetics implementation
  3. Uptime Monitor management reconciliation
  4. Add details to errors in suites table issue
  5. Alerts for suite errors

@dominiqueclarke
Copy link
Contributor

Raised elastic/kibana#123969 and elastic/kibana#123928.

For 4, isn't that covered by: elastic/uptime#435

@vigneshshanmugam
Copy link
Member

Since we have moved away from the discovery phase approach, This would mean the Synthetics agent would skip pushing the monitor id that got duplicated.

  1. User would run npx @ealstic/synthetics push ./examples/
  2. Journeys that contains duplicate ids will not be uploaded to the Kibana
  3. Errors will be reported on the CLI for the failed journeys with duplication.

We can either go with the proposed approach or even early error and abort the push command if two journeys contains duplicate ids.

We don't need any guards on the Kibana/Heartbeat side and the entire logic sits just on the agent alone.

@paulb-elastic
Copy link
Contributor

Can this be closed now with push?

@vigneshshanmugam
Copy link
Member

I guess with push this shouldn't happen as Kibana would fail to save the SO incase there is a duplicate monitor. I would like to check with @dominiqueclarke. If its not implemented on the Kibana side, we could solve this for the time being on the Synthetics agent side before uploading the monitors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants