Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RUM] Keep unsampled transactions for the rum intake #6260

Closed
vigneshshanmugam opened this issue Sep 29, 2021 · 19 comments · Fixed by #6669
Closed

[RUM] Keep unsampled transactions for the rum intake #6260

vigneshshanmugam opened this issue Sep 29, 2021 · 19 comments · Fixed by #6669
Assignees
Milestone

Comments

@vigneshshanmugam
Copy link
Member

vigneshshanmugam commented Sep 29, 2021

APM server is introducing a breaking change in 8.0 - #5997

As the User Experience Dashboard and Exploratory UI is built based on transaction events and not taking advantage of the transaction metrics documents at the moment, We(synthetics team) still need a way to keep the un-sampled transaction for all RUM events and also probably for agents(Ex- mobile agents iOS) by default that are part of the UX and Exploratory view app.

Discussion from slack thread

Vignesh Shanmugam  1 day ago
As Andrew is on vacation. I am delegating this question to @silvia Is there a way to exclude RUM events from getting dropped as part of the metrics initiative?

Silvia Mitter  1 day ago
not directly; should it be configurable, or never be applied to RUM events?

Silvia Mitter  1 day ago
Dropping samples is done in a dedicated processor; excluding all events from a certain agent seems doable. @v can you open an issue with the description whether this should always be excluded or be configurable?

Silvia Mitter  1 day ago
@bryce.buchanan is dropping unsampled events also a concern for iOS?

Vignesh Shanmugam  1 day ago
Sure let me bring it up on our refinement call and will do post that. Thanks for double checking @silvia

Bryce Buchanan:flag-us:  1 day ago
we don’t have any customized UI for the iOS agent at this point, so I think whatever implications this has for the standard apm agents apply to the iOS agent. I’m not familiar with what the ‘UX app’ is, however.

Note: Mobile agents is just an example, this issue is more focussed towards RUM User Experience app at the moment

Open questions (on the UI side)

  • Would we still be able to link between APM UI and UX app in terms of transactions and errors

@paulb-elastic @drewpost

@simitt
Copy link
Contributor

simitt commented Oct 1, 2021

This is a requirement for #5997 to not break the UX for RUM.

Indexing RUM events should be fairly straight forward, as we can make an exception from dropping them based on the app name.

@vigneshshanmugam, can you please clarify what exactly you mean by

We(synthetics team) still need a way to keep the un-sampled transaction for all RUM events and also probably for agents(Ex- mobile agents iOS) by default that are part of the UX and Exploratory view app.

@vigneshshanmugam
Copy link
Member Author

Indexing RUM events should be fairly straight forward, as we can make an exception from dropping them based on the app name.

👍🏽

@vigneshshanmugam, can you please clarify what exactly you mean by

The current iOS agent is being used in the Exploratory view UI which again is based on un-sampled transaction events. I am delegating this to @paulb-elastic and @drewpost

Do we want to make an exception for iOS agent for to keep it working on the Exploratory view?

@axw
Copy link
Member

axw commented Oct 4, 2021

@vigneshshanmugam we discussed this quite a while ago, and at that time I thought we concluded that the RUM agent would just not set the sampled property. Do you recall that discussion? However, thinking about it some more I'm not sure if that's a good idea. The UI uses the property to search for complete trace samples, and if the agent doesn't set sampled the server assumes it is sampled.

I think it makes sense to also capture unsampled transactions for iOS. If we are going to make it configurable, I think it should probably be in the agent. i.e. it can choose to send unsampled transactions if it wants to. Perhaps we should rethink #5997, and update the agents to stop sending unsampled transactions rather than adding config to the server.

@vigneshshanmugam
Copy link
Member Author

@vigneshshanmugam we discussed this quite a while ago, and at that time I thought we concluded that the RUM agent would just not set the sampled property

I did remember we had some initial discussions around this, but forgot about the outcome. Thanks for bringing it up Andrew.

I think it makes sense to also capture unsampled transactions for iOS. If we are going to make it configurable, I think it should probably be in the agent. i.e. it can choose to send unsampled transactions if it wants to.

++ on keeping the decision on the Agent side instead of the server as we can allow users to control the sampling based on tranasctionSampleRate and other means. I am not sure about the impact it brings to the Metrics powered UI, but feels like a simpler solution than excluding on the server side.

@felixbarny
Copy link
Member

To me, it also seems like a simpler mental modal for users if the config is on the agent side.
However, the downside is that it's harder to centrally ensure to only store sampled transactions. We should add that option to agent central config. But still, all agents need to be updated which gets more difficult with the number of distinct services.

Overall, I think it's a tradeoff we can live with. As this issue shows, "centrally ensure to only store sampled transactions" can have unintended side effects. Maintaining a list of agents where unsampled transactions should or should not be dropped doesn't seem very desirable, too.

@simitt
Copy link
Contributor

simitt commented Oct 5, 2021

Somewhat related, we are discussing having different ILM default policies, with different retention periods for traces-apm.*, logs-apm.* and metrics-apm.* (#6223). The assumption for the default delete phases is that the UI will by default be powered by metrics, and traces and logs lose value really fast.
If RUM agents do not support metrics based UI, we also need to discuss keeping these traces around for longer, which would potentially require us to having dedicated data streams. If that's what we end up with, I believe we could also keep the configs server side, as there is some dependency and implicit knowledge involved anyways.

@zube zube bot added this to the 8.0 milestone Oct 6, 2021
@zube zube bot added the [zube]: Inbox label Oct 6, 2021
@drewpost
Copy link

@vigneshshanmugam - Please keep RUM based on upsampled transactions

@bryce-b @graphaelli @alex-fedotyev - I would suggest keeping iOS based on the upsampled transactions for exploratory, as well. Especially as it's the main UI for that at the moment. WDYT?

To be clear, I don't see being in the Exploratory View as the leading requirement for being upsampled, it should be able to deal with whatever fidelity the data is at although I appreciate that this may require further discussion in terms of implementation.

@felixbarny
Copy link
Member

the User Experience Dashboard and Exploratory UI is built based on transaction events and not taking advantage of the transaction metrics documents at the moment

Do the unsampled transaction documents contain information about the OS, device type, and location? If not, are you upscaling the metrics in the exploratory view based on sampling rate? If yes, then there's not a big difference in terms of size of sampled vs unsampled transaction documents, right?

The current iOS agent is being used in the Exploratory view UI which again is based on un-sampled transaction events.

The iOS agent does not send unsampled transaction documents. It's based on the OTel SDK which only sends spans if they are sampled. Implementing something like unsampled documents on top of the OTel SDK could be tricky. And probably also not really desirable.

We have heard numerous times from our user base that the unsampled transaction documents add a lot of storage overhead which increases TCO. The idea behind sampling is that a smaller sub-set of the overall population should yield statistically meaningful results for big enough cohorts.

What I'm trying to say is, could the metrics explorer use the sample_rate property on transaction documents to upscale the results?

@axw
Copy link
Member

axw commented Nov 1, 2021

Do the unsampled transaction documents contain information about the OS, device type, and location

For RUM they do.

@axw
Copy link
Member

axw commented Nov 1, 2021

I've written up a proposal here: https://docs.google.com/document/d/17wNGGh3u53tQsl938FZmQBa-If8TFKEGyto-Jvjzobk/edit?usp=sharing

@drewpost @vigneshshanmugam a couple of questions:

  • Would you be able to link some reference materials on why we should not rely on sampling and pre-aggregated transaction metric for User Experience?
  • Is it a problem for the User Experience and Exploratory views that iOS does not send unsampled transactions? I don't know how much control we have over this.

I think it's worth noting that even if we do store unsampled transactions for RUM, we can still introduce configuration later for users that don't want that. Maybe some users don't care about the UX/Exploratory views at all and only want to look at the APM UI; or maybe they would be happy with using the UX/Exploratory views and only visualising a % of the transactions if it means they have much lower storage cost.

@vigneshshanmugam
Copy link
Member Author

think it's worth noting that even if we do store unsampled transactions for RUM, we can still introduce configuration later for users that don't want that.

++

Is it a problem for the User Experience and Exploratory views that iOS does not send unsampled transactions? I don't know how much control we have over this.

@drewpost Who would be the better person to answer this question. I don't know much about how the Exploratory view UI works. @dominiqueclarke @shahzad31 Could you help here.

@simitt simitt added the blocked label Nov 10, 2021
@shahzad31
Copy link

I think from User experience app and exploratory point of view, we would like to move to a more performant solution like transaction metrics, querying directly against raw transactions is costly and slow.

But it does provides the extra flexibility in terms of filtering/breakdown. In any case, we first need to audit if we can use transaction metrics in the UX app and Exploratory view , until then we will have to live with raw transactions.

And it is a signifiant work, first making sure, it will fulfil our use case, and then doing the actual UI work, which it seems like wont be possible in 8.0, i will leave it up to @paulb-elastic on where we should put it on our roadmap.

Until then i think we will have to continue using transactions in user experience app and exploratory view.

@felixbarny
Copy link
Member

I think the UI doesn't have to be solely based on metrics. Use metrics for the default views and if the dimensions you need are available in metrics. Otherwise, you could fall back to sampled transaction documents and use sample_rate to compensate for the transactions that are dropped due to sampling.

@sorenlouv sorenlouv added the impact:low Long-term priority, unless it's a quick fix. label Nov 15, 2021
@sorenlouv
Copy link
Member

sorenlouv commented Nov 15, 2021

For RUM they do.

@axw Would this also be the case for transaction metrics for RUM? Would they contain these extra dimensions? That would make it easier for UX to transition to transaction metrics (although if it defeats the purpose of metrics I agree we shouldn't do it)

@sorenlouv sorenlouv removed the impact:low Long-term priority, unless it's a quick fix. label Nov 15, 2021
@axw
Copy link
Member

axw commented Nov 16, 2021

@sqren no, unfortunately the transaction metrics do not contain the extra dimensions. We rely on ingest node doing that enrichment, and the raw information (naturally - it's high cardinality) isn't included in metrics.

In the past we planned to do User-Agent parsing and geoIP enrichment in apm-server for this purpose. That would enable us to include, for example, country name in the dimensions. Whether it's practical depends on how fine-grained we want to go.

As Felix says above, if we can rely on metrics then I also think we can rely on sampled transactions (if we record the sample rate/representative count).

@paulb-elastic
Copy link

@drewpost can you take a look and comment on #6260 (comment) please?

@drewpost
Copy link

I've written up a proposal here: https://docs.google.com/document/d/17wNGGh3u53tQsl938FZmQBa-If8TFKEGyto-Jvjzobk/edit?usp=sharing

@drewpost @vigneshshanmugam a couple of questions:

  • Would you be able to link some reference materials on why we should not rely on sampling and pre-aggregated transaction metric for User Experience?
  • Is it a problem for the User Experience and Exploratory views that iOS does not send unsampled transactions? I don't know how much control we have over this.

I think it's worth noting that even if we do store unsampled transactions for RUM, we can still introduce configuration later for users that don't want that. Maybe some users don't care about the UX/Exploratory views at all and only want to look at the APM UI; or maybe they would be happy with using the UX/Exploratory views and only visualising a % of the transactions if it means they have much lower storage cost.

Hi @axw re sampling and pre-aggregations: To be clear, there's some nuance here. I'm not against sampling as a concept, it's just that the tools we have to do it are very blunt. We also don't have the ability to do it at the session level nor to ensure we get a statistically relevant volume of impressions to power things like core web vitals etc. Conceptually it also starts to bias the data against lesser represented user populations.

That's the big problem right now with moving to upsampled.

Now, given the adoption of our RUM for these types of use cases, I can't justifiably stand in the way of this decision from APM. If you could make it an option/feature flag for us to work with an enable to facilitate a graceful transition to what a DEM based RUM solution will look like, it would be very helpful.

@paulb-elastic
Copy link

@axw is there anything else you need from us for this? Is it still possible to add an exception and not drop unsampled transactions for RUM data, for 8.0?

@axw
Copy link
Member

axw commented Nov 18, 2021

Thank you @drewpost and @paulb-elastic. My takeaways are:

  • User Experience would still ideally use all RUM transactions events until we have better sampling mechanisms (Drew's comment)
  • There is no time to make significant UI changes for 8.0 (Shahzad's comment)
  • Pre-aggregated transaction metrics don't have all dimensions necessary to power the User Experience app anyway (Søren's comment)

So for 8.0, I don't think we have any option other than keeping unsampled transaction events for RUM.

Later on, we should come back to this and consider what Felix suggested: record the sample rate/representative count in transaction event documents, and extrapolate metrics on the UI side when unsampled transactions are not stored. That would allow users to choose the tradeoff:

  • either by default or by opting in, store all events for more flexible and precise aggregations, i.e. Drew's "option/feature flag for us to work with an enable to facilitate a graceful transition to what a DEM based RUM solution will look like"
  • either by default or by opting in, store only sampled events when they are primarily interested in APM, or otherwise do not require fine-grained aggregations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants