Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for streaming paged data and dynamic chart updating #326

Open
rivadblins opened this issue Nov 21, 2024 · 8 comments
Open

Support for streaming paged data and dynamic chart updating #326

rivadblins opened this issue Nov 21, 2024 · 8 comments
Labels
🧰 feature-request New feature or request

Comments

@rivadblins
Copy link

Is your feature request related to a problem? Please describe.
The dashboard currently loads each chart in a single batch, waiting for all the result to return when using a REST endpoint.

Describe the solution you'd like
The dashboard supports streaming paged data and loads immediately, updating the charts as more data is returned similar to Splunk, or streaming/paging support for the MongoDb datasource. This should not result in the charts entering a spinning loader state as we want the customers to be able to watch as the data streams in rather than waiting for the spinner to finish.

Describe alternatives you've considered
Right now we return the data as a single batch, when asked on Discord the consensus among your developers was that it cannot be done currently.

Additional context
N/A

@rivadblins rivadblins added the 🧰 feature-request New feature or request label Nov 21, 2024
@MLopezIG
Copy link

Hi! I'm curious about your scenario. For one of your charts:

  • Have you identified how long an initial 'page' of data takes to be generated?
  • how long to generate all the data?
  • How many data points?

Thanks in advance!

@rivadblins
Copy link
Author

Have you identified how long an initial 'page' of data takes to be generated?

From performance estimates Mongo loads very linearly, within a few seconds we can process 10k documents on average.

how long to generate all the data?

For a 5.6 million document collection it takes 5 minutes on average to process.

How many data points?

We were testing with 5,6 million document collection for an average small customer of ours, we also tested 56 million. Our largest customers that want this dashboard feature would dwarf this number.

@MLopezIG
Copy link

Have you identified how long an initial 'page' of data takes to be generated?

From performance estimates Mongo loads very linearly, within a few seconds we can process 10k documents on average.

how long to generate all the data?

For a 5.6 million document collection it takes 5 minutes on average to process.

How many data points?

We were testing with 5,6 million document collection for an average small customer of ours, we also tested 56 million. Our largest customers that want this dashboard feature would dwarf this number.

I'm guessing this number of data points correspond to raw data, but on the chart itself, the number of data points will be a completely different, much smaller number, right?

Having people wait 5 minutes to the charts to completely show what they're supposed to show doesn't look like a great user experience. What if you make your users to leverage the dashboard filter to 'page' through days/weeks of data?

I might not be understanding you scenario correctly, maybe having an idea of how your dashboard look like might help here.

@rivadblins
Copy link
Author

I'm not sure I follow what you are asking, I can give a simplified breakdown.

In collection A we have log lines ingested from our service. These logs correspond to a users "Sync" AKA data going from their email system into their CRM system. 1 log is generated per user every 5 minutes on average Monday to Friday. You can see how this can grow very quickly the further back in time we go.

If we point Reveal at this collection we can see all the fields and build the aggregation as we see fit, but when it tries to retrieve the data and build the visualization it hangs for 5 minutes. This is because that 1 user that is syncing is not the same one that will be looking at the dashboard. The administrator of that "Sync" is and so the data in collection A is aggregated for every single user that is "Syncing", not just a singular user. This is so that the admin can verify the health of their "Sync", ie: How many are syncing, how long do they wait to sync, how long does the sync take, how many creates/updates/deletes are there across every object type ie contact/appointments/emails etc...

The paging I am requesting is for this kind of solution:

  1. Admin loads Dashboard
  2. Visualizations request page 1 of their data
  3. Visualizations get page 1 and display that data immediately, updating the charts.
  4. Visualizations then request page 2
  5. Visualization gets page 2 and updates the visualizations with the additional data, not replacing the page 1 data but adding onto it, with no loading bar/spinner.

The data would be returned from most recent, to least recent. A time based chart, a line chart for example, would then grow to show the data that is further back in time as it gets that data from the endpoint.

This way the user can see the most recent up to date view of their data and start analyzing it while more is loaded. Essentially we want to hide the fact that we are loading more data behind the scenes by displaying it as it comes in.

If you are familiar with Splunk, we use that internally and it has this functionality. When a query is run, the visualizations immediately show the data that has been processed but will continually update themselves, without a loading spinner, until all the data is returned.

This is a complicated ask and I realize that, but this is another middle ground for issue #325 that while related, is still a feature we would like even if #325 gets resolved. Hence the second ticket.

@MLopezIG
Copy link

MLopezIG commented Dec 4, 2024

Thanks for the detailed explanation. If I'm following you correctly, the chart, once it displays all data, might not be showing more than e.g. 1 value for each day (as an example)... the need for paging comes from the fact that producing the chart's data is resource-intensive, not that there are many values that the chart needs to show. Is that correct?

@rivadblins
Copy link
Author

@MLopezIG I'm not sure I follow what you are saying. Once every page has been retrieved the charts will no longer update. There are 108 per day per user on average, however the dashboard is aggregating across all users because a single user doesn't see their data. This dashboard is an administration tool for someone to look at 1000+ users data. In a 1000 user sync (the customer requresting this feature has many many more than that) we expect 108,000 Mongo documents per day that need to get aggregated. I hope this helps clarify, apologies for the late reply.

@MLopezIG
Copy link

Hi @rivadblins, what I'm trying to ask is how many data point will there be AFTER doing the aggregations. I'm interested in that to know whether the particular chart(s) also need to handle a really big amount of data (charts show data AFTER aggregation).

From what I've understood so far, the answer to my question is that no, there is not a very big amount of data after aggregation for the data charts to show. On the other hand, the amount of data to aggregate is really big.

@rivadblins
Copy link
Author

I see what you are saying, the data after aggregation will be condensed down to 1 document per day regardless of the number of users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧰 feature-request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants