Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: multiple x axis to combine different time precisions #405

Closed
milahu opened this issue Dec 17, 2020 · 11 comments
Closed

feature: multiple x axis to combine different time precisions #405

milahu opened this issue Dec 17, 2020 · 11 comments
Labels
question Further information is requested

Comments

@milahu
Copy link

milahu commented Dec 17, 2020

assume we have multiple time series with different time precisions / resolutions / divisions:
some series are value per year, others are value per month, others are value per week

currently we must preprocess our data to match the highest value frequency (here: value per week)
and repeat values with lower frequencies (annual data: same value for all 52 weeks)
(please tell me im wrong ..)

here is a plot of annual and monthly values:

what is ugly here are the circles in the white annual line - they are too many

disabling the circles completely is a bad solution

possible workarounds:
use nulls/gaps to encode missing values
show only every N-th circle
(others?)
.. but these still require to merge x values into one axis

possible solution:
currently, data[0] holds the x values, and all other data[i] hold y values
we could introduce a mapping between arrays, mapping x values to y values
or more general, map input values to output values

the default mapping would be

datamap: [
  [0, 1], // f(d0) -> d1
  [0, 2], // f(d0) -> d2
  [0, 3], // f(d0) -> d3
  [0, 4], // f(d0) -> d4
  // ....
]

to combine annual and weekly x values, we could then use

datamap: [
  [0, 1], // f(d0) -> d1
  [2, 3], // f(d2) -> d3
]

or plot functions with multiple inputs

datamap: [
  [0, 1, 2], // f(d0, d1) -> d2
  [3, 4], // f(d3) -> d4
]

or we extend opt.series[i] like

series: [
  {
    label: "T year",
    axis: 0, // x axis
  },
  {
    label: "T month",
    axis: 0, // x axis
  },
  {
    label: "N year",
    axis: 1, // y axis
    input: 0,
    // this is an output/value series
    // with series 0 (T year) as single input/key
  },
  {
    label: "N month",
    axis: 1, // y axis
    input: 1,
    // this is an output/value series
    // with series 1 (T month) as single input/key
  },
]

in the future we might need 3D plotting and MISO functions (multi input, single output)
(not sure if MIMO makes much sense)

@leeoniya please share your thoughts so i can make a better PR : )

@leeoniya
Copy link
Owner

leeoniya commented Dec 17, 2020

and repeat values with lower frequencies (annual data: same value for all 52 weeks)
(please tell me im wrong ..)

you're wrong :)

you need to fill them with null and set spanGaps: true for those series.

there is now a utility function that can do this for you called uPlot.join(). see how it's used in https://github.com/leeoniya/uPlot/blob/master/demos/path-gap-clip.html#L120

@milahu
Copy link
Author

milahu commented Dec 17, 2020

as i said ..

possible workarounds:
use nulls/gaps to encode missing values
show only every N-th circle
(others?)
.. but these still require to merge x values into one axis

this still *feels* like a non-ideal solution
(or is my optimization premature?)

the feature would allow a space-time tradeoff
and as side-effect, allow to plot MISO functions f(d1, d2) -> d3
and: easily plot data with different x ranges

there is now a utility function that can do this for you

sweet, but on runtime, i want to do as little work as possible
all my data is precompiled/cached to an optimal format
so i can dynamically add/remove data with little cost

one problem: we get more snap points
choice: snap to nearest value (multiple x) or show average value (one x)

@leeoniya
Copy link
Owner

the complexity of doing anything else will be significant. the overhead does become significant for aligning many completely unaligned datasets that are several thousand points each, but in what i think are typical cases, i've tried to make the join function as efficient as possible.

i have an unlisted synthetic demo that allows you to assess the alignment cost here: https://github.com/leeoniya/uPlot/blob/master/demos/align-data.html. i don't expect real-world cases to be random()-levels of unaligned, so it's a good stress test.

i'd be interested to see your actual datasets and how much it costs to align them.

@milahu
Copy link
Author

milahu commented Dec 17, 2020

the complexity of doing anything else will be significant.

im happy to help .. or do you mean SIGNIFICANT?

i'd be interested to see your actual datasets and how much it costs to align them.

simple, im plotting global population data over the last 500 years (or more)
where datasets have different lengths and resolutions
workaround: use "now" as zero index and count back
(i assume plotting stops at x == data[s].length)

edit: this would make #107 easier to solve = plot SIMO fn f(d1) -> (d2, d3, d4)
naah, only useful for MISO fns

@leeoniya
Copy link
Owner

leeoniya commented Dec 17, 2020

im happy to help .. or do you mean SIGNIFICANT?

you're welcome to help. but, yes, i think it will be very substantial as the underlying data format assumptions permeate many parts of the internals. it's not gonna be as simple as tweaking just the pathbuilder, for example. the probability of not breaking many things to get this done is basically zero, imo.

the ultimate question is, what gains do you expect in non-artificial cases, and can you prove that this will work robustly and generally. if someone tells me that they need better perf on a 10M pts scatter dataset, the simple answer is that this is not the right library to use - at some point, this just becomes true. so, it's important to evaluate real use-cases, costs and possible gains from this effort.

simple, im plotting global population data over the last 500 years (or more)

this is an arbitrary number. is that 500 datapoints? 500 * 52 datapoints? 500 * 365 * 24 * 3600 datapoints? as i said, i would be interested to see how uPlot.join() performs on your dataset, and its details.

edit: this would make #107 easier to solve

scatter/bubble is much easier (not easy) to solve with a different underlying data structure (as described there), which is the plan. if you'd like to help with that, i think it will be a much better use of time :) i don't plan to work on that for a few more months due to other work, so cannot promise a timely PR review either, unfortunately.

@leeoniya
Copy link
Owner

gonna close this since i don't think there is anything actionable here. feel free to follow up in this thread if you have perf issues with a specific use-case/dataset.

@leeoniya leeoniya added the question Further information is requested label Dec 21, 2020
@graphefruit
Copy link

Hey @leeoniya,
Sorry to push this topic again up.
I came here explicit for this.

My use case:
I develop a coffee app (https://github.com/graphefruit/Beanconqueror) which connects to different bluetooth devices.
Each bluetooth devices sends in his own timestamps his data.
A bluetooth scale e.g. 10 values per second, a pressure sensor 30 values per second, a temperature sensor maybe 5 values.
So-> This differs into different unixtimestamps, but all shall be displayed in the same axis.

Actually I use Plotly but if possible I'd like to switch cause of issues, but this is holding me back - was there a take on in the last years and I just didn't found it?

Or do I need to update the old datas which are already plotted, to insert the data?

Thanks so far & Have a great cup of coffee
Lars

@milahu
Copy link
Author

milahu commented May 4, 2024

do I need to update the old datas which are already plotted, to insert the data?

you will have to preprocess your data, so all graphs have the same time resolution, i your case 30Hz

A bluetooth scale e.g. 10 values per second, a pressure sensor 30 values per second, a temperature sensor maybe 5 values.

PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP @ 30Hz
S  S  S  S  S  S  S  S  S  S  S  S  S  S  S  S  S @ 10Hz
T     T     T     T     T     T     T     T     T @  5Hz

to get 30Hz, repeat all S values 3 times, all T values 6 times

@leeoniya
Copy link
Owner

leeoniya commented May 4, 2024

you can keep a separate data buffer for each device and use uPlot.join() just prior to calling u.setData(joinedBuffers)

@graphefruit
Copy link

Thanks for the fast responses!

you can keep a separate data buffer for each device and use uPlot.join() just prior to calling u.setData(joinedBuffers)

Is there any sample I can quickly have a look at?

@leeoniya
Copy link
Owner

leeoniya commented May 4, 2024

you can search the demos folder here for "uPlot.join`.

e.g. https://github.com/leeoniya/uPlot/blob/master/demos/nearest-non-null.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants