Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make .data() binding work with iterable protocol #321

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bsidhom
Copy link

@bsidhom bsidhom commented Dec 18, 2024

The current data binding implementation relies on random array-style access, but only ever accesses data sequentially. Some data structures (such as linked lists) cannot easily expose random access. Moreover, since this specifically uses bracket indexing, it only accepts Arrays or first-class array-like objects. Making a custom data structure indexable this way requires either an expensive Proxy (indexes have to be round-tripped from integers to strings on every access) or else requires the custom class to add explicit integer properties for each contained item; this adds memory bloat linear in the size of the data structure itself and also requires unnecessary bookkeeping.

If random access were required, it might be reasonable to require some cheaply-implemented but customizable accessor method (see, for example Array.at()). However, since this is not the case, it makes more sense to instead only require that input data types be iterable. This opens up efficient implementations for structures such as singly-linked lists and other purely applicative structures. Backward compatibility can be maintained by adding a cheap wrapper to implement iteration in terms of sequential indexing.

The current data binding implementation relies on random array-style
access, but only ever accesses data sequentially. Some data structures
(such as linked lists) cannot easily expose random access. Moreover,
since this specifically uses bracket indexing, it only accepts `Array`s
or first-class array-like objects. Making a custom data structure
indexable this way requires either an expensive `Proxy` (indexes have to
be round-tripped from integers to strings on every access) or else
requires the custom class to add explicit integer properties for each
contained item; this adds memory bloat linear in the size of the data
structure itself and also requires unnecessary bookkeeping.

If random access were required, it might be reasonable to require some
cheaply-implemented but customizable accessor method (see, for example
Array.at()). However, since this is not the case, it makes more sense to
instead only require that input data types be iterable. This opens up
efficient implementations for structures such as singly-linked lists and
other purely applicative structures. Backward compatibility can be
maintained by adding a cheap wrapper to implement iteration in terms of
sequential indexing.
@bsidhom
Copy link
Author

bsidhom commented Dec 18, 2024

Note that this is just a sketch of the suggested functionality. I have not added the shim required to make this backward compatible with code that implements random bracketed access but is not iterable.

There's another question as to whether .data() should also require a .length method or whether it's sufficient to compute that dynamically during iteration. As far as I can tell, this is essentially used as a proxy to infer whether we're dealing with an array-like object. Ideally, this method would only check for the Symbol.iterator property and use that for iteration; sadly, that breaks compatibility.

@curran
Copy link
Contributor

curran commented Dec 18, 2024

I'm curious, what limitation of Arrays are you facing?

Is there something wrong with converting the data to an Array first?

@bsidhom
Copy link
Author

bsidhom commented Dec 28, 2024

The main issue is just the expense of repeatedly converting to arrays at each render step. I'm working on some visualizations either using fully persistent data structures (which heavily use linking and structural sharing) or of those same data structures. It turns out that visualizing the internals of the data structures themselves is fine because any given element should only have $O(1)$ links. However, it becomes quite expensive to convert the entire data structure to a temporary array every render/update. That work is then thrown out because the temporary array isn't even retained, but discarded once the underlying enter/update/exit arrays, etc. are created.

On the other hand, I'm starting to realize that using the d3 join verbs is not necessarily the best way to spell this given that I want to use incremental updates with structural updates. It might make sense to use d3 for display only and roll my own diff/incremental update operation that aligns more closely with my data structures.

In any case, switching data binding to use iteration rather than random access in the interface more closely aligns with what d3 is already doing under the hood, requires a weaker contract from callers, and makes it easier to adapt for custom data structures, with minimal computational and storage overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants