-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use value indexing by default #84
Comments
I've started hacking on this on a fork here. |
I'm coming to the conclusion that supporting both by-axis-value and by-axis-index within the Pandas used to support four different ways of indexing into their dataframes: If we decide to completely detangle these things we have two decisions to make:
|
I'm an user of AxisArrays and I like its approach a lot. On the other hand I'm just an average user with little knowledge of the inner functionality, so regarding the architectural decisions i fully trust the people like M. Bauman and Tim Holy. This post is only to show how I'm using them and why I find this package great. I'm almost always using the AxisArrays package in combination with Unitful or with Images. In both cases I use AxisArray as a way to annotate physical meaning of the Integer indexes and dimensions. The huge advantage for me is to use AxisArray for steering the dispatch into dedicated functions, and the dispatch is really pleasure to write thanks to rdeits suggestion in discourse with constructs like const QAxis{Name, Dim} = Axis{Name, AX} where AX <: AbstractArray{Q} where Q <: Dim
const TimeAxis = QAxis{S, Unitful.Time} where S;
const TimeArray{T, N, V} = AxisArray{T, N, V, <:Tuple{TimeAxis, Vararg{Axis}}};
const QAxisW{Name, Dim} = Axis{Name, AX} where AX <: Range{Q} where Q <: Dim
const TimeAxisW = QAxisW{S, Unitful.Time} where S;
const TimeWaveformArray{T, N, V} = AxisArray{T, N, V, <:Tuple{TimeAxisW, Vararg{Axis}}};
const TimeWaveform{T, V} = TimeWaveformArray{T, 1, V}
const EnergyAxis = QAxis{S, Unitful.Energy} where S;
const EnergySpectrumArray{T, N, V} = AxisArray{T, N, V, <:Tuple{EnergyAxis, Vararg{Axis}}}; and this definitions I use to define specialized functions - some of them expecting Axis of StepRange kind (FIR, image processing, ...), other can work with arbitrary axis values (after e.g removal of outliers) such as plotting. I know I can define my own type and do the dispatch on its type - but this way it is for free, all playing just well with the rest of Julia. A second usecase is when I use it instead of DataFrames - just that accessing any column of my measurement dataset is so simple, my main axis (time, or energy, or length) is still present and propagated - but I can pass it to any function expecting an AbstractArray and it just works without explicitly writing any conversions, any macro, just pure math-like syntax. So my opinion is:
My feeling is, the three characters But again, don;t put too much weight to my examples and suggestions. |
Since the upgrade-to-0.7 train is at full speed now and the exported name My proposal is to orthogonalize the concepts as much as possible, using the following new packages:
Either of the latter two can be combined with NamedAxisIndexing at the user's or package author's option. That way everyone gets to leverage common functionality without there being so much angst over what things should actually mean. We might need a small glue package (AnnotatedAxisArrays?) that defines trivial implementations of core trait functions that all of these packages can extend for their specific types. |
I'm not sure what the difference between |
The key point is the subtype relationship, |
It sounds like |
You have to make a decision about subtyping when you define the structure: struct NamedAxisArray{T,N,A<:AbstractArray,S<:NTuple{N,Symbol}} <: AbstractArray{T,N}
data::A
end
# Anything that isn't an Array but supports getindex/setindex!
struct NamedAxisDict{K,V,D,S<:Tuple{Vararg{Symbol}} <: AbstractDict{K,V}
data::D
end but the implementations of Like I said, if we were trait-based rather than inheritance-based then this wouldn't be an issue, since the |
Hmm, though I realized there's still nothing to prevent someone for asking for value-based lookup with a Overall I have the perception seems that existential angst about this package not quite being right is holding it back. Personally, I don't share this angst because I really do think this package should just be an axis metadata wrapper. |
In my mind AxisArrays has these basic levels of processing (let me know if I'm missing anything): A. Index into an AxisArray with integers/ranges: B. Index into an AxisArray with positional indexes: C. Index into an AxisArray with named indexes: I see B as handled with Does that sound right? |
I agree with most of that, but I'd say that C (named axes) is actually orthogonal to B (value-indexing), in the sense that you can use integer/range indexing with named axes. |
I agree, that makes sense |
Matt has pointed out that sometimes AxisArrays.to_index will produce AxisArray{T, N} indexes sometimes (where N>=2) as in the README example. Since the output dimensionality and axis names depend on the index dimensionality and type, I think this means that named indexing and at least the indexing-by-value interface need to live in the same package. |
Following the lead of a proposal I just made for Interpolations, perhaps we should make The advantage here is that AxisArrays could continue to be "normal" |
I find the function call interface to make much more sense for interpolations (where it feels like I'm actually executing a calculation to determine the value) vs. here (where it feels more like lookup). I know the distinction there is rather fuzzy, but I find it meaningful. I'd argue that the tension we feel here is because we're trying to be two data structures at once, both a [Sorted|Ordered]CartesianDict with named dimensions and an array with named and valued axes. While we could move the by-value indexing behind a function call interface, I'm more tempted to rip it out altogether and move it into a I don't think @andyferris follows this repository, so I'm pinging you because I know you've put in lots of effort on blurring the lines between dictionaries and arrays. I'd be interested in hearing your thoughts here. |
Thanks @mbauman for the link. Yes, this is very interesting. I like the way this mixes up arrays, offset arrays, generalizations of dictionaries and Cartesian indices all in the one place :). A few (some of them obvious) things:
I haven't had the time to experiment with any of this lately - but what I'd like to see is basically a dictionary that can be as fast as an array, and supporting multidimensional indices (via positional or named axes - I'm hoping that |
It's been almost 3 years since this first opened - would it be safe to say that indexing by integers will remain a special case for AxisArrays? |
In the single use case I have for
AxisArrays
right now, it would be very convenient if the default indexing of anAxisArray
used the values of the indices instead of the indices of the indices. E.g. if I doit would be handy if it returned the second and fifth element of the
AxisVector
. You can always refer to the indices of the indices by using theAxis
type. Could this work?The text was updated successfully, but these errors were encountered: