-
Notifications
You must be signed in to change notification settings - Fork 9
Broadcasting ? #24
Comments
|
No, that does not work and it generates this: ValueError: array 'tt' has len 1 but other arrays have len 100
|
By the way, hist.fill(x=xarray, condition=["test"]*len(xarray)) seems to be very inefficient. It severely slows things down. |
The
I put the single string in quotations in the wrong place. It's supposed to be a constant expression. Also, because of #23, I might need to fix constant expressions.
|
No, that is not the use case I meant. I meant this:
Real life example: I want to plot some distribution for a number of datasets. So I call fill() some number of times for dataset "a", then for dataset "b" and so on, and then I want to see stacked histogram with all the datasets together. So I am not grouping by a constant. It is constant only for a single call to fill(). Next call to fill() will use another value. |
I figured that out in the other issue. You want to use the |
I appreciate your help, Jim. Sorry you have to do this in such inhumane conditions :) Maybe grouping would work too, but I got it figured out, implementing the "broadcasting" myself for now. Here is the fragment of real code which works:
|
I don't see the One of the complaints about Histogrammar was that everything that appeared in a final plot had to come from a single dataset, but it's common to combine data from different Monte Carlo and data samples. It seemed unnatural to have to create new fields in the data that were piecewise constant to emulate what the physicists had in mind: bringing together data from different places. So histbook has methods for adding information (like "group") in addition to removing information (like "select", "rebin", and "project"). |
Yes, the actual filling happens remotely and once in a while the Histbook histogram gets picked and shipped back and then cleared. Here is the implementation of the broadcasting at the remote:
|
…adcasted to the lengths of the other arrays
|
Numerical values also get broadcasted:
|
I am trying to use broadcasting with groupby() and a string as the groupby axis value, and it is not working:
this fragment gives me a warning and an error:
this works fine:
my version is 1.2.0 |
Hmmm. >>> from histbook import *
>>> import numpy as np
>>> h = Hist(bin("NJets", 10, 0, 10), groupby("dataset"))
>>> h.fill(dataset="dataset1", NJets=np.arange(100))
>>> h.pandas()
count() err(count())
dataset NJets
dataset1 [-inf, 0.0) 0.0 0.000000
[0.0, 1.0) 1.0 1.000000
[1.0, 2.0) 1.0 1.000000
[2.0, 3.0) 1.0 1.000000
[3.0, 4.0) 1.0 1.000000
[4.0, 5.0) 1.0 1.000000
[5.0, 6.0) 1.0 1.000000
[6.0, 7.0) 1.0 1.000000
[7.0, 8.0) 1.0 1.000000
[8.0, 9.0) 1.0 1.000000
[9.0, 10.0) 1.0 1.000000
[10.0, inf) 90.0 9.486833
{NaN} 0.0 0.000000 My version is 1.2.0 also. What's your Numpy version? >>> histbook.__version__
'1.2.0'
>>> np.__version__
'1.14.4' This might imply a higher minimum Numpy version: right now I say |
|
I found Numpy 1.11.3 on lxplus and tried it there. Indeed: >>> numpy.full(10, "hello")
/afs/cern.ch/user/p/pivarski/miniconda3/lib/python3.5/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full(10, 'hello') will return an array of dtype('<U5')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/afs/cern.ch/user/p/pivarski/miniconda3/lib/python3.5/site-packages/numpy/core/numeric.py", line 302, in full
multiarray.copyto(a, fill_value, casting='unsafe')
ValueError: could not convert string to float: 'hello' I'll keep the Numpy requirement minimum where it is and put a work-around in this bit of code. I think this just affects everywhere |
I upgraded numpy to 1.14.5 and now the error is different:
Sorry. my bad. Call to h.fill() is wrong. It works fine with numpy 1.14.5 |
I just patched master so that it works as shown in my comment in Numpy 1.14.4 and 1.11.3. But your problem is that you're missing a keyword argument: h.fill("dataset1", NJets=np.arange(100)) instead of h.fill(dataset="dataset1", NJets=np.arange(100)) It's trying to interpret the positional (non-keyword) argument as a dict of arrays, a Pandas DataFrame, or a Spark DataFrame. I wanted to be loose about it so that anything with df["column"] → array would work in that slot (i.e. duck typing). But if the "df" is a string, it applies square-bracket-string-close-square-bracket to what is actually a string, hence "string indices must be integers, not str." |
Imagine I have a histogram like this:
Hist(bin("x", 100, 0.0, 1.0), groupby("condition"))
and I have an array of x collected under condition="test". So I want to put these data into the histogram. To do that, I need to do something like this:
hist.fill(x=xarray, condition=["test"]*len(xarray))
I think it would be useful to adopt a simple "broadcasting" rule to allow this:
hist.fill(x=xarray, condition="test")
The text was updated successfully, but these errors were encountered: