-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: unit of measurement / physical quantities #10349
Comments
see #2485 The tricky thing with this is how to actually propogate this meta-data. I think this could work if it was attached to the index itself (as an optional additional array of meta data). If this were achieved, then this should be straightforward to have operations work on it (though to be honest that is a bit out of scope for main pandas, perhaps a sub-class / other library would be better). |
Thanks for your comment. Maybe the proposed way with adding an attribute 'unit' to the columns is not the best way, but hopefully units are significantly less difficult than arbitrary metadata. I think pint (there are other modules, but I do not know them, sorry) is capable of taking care about the units itself (also throwing errors when misused), so this would not be a pandas issue. Here is a small snippet that demonstrates how a new unit could be created if two columns are multiplicated: #prototype column1, omitting the name and index
value1 = [1]
unit1 = 'meter'
# column1: representation of value and unit
column1 = value1 * units(unit1)
# column2: representation of value and unit
column2 = [2] * units('meter')
# creating a new column: column1 * column2
column12 = column1 * column2
print 'column12: {}'.format(column12)
# value could go to a new column of a DataFrame
print 'value of column12: {}'.format(column12.magnitude)
# str(column12.units) could serve as the unit-attribute for the new column
print 'unit of column12: {}'.format(column12.units) output: column12: [2] meter ** 2
value of column12: [2]
unit of column12: meter ** 2 |
@mdk73 as I said this could be done, but there are lots and lots of tests cases and behavior that are needed, e.g.
so what are
these may seem completely obvious, and for the most part they are, but you have to propogate things very carefully. As I said, this is a natural attachment for the The way to investigate is to add the property and write a suite of tests that ensure correct propogation on the |
Unit aware arrays are indeed be extremely valuable for some use cases, but it's difficult to see how they could be integrated into the core of pandas in a way that is agnostic about the particular implementation. We definitely do not want to duplicate the work of |
What about having a user define a dictionary containing any units she or he uses via
|
There are several existing approaches to units in Python -- notably pint and |
+1 on units, especially for plots with multiple axis: http://matplotlib.org/examples/axes_grid/demo_parasite_axes2.html |
Similar to #2494 |
👍 units awareness of the column as a string it is a good enough first step to be able to store the unit associated with the column. It will be nice I think the community will be not able to align on unit naming conversion this should be managed outside pandas, also the conversion factors can be managed outside the pandas. Unit name challenge: there are a lot of unit aliases and in some case conflicts. Unit name challenge: there are a lot of unit aliases and in some case conflicts. In my domain, UOM from Energistics (http://www.energistics.org/asset-data-management/unit-of-measure-standard) is covering most of my needs, but I agree for people that manage more digital storage units or date time units maybe this is out of scope. |
I think a very straightforward way of doing this (though will get a bit of flack from @shoyer, @wesm, @njsmith @teoliphant for not doing this in c :<) is to simply define an 'extension pandas dtype' along the lines of E.g. you would have a Then would need some modification to the ops routines to handle the interactions. |
Just to make things more explicit, this same discussion is happening at a pint's issue (that is actually referenced here). I think there should be an exchange of information from both sides to make robust solution and to avoid "reinventing the wheel", but IMHO the actual implementation should come from pint, with pandas only providing a good base for it (as some comments here have already said). |
@Bernhard10 any reason you choose not to use Pint or Quantities or another established, mature, tested, robust, popular units package? |
@Bernhard10 I think the additional About implementing it in Pint, unfortunately I'm not the man to do it (at least right now). I still have a lot to learn about Pint and I have some other urgent priorities to take care. @mikofski I guess Pint looks like a better candidate (at least for me) because it seems more intuitive and simpler. But I guess there would be no strong argument against using Quantities. I think the point of providing a general basis for the implementation in Pandas (such as the |
@tomchor, I do think a backend approach that allowed the units package to be swappable is the best approach. Also I agree, Pint is easier and more popular IMHO than Quantities right now, although before Pint, Quantities was definitely the most popular, and is still very good @Bernhard10, if you are implementing |
@mikofski I am currently testing with pint, but the idea of the |
684: Add pandas support r=hgrecco a=znicholls This pull request adds pandas support to pint (hence is related to #645, #401 and pandas-dev/pandas#10349). An example can be seen in `example-notebooks/basic-example.ipynb`. It's a little bit hacksih, feedback would be greatly appreciated by me and @andrewgsavage. One obvious example is that we have to run all the interface tests with `pytest` to fit with `pandas` test suite, which introduces a dependency for the CI and currently gives us this awkward testing setup (see the alterations we had to make to `testsuite`). This also means that our code coverage tests are fiddly too. If you'd like us to squash the commits, that can be done. If pint has a linter, it would be good to run that over this pull request too as we're a little bit all over the place re style. Things to discuss: - [x] general feedback and changes - [x] test setup, especially need for pytest for pandas tests and hackish way to get around automatic discovery - [x] squashing/rebasing - [x] linting/other code style (related to #664 and #628: we're happy with whatever, I've found using an automatic linter e.g. black and/or flake8 has made things much simpler in other projects) - [x] including notebooks in the repo (if we want to, I'm happy to put them under CI so we can make sure they run) - [x] setting up the docs correctly Co-authored-by: Zebedee Nicholls <[email protected]> Co-authored-by: andrewgsavage <[email protected]>
I think the issue of dealing with units in pandas can deal very well with time series and I am very thankful for that because it is a very useful feature. Dealing with units should be in my opinion another core feature of pandas (especially as part of NumFOCUS). Thanks in advance for your feedback. |
Hi there, would indeed be great to be able to store the measurement unit for the columns of DataFrame or as some kind of metadata. Given the breath of use of pandas, of which only a small subset cares deeply about units, I am of the opinion that unit conversion could be handled externally by packages like Is there a suggested way to simply store measurement units in a Pandas DataFrame column? |
I think I have an idea how to marry Extending a pandas DataFrame: class QDataFrame(pd.DataFrame):
# normal properties
_metadata = ["units"]
def __init__(self, *args, units=[], **kwargs):
super(QDataFrame, self).__init__(*args, **kwargs)
self.units = units
@property
def _constructor(self):
return QDataFrame Then an accessor like this for the series. @pd.api.extensions.register_series_accessor("quantity")
class QuantityAccessor():
def __init__(self, pandas_obj):
self._obj = pandas_obj
def to(self, unit):
return (self._obj.values * self._obj.unit).to(unit) I'm struggling how to now propagate the list of units to the single column via |
As an update on efforts with pint-pandas, we have hit the wall with #35131. The crux is how to handle this scalar/array ambiguity without potentially destroying performance elsewhere. The first attempt at a fix was #35127, that was superseded by #39790 but that went stale. The latest person to raise this (as far as I'm aware) is #43196. It seems this is still an issue, but not one which is high enough priority to get over the tricky hurdles required to allow things to move forward. |
Is there a way to make an extra "Dimension" of the Database (not just Row & Column)? So you could have "Unit" strings (or any other data), associated with each Column, in addition to the Column "name". I am thinking of a solution that may already exist within Pandas - and a documentation suggestion/example may show users a simple implementation. |
Looks like pint-pandas (https://github.com/hgrecco/pint-pandas) is a 3rd party library that provides this support which is mentioned in our ecosystem docs so closing |
quantities related
xref #2494
xref #1071
custom meta-data
xref #2485
It would be very convenient if unit support could be integrated into pandas.
Idea: pandas checks for the presence of a unit-attribute of columns and - if present - uses it
For my example I use the module pint and add an attribute 'unit' to columns (and a 'title'...).
Example:
The text was updated successfully, but these errors were encountered: