-
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solving crop_by_coord problem when data are rotated #113
Conversation
ndcube/ndcube.py
Outdated
@@ -502,7 +537,7 @@ class NDCubeOrdered(NDCube): | |||
for standard deviation or "var" for variance. A metaclass defining | |||
such an interface is NDUncertainty - but isn’t mandatory. If the uncertainty | |||
has no such attribute the uncertainty is stored as UnknownUncertainty. | |||
Defaults to None. | |||
Defaults to None.list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mistake removed from the file
ndcube/ndcube.py
Outdated
@@ -85,22 +87,30 @@ def world_axis_physical_types(self): | |||
pass | |||
|
|||
@abc.abstractmethod | |||
def crop_by_coords(self, min_coord_values, interval_widths): | |||
def crop_by_coords(self, lower_corner, interval_widths=None, upper_corner=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An API point here, I think it would be worth moving interval_widths
into **kwargs
so that it is no longer part of the function signature. We can still support it by doing a kwargs.pop('interval_widths', None)
as the ~first line of the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @DanRyanIrish told me to do it like that because some people will use this version as the previous : by inputting corp_by_coords( (low1, low2), (inter1, inter2) )
.
If I move interval_widths
into **kwargs
, I am afraid that people use upper_corner
instead of interval_widths
and the result will not be correct.
I don't know if you saw it but I add a warning about using interval_widths
where I ask the user to use upper_corner
.
Of course I am open to every modification and if you want I change it, I'll do it !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, you are correct. never mind then.
Since we are contemplating an API change, can we perhaps consider moving the units into a separate tuple on things like
Why not have the units on a separate argument, e.g.:
This seems much more readable. If |
@tiagopereira your first example is a little more verbose than it needs be the following would also work: >>> import astropy.units as u
>>> my_cube_roi = my_cube.crop_by_coords(
... [*(0.7, 1.3e-5)*u.deg, 1.04e-9*u.m],
... [*(0.6, 1.)*u.deg, 0.08e-9*u.m]) I am however not adverse to a (sorry for accidentally posting and closing the issue lol) |
I believe the current API is better for philosophical reasons, namely that all values of a physical quantity should be represented with units attached. I believe it is also more useful in many cases. However, in the case where the user is entering these values manually on the command line, I agree it does require duplication of the unit definition in the corner arguments. So I am open to discussing whether we support this additional API simultaneous with the current one. My main concern about this API from a coding point of view, is that it introduces a potentially unnecessary for loop at the start of the method where the I agree with @Cadair that it should not have a default unit, but rather require explicit definition. The default would be derived from the WCS units, which are in SI. These are not the typical units used in solar physics so the majority of solar users will have to explicitly give the units anyway. It also leaves open the chance that users will be confused and frustrated when they enter values in arcsec, nm or erg without defining the units and the cropping happens in deg, m, or J. |
raise ValueError("lower_corner and interval_widths must have " | ||
"same number of elements as number of data " | ||
"dimensions.") | ||
upper_corner = [lower_corner[i] + interval_widths[i] for i in range(n_dim)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better if you reverse the order of these if
statements. That way you can issue the warning and construct upper_corner
from interval_widths
without checking the number of elements is right. Then you can just check the length of upper_corner
without requiring an if
statement since if it wasn't defined by the user, you will have defined in the if interval_widths:
if statement.
Then when it comes time to stop supporting the interval_widths
API, we just have to delete the if interval_widths:
statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i delete the elements number checking between lower_corner
, interval_widths
and n_dim
, an error will come by creating the upper_corner
values because one of these tuples will be out of range. Do you want I remove this error raising ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're absolutely right! Yes, leave in the length checking in both statements, but check interval_widths
first. Then you can also add in a check that interval_widths
and upper_corner
were not both set:
# Calculation of upper_corner with the inputing interval_widths
# This part of the code will be removed in version 2.0
if interval_widths:
warnings.warn("interval_widths will be removed from the API in "
"version 2.0, please use upper_corner argument.")
# If both intervals_widths and upper_corner set, raise error.
if upper_corner:
raise ValueError("Only one of interval_widths or upper_corner can be set. Recommend using upper_corner as interval_widths is deprecated.")
if (len(lower_corner) != len(interval_widths)) or (len(lower_corner) != n_dim):
raise ValueError("lower_corner and interval_widths must have "
"same number of elements as number of data "
"dimensions.")
upper_corner = [lower_corner[i] + interval_widths[i] for i in range(n_dim)]
# Raising a value error if the arguments have not the same dimensions.
if (len(lower_corner) != len(upper_corner)) or (len(lower_corner) != n_dim):
raise ValueError("lower_corner and upper_corner must have "
"same number of elements as number of data "
"dimensions.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will input this line in my code, I already exchanged the if statements positions.
ndcube/ndcube.py
Outdated
@@ -409,21 +419,46 @@ def extra_coords(self): | |||
"value": self._extra_coords_wcs_axis[key]["value"]} | |||
return result | |||
|
|||
def crop_by_coords(self, min_coord_values, interval_widths): | |||
def crop_by_coords(self, lower_corner, interval_widths=None, upper_corner=None): | |||
# The docstring is defined in NDDataBase | |||
|
|||
n_dim = len(self.dimensions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this wasn't your change, but can you replace this is n_dim = self.data.ndim
? It's more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
In fact, I have a suggestion for third API option. @Cadair wont like it because I don't see an easy way to maintain the current API and this one simultaenously. But I think it makes the code more efficient and succinct. It solves @tiagopereira's concern about duplicating the unit definition as my concern about not separating the values from the units. Have the user supply N (number of dimensions) my_cube_roi = my_cube.crop_by_coords(Quantity([0.7, 1.3], unit="deg"),
Quantity([1.3e-5, 1+1.3e-5], unit="deg"),
Quantity([1.04e-9, 1.12e-9], unit="m")) This is the same API as with def crop_by_coords(self, *quantity_list):
# Get all corners of region of interest. Perhaps there is a tidier/more efficient way of doing this.
all_world_corners_grid = np.meshgrid(*[axis.value for axis in quantity_list])
all_world_corners = [all_world_corners_grid[i].flatten()*quantity_list[i].unit
for i in range(n_dim)]
# Convert to pixel coordinates
all_pix_corners = self.world_to_pixel(*all_world_corners)
# Derive slicing item with which to slice NDCube.
# Be sure to round down min pixel and round up + 1 the max pixel.
item = tuple([slice(int(axis_pixels.value.min()),
int(np.ceil(axis_pixels.value.max()))+1)
for axis_pixels in all_pix_corners])
return self[item] As you can see, |
@DanRyanIrish you are quite correct I won't like it. However, not really because of the API change. The two main objections I have are:
|
The extension to def crop_by_coords(self, lower_corner, upper_corner):
# Get all corners of region of interest. Perhaps there is a tidier/more efficient way of doing this?
all_world_corners_grid = np.meshgrid(
*[Quantity([lower_corner[i], upper_corner[i]], unit=lower_corner[i].unit).value
for i in range(self.data.ndim)])
all_world_corners = [all_world_corners_grid[i].flatten()*lower_corner[i].unit
for i in range(n_dim)]
# Convert to pixel coordinates
all_pix_corners = self.world_to_pixel(*all_world_corners)
# Derive slicing item with which to slice NDCube.
# Be sure to round down min pixel and round up + 1 the max pixel.
item = tuple([slice(int(axis_pixels.value.min()),
int(np.ceil(axis_pixels.value.max()))+1)
for axis_pixels in all_pix_corners])
return self[item] |
ndcube/ndcube.py
Outdated
item = tuple([slice(int(lower_pixels[i]), int(upper_pixels[i])) for i in range(n_dim)]) | ||
|
||
item = tuple([slice(int(lower_pixels[i]), int(upper_pixels[i])) | ||
for i in range(n_dim)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, why are you breaking up this line and others like it into two? According to the PEP8 standard, lines of code can be up to 99 characters long. It doesn't affect the functionality. It's just a coding style question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am coding on Atom and I have a line at 70 characters that I have to avoid to exceed.
Then I am coding with three windows on my screen so I only can see around 75 characters. This is better for me. If you want, I can modify it but I think that for user with a lot of code windows, it is hard to see everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's general to stick with the PEP8 standard and have lines up to 99 characters. The 70 line limit and size of your screen is specific to you and wont make it the code more readable for others. Not every line has to be 99 characters of course. Split lines where it makes sense. But if one line is less than 99 characters, don't split it over two. As I said, it's more readable and keeps the style standardised with the majority of other Python code out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will change that ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 says "Limit all lines to a maximum of 79 characters." This is a matter of preference, but I think it is a bad idea to go over 80. If the python standard library can make it under 80 characters, I don't see why we shouldn't be able to. The vertical line in Atom should default to 80.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general policy in SunPy core is 99 since they added this to PEP8:
For code maintained exclusively or primarily by a team that can reach agreement on this issue, it is okay to increase the nominal line length from 80 to 100 characters (effectively increasing the maximum length to 99 characters), provided that comments and docstrings are still wrapped at 72 characters.
I leave the decision up to @DanRyanIrish though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stick with the SunPy core policy of 99 character-long lines. I believe the 80 character limit comes from the days when computers were operated with punch cards! Nowadays most people have big and/or multiple screens so viewing 99 characters isn't a problem. It also encourages (or doesn't inhibit) the use of longer, more explicit variable names which is a good thing. I find that this and reducing cases where code is split over multiple line makes code far more readable. And I'm sure you can change the right margin of editor's like Atom to 100 characters.
Hi @BaptistePellorceAstro. I know we are still discussing implementation and API changes so there is a bit of uncertainty. But irrespective of that, The tests for
|
Hi, I follow all you ask me to do and now I want to try the test to see if it is working before push my work. can you explain me how to do ? |
Great! Of course. On a terminal change to the top level ndcube directory. It should have a file called setup.py. Once you're there, type on the command line |
I succeed to run the test but there is an error with the previous data from the test. I will try to understand why. Do you want I push ? I used an other command, I will try with yours. |
Ok, I have the same return : So what I understand is that the first tuple of data is failing in the test no ? |
Sure. There's never harm in pushing. The error in the last test may be because you've changed something. So yes, it is important to understand why that is now failing. |
Hello @BaptistePellorceAstro! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on April 27, 2018 at 09:10 Hours UTC |
I think I have found the problem. They have : Firstly, they ask to crop the X axis from 1nm to 1m. This is a wide crop, so data are not crop in this axis because the data shape in X axis is 4 pixels and values are increasing by step of 0.02 nm per pixel. Secondly, they ask to crop the Z axis from 0.7 deg to 1.7 deg. This is also a wide crop for the data because the data shape for this axis is only 2 pixels. By step of 0.4 deg as they ask, they are out of range (they need at least 4 pixels). One thing I saw here and on my test, the plotting function seems to be incorrect : in this axis, one pixel is missing in the plot. Do you know why ? Then, they ask to crop the Y axis from 1.3e-5 deg to 1 deg. This crop is taking 3 pixels because the step is 0.5 deg per pixel. However 3 pixel is the shape of the data in Y axis. To conclude, this crop is useless and crop nothing from these data. However, they ask a return as |
This analysis seems reasonable. Feel free to adjust the inputs to the first test to make a more reasonable check of this method in the case where the coordinate and pixel grids align. As for the plotting, don't worry about it. There are open PRs which will be merged once sunpy 0.9 is released and will make major changes to the plotting. Energy chasing down that bug at this time probably isn't worth it. |
Nice, I was just wanting to return you that. I will change the first inputs values to solve this test. |
Good job! |
Hi, I have modified the inputs and the output expected for the align grid test. I am limited by selecting a part of data due to a plotting error. As you said before, I don't take care of that and I focused on the smaller part I can select. |
I added the |
ndcube/ndcube.py
Outdated
if type(lower_corner[i]) is not u.quantity.Quantity: | ||
lower_corner[i] = u.Quantity(lower_corner[i], unit=units[i]) | ||
if type(upper_corner[i]) is not u.quantity.Quantity: | ||
upper_corner[i] = u.Quantity(upper_corner[i], unit=units[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to check the types in this for loop. If the type is already a Quantity
the effect will be to convert the Quantity
to the given unit. This is probably a more intuitive behaviour if a user gives both Quantities
and sets the units
kwarg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, I didn't think that it is working like that ! I go modify that.
ndcube/ndcube.py
Outdated
if type(lower_corner[i]) is not u.quantity.Quantity or \ | ||
type(upper_corner[i]) is not u.quantity.Quantity: | ||
raise ValueError("The inputs are not Quantity objects, you " | ||
"can use the units argument to set it") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest changing this error message slightly to something like: "lower_corner and interval_widths/upper_corner must be of type astropy.units.Quantity or the units kwarg must be set."
.
Alos, the type of error should be TypeError
, not ValueError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified
@BaptistePellorceAstro, this is looking very good. I'll have another look at it a bit later. But I think it's getting to be being ready to merge. |
One of the tests is now failing so something needs to be fixed. Have a look at the online test report here: https://travis-ci.org/sunpy/ndcube/jobs/371049917 If you have questions I can answer them a bit later. |
ndcube/ndcube.py
Outdated
if type(upper_corner[i]) is not u.quantity.Quantity: | ||
upper_corner[i] = u.Quantity(upper_corner[i], unit=units[i]) | ||
else: | ||
for i in range(n_dim): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to do this like loop and if statement in one go:
if any([not isinstance(x, u.Quantity) for x in lower_corner + upper_corner]):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice ! That is what I was looking for, thanks !
ndcube/ndcube.py
Outdated
'number of data dimensions.') | ||
# If inputs are not Quantity objects, they are modified into specified units | ||
for i in range(n_dim): | ||
if type(lower_corner[i]) is not u.quantity.Quantity: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need these if statements here just:
for i in range(n_dim):
lower_corner[i] = u.Quantity(lower_corner[i], unit=units[i])
upper_corner[i] = u.Quantity(upper_corner[i], unit=units[i])
is sufficient. (Converting units in this case is cleaner and probably faster)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated lines, changes are already made.
This is looking good @BaptistePellorceAstro. Let us know when you feel it is ready for another review. |
I have finished to work on this version. I am waiting your reviews to continue. |
ndcube/ndcube.py
Outdated
lower_pixels = corners_array.min(0) | ||
upper_pixels = corners_array.max(0) | ||
# Creating a tuple to crop the data with inputed coordinates | ||
item = tuple([slice(int(lower_pixels[i]), int(upper_pixels[i]) + 1) for i in range(n_dim)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the algorithm from L466 to 483 should be replaced with the following:
# Get all corners of region of interest.
all_world_corners_grid = np.meshgrid(*[u.Quantity([lower_corner[i], upper_corner[i]], unit=lower_corner[i].unit).value for i in range(self.data.ndim)])
all_world_corners = [all_world_corners_grid[i].flatten()*lower_corner[i].unit for i in range(n_dim)]
# Convert to pixel coordinates
all_pix_corners = self.world_to_pixel(*all_world_corners)
# Derive slicing item with which to slice NDCube.
# Be sure to round down min pixel and round up + 1 the max pixel.
item = tuple([slice(int(axis_pixels.value.min()), int(np.ceil(axis_pixels.value.max()))+1) for axis_pixels in all_pix_corners])
This should make the code more efficient as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are your lines checking the boundary conditions of the data ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you have any questions about what this code is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an excellent point! If the derived indices are larger than the extent of that axis, that's fine. It'll just return to the maximum extent of the data. But if the pixels are negative that will cause incorrect behaviour.
So we need to ensure the start pixel values in the slice item are non-negative. There's probably a clever way to do this. Let me give it some quick though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok it's clear ! I update and commit again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This can be done using np.clip
. See the documentation here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.clip.html
So in our case we would do:
item = tuple([slice(int(np.clip(axis_pixels.value.min(), 0, None)), int(np.ceil(axis_pixels.value.max()))+1) for axis_pixels in all_pix_corners])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, your comments just appears, I modify that !
ndcube/ndcube.py
Outdated
# Get all corners of region of interest. | ||
all_world_corners_grid = np.meshgrid(*[u.Quantity([lower_corner[i], upper_corner[i]], | ||
unit=lower_corner[i].unit).value | ||
for i in range(self.data.ndim)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for code readability:
all_world_corners_grid = np.meshgrid(
*[u.Quantity([lower_corner[i], upper_corner[i]], unit=lower_corner[i].unit).value
for i in range(self.data.ndim)]
It makes the code more readable but minimizing the break up of single items (e.g. Quantities) over multiple lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @BaptistePellorceAstro. This comment got outdated incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I was thinking about it ! I am solving some test bugs from the last commits.
ndcube/ndcube.py
Outdated
@@ -1,6 +1,8 @@ | |||
# -*- coding: utf-8 -*- | |||
|
|||
import abc | |||
import warnings | |||
from itertools import product |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this import as it is no longer used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I missed this one. I will delete it tomorrow.
@Cadair, the algorithm in this PR has changed. Any comments? |
Congratulations @BaptistePellorceAstro one another merged PR! 🎉 |
@DanRyanIrish I assume this should be milestoned 1.0.2 to be released in a bugfix? |
Hi @BaptistePellorceAstro. Yes, that's a good point. I should speak with @Cadair and figure out how to backport the bugfixes, including this one, that we want to release. |
Sorry @Cadair. Misread who this comment was from!!! Yes, we should include it in the next bugfix release. |
@DanRyanIrish can we try and make sure that all PRs have a milestone so that we know if they should be backported or not? i.e. we should have a 1.1 and a 1.0.x (currently 1.0.2) milestone for the next bugfix release. I will backport this. |
Solving crop_by_coord problem when data are rotated
Hi @Cadair. Yes, we can do that. I will go through the PRs when I get a chance and milestone them. |
Here a first version to solve the
crop_by_coords
problem when the data are rotated.Discussion about it was on issue #99 and issue #100
Ping @DanRyanIrish