Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transformations to polygons and points bounding box query #143

Merged
merged 8 commits into from
Feb 24, 2023

Conversation

kevinyamauchi
Copy link
Collaborator

Hey @LucaMarconato . Here are the transforms added to the bounding box query for points and polygons. This gets the main infrastructure in. We can extend the tests for a few corner cases in a follow up PR.

A couple of notes:

  • the queries now return None if no data matches the query. This way the user doesn't have to figure out the specific way to check in an element is empty for each of the different types of elements.
  • I had to reset the index on the DaskDataFrame when subsetting the points. Otherwise, the tables get misaligned during the transform because the index doesn't start from 0 and the index of the transformed points that are re-inserted do start from 0. This feels a bit hacky, but I think it's okay for now.

I won't be able to work on this tomorrow, so please feel free to take the PR over and directly push to it!

@@ -206,7 +207,7 @@ def _(data: DaskDataFrame, transformation: BaseTransformation) -> DaskDataFrame:
indices = xtransformed["dim"] == ax
new_ax = xtransformed[:, indices].data.flatten()
# mypy says that from_array is not a method of DaskDataFrame, but it is
transformed[ax] = da.from_array(np.array(new_ax)) # type: ignore[attr-defined]
transformed[ax] = dd.from_dask_array(da.from_array(np.array(new_ax))) # type: ignore[attr-defined]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one is supposed to add a Series, not an array here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this code, now it is fully lazy. I still use a dask array and not a series. I think it works both way, but if there will be performance problems or some other problems I will keep this in mind and maybe check again.

@LucaMarconato
Copy link
Member

LucaMarconato commented Feb 23, 2023

I have reviewed this code and made mainly two changes:

  • the transform points function is now lazy and doesn't convert back and forth to numpy arrays
  • I realized that I could simplify the bounding box query for polygons (not points) by using shapely + geopandas APIs.

If you are ok with the changes, please merge the PR into feature/query_transforms; and I will add the image part.

@kevinyamauchi kevinyamauchi merged commit 98e7f31 into feature/query_transforms Feb 24, 2023
@kevinyamauchi kevinyamauchi deleted the feature/better_query branch February 24, 2023 06:05
@kevinyamauchi
Copy link
Collaborator Author

Thanks @LucaMarconato ! I went ahead and merged to keep things moving forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants