Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential speed-up for creating spot column in atl03sp #388

Closed
alma-pi opened this issue Apr 16, 2024 · 2 comments
Closed

Potential speed-up for creating spot column in atl03sp #388

alma-pi opened this issue Apr 16, 2024 · 2 comments

Comments

@alma-pi
Copy link

alma-pi commented Apr 16, 2024

In sliderule/clients/python/sliderule/icesat2.py, the 'spot' column is calculated using geopandas.apply and the __calcspot function. Using pandas.Series.map here instead should be much faster.

atl03['spot'] = atl03.apply(lambda row: sliderule.icesat2.__calcspot(row["sc_orient"], row["track"], row["pair"]), axis=1)

For a granule of about 2 million photons, this takes about 19s. Using a dictionary and pandas.Series.map takes less than 2s:

# Create dictionary mapping (sc_orient, track, pair) to spot
map_spot = {(0,1,0): 1,
            (0,1,1): 2,
            (0,2,0): 3,
            (0,2,1): 4,
            (0,3,0): 5,
            (0,3,1): 6,
            (1,1,0): 6,
            (1,1,1): 5,
            (1,2,0): 4,
            (1,2,1): 3,
            (1,3,0): 2,
            (1,3,1): 1,}

tmp = pd.Series(zip(atl03['sc_orient'], atl03['track'], atl03['pair']))
atl03['spot'] = tmp.map(map_spot).values
del tmp

Pandas map function returns NaNs in case of missing keys.
It's also possible to change the current function to accept a tuple as input. Along the lines of:

def __calcspot(input_tuple):
    sc_orient, track, pair = input_tuple
    [...]

tmp = pd.Series(zip(atl03['sc_orient'], atl03['track'], atl03['pair']))
atl03['spot'] = tmp.map(__calcspot).values

@jpswinski
Copy link
Member

Thanks @alma-pi! This is a great optimization. I'll update this issue when we get it into the code, and it should be out with the next release.

@jpswinski
Copy link
Member

@alma-pi this change you outlined has been made and pushed. It didn't make it into this past release, but will come out with the next release.

After implementing your change, I saw a significant speed up in atl03 processing calls; on the order of ~60 second requests going to ~35 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants