Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add /timeseries endpoints #33

Merged
merged 7 commits into from
Nov 7, 2024
Merged

add /timeseries endpoints #33

merged 7 commits into from
Nov 7, 2024

Conversation

hrodmn
Copy link
Contributor

@hrodmn hrodmn commented Oct 22, 2024

The line count looks scary but the big changes here include ~600 lines of code in the new titiler/cmr/timeseries.py file. The rest is in a notebook, a VCR cassette, and tests.

The goal is to start refining the timeseries API spec by testing out a bunch of CMR collections and seeing what kind of parameters make sense for most cases.

To see some examples of the /timeseries endpoints in action and some descriptions of the API for specifying a timeseries, check out the new timeseries notebook

I am really interested in getting some feedback on the specification of the timeseries parameters. It feels like there are a lot of knobs so maybe there are ways to simplify it.

@hrodmn hrodmn self-assigned this Oct 22, 2024
@hrodmn hrodmn force-pushed the feature/timeseries branch 3 times, most recently from 8a664f0 to a827171 Compare October 28, 2024 18:48
@hrodmn hrodmn changed the title DRAFT: add /timeseries endpoints add /timeseries endpoints Oct 31, 2024
@hrodmn hrodmn marked this pull request as ready for review October 31, 2024 02:50
@hrodmn hrodmn force-pushed the feature/timeseries branch 2 times, most recently from 3a4253e to 96c28d6 Compare October 31, 2024 12:16
Comment on lines 35 to 191
Optional[str],
Query(
description="Start datetime for timeseries request",
),
] = None
end_datetime: Annotated[
Optional[str],
Query(
description="End datetime for timeseries request",
),
] = None
step: Annotated[
Optional[str],
Query(
description="Time step between timeseries intervals, expressed as [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations)"
),
] = None
step_idx: Annotated[
Optional[int],
Query(description="Optional (zero-indexed) index of the desired time step"),
] = None
exact: Annotated[
Optional[bool],
Query(
description="If true, queries will be made for a point-in-time at each step. If false, queries will be made for the entire interval between steps"
),
] = None
datetimes: Annotated[
Optional[str],
Query(
description="Optional list of comma-separated specific time points or time intervals to summarize over"
),
] = None


def parse_duration(duration: str) -> relativedelta:
"""Parse ISO 8601 duration string to relativedelta."""
match = re.match(
r"P(?:(\d+)Y)?(?:(\d+)M)?(?:(\d+)W)?(?:(\d+)D)?(?:T(?:(\d+)H)?(?:(\d+)M)?(?:(\d+(?:\.\d+)?)S)?)?",
duration,
)
if not match or not any(m for m in match.groups()):
raise ValueError(f"{duration} is an invalid duration format")

years, months, weeks, days, hours, minutes, seconds = [
float(g) if g else 0 for g in match.groups()
]
return relativedelta(
years=int(years),
months=int(months),
weeks=int(weeks),
days=int(days),
hours=int(hours),
minutes=int(minutes),
seconds=int(seconds),
microseconds=int((seconds % 1) * 1e6),
)


def generate_datetime_ranges(
start_datetime: datetime, end_datetime: datetime, step: str, exact: bool = False
) -> List[Union[Tuple[datetime], Tuple[datetime, datetime]]]:
"""Generate datetime ranges"""
start = start_datetime
end = end_datetime
step_delta = parse_duration(step)

ranges: List[Union[Tuple[datetime], Tuple[datetime, datetime]]] = []
current = start

step_timedelta = (current + step_delta) - current
is_small_timestep = step_timedelta <= timedelta(seconds=1)

while current < end:
if exact:
# For exact case, return a tuple with just one exact datetime
next_step = current + step_delta
ranges.append((current,))
else:
next_step = min(current + step_delta, end)
if next_step == end:
ranges.append((current, next_step))
break

if is_small_timestep:
# Subtract 1 millisecond for small timesteps
ranges.append((current, next_step - timedelta(microseconds=1)))
else:
# Subtract 1 second for larger timesteps
ranges.append((current, next_step - timedelta(seconds=1)))

current = next_step

if current == end:
ranges.append((end,))

if not ranges:
return [(start, end)]

return ranges
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this would eventually move to titiler.extensions.timeseries but I wanted to do the development all in one place to facilitate easier iteration during early stages of development.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of the classes and models defined at the top of this module will eventually go to titiler.extensions.timeseries so they can be re-used by other applications.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@abarciauskas-bgse abarciauskas-bgse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me, I just made comments about clarifying some of the function and dependency descriptions. I focused on the notebook and titiler/cmr/timeseries.py. If time allows, I would like to understand how the VCR and data/podaac... files are being used (presumably in the tests).

titiler/cmr/timeseries.py Outdated Show resolved Hide resolved
titiler/cmr/timeseries.py Show resolved Hide resolved
titiler/cmr/timeseries.py Outdated Show resolved Hide resolved
titiler/cmr/timeseries.py Show resolved Hide resolved
base_url=str(factory.url_for(request, "geojson_statistics")),
request=request,
param_list=query,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand how this works: the timeseries params are generated as part of the dependency timeseries_query_no_bbox - which generates a query object that is list of objects each containing a concept id and datetime. Build request urls returns a list of urls, each of which is a statistics endpoint, and then in the line below a request is made to each of those statistics endpoints. Is that understanding correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abarciauskas-bgse yes that is correct! The code in the /timeseries endpoints just converts the timeseries query parameters into a list of datetime parameters, then a fires off a lower-level request for each unique datetime parameter. Then there is a little bit of code to organize the results (e.g. list of PNGs to a GIF, list of statistics geojson output into a single geojson).

Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally blown away by the GIF possibilities 🎉 🌍 I unfortunately wasn't able to fully dive into a review today, so I will try it out more on Monday but wanted to share a couple questions/comments so far.

My main question is about the pros/cons of having a new /timeseries/bbox url rather than adding support for returning image/gif to the images/bbox url. They share a ton of parameters and to me seem conceptually very similar with the return type just dependent on whether you have discrete mosaicing in time. If it remains under /timeseries and will be specific to gif return types, would it help for clarity to just name if timeseries/gif rather than timeseries/bbox?

titiler/cmr/timeseries.py Outdated Show resolved Hide resolved
@hrodmn
Copy link
Contributor Author

hrodmn commented Nov 2, 2024

My main question is about the pros/cons of having a new /timeseries/bbox url rather than adding support for returning image/gif to the images/bbox url. They share a ton of parameters and to me seem conceptually very similar with the return type just dependent on whether you have discrete mosaicing in time.

I think we could just extend the original /bbox endpoint but there would wind up being a big if "gif" then ... else: ... control flow since the timeseries functionality is just farming out the requests to the /bbox endpoint again. Maybe there would be an elegant way to write a recursive approach.

If it remains under /timeseries and will be specific to gif return types, would it help for clarity to just name if timeseries/gif rather than timeseries/bbox?

This would definitely make sense if we are just going to stick with the gif format but I want to add support for mp4 export, too, so I kept the format argument in there.

I think the /bbox endpoint label does a bad job describing what it actually does, so maybe it would make more sense to do /timeseries/{format} where format could be gif or mp4.

@@ -26,5 +26,5 @@ RUN if [ -z "$EARTHDATA_USERNAME" ] || [ -z "$EARTHDATA_PASSWORD" ]; then \
# http://www.uvicorn.org/settings/
ENV HOST 0.0.0.0
ENV PORT 80
CMD uv run uvicorn titiler.cmr.main:app --host ${HOST} --port ${PORT} --log-level debug
CMD uv run uvicorn titiler.cmr.main:app --host ${HOST} --port ${PORT} --log-level debug --reload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't really need the reload here because the code is copied not mounted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it because I mount the titiler/ directory in docker-compose.yml. It is redundant to COPY and mount the volume but it is convenient for local development to have the reload functionality.

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@maxrjones
Copy link
Member

maxrjones commented Nov 4, 2024

My main question is about the pros/cons of having a new /timeseries/bbox url rather than adding support for returning image/gif to the images/bbox url. They share a ton of parameters and to me seem conceptually very similar with the return type just dependent on whether you have discrete mosaicing in time.

I think we could just extend the original /bbox endpoint but there would wind up being a big if "gif" then ... else: ... control flow since the timeseries functionality is just farming out the requests to the /bbox endpoint again. Maybe there would be an elegant way to write a recursive approach.

If it remains under /timeseries and will be specific to gif return types, would it help for clarity to just name if timeseries/gif rather than timeseries/bbox?

This would definitely make sense if we are just going to stick with the gif format but I want to add support for mp4 export, too, so I kept the format argument in there.

I think the /bbox endpoint label does a bad job describing what it actually does, so maybe it would make more sense to do /timeseries/{format} where format could be gif or mp4.

The timeseries/bbox url calls the images/bbox url to get the individual images to combine into a gif, right? If so, it might be safer to keep them separate to avoid a potential for recursive lambda calls. I would lean towards a more descriptive label like /timeseries/{format} or /timeseries/movie with two output types.

@hrodmn
Copy link
Contributor Author

hrodmn commented Nov 6, 2024

I would lean towards a more descriptive label like /timeseries/{format} or /timeseries/movie with two output types.

I went with /timeseries/bbox/{xmin},{ymin},{xmax},{ymax}.{format} to be consistent with the non-timeseries /bbox endpoint, but I agree with @maxrjones that /timeseries/bbox path makes it less than obvious what it actually does. @vincentsarago do you have an opinion on staying consistent with /bbox or more creating a more descriptive with /timeseries/{format} (-> /timeseries/gif, /timeseries/mp4)?

@maxrjones
Copy link
Member

I would lean towards a more descriptive label like /timeseries/{format} or /timeseries/movie with two output types.

I went with /timeseries/bbox/{xmin},{ymin},{xmax},{ymax}.{format} to be consistent with the non-timeseries /bbox endpoint, but I agree with @maxrjones that /timeseries/bbox path makes it less than obvious what it actually does. @vincentsarago do you have an opinion on staying consistent with /bbox or more creating a more descriptive with /timeseries/{format} (-> /timeseries/gif, /timeseries/mp4)?

sounds good, I think this could be considered a documentation issue. the self-describing APIs are nice for maintenance, but a downside is new people may not find it sufficient for finding features.

@hrodmn
Copy link
Contributor Author

hrodmn commented Nov 7, 2024

Thank you all for your thoughtful reviews of these changes!!

After thinking it over some more I want to stick with /timeseries/bbox for the sake of consistency with other titiler APIs. I added some more verbose descriptions to the openapi documentation to make the purpose of each /timeseries endpoint more clear. I also opened up an issue to actually publish the docs to the web! #38

@hrodmn hrodmn merged commit 96e0d8c into develop Nov 7, 2024
4 checks passed
@hrodmn hrodmn deleted the feature/timeseries branch November 7, 2024 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants