Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add validation handler and deps #269

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nickmasson
Copy link
Contributor

@nickmasson nickmasson commented Nov 14, 2024

Changes

Features

Adds a ValidateTrajectoryHandler to pycontrails.datalib.spire.

The handler takes a pandas dataframe, where each row represents a single flight instance trajectory waypoint.

Certain columns are required (ValidateTrajectoryHandler.SCHEMA) beyond what is required in e.g. a Flight instance (pycontrails.core.flight), as those additional columns are used in the validation ruleset that evaluates the trajectory.

TODO

  • mypy wrestling
  • update changelog
  • update release notes
  • add tutorial/cheat-sheet (proposed location: https://py.contrails.org/flight.html)
  • (?) additional consolidation and structuring of existing methods in spire.py (could also be a follow-up release, as those changes will be backward incompatible)

Tests

  • QC test passes locally (make test)
  • CI tests pass

@nickmasson nickmasson requested a review from a team November 14, 2024 21:32
"""
Evaluates a trajectory and identifies if it violates any verification rules.

<LINK HERE TO HOSTED REFERENCE EXAMPLE(S)>.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

Copy link
Contributor

@zebengberg zebengberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nickmasson! I made a ton of comments. Let me know if I can better explain anything!

I'm also happy to do some work here to better use some of the existing pycontrails tooling.

<LINK HERE TO HOSTED REFERENCE EXAMPLE(S)>.
"""

CRUISE_ROCD_THRESHOLD_FPS = 4.2 # 4.2 ft/sec ~= 250 ft/min
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the existing value in pycontrails?

nominal_rocd: float = 12.7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, absolutely. It is OK if I keep it as FPS within the boundaries of the handler? It will just save having to refactor broadly.

What I'll do is pull this value in from constants.py, but still do it as a class variable, so that an implementer can monkey-patch the handler if they want.

"""

CRUISE_ROCD_THRESHOLD_FPS = 4.2 # 4.2 ft/sec ~= 250 ft/min
CRUISE_LOW_ALTITUDE_THRESHOLD_FT = 15000 # lowest expected cruise altitude
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here we have two competing thresholds ... could we consolidate?

min_cruise_altitude_ft: float = 20000.0,

pycontrails/datalib/spire.py Outdated Show resolved Hide resolved
"altitude_baro": pdtypes.is_numeric_dtype,
}

airports_db = airports.global_airport_database()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the @functools.cache decorator onto airports.global_airport_database()? That's how we've been handling data loading throughout pycontrails. (I think this is the first place we're actually using airports.global_airport_database() in the source code.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
The way the handler is designed, an implementer shouldn't be reinstantiating an instance of the class multiple times in a given runtime... so this should (ideally) be loaded only once per program. But I don't see a reason using the functools cacher would hurt, so happy to add (we could also move this to a module-level singleton, so it is similarly loaded only once per program, with more confidence than if it were a class obj).

pycontrails/datalib/spire.py Outdated Show resolved Hide resolved
pycontrails/datalib/exceptions.py Outdated Show resolved Hide resolved
Comment on lines +1043 to +1060
self._df = self._df.assign(
elapsed_seconds=[
self._rolling_time_delta_seconds(window) for window in self._df.rolling(window=2)
],
)
self._df = self._df.assign(
elapsed_distance_m=[
self._rolling_distance_meters(window) for window in self._df.rolling(window=2)
],
)
self._df = self._df.assign(
ground_speed_m_s=self._df["elapsed_distance_m"]
.divide(self._df["elapsed_seconds"])
.replace(np.inf, np.nan)
)
self._df = self._df.assign(
rocd_fps=[self._rolling_rocd_fps(window) for window in self._df.rolling(window=2)]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main piece of feedback for this PR: there are a number of pycontrails.Flight methods and other helper functions that already do these computations in a vectorized manner. Some of them may need slight tweaking to match exactly what you're doing here, but it shouldn't be too hard. Is there a reason to avoid them?

If you have a good reason to avoid the existing pycontrails tooling (and there definitely are some good reasons -- some of the existing pycontrails function have opinions / conventions baked into them), all of these calculations can still be vectorized. For example, your calculation for elapsed_seconds is essentially:

elapsed_seconds = self._df["timestamp"].diff().dt.total_seconds().astype("Int64")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. No reason to have avoided methods that have parity of behavior using a vectorized approach (other than my fairly shallow experience w. pd and np). I'd like to avoid referencing existing pycontrails methods that have domain logic/assumptions, but otherwise welcome refactor of these to avoid pd.rolling() (I assume that will help w/ perf notably).

@zebengberg would you mind helping with that refactor? Feel free to push changes directly to this branch, or throw up small PRs pointed to this branch with the proposed refactor.

pycontrails/datalib/spire.py Outdated Show resolved Hide resolved
pycontrails/datalib/spire.py Outdated Show resolved Hide resolved
Comment on lines +1391 to +1392
flight_length_check: None | FlightTooShortError | FlightTooLongError
flight_length_check = self._is_valid_flight_length()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to delete the line flight_length_check: None | FlightTooShortError | FlightTooLongError -- the type hint is already provided on the method self._is_valid_flight_length().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed w.r.t type hinting and mypy etc...

The hinting here was motivated as a hint in the spirit of how a developer might use the handler.

Given that the handler returns a list of one or more typed violations if one or more violations are encountered, my expectation is that a developer would likely use this method definitions in drafting their approach to managing the handler response. It seems to me like having a hint of the possible objects aggregated & returned into all_violations will help a developer avoid drilling up and down into all the methods. If these were removed, then someone looking to understand the full scope of what the handler is returned would have to traverse a lot more src code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants