-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add validation handler and deps #269
base: main
Are you sure you want to change the base?
Conversation
""" | ||
Evaluates a trajectory and identifies if it violates any verification rules. | ||
|
||
<LINK HERE TO HOSTED REFERENCE EXAMPLE(S)>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nickmasson! I made a ton of comments. Let me know if I can better explain anything!
I'm also happy to do some work here to better use some of the existing pycontrails tooling.
<LINK HERE TO HOSTED REFERENCE EXAMPLE(S)>. | ||
""" | ||
|
||
CRUISE_ROCD_THRESHOLD_FPS = 4.2 # 4.2 ft/sec ~= 250 ft/min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the existing value in pycontrails?
pycontrails/pycontrails/physics/constants.py
Line 116 in 814f822
nominal_rocd: float = 12.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, absolutely. It is OK if I keep it as FPS within the boundaries of the handler? It will just save having to refactor broadly.
What I'll do is pull this value in from constants.py
, but still do it as a class variable, so that an implementer can monkey-patch the handler if they want.
pycontrails/datalib/spire.py
Outdated
""" | ||
|
||
CRUISE_ROCD_THRESHOLD_FPS = 4.2 # 4.2 ft/sec ~= 250 ft/min | ||
CRUISE_LOW_ALTITUDE_THRESHOLD_FT = 15000 # lowest expected cruise altitude |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here we have two competing thresholds ... could we consolidate?
pycontrails/pycontrails/core/flight.py
Line 727 in 814f822
min_cruise_altitude_ft: float = 20000.0, |
"altitude_baro": pdtypes.is_numeric_dtype, | ||
} | ||
|
||
airports_db = airports.global_airport_database() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the @functools.cache
decorator onto airports.global_airport_database()
? That's how we've been handling data loading throughout pycontrails. (I think this is the first place we're actually using airports.global_airport_database()
in the source code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The way the handler is designed, an implementer shouldn't be reinstantiating an instance of the class multiple times in a given runtime... so this should (ideally) be loaded only once per program. But I don't see a reason using the functools cacher would hurt, so happy to add (we could also move this to a module-level singleton, so it is similarly loaded only once per program, with more confidence than if it were a class obj).
self._df = self._df.assign( | ||
elapsed_seconds=[ | ||
self._rolling_time_delta_seconds(window) for window in self._df.rolling(window=2) | ||
], | ||
) | ||
self._df = self._df.assign( | ||
elapsed_distance_m=[ | ||
self._rolling_distance_meters(window) for window in self._df.rolling(window=2) | ||
], | ||
) | ||
self._df = self._df.assign( | ||
ground_speed_m_s=self._df["elapsed_distance_m"] | ||
.divide(self._df["elapsed_seconds"]) | ||
.replace(np.inf, np.nan) | ||
) | ||
self._df = self._df.assign( | ||
rocd_fps=[self._rolling_rocd_fps(window) for window in self._df.rolling(window=2)] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main piece of feedback for this PR: there are a number of pycontrails.Flight
methods and other helper functions that already do these computations in a vectorized manner. Some of them may need slight tweaking to match exactly what you're doing here, but it shouldn't be too hard. Is there a reason to avoid them?
If you have a good reason to avoid the existing pycontrails tooling (and there definitely are some good reasons -- some of the existing pycontrails function have opinions / conventions baked into them), all of these calculations can still be vectorized. For example, your calculation for elapsed_seconds
is essentially:
elapsed_seconds = self._df["timestamp"].diff().dt.total_seconds().astype("Int64")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. No reason to have avoided methods that have parity of behavior using a vectorized approach (other than my fairly shallow experience w. pd and np). I'd like to avoid referencing existing pycontrails methods that have domain logic/assumptions, but otherwise welcome refactor of these to avoid pd.rolling()
(I assume that will help w/ perf notably).
@zebengberg would you mind helping with that refactor? Feel free to push changes directly to this branch, or throw up small PRs pointed to this branch with the proposed refactor.
flight_length_check: None | FlightTooShortError | FlightTooLongError | ||
flight_length_check = self._is_valid_flight_length() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to delete the line flight_length_check: None | FlightTooShortError | FlightTooLongError
-- the type hint is already provided on the method self._is_valid_flight_length()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed w.r.t type hinting and mypy etc...
The hinting here was motivated as a hint in the spirit of how a developer might use the handler.
Given that the handler returns a list of one or more typed violations if one or more violations are encountered, my expectation is that a developer would likely use this method definitions in drafting their approach to managing the handler response. It seems to me like having a hint of the possible objects aggregated & returned into all_violations
will help a developer avoid drilling up and down into all the methods. If these were removed, then someone looking to understand the full scope of what the handler is returned would have to traverse a lot more src code.
Changes
Features
Adds a
ValidateTrajectoryHandler
topycontrails.datalib.spire
.The handler takes a pandas dataframe, where each row represents a single flight instance trajectory waypoint.
Certain columns are required (
ValidateTrajectoryHandler.SCHEMA
) beyond what is required in e.g. aFlight
instance (pycontrails.core.flight
), as those additional columns are used in the validation ruleset that evaluates the trajectory.TODO
spire.py
(could also be a follow-up release, as those changes will be backward incompatible)Tests
make test
)