Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trimmed Pipeline Stat Tracking #994

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TeachMeTW
Copy link
Contributor

@TeachMeTW TeachMeTW commented Nov 16, 2024

Optimize Pipeline by Streamlining Statistic Tracking

Summary

Introduces improvements to the statistics tracking pipeline by focusing on key metrics and removing unnecessary tracking for the bottom 80% and top 20% statistics. These changes aim to enhance performance and reduce resource overhead.

Changes

  • Analyzed and reviewed bottom 80% and top 20% statistics to identify redundancies.
  • Removed unnecessary tracking logic related to these stats from the pipeline.

Testing

  • Verified that the removal of unnecessary tracking does not impact critical functionality by clearing pipeline and testing on an opcode.
  • Validated the accuracy of remaining statistics and their integration into the pipeline by filtering today's pipeline stats.

Notes

From this test run.... Readings are in SECONDS

Top 20%:

data.name,data.reading
TRIP_SEGMENTATION,2537.513639458
TRIP_SEGMENTATION/segment_into_trips,2479.786787083
TRIP_SEGMENTATION/segment_into_trips_time/loop,2450.691215375
MODE_INFERENCE,1903.8646702920005

Bottom 80%:

data.name,data.reading
CLEAN_RESAMPLING,342.42430512500005
SECTION_SEGMENTATION,159.14454279100028
JUMP_SMOOTHING,64.29351804199996
TRIP_SEGMENTATION/create_places_and_trips,34.72458166600018
TRIP_SEGMENTATION/get_data_df,26.0028838125
TRIP_SEGMENTATION/segment_into_trips_time/get_filtered_points_pre_ts_diff_df,22.336031124999998
USER_INPUT_MATCH_INCOMING,0.41017545800000005
CREATE_COMPOSITE_OBJECTS,0.08192466699983925
USERCACHE,0.012076972333333344
CREATE_CONFIRMED_OBJECTS,0.010660459000064293
LABEL_INFERENCE,0.0088373339995087
ACCURACY_FILTERING,0.008748396000000325
EXPECTATION_POPULATION,0.006602458000088518
STORE_USER_STATS,0.00648329100022238

…emoving unnecessary tracking for these stats
@shankari
Copy link
Contributor

@TeachMeTW the problem with this analysis is that it doesn't take nesting into account.

To understand what that means, think about the high level of why we instrumented in the first place. If we remove the bottom 80% of stats, will it help us achieve that goal? Concretely, what is the next step after removing the bottom 80% of stats and will that take us closer to the eventual goal of improved scalability.

@TeachMeTW
Copy link
Contributor Author

@TeachMeTW the problem with this analysis is that it doesn't take nesting into account.

To understand what that means, think about the high level of why we instrumented in the first place. If we remove the bottom 80% of stats, will it help us achieve that goal? Concretely, what is the next step after removing the bottom 80% of stats and will that take us closer to the eventual goal of improved scalability.

@shankari There was a 'problem' with the stage snapshot; when I was going through the top and bottom timings, there were some functions that does NOT exist in the recent master -- they might've been artifacts from the past.

I got rid of those functions that did exist for the bottom end and re ran the pipeline; now yes there is nesting -- the next step I am currently working is looking into the top 20% and seeing where its taking quite long and adding further instrumentation within those to zoom into the actual bottlenecks ie TRIP_SEGMENTATION/segment_into_trips_time/loop,2450.691215375 I am figuring out what is going in on that loop since it is general. After readding instrumentation within that nest, i will reanalyze and repeat until we reach a good point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants