Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes the misuse/mixuse of time.time()/time.monotonic() #3220

Merged
merged 2 commits into from
Mar 15, 2024

Conversation

sighingnow
Copy link
Contributor

@sighingnow sighingnow commented Mar 6, 2024

I have noticed that there's some mixuse of time.time() and time.monotonic(), however, these two function cannot be used to subtract to compute the duration, e.g., self.metrics.time_in_queue = time - self.metrics.arrival_time.

The time.time() is preferred as it denotes the timepoint of events, e.g., arrive, scheduled, and finished.

After this PR, only the StatLogger uses time.monotonic() to decide when to print a log message.

Signed-off-by: Tao He <sighingnow@gmail.com>
@simon-mo
Copy link
Collaborator

simon-mo commented Mar 6, 2024

The clock is not affected by system clock updates. The reference point of the returned value is undefined, so that only the difference between the results of two calls is valid.

I thought the point of monotonic clock is they can be used to compute time deltas?

@sighingnow
Copy link
Contributor Author

We currently have:

@dataclass
class RequestMetrics:
    """Metrics associated with a request.

    Args:
        arrival_time: The time when the request arrived.
        first_scheduled_time: The time when the request was first scheduled.
        first_token_time: The time when the first token was generated.
        time_in_queue: The time the request spent in the queue.
        finished_time: The time when the request was finished.
    """
    arrival_time: float
    last_token_time: float
    first_scheduled_time: Optional[float]
    first_token_time: Optional[float]
    time_in_queue: Optional[float]
    finished_time: Optional[float] = None

I thought the point of monotonic clock is they can be used to compute time deltas?

Yes, monotonic clock is better for compute time deltas like time_in_queue (as well as in SequenceGroup.get_last_latency()), but intuitively it would to better to let user pass the epoch time (time.time() as arrive time when adding requests, and record epoch time for scheded, finished time as metrics (as the metrics may be further processed).

Should we add new fields like arrive_timepoint, last_token_timepoint for delta computation? Ideally monotonic clocks should only be used internally, and user APIs it is better to have epoch time arguments/properties/results. How do you folks think about that?

@simon-mo simon-mo enabled auto-merge (squash) March 15, 2024 16:52
@mitchellstern
Copy link

mitchellstern commented Mar 15, 2024

Adding some brief feedback here as requested by @simon-mo:

For the first pass at fixing this bug, I think using time.time() everywhere for consistency provides the most intuitive behavior for users and is a reasonable thing to do.

As an enhancement, you could also imagine adding an arrival_time_monotonic field in the future based on time.monotonic(), which could serve as a reference point for latency-oriented fields like time_in_queue.

@simon-mo simon-mo merged commit 14b8ae0 into vllm-project:main Mar 15, 2024
24 checks passed
@sighingnow
Copy link
Contributor Author

As an enhancement, you could also imagine adding an arrival_time_monotonic field in the future based on time.monotonic(), which could serve as a reference point for latency-oriented fields like time_in_queue.

Thanks for the comment!

@sighingnow sighingnow deleted the ht/fixes-timing branch March 16, 2024 03:51
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
…#3220)

Signed-off-by: Tao He <sighingnow@gmail.com>
Co-authored-by: simon-mo <simon.mo@hey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants