Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up DogStatsd._report() #233

Merged
merged 1 commit into from
Oct 30, 2017
Merged

Conversation

shargan
Copy link
Contributor

@shargan shargan commented Oct 25, 2017

As currently written, DogStatsd._report() can become a bottleneck to high-throughput applications. Profiling against a rough microbenchmark (see below) showed the main time sinks to be str.join() and DogStatsd._report() itself.

This diff is a little ugly, but dramatically more performant than extending lists and joining strings:

(.venv) datadogpy (master)$ python bench.py
Simple: 1000000 metrics in 8.31s (120314.07 metrics/s)
Buffered: 1000000 metrics in 3.87s (258083.36 metrics/s)

(.venv) datadogpy (master)$ git checkout shargan/performance
Switched to branch 'shargan/performance'

(.venv) datadogpy (shargan/performance)$ python bench.py
Simple: 1000000 metrics in 5.32s (187969.19 metrics/s)
Buffered: 1000000 metrics in 1.72s (580796.61 metrics/s)

I see similar performance improvements on both Python 2.7 and 3.5, as well as when the stats get more complicated (including several tags and a sample rate).

The micro-benchmark in question:

import itertools
import time

from datadog.dogstatsd import DogStatsd

STAT_NAMES = itertools.cycle(["stat1", "stat2", "stat3", "stat4", "stat5"])

def bench_buffered(num, stats):
    stats.open_buffer()
    for _ in range(num):
        stats.increment(next(STAT_NAMES))
    stats.close_buffer()

def bench_simple(num, stats):
    for _ in range(num):
        stats.increment(next(STAT_NAMES))

if __name__ == "__main__":
    num = 1000000

    for name, func in (("Simple", bench_simple), ("Buffered", bench_buffered)):
        stats = DogStatsd(namespace="foo.bar.baz")
        start = time.time()
        func(num, stats)
        delta = time.time() - start
        print("%s: %d metrics in %.2fs (%.2f metrics/s)"
              % (name, num, delta, num / delta))

@yannmh yannmh self-requested a review October 30, 2017 20:12
@yannmh
Copy link
Member

yannmh commented Oct 30, 2017

Thanks a lot @shargan for the detailed content on your PR 🙇

The changes look great and I am looking forward to see them included in the next release!

@yannmh yannmh merged commit b9b14d7 into DataDog:master Oct 30, 2017
@shargan shargan deleted the shargan/performance branch October 31, 2017 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants