Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pgbouncer] Fix stats deprecated in pgbouncer 1.8 #1016

Merged
merged 7 commits into from
Jan 19, 2018

Conversation

sj26
Copy link
Contributor

@sj26 sj26 commented Jan 16, 2018

When using pgbouncer version 1.8 the pgbouncer check fails with:

$ datadog-agent check pgbouncer
  =========
  Collector
  =========

    Running Checks
    ==============
      pgbouncer
      ---------
        Total Runs: 1
        Metrics: 0, Total Metrics: 0
        Events: 0, Total Events: 0
        Service Checks: 0, Total Service Checks: 0Error:
        Traceback (most recent call last):
          File "/opt/datadog-agent/bin/agent/dist/checks/__init__.py", line 300, in run
            self.check(copy.deepcopy(self.instances[0]))
          File "/opt/datadog-agent/checks.d/pgbouncer.py", line 227, in check
            self._collect_stats(db, tags)
          File "/opt/datadog-agent/checks.d/pgbouncer.py", line 121, in _collect_stats
            assert len(row) == len(cols) + len(desc)
        AssertionError

This is due to some stats being renamed in an upstream commit as part of pgbouncer 1.8: pgbouncer/pgbouncer@876d8a5

The attached commit updates the datadog pgbouncer check a little, adds a test flavor version for pgbouncer 1.8, and modifies the check to cope with the renamed and added stats maintaining backwards compatibility.

(We're now running this version of the check in production.)

I didn't see this work had already been started in #1011 — feel free to take this commit and fold it in there, or whatever you'd like. (I've kept version and changelog updates out of here so it can be cherried or rebased.)

Adds new column support while maintaining backward compat, detected
using result set column names.

See pgbouncer/pgbouncer@876d8a5
Copy link
Collaborator

@nmuesch nmuesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the PR! (Also appreciate the tests!) Left a couple of quick thoughts as well, let me know what you think.


desc = scope['descriptors']
try:
self.log.debug("Running query: %s" % query)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be helpful to have a debug line here that also reports the output of the query to help troubleshoot possible issues in the future. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add something like this but was worried it might be too much output to the log so omitted from the final commit. If you think it's a good idea I'll add it back.

@@ -22,6 +23,9 @@ class TestPgbouncer(AgentCheckTest):
CHECK_NAME = 'pgbouncer'

def test_checks(self):
pgbouncer_version = os.environ.get('FLAVOR_VERSION', 'latest')
pgbouncer_deprecated = pgbouncer_version in ('1.5', '1.7')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing the tests! I'm not sure we want to deprecate older versions of PGBouncer. Could we rename this to something like 'pgbouncer_pre18` or something similar here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I was trying to use something other than "old" and "new" 😅

@sj26
Copy link
Contributor Author

sj26 commented Jan 18, 2018

(Separated commits for review purposes, probably should be squashed for merge.)

Copy link
Member

@hush-hush hush-hush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sj26,

Thanks a lot for this amazing PR ! I added a few comment but nothing blocking.

Could you also update the CHANGELOG (creating a new Unreleased section) and the manifest.json and postgres/datadog_checks/postgres/__init__.py to update the check version. You can find a exemple here

if row[0] == self.DB_NAME:
continue
except pg.Error as e:
self.log.exception("Not all metrics may be available")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still log the PG exception, it might be useful. What do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using log.exception will print the exception details in the log per the docs, e.g.:

checks.pgbouncer: ERROR: Not all metrics may be available
Traceback (most recent call last):
  File "/Users/sj26/Projects/datadog/integrations-core/pgbouncer/datadog_checks/pgbouncer/pgbouncer.py", line 106, in _collect_stats
    cursor.execute(query)
  File "/Users/sj26/Projects/datadog/integrations-core/venv/lib/python2.7/site-packages/psycopg2/extras.py", line 144, in execute
    return super(DictCursor, self).execute(query, vars)
ProgrammingError: unrecognized configuration parameter "pools"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, thanks !


assert len(row) == len(cols) + len(desc)
tags = list(instance_tags)
tags += ["%s:%s" % (tag, row[column]) for (column, tag) in descriptors]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test if row contains column to avoid raising an exception if the column in descriptor does not exist.
Something like:

 tags += ["%s:%s" % (tag, row[column]) for (column, tag) in descriptors if column in row]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are pretty important and should never be missing. If they are missing then it should probably explode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the conditional anyway as protection against column name changes 👍

('total_received', ('pgbouncer.stats.bytes_received_per_second', RATE)),
('total_sent', ('pgbouncer.stats.bytes_sent_per_second', RATE)),
('total_query_time', ('pgbouncer.stats.total_query_time', GAUGE)),
('avg_req', ('pgbouncer.stats.avg_req', GAUGE)),
('total_xact_time', ('pgbouncer.stats.total_transaction_time', GAUGE)), # >= 1.8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a rate since the documentation say:

Total number of microseconds spent by **pgbouncer** when connected to PostgreSQL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was copying total_query_time — if this should change, so should that. Probably makes sense that all the "total" are rates, and all the "avg" are gauges. Will update.

except pg.Error as e:
self.log.error("Connection error: %s" % str(e))
self.log.exception("Connection error")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also e is never used which make the flake8 fail (that's why the travis tests are red).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove!

@hush-hush hush-hush added this to the 5.22 milestone Jan 19, 2018
@hush-hush hush-hush merged commit cabaec2 into DataDog:master Jan 19, 2018
@hush-hush
Copy link
Member

Thanks a lot for this PR @sj26 !

It will be available in the next version of the agent 5.22.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants