-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sqllab]More granular sqllab logging #5736
[sqllab]More granular sqllab logging #5736
Conversation
a5a1ce5
to
77d947b
Compare
|
||
|
||
def upgrade(): | ||
op.alter_column('query', 'start_running_time', nullable=False, new_column_name='time_at_db') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table tends to have hanging locks and is hard to alter. I've had issue every time I've had to alter this model where it just completely lock up everything. We'd all need to schedule downtime to allow for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use some sort of tracing instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mistercrunch Can you flesh this out a little more or provide an example? I'll be happy to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean just using statsd.timing(DURATION)
a7fc4c0
to
b389888
Compare
f7e5a4f
to
8bf7603
Compare
f4b25d2
to
0e2af38
Compare
Codecov Report
@@ Coverage Diff @@
## master #5736 +/- ##
==========================================
- Coverage 63.75% 63.74% -0.01%
==========================================
Files 374 374
Lines 23320 23330 +10
Branches 2608 2608
==========================================
+ Hits 14867 14872 +5
- Misses 8440 8445 +5
Partials 13 13
Continue to review full report at Codecov.
|
60c7fbb
to
eeeca47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the statsD
key it seems like they should potentially by more descriptive, i.e., start with the sqllab
prefix so rather than time_pending
it could be something like sqllab.query.time_pending
. You can look at Datadog for examples of how other applications define their keys.
superset/sql_lab.py
Outdated
db_engine_spec.execute(cursor, query.executed_sql, async_=True) | ||
logging.info('Handling cursor') | ||
db_engine_spec.handle_cursor(cursor, query, session) | ||
logging.info('Fetching data: {}'.format(query.to_dict())) | ||
data = db_engine_spec.fetch_data(cursor, query.limit) | ||
stats_logger.timing('time_in_database', utils.now_as_float() - query_start_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a better (more descriptive) name than time_in_database
? May time_executing_query
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOne
eeeca47
to
ef7c4fe
Compare
That makes sense. I updated the names. |
ef7c4fe
to
61f63fe
Compare
61f63fe
to
2989409
Compare
* quote hive column names (apache#5368) * create db migration * use stats_logger timing * trigger build (cherry picked from commit 9a4bba4)
* quote hive column names (apache#5368) * create db migration * use stats_logger timing * trigger build
* quote hive column names (apache#5368) * create db migration * use stats_logger timing * trigger build
This PR improves how sqllab logs its performance by logging at a more granular level primarily for asynchronous queries.
It logs the
(for asynchroous queries) time spent pending
time spent in the database
time spent writing to results_backend
time spent time spent reading from results_backend
This PR depends on #5844
@mistercrunch @john-bodley