Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add %z for %(asctime)s to fix timezone for logs on UI #24373

Merged
merged 6 commits into from
Jun 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions airflow/config_templates/airflow_local_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,16 @@
# settings.py and cli.py. Please see AIRFLOW-1455.
LOG_LEVEL: str = conf.get_mandatory_value('logging', 'LOGGING_LEVEL').upper()


# Flask appbuilder's info level log is very verbose,
# so it's set to 'WARN' by default.
FAB_LOG_LEVEL: str = conf.get_mandatory_value('logging', 'FAB_LOGGING_LEVEL').upper()

LOG_FORMAT: str = conf.get_mandatory_value('logging', 'LOG_FORMAT')

LOG_FORMATTER_CLASS: str = conf.get_mandatory_value(
'logging', 'LOG_FORMATTER_CLASS', fallback='airflow.utils.log.timezone_aware.TimezoneAware'
)

COLORED_LOG_FORMAT: str = conf.get_mandatory_value('logging', 'COLORED_LOG_FORMAT')

COLORED_LOG: bool = conf.getboolean('logging', 'COLORED_CONSOLE_LOG')
Expand All @@ -60,10 +63,13 @@
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'airflow': {'format': LOG_FORMAT},
'airflow': {
'format': LOG_FORMAT,
'class': LOG_FORMATTER_CLASS,
},
'airflow_coloured': {
'format': COLORED_LOG_FORMAT if COLORED_LOG else LOG_FORMAT,
'class': COLORED_FORMATTER_CLASS if COLORED_LOG else 'logging.Formatter',
'class': COLORED_FORMATTER_CLASS if COLORED_LOG else LOG_FORMATTER_CLASS,
},
},
'filters': {
Expand Down
6 changes: 6 additions & 0 deletions airflow/config_templates/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -633,6 +633,12 @@
type: string
example: ~
default: "%%(asctime)s %%(levelname)s - %%(message)s"
- name: log_formatter_class
description: ~
version_added: 2.3.3
type: string
example: ~
default: "airflow.utils.log.timezone_aware.TimezoneAware"
- name: task_log_prefix_template
description: |
Specify prefix pattern like mentioned below with stream handler TaskHandlerWithCustomFormatter
Expand Down
1 change: 1 addition & 0 deletions airflow/config_templates/default_airflow.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,7 @@ colored_formatter_class = airflow.utils.log.colored_log.CustomTTYColoredFormatte
# Format of Log line
log_format = [%%(asctime)s] {{%%(filename)s:%%(lineno)d}} %%(levelname)s - %%(message)s
simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s
log_formatter_class = airflow.utils.log.timezone_aware.TimezoneAware

# Specify prefix pattern like mentioned below with stream handler TaskHandlerWithCustomFormatter
# Example: task_log_prefix_template = {{ti.dag_id}}-{{ti.task_id}}-{{execution_date}}-{{try_number}}
Expand Down
4 changes: 4 additions & 0 deletions airflow/utils/log/colored_log.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ class CustomTTYColoredFormatter(TTYColoredFormatter):
traceback.
"""

# copy of airflow.utils.log.timezone_aware.TimezoneAware
default_time_format = '%Y-%m-%d %H:%M:%S%z'
default_msec_format = '%s %03dms'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding milliseconds to this PR? I don't think the logs showed ms before...

Copy link
Contributor Author

@rino0601 rino0601 Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbovenzi

The original value before override in this part is %s,%03d.
If left as is, the file will have something like 2022-06-14 06:00:00-0600,123 which is a bit awkward. Also, it's in a format that moments.js doesn't understand.
It would be ideal to have a format like 2022-06-14 06:00:00,123-0600, but that would be a very difficult because it would have to modify the standard library.

There are 3 ways to solve this problem:

  • Avoid specifying milliseconds in the first place. by doing that we can leave the same level of information both in the UI and in the file (which is currently not. file is verbose than UI). It is possible by assigning default_msec_format to None , but this requires python3.9 or higher. Dropping 3.7 and 3.8 because of this small change is ridiculous. so this is not an option. (if there is a way to consume an argument in the %-format but not print it, we can do this without dropping 3.7 and 3.8. but I don't think such a method exists.)
  • In order to stick current behavior, modify the regex to match including the rear part of datetime, and then not use the rear part in replaceAll. In this case, the file side has more detailed information, anyway I think -0600 123ms is easier to understand than -0600,123.
  • (which I chose) To provide the same level of information to both the UI and the file, the regex didn't match the milliseconds part. so they could exposed as they is. By doing this, UI shows more information that wasn't there before, but I didn't think that would be a problem.

Previously, milliseconds was not shown because it was discarded from the format of moments.js added on August 3, 2018. While searching for this, I found that the most recently added format (presumably used in grid views) uses milliseconds. Based on this, I guessed that there is a demand for milliseconds.

image

If you think displaying milliseconds is too verbose, I'll edit the code to use the second method. However, the regular expression become a bit more complicated than it is now, and code of replaceAll become getting the milliseconds value and then ignoring it. This can be awkward code for anyone who hasn't read the conversation in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should send the date to the UI that moment can parse. But that is annoying if it's non-trivial. We can go with this for now. But we should get that single log parsing function soon so the gird view keeps this in consideration too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbovenzi

After I git rebase onto main:

Cursor_및_chungking_-Grid-_Airflow

A lot of thoughts went through my mind when I saw this screen. 😢

It would be desirable to fix it in this PR because it takes more effort to find and process it later. so I've spend some times try to fix it. it seems that if I fix the regular expression in grid/details/content/taskInstance/Logs/utils.js I can fix it for now, but if I do, as you concerned, we will have two log parsers.

Then I stopped trying, to ask you comment. Also, my front-end knowledge is frozen in the days of react15, so I need some time to review if I understand the grid code properly.


Anyway,
I wonder if logs in the grid view feature will be released in 2.3 or 2.4.
If it's going to be released in 2.4, so if this PR also has to go out in 2.4 too, I'd like to know in advance.

I'm using 1.10.15 at work, and dozens of instances are stuck with that version. I recently migrated one instance to 2.3.2 as a pilot. Then I discovered this time zone problem. Therefore, other instances are waiting for the result of this PR.

My co-worker's previous contribution was classified as a minor release, it took 5 months to release. If this PR is going to be classified as minor release, I would like to say "migrate now (with workaround config)" to my colleagues because I can guess that it will be released in a few months.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead and do the simple fix for now. I'm happy to help out combine them into a single parser in a subsequent PR. I think for this change we can target 2.3.3, so it shouldn't be months.


def __init__(self, *args, **kwargs):
kwargs["stream"] = sys.stdout or kwargs.get("stream")
kwargs["log_colors"] = DEFAULT_COLORS
Expand Down
39 changes: 39 additions & 0 deletions airflow/utils/log/timezone_aware.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import logging


class TimezoneAware(logging.Formatter):
"""
Override `default_time_format` and `default_msec_format` to specify utc offset.

utc offset is the matter, without it, time conversion could be wrong. With this Formatter, `%(asctime)s`
will be formatted containing utc offset. (e.g. 2022-06-12 13:00:00+0000 123ms)

moments.js couldn't parse milliseconds comes after utc offset, so it would be ideal `%(asctime)s`
formatted with millisecond comes before utc offset in th first place. (e.g 2022-06-12 13:00:00.123+0000)
But python standard lib doesn't support format like that.

Omitting milliseconds is possible by assigning `default_msec_format` to `None`. But this requires
python3.9 or higher, so we can't omit milliseconds until dropping support 3.8 or under.

Therefore, to use in moments.js, formatted `%(asctime)s` has to be re-formatted by javascript side.
"""

default_time_format = '%Y-%m-%d %H:%M:%S%z'
default_msec_format = '%s %03dms'
Original file line number Diff line number Diff line change
Expand Up @@ -48,17 +48,39 @@ export const parseLogs = (data, timezone, logLevelFilter, fileSourceFilter) => {
}

const regExp = /\[(.*?)\] \{(.*?)\}/;
// e.g) '2022-06-15 10:30:06,020' or '2022-06-15 10:30:06+0900'
const dateRegex = /(\d{4}[./-]\d{2}[./-]\d{2} \d{2}:\d{2}:\d{2})((,\d{3})|([+-]\d{4} \d{3}ms))/;
// above regex is a kind of duplication of 'dateRegex'
// in airflow/www/static/js/tl_log.js
const matches = line.match(regExp);
let logGroup = '';
if (matches) {
// Replace UTC with the local timezone.
// Replace system timezone with user selected timezone.
const dateTime = matches[1];
[logGroup] = matches[2].split(':');
if (dateTime && timezone) {
const localDateTime = moment.utc(dateTime).tz(timezone).format(defaultFormatWithTZ);
parsedLine = line.replace(dateTime, localDateTime);
}

// e.g) '2022-06-15 10:30:06,020' or '2022-06-15 10:30:06+0900 123ms'
const dateMatches = dateTime?.match(dateRegex);
if (dateMatches) {
const [date, msecOrUTCOffset] = [dateMatches[1], dateMatches[2]];
if (msecOrUTCOffset.startsWith(',')) { // e.g) date='2022-06-15 10:30:06', msecOrUTCOffset=',020'
// for backward compatibility. (before 2.3.3)
// keep previous behavior if utcoffset not found. (consider it UTC)
//
if (dateTime && timezone) { // dateTime === fullMatch
const localDateTime = moment.utc(dateTime).tz(timezone).format(defaultFormatWithTZ);
parsedLine = line.replace(dateTime, localDateTime);
}
} else {
// e.g) date='2022-06-15 10:30:06', msecOrUTCOffset='+0900 123ms'
// (formatted by airflow.utils.log.timezone_aware.TimezoneAware) (since 2.3.3)
const [utcoffset, threeDigitMs] = msecOrUTCOffset.split(' ');
const msec = threeDigitMs.replace(/\D+/g, ''); // drop 'ms'
// e.g) datetime='2022-06-15 10:30:06.123+0900'
const localDateTime = moment(`${date}.${msec}${utcoffset}`).tz(timezone).format(defaultFormatWithTZ);
parsedLine = line.replace(dateTime, localDateTime);
}
}
[logGroup] = matches[2].split(':');
fileSources.add(logGroup);
}
if (!fileSourceFilter || fileSourceFilter === logGroup) {
Expand Down
21 changes: 19 additions & 2 deletions airflow/www/static/js/ti_log.js
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,9 @@ function autoTailingLog(tryNumber, metadata = null, autoTailing = false) {

// Detect urls and log timestamps
const urlRegex = /http(s)?:\/\/[\w.-]+(\.?:[\w.-]+)*([/?#][\w\-._~:/?#[\]@!$&'()*+,;=.%]+)?/g;
const dateRegex = /\d{4}[./-]\d{2}[./-]\d{2} \d{2}:\d{2}:\d{2},\d{3}/g;
const dateRegex = /(\d{4}[./-]\d{2}[./-]\d{2} \d{2}:\d{2}:\d{2})((,\d{3})|([+-]\d{4} \d{3}ms))/g;
// above regex is a kind of duplication of 'dateRegex'
// in airflow/www/static/js/grid/details/content/taskinstance/Logs/utils.js

res.message.forEach((item) => {
const logBlockElementId = `try-${tryNumber}-${item[0]}`;
Expand All @@ -120,7 +122,22 @@ function autoTailingLog(tryNumber, metadata = null, autoTailing = false) {
const escapedMessage = escapeHtml(item[1]);
const linkifiedMessage = escapedMessage
.replace(urlRegex, (url) => `<a href="${url}" target="_blank">${url}</a>`)
.replaceAll(dateRegex, (date) => `<time datetime="${date}+00:00" data-with-tz="true">${formatDateTime(`${date}+00:00`)}</time>`);
.replaceAll(dateRegex, (dateMatches, date, msecOrUTCOffset) => {
// e.g) '2022-06-15 10:30:06,020' or '2022-06-15 10:30:06+0900 123ms'
if (msecOrUTCOffset.startsWith(',')) { // e.g) date='2022-06-15 10:30:06', msecOrUTCOffset=',020'
// for backward compatibility. (before 2.3.3)
// keep previous behavior if utcoffset not found.
//
return `<time datetime="${dateMatches}+00:00" data-with-tz="true">${formatDateTime(`${dateMatches}+00:00`)}</time>`;
}
// e.g) date='2022-06-15 10:30:06', msecOrUTCOffset='+0900 123ms'
// (formatted by airflow.utils.log.timezone_aware.TimezoneAware) (since 2.3.3)
const [utcoffset, threeDigitMs] = msecOrUTCOffset.split(' ');
const msec = threeDigitMs.replace(/\D+/g, ''); // drop 'ms'
const dateTime = `${date}.${msec}${utcoffset}`; // e.g) datetime='2022-06-15 10:30:06.123+0900'
//
return `<time datetime="${dateTime}" data-with-tz="true">${formatDateTime(`${dateTime}`)}</time>`;
});
logBlock.innerHTML += `${linkifiedMessage}\n`;
});

Expand Down
23 changes: 23 additions & 0 deletions newsfragments/24373.significant.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Added new config ``[logging]log_formatter_class`` to fix timezone display for logs on UI

If you are using a custom Formatter subclass in your ``[logging]logging_config_class``, please inherit from ``airflow.utils.log.timezone_aware.TimezoneAware`` instead of ``logging.Formatter``.
For example, in your ``custom_config.py``:

.. code-block:: python

from airflow.utils.log.timezone_aware import TimezoneAware

# before
class YourCustomFormatter(logging.Formatter):
...


# after
class YourCustomFormatter(TimezoneAware):
...


AIRFLOW_FORMATTER = LOGGING_CONFIG["formatters"]["airflow"]
AIRFLOW_FORMATTER["class"] = "somewhere.your.custom_config.YourCustomFormatter"
# or use TimezoneAware class directly. If you don't have custom Formatter.
AIRFLOW_FORMATTER["class"] = "airflow.utils.log.timezone_aware.TimezoneAware"