Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing-appender 0.2.1 panics while rolling in debug mod #1987

Closed
clia opened this issue Mar 12, 2022 · 5 comments · Fixed by #1989
Closed

tracing-appender 0.2.1 panics while rolling in debug mod #1987

clia opened this issue Mar 12, 2022 · 5 comments · Fixed by #1989

Comments

@clia
Copy link

clia commented Mar 12, 2022

Bug Report

Version

├── tracing v0.1.31
│ ├── tracing-attributes v0.1.19 (proc-macro)
│ └── tracing-core v0.1.22
├── tracing-appender v0.2.1
│ └── tracing-subscriber v0.3.9
│ ├── tracing v0.1.31 ()
│ ├── tracing-core v0.1.22 (
)
│ └── tracing-log v0.1.2
│ └── tracing-core v0.1.22 ()
├── tracing-log v0.1.2 (
)
└── tracing-subscriber v0.3.9 (*)

Platform

Darwin clia-mbp-2.local 21.3.0 Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_X86_64 x86_64

Crates

tracing-appender

Description

I tried this code:

    let file_appender = tracing_appender::rolling::minutely(directory, file_name);
    let (file_writer, guard) = tracing_appender::non_blocking(file_appender);

    let offset = UtcOffset::UTC;
    let timer = OffsetTime::new(
        offset, 
        format_description!("[year]-[month]-[day] [hour]:[minute]:[second].[subsecond digits:3]"),
    );

Then it panics while rolling file.

RUST_BACKTRACE=full

thread '<unnamed>' panicked at 'if we have &mut access to the appender, no other thread can have advanced the timestamp...', /Users/clia/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-appender-0.2.1/src/rolling.rs:176:13
stack backtrace:
   0:        0x107f0b701 - std::backtrace_rs::backtrace::libunwind::trace::hfd8fbd06638442ca
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:        0x107f0b701 - std::backtrace_rs::backtrace::trace_unsynchronized::hc1cd6876eeb6d18a
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:        0x107f0b701 - std::sys_common::backtrace::_print_fmt::hdee2a5e2d8205c55
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/sys_common/backtrace.rs:66:5
   3:        0x107f0b701 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h55afb1013a03bec9
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/sys_common/backtrace.rs:45:22
   4:        0x107f3018b - core::fmt::write::hf346347c8bd39f92
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/core/src/fmt/mod.rs:1190:17
   5:        0x107f0388e - std::io::Write::write_fmt::h3ee7a608c2b4575b
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/io/mod.rs:1657:15
   6:        0x107f0db00 - std::sys_common::backtrace::_print::h8493d033a9894b49
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/sys_common/backtrace.rs:48:5
   7:        0x107f0db00 - std::sys_common::backtrace::print::hdada1c68af67b168
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/sys_common/backtrace.rs:35:9
   8:        0x107f0db00 - std::panicking::default_hook::{{closure}}::hd379186b8ed80e63
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/panicking.rs:295:22
   9:        0x107f0d7e4 - std::panicking::default_hook::hb6f0fa309b9519b8
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/panicking.rs:314:9
  10:        0x107f0e25e - std::panicking::rust_panic_with_hook::h5485f9739aa6a2f1
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/panicking.rs:698:17
  11:        0x107666e3d - std::panicking::begin_panic::{{closure}}::ha1324d25c5d0d075
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/panicking.rs:617:9
  12:        0x107666c88 - std::sys_common::backtrace::__rust_end_short_backtrace::hd41ad3d88fb6b2d8
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/sys_common/backtrace.rs:138:18
  13:        0x107f42f85 - std::panicking::begin_panic::h7cb933f2aaaf0450
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/panicking.rs:616:12
  14:        0x10744e4a6 - <tracing_appender::rolling::RollingFileAppender as std::io::Write>::write::hcc627a5619d2b70a
                               at /Users/clia/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-appender-0.2.1/src/rolling.rs:176:13
  15:        0x1072f6f06 - std::io::Write::write_all::h7c5531b1075768af
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/io/mod.rs:1517:19
  16:        0x1072a13a7 - tracing_appender::worker::Worker<T>::handle_recv::he52450f8592e1ed6
                               at /Users/clia/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-appender-0.2.1/src/worker.rs:33:17
  17:        0x1072a156f - tracing_appender::worker::Worker<T>::work::h1994e579b2159111
                               at /Users/clia/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-appender-0.2.1/src/worker.rs:58:32
  18:        0x1072a19d2 - tracing_appender::worker::Worker<T>::worker_thread::{{closure}}::hc1fc66cbed01b8aa
                               at /Users/clia/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-appender-0.2.1/src/worker.rs:73:23
  19:        0x1073d390b - std::sys_common::backtrace::__rust_begin_short_backtrace::ha549c90baef05e80
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/sys_common/backtrace.rs:122:18
  20:        0x1072a6dc2 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::h38c6655a5aba6a7c
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/thread/mod.rs:498:17
  21:        0x1073fdab5 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h5ab111b5c8a21af3
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/core/src/panic/unwind_safe.rs:271:9
  22:        0x1071cb7ca - std::panicking::try::do_call::hb1f16a5d3503ac71
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/panicking.rs:492:40
  23:        0x1071cbbed - ___rust_try
  24:        0x1071cb512 - std::panicking::try::h5556e8e9d5e73d4c
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/panicking.rs:456:19
  25:        0x1071e4645 - std::panic::catch_unwind::heead4878bbeecd25
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/panic.rs:137:14
  26:        0x1072a5e7e - std::thread::Builder::spawn_unchecked_::{{closure}}::h03d17e644985cf0a
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/thread/mod.rs:497:30
  27:        0x1071f4e81 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hfc834c1d6e3953cd
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/core/src/ops/function.rs:227:5
  28:        0x107f16e67 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h87f91e975c83ebc5
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/alloc/src/boxed.rs:1854:9
  29:        0x107f16e67 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hfd5e77ce3fe72807
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/alloc/src/boxed.rs:1854:9
  30:        0x107f16e67 - std::sys::unix::thread::Thread::new::thread_start::h44050836aaa9f63b
                               at /rustc/68369a041cea809a87e5bd80701da90e0e0a4799/library/std/src/sys/unix/thread.rs:108:17
  31:     0x7ff80c8784f4 - __pthread_start

In advance_date fn I print some time info:

    fn advance_date(&self, now: OffsetDateTime) -> bool {
        let next_date = self
            .rotation
            .next_date(&now)
            .map(|date| date.unix_timestamp() as usize)
            .unwrap_or(0);
        eprintln!("now: {}", now.unix_timestamp() as usize);
        eprintln!("next_date: {}", next_date);
        eprintln!("self.next_date: {:?}", self.next_date);

        self.next_date
            .compare_exchange(
                now.unix_timestamp() as usize,
                next_date,
                Ordering::AcqRel,
                Ordering::Acquire,
            )
            .is_ok()
    }

And it outputs:

now: 1647066781
next_date: 1647066840
self.next_date: 1647066780

now is not equal to self.next_date.

Wish this information is helpful!

@clia
Copy link
Author

clia commented Mar 12, 2022

And, this code:

impl io::Write for RollingFileAppender {
    fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
        let now = OffsetDateTime::now_utc().to_offset(self.offset);
        let writer = self.writer.get_mut();
        if self.state.should_rollover(now) {
            let _did_cas = self.state.advance_date(now);
            debug_assert!(_did_cas, "if we have &mut access to the appender, no other thread can have advanced the timestamp...");
            self.state.refresh_writer(now, writer);
        }
        writer.write(buf)
    }

Should self.state.refresh_writer(now, writer); be before let _did_cas = self.state.advance_date(now);?

After advance_date, the debug_assert in refresh_writer may fail and cause a panic.

@hawkw
Copy link
Member

hawkw commented Mar 12, 2022

Looking at this code, I believe the advance_date function is incorrect. If I'm reading it correctly, I believe the CAS fails because the rollover condition is now >= next_date, but the "current" value that's passed to the CAS is now, which may be greater than next_date.

hawkw added a commit that referenced this issue Mar 12, 2022
This reproduces the bug described in #1987

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this issue Mar 12, 2022
This branch fixes a bug in `RollingFileAppender` where
`compare_exchange` is called with the _current timestamp_ as the
"current value" of the next rollover timestamp, rather than the actual
current value. This means that if the current time is *greater* than the
rollover time, the CAS will fail and the appender will never roll over
--- currently, rolling only functions correctly if we try to write to
the file at *precisely* the rollover time. This means that, in practice,
the appender almost never rolls over.

I've fixed this by ensuring that the compare-and-swap is always
performed with the current value of the atomic, rather than the current
timestamp. `should_rollover` now returns an `Option` with the current
value in it to indicate it's time to roll over, so that we perform the
CAS with the value loaded in `should_rollover`.

Fixes #1987
@hansonchar
Copy link

I think I encountered the same panic after leaving the server/app idle for like an hour (or few?), running on both x86_64 OSX and x86_64 CentOS 7:

thread '<unnamed>' panicked at 'if we have &mut access to the appender, no other thread can have advanced the timestamp...', /home/centos/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-appender-0.2.1/src/rolling.rs:176:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Here is my setup:

#[tokio::main]
async fn main() {
    // initialize tracing
    let file_appender: RollingFileAppender =
        tracing_appender::rolling::hourly("/path/to/logdir", "logfile");
    let (non_blocking, _guard) = tracing_appender::non_blocking(file_appender);
    let level: Level = Level::from_str("DEBUG").unwrap();
    tracing_subscriber::fmt()
        .with_writer(non_blocking)
        .with_max_level(level)
        .init();
    // ...
}

hansonchar pushed a commit to hansonchar/tracing that referenced this issue Mar 12, 2022
This branch fixes a bug in `RollingFileAppender` where
`compare_exchange` is called with the _current timestamp_ as the
"current value" of the next rollover timestamp, rather than the actual
current value. This means that if the current time is *greater* than the
rollover time, the CAS will fail and the appender will never roll over
--- currently, rolling only functions correctly if we try to write to
the file at *precisely* the rollover time. This means that, in practice,
the appender almost never rolls over.

I've fixed this by ensuring that the compare-and-swap is always
performed with the current value of the atomic, rather than the current
timestamp. `should_rollover` now returns an `Option` with the current
value in it to indicate it's time to roll over, so that we perform the
CAS with the value loaded in `should_rollover`.

Fixes tokio-rs#1987
@hawkw
Copy link
Member

hawkw commented Mar 13, 2022

#1989 will fix this!

hawkw added a commit that referenced this issue Mar 13, 2022
This branch fixes a bug in `RollingFileAppender` where
`compare_exchange` is called with the _current timestamp_ as the
"current value" of the next rollover timestamp, rather than the actual
current value. This means that if the current time is *greater* than the
rollover time, the CAS will fail and the appender will never roll over
--- currently, rolling only functions correctly if we try to write to
the file at *precisely* the rollover time. This means that, in practice,
the appender almost never rolls over.

I've fixed this by ensuring that the compare-and-swap is always
performed with the current value of the atomic, rather than the current
timestamp. `should_rollover` now returns an `Option` with the current
value in it to indicate it's time to roll over, so that we perform the
CAS with the value loaded in `should_rollover`.

I've also added a test that exercises a file rollover using a mock
time. This would have caught the bug described in #1987.

Fixes #1987

Signed-off-by: Eliza Weisman <[email protected]>
@hansonchar
Copy link

Testing so far is positive - the pull request seems to have fixed this issue. Thanks!

davidbarsky pushed a commit that referenced this issue Mar 17, 2022
This branch fixes a bug in `RollingFileAppender` where
`compare_exchange` is called with the _current timestamp_ as the
"current value" of the next rollover timestamp, rather than the actual
current value. This means that if the current time is *greater* than the
rollover time, the CAS will fail and the appender will never roll over
--- currently, rolling only functions correctly if we try to write to
the file at *precisely* the rollover time. This means that, in practice,
the appender almost never rolls over.

I've fixed this by ensuring that the compare-and-swap is always
performed with the current value of the atomic, rather than the current
timestamp. `should_rollover` now returns an `Option` with the current
value in it to indicate it's time to roll over, so that we perform the
CAS with the value loaded in `should_rollover`.

I've also added a test that exercises a file rollover using a mock
time. This would have caught the bug described in #1987.

Fixes #1987

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this issue Mar 17, 2022
This branch fixes a bug in `RollingFileAppender` where
`compare_exchange` is called with the _current timestamp_ as the
"current value" of the next rollover timestamp, rather than the actual
current value. This means that if the current time is *greater* than the
rollover time, the CAS will fail and the appender will never roll over
--- currently, rolling only functions correctly if we try to write to
the file at *precisely* the rollover time. This means that, in practice,
the appender almost never rolls over.

I've fixed this by ensuring that the compare-and-swap is always
performed with the current value of the atomic, rather than the current
timestamp. `should_rollover` now returns an `Option` with the current
value in it to indicate it's time to roll over, so that we perform the
CAS with the value loaded in `should_rollover`.

I've also added a test that exercises a file rollover using a mock
time. This would have caught the bug described in #1987.

Fixes #1987

Signed-off-by: Eliza Weisman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants