Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rm_stm returning invalid_lso as last stable offset after node restart #11130

Closed
graphcareful opened this issue May 31, 2023 · 0 comments
Closed
Assignees
Labels
area/transactions kind/bug Something isn't working

Comments

@graphcareful
Copy link
Contributor

Version & Environment

Found on dev while testing locally

What went wrong?

After a restart I noticed that the rm_stm is returning 0 for max_collectible_offset. When i look closer i see this condition is getting hit.

    auto last_applied = last_applied_offset();
    if (unlikely(
          !_bootstrap_committed_offset
          || last_applied < _bootstrap_committed_offset.value())) {
        // To preserve the monotonicity of LSO from a client perspective,
        // we return this unknown offset marker that is translated to
        // an appropriate retry-able Kafka error code for clients.
        vlog(
          _ctx_log.info,
          "Returning invalid_lso: {}",
          _bootstrap_committed_offset.has_value());
        return model::invalid_lso;
    }

It seems like this condition only occurs if the stm has consumed to the true end before the node restart. In the above snippet bootstrap_committed_offset is null because apply() hasn't yet been called due to the fact that raft::state_machine_next is already at the end of the log because the stm sucessfully snapshotted the final offset before it crashed.

Therefore when the last_stable_offset is queried 0 is returned instead of the actual value.

What should happen?

The actual last_stable_offset the stm has processed should be returned instead of 0.

How to reproduce the issue?

Shutdown a node after the rm_stm is up to date, then query last_stable_offset.

@graphcareful graphcareful added the kind/bug Something isn't working label May 31, 2023
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 1, 2023
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 7, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 8, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 8, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 9, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 13, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 14, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 16, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 16, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 20, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 20, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 20, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 20, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 20, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 20, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 20, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jun 21, 2023
- This stm has a conditional in its last_stable_offset() method that
returns an invalid offset in the case it hasn't completed bootstrapping.

- The issue is that this bootstrap phase isn't considered finished after
bootstrapping from apply_snapshot(). This would cause other stms to
pause thinking the rm_stm had work to do at an offset at 0, causing
that other stm to timeout and fail processing of said event.

- Solution is simple, to set `_boostrap_committed_offset`within the
`apply_snapshot()` method

- Fixes: redpanda-data#11131

- Fixes: redpanda-data#11130
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/transactions kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants