Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log offset is outside the translation range with low retention and segment time #10221

Closed
oliverfrye opened this issue Apr 20, 2023 · 4 comments
Labels
kind/bug Something isn't working

Comments

@oliverfrye
Copy link

oliverfrye commented Apr 20, 2023

Version & Environment

Redpanda version: v23.1.7 (rev 8ce86e4)
OS version: Debian 11 (bullseye)

What went wrong?

I can no longer consume from my topic (i.e. it seems to be "frozen" and the consumer just hangs), and the following log message is repeatedly printed to the journal.

Apr 20 14:41:14 oliver-debian-vm rpk[2667960]: WARN  2023-04-20 14:41:14,202 [shard 2] kafka - connection_context.cc:451 - Error processing request: std::runtime_error (ntp {kafka/queue/0}: log offset 1693 is outside the translation range (starting at 1694))

What should have happened instead?

I can consume from the topic as normal, and no warning is printed to the journal.

How to reproduce the issue?

  1. Create a new topic with the following configuration
rpk topic create queue
rpk topic alter-config queue --set retention.ms=1000
rpk topic alter-config queue --set segment.ms=1000
  1. Produce some messages to the topic
  2. Wait for a segment to be created (as evidenced by Creating new segment [...] in the journal)
  3. Produce more messages to the topic - it seems to be important that you produce a number of messages (in our case, it's about 1500 messages in less than a second). In my testing, I haven't been able to reproduce the issue if I just produce one message at a time using something like rpk topic produce.
  4. Try to consume (using rpk topic consume queue
  5. Observe that no messages are available (even if you continue to produce messages while consuming) and the warning message printed to the journal

Additional information

Please attach any relevant logs, backtraces, or metric charts.

Apr 20 14:48:25 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:48:25,923 [shard 2] offset_translator - ntp: {kafka/queue/0} - offset_translator.cc:105 - resetting offset translation state
Apr 20 14:48:25 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:48:25,944 [shard 2] offset_translator - ntp: {kafka/queue/0} - offset_translator.cc:156 - started, state: {base offset/delta: {-9223372036854775808}/0, map size: 1, last delta: 0}, highest_known_offset: -9223372036854775808
Apr 20 14:48:25 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:48:25,944 [shard 2] raft - [group_id:91, {kafka/queue/0}] consensus.cc:1261 - Read bootstrap state: data_seen 0 config_seen 0 eol false commit 0 term 0 prev_idx 0 prev_term 0 config_tracker -9223372036854775808 commit_base_tracker -9223372036854775808 configurations {}
Apr 20 14:48:25 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:48:25,944 [shard 2] raft - [group_id:91, {kafka/queue/0}] consensus.cc:1262 - Current log offsets: {start_offset:-9223372036854775808, committed_offset:-9223372036854775808, committed_offset_term:-9223372036854775808, dirty_offset:-9223372036854775808, dirty_offset_term:-9223372036854775808, last_term_start_offset:-9223372036854775808}
Apr 20 14:48:25 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:48:25,944 [shard 2] raft - [group_id:91, {kafka/queue/0}] consensus.cc:1286 - Truncating configurations at -9223372036854775808
Apr 20 14:48:25 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:48:25,965 [shard 2] offset_translator - ntp: {kafka/queue/0} - offset_translator.cc:226 - synced with log, state: {base offset/delta: {-9223372036854775808}/0, map size: 1, last delta: 0}, highest_known_offset: -9223372036854775808
Apr 20 14:48:25 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:48:25,965 [shard 2] raft - [group_id:91, {kafka/queue/0}] consensus.cc:1386 - started raft, log offsets: {start_offset:-9223372036854775808, committed_offset:-9223372036854775808, committed_offset_term:-9223372036854775808, dirty_offset:-9223372036854775808, dirty_offset_term:-9223372036854775808, last_term_start_offset:-9223372036854775808}, term: 0, configuration: {current: {voters: {{id: {1}, revision: {414}}}, learners: {}}, old:{nullopt}, revision: 414, update: {nullopt}, version: 4, brokers: {{id: 1, kafka_advertised_listeners: {{:{host: 0.0.0.0, port: 9092}}}, rpc_address: {host: 0.0.0.0, port: 33145}, rack: {nullopt}, properties: {cores 8, mem_available 31, disk_available 249}}}}
Apr 20 14:48:25 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:48:25,976 [shard 2] raft - [group_id:91, {kafka/queue/0}] vote_stm.cc:264 - becoming the leader term:1
Apr 20 14:48:25 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:48:25,976 [shard 2] storage - segment.cc:663 - Creating new segment /var/lib/redpanda/data/kafka/queue/0_414/0-1-v1.log
Apr 20 14:48:25 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:48:25,997 [shard 2] raft - [group_id:91, {kafka/queue/0}] vote_stm.cc:279 - became the leader term: 1
Apr 20 14:48:57 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:48:57,048 [shard 2] storage - segment.cc:663 - Creating new segment /var/lib/redpanda/data/kafka/queue/0_414/1693-1-v1.log
Apr 20 14:49:03 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:49:03,560 [shard 2] storage - disk_log_impl.cc:1240 - Removing "/var/lib/redpanda/data/kafka/queue/0_414/0-1-v1.log" (remove_prefix_full_segments, {offset_tracker:{term:1, base_offset:0, committed_offset:1692, dirty_offset:1692}, compacted_segment=0, finished_self_compaction=0, generation={97}, reader={/var/lib/redpanda/data/kafka/queue/0_414/0-1-v1.log, (107320 bytes)}, writer=nullptr, cache={cache_size=48}, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/queue/0_414/0-1-v1.base_index, offsets:{0}, index:{header_bitflags:0, base_offset:{0}, max_offset:{1692}, base_timestamp:{timestamp: 1681966131716}, max_timestamp:{timestamp: 1681966132717}, batch_timestamps_are_monotonic:1, index(3,3,3)}, step:32768, needs_persistence:0}})
Apr 20 14:49:07 oliver-debian-vm rpk[2667960]: INFO  2023-04-20 14:49:07,079 [shard 2] storage - segment.cc:663 - Creating new segment /var/lib/redpanda/data/kafka/queue/0_414/3385-1-v1.log
Apr 20 14:49:11 oliver-debian-vm rpk[2667960]: WARN  2023-04-20 14:49:11,298 [shard 2] kafka - connection_context.cc:451 - Error processing request: std::runtime_error (ntp {kafka/queue/0}: log offset 1693 is outside the translation range (starting at 1694))
Apr 20 14:49:11 oliver-debian-vm rpk[2667960]: WARN  2023-04-20 14:49:11,531 [shard 2] kafka - connection_context.cc:451 - Error processing request: std::runtime_error (ntp {kafka/queue/0}: log offset 1693 is outside the translation range (starting at 1694))
Apr 20 14:49:11 oliver-debian-vm rpk[2667960]: WARN  2023-04-20 14:49:11,958 [shard 2] kafka - connection_context.cc:451 - Error processing request: std::runtime_error (ntp {kafka/queue/0}: log offset 1693 is outside the translation range (starting at 1694))
Apr 20 14:49:12 oliver-debian-vm rpk[2667960]: WARN  2023-04-20 14:49:12,955 [shard 2] kafka - connection_context.cc:451 - Error processing request: std::runtime_error (ntp {kafka/queue/0}: log offset 1693 is outside the translation range (starting at 1694))
Apr 20 14:49:15 oliver-debian-vm rpk[2667960]: WARN  2023-04-20 14:49:15,321 [shard 2] kafka - connection_context.cc:451 - Error processing request: std::runtime_error (ntp {kafka/queue/0}: log offset 1693 is outside the translation range (starting at 1694))

JIRA Link: CORE-1284

@oliverfrye oliverfrye added the kind/bug Something isn't working label Apr 20, 2023
@jbguerraz
Copy link

jbguerraz commented Jul 19, 2023

We're experiencing the same behavior (tons of "is outside the translation range" logs). we'll report if we find any thing.

@jbguerraz
Copy link

eventually related #8978 ? maybe @andrwng could say if there is a chance it relates ?

@andrwng
Copy link
Contributor

andrwng commented Jul 19, 2023

@jbguerraz #8978 was an issue found when Redpanda uploads to remote storage data. Are you seeing it in that context? Do you have tiered storage enabled?

Otherwise, #11403 looks relevant

@oliverfrye sorry for only getting to this now! I believe this issue should be addressed in the next release (#11403 also notes a low retention)

@dotnwat
Copy link
Member

dotnwat commented Apr 5, 2024

Closing this. v23.1 is EOL and there have been several fixes to offset translation which may have fixed this issue. Please re-open if you experience this agian.

@dotnwat dotnwat closed this as completed Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants