Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close KafkaProducer explicitly to ensure errors are caught. #63

Open
anjackson opened this issue Dec 6, 2019 · 2 comments
Open

Close KafkaProducer explicitly to ensure errors are caught. #63

anjackson opened this issue Dec 6, 2019 · 2 comments
Assignees

Comments

@anjackson
Copy link
Contributor

After noticing a small crawl gap:

2019-12-05 10:00:12,826 INFO: Worker Worker(salt=769972995, workers=1, host=ingest, username=root, pid=28327) was stopped. Shutting down Keep-Alive thread
2019-12-05 10:00:12,835 INFO:
===== Luigi Execution Summary =====

Scheduled 2 tasks of which:
* 1 complete ones were encountered:
    - 1 w3act.CrawlFeed(frequency=all, feed=npld, date=2019-12-05T10)
* 1 ran successfully:
    - 1 crawl.LaunchCrawls(frequency=all, date=2019-12-05T10, kafka_server=crawler05.n45.bl.uk:9094, queue=fc.tocrawl.npld)

This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====

2019-12-05 10:00:12,966 INFO: Closing the Kafka producer with 0 secs timeout.
2019-12-05 10:00:12,966 INFO: Proceeding to force close the producer since pending requests could not be completed within timeout 0.
2019-12-05 10:00:12,968 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=9) with base offset None and error IllegalStateError: Producer is closed forcefully..
2019-12-05 10:00:12,969 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=13) with base offset None and error IllegalStateError: Producer is closed forcefully..
2019-12-05 10:00:12,969 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=1) with base offset None and error IllegalStateError: Producer is closed forcefully..
2019-12-05 10:00:12,970 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=10) with base offset None and error IllegalStateError: Producer is closed forcefully..
2019-12-05 10:00:12,970 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=5) with base offset None and error IllegalStateError: Producer is closed forcefully..
2019-12-05 10:00:12,972 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=8) with base offset None and error IllegalStateError: Producer is closed forcefully..
2019-12-05 10:00:12,973 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=2) with base offset None and error IllegalStateError: Producer is closed forcefully..
2019-12-05 10:00:12,974 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=14) with base offset None and error IllegalStateError: Producer is closed forcefully..
2019-12-05 10:00:12,974 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=9) with base offset None and error IllegalStateError: Producer is closed forcefully..
2019-12-05 10:00:12,977 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=10) with base offset None and error IllegalStateError: Producer is closed forcefully..
....
2019-12-05 10:00:13,009 INFO: <BrokerConnection node_id=1 host=192.168.45.15:9094 <connected> [IPv4 ('192.168.45.15', 9094)]>: Closing connection.
2019-12-05 10:00:13,010 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=8) with base offset -1 and error Cancelled: <BrokerConnection node_id=1 host=192.168.45.15:9094 <connected> [IPv4 ('192.168.45.15', 9094)]>.
2019-12-05 10:00:13,010 WARNING: Batch is already closed -- ignoring batch.done()
2019-12-05 10:00:13,010 ERROR: Error processing errback
Traceback (most recent call last):
  File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/kafka/future.py", line 79, in _call_backs
    f(value)
  File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/kafka/producer/sender.py", line 186, in _failed_produce
    self._complete_batch(batch, error, -1, None)
  File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/kafka/producer/sender.py", line 244, in _complete_batch
    self._accumulator.deallocate(batch)
  File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/kafka/producer/record_accumulator.py", line 506, in deallocate
    self._incomplete.remove(batch)
  File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/kafka/producer/record_accumulator.py", line 586, in remove
    return self._incomplete.remove(batch)
KeyError: <kafka.producer.record_accumulator.ProducerBatch object at 0x7f116f7a67f0>
2019-12-05 10:00:13,016 WARNING: Produced messages to topic-partition TopicPartition(topic='fc.tocrawl.npld', partition=14) with base offset -1 and error Cancelled: <BrokerConnection node_id=1 host=192.168.45.15:9094 <connected> [IPv4 ('192.168.45.15', 9094)]>.
2019-12-05 10:00:13,016 WARNING: Batch is already closed -- ignoring batch.done()
....
2019-12-05 10:00:13,098 INFO: Kafka producer closed

i.e. the KafkaProducer is being closed after the task has completed, but has not flushed.

anjackson added a commit that referenced this issue Dec 6, 2019
@anjackson
Copy link
Contributor Author

Implemented a flush and added support for a close to crawl-streams. Will update and trial.

@anjackson anjackson self-assigned this Dec 6, 2019
@anjackson
Copy link
Contributor Author

Hmm, code for crawl-streams includes a refreshDepth in the launcher. Need to check if that's ready to go as far as Heritrix is concerned and then review the launcher/enqueue implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant