You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2020-03-19 20:39:21,135 INFO: Getting http://cdx.api.wa.bl.uk/data-heritrix?q=ty
pe%3Aurlquery+url%3Ahttps%253A%252F%252Ftwitter.com%252Fi%252Fjs_inst%253Fc_name
%253Dui_metrics+limit%3A25000+offset%3A1650000
2020-03-19 20:39:32,662 ERROR: [pid 3787] Worker Worker(salt=086575122, workers=
4, host=access, username=root, pid=3665) failed access.index.CheckCdxIndexFor
WARC(input_file=/heritrix/output/frequent-npld/20200227133858/warcs/BL-NPLD-WEBR
ENDER-frequent-npld-20200227133858-20200311061302718-00540-0o4xyiz2.warc.gz, cdx
_service=http://cdx.api.wa.bl.uk/data-heritrix, sampling_rate=500, max_records_t
o_check=10)
Traceback (most recent call last):
File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/urllib3/res
ponse.py", line 397, in _error_catcher
yield
File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/urllib3/res
ponse.py", line 479, in read
data = self._fp.read(amt)
File "/usr/local/lib/python3.6/http/client.py", line 449, in read
n = self.readinto(b)
File "/usr/local/lib/python3.6/http/client.py", line 493, in readinto
n = self.fp.readinto(b)
File "/usr/local/lib/python3.6/socket.py", line 586, in readinto
return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/luigi/worke
r.py", line 199, in run
new_deps = self._run_get_new_deps()
File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/luigi/worke
r.py", line 141, in _run_get_new_deps
task_gen = self.task.run()
File "/root/github/ukwa-manage-p3/tasks/access/update_cdx_index.py", line 137,
in run
for record in reader:
File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/warcio/arch
iveiterator.py", line 119, in _iterate_records
self.read_to_end()
File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/warcio/arch
iveiterator.py", line 212, in read_to_end
b = self.record.raw_stream.read(BUFF_SIZE)
File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/warcio/limi
treader.py", line 28, in read
buff = self.stream.read(length)
File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/warcio/buff
eredreaders.py", line 162, in read
self._fillbuff()
File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/warcio/buff
eredreaders.py", line 111, in _fillbuff
data = self.stream.read(block_size)
File "/root/github/ukwa-manage-p3/tasks/access/update_cdx_index.py", line 105,
in read
chunk = self.stream.read(size)
File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/urllib3/res
ponse.py", line 496, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "/usr/local/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/root/github/ukwa-manage-p3/venv/lib/python3.6/site-packages/urllib3/res
ponse.py", line 415, in _error_catcher
raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: ConnectionResetError(104,
'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by pe
er'))
2020-03-19 20:39:32,990 INFO: Informed scheduler that task access.index.CheckC
dxIndexForWARC_http___cdx_api_w__heritrix_output_10_797ecab879 has status FA
ILED
The text was updated successfully, but these errors were encountered:
The text was updated successfully, but these errors were encountered: