Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"OSError 24 Too many open files" when uploading many small files on S3 #1402

Closed
several27 opened this issue Dec 27, 2017 · 7 comments
Closed
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made.

Comments

@several27
Copy link

Hey all, I got stuck when trying to upload about 50 mln small files to s3.

This is a small code sample as to how I'm trying to approach the problem.

for path, data in files:
    s3 = boto3.resource('s3').Bucket('bucket')
    s3.Object(path).put(Body=data)

After around 1000 correctly uploaded files, I'm getting the following error:

File "/usr/local/lib/python3.5/dist-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py", line 544, in urlopen
....
botocore.vendored.requests.exceptions.ConnectionError: ('Connection aborted.', OSError(24, 'Too many open files'))

I'm guessing boto3 under the hood uses some sort of connections pool, which is clearly used, but for some reason not working as expected. Is there a better way of doing this? Thanks in advance!

@jamesls
Copy link
Member

jamesls commented Jan 2, 2018

What's creating the files iterator? I would double check that the files are being closed after reading their contents.

The default settings for the connection pool will only open 10 connections, so it seems unlikely to be caused from socket creation.

I would recommend using the transfer manager in boto3 (http://boto3.readthedocs.io/en/latest/guide/s3.html#uploads) or potentially using the low level s3transfer package, which is a dependency of boto3. The s3transfer package will give you a nice performance boost, which would help if you're uploading 50 million files.

Let me know if you have any more questions.

@jamesls jamesls added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jan 2, 2018
@several27
Copy link
Author

Hey @jamesls, thanks for the response.

The code is obviously more complicated than this iterator, was just trying to make a point. I'm actually iterating over str paths and then for each path I open the file using the with open(path,'rb') as _in:. So I'm pretty the sure files are closed.

Later today, I'll try to write a full proof of concept code that you can run to reproduce the error.

@stealthycoin
Copy link
Contributor

Closing due to inactivity.

@aalvrz
Copy link

aalvrz commented Jul 27, 2018

@jamesls

I am experiencing a similar issue when uploading many files to S3. Here is my error trace:

[Errno 24] Too many open files: OSError
Traceback (most recent call last):
File "/var/task/chalice/app.py", line 989, in __call__
return self.func(event_obj)
File "/var/task/app.py", line 213, in generate_pdf_images
s3_uploader.upload_files()
File "/var/task/chalicelib/aws/s3.py", line 40, in upload_files
process.start()
File "/var/lang/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/var/lang/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/var/lang/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/var/lang/lib/python3.6/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/var/lang/lib/python3.6/multiprocessing/popen_fork.py", line 66, in _launch
parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files

I am using the transfer manager like this:

def _upload_to_s3(self, file_path, key, conn):
    self._bucket.upload_file(file_path, key)
    conn.close()

I am iterating over hundreds of file paths, yet I am not explicitely opening any of those files. I simply pass the path to the upload_file function.

However I am using Python multiprocessing module to speed up these uploads:

for file_path, key in self._files_keys.items():
    parent_conn, child_conn = Pipe()
    parent_connections.append(parent_conn)
    process = Process(
        target=self._upload_to_s3, args=(file_path, key, child_conn)
    )
    processes.append(process)

Could multiprocessing be the reason I am getting this error?

@yeachan-park
Copy link

@jamesls

I am experiencing a similar issue when uploading many files to S3. Here is my error trace:

[Errno 24] Too many open files: OSError
Traceback (most recent call last):
File "/var/task/chalice/app.py", line 989, in __call__
return self.func(event_obj)
File "/var/task/app.py", line 213, in generate_pdf_images
s3_uploader.upload_files()
File "/var/task/chalicelib/aws/s3.py", line 40, in upload_files
process.start()
File "/var/lang/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/var/lang/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/var/lang/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/var/lang/lib/python3.6/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/var/lang/lib/python3.6/multiprocessing/popen_fork.py", line 66, in _launch
parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files

I am using the transfer manager like this:

def _upload_to_s3(self, file_path, key, conn):
    self._bucket.upload_file(file_path, key)
    conn.close()

I am iterating over hundreds of file paths, yet I am not explicitely opening any of those files. I simply pass the path to the upload_file function.

However I am using Python multiprocessing module to speed up these uploads:

for file_path, key in self._files_keys.items():
    parent_conn, child_conn = Pipe()
    parent_connections.append(parent_conn)
    process = Process(
        target=self._upload_to_s3, args=(file_path, key, child_conn)
    )
    processes.append(process)

Could multiprocessing be the reason I am getting this error?

Did you manage to fix this issue? Could you share a workaround?

@littlehomelessman
Copy link

I encountered similar error, any workaround?

@HassanAthmani
Copy link

HassanAthmani commented Oct 7, 2023

You can try raising the open file limit of your OS by using the command:
ulimit -n 4096

Plus check out the following Stack Overflow solution:
https://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made.
Projects
None yet
Development

No branches or pull requests

7 participants