Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memfix.rb kills download process #348

Closed
padmassun opened this issue Jun 8, 2018 · 8 comments
Closed

memfix.rb kills download process #348

padmassun opened this issue Jun 8, 2018 · 8 comments

Comments

@padmassun
Copy link
Member

padmassun commented Jun 8, 2018

Please see #348 (comment) for more details.

The file is being sent using Rail's send_file https://github.com/NREL/OpenStudio-server/blob/pad_nrcan_2.4.3/server/app/controllers/analyses_controller.rb#L732

nginx.conf file: https://github.com/NREL/OpenStudio-server/blob/pad_nrcan_2.4.3/docker/server/nginx.conf

The download file is about 930 MB, and it is being killed after approximately 10 to 12 minutes.

From nginx error log from var/log/nginx/error.log

App 7073 stdout:
App 6913 stderr: [ 2018-06-08 19:42:44.8914 7073/0x00000002851b30(Worker 1) utils.rb:87 ]: *** Exception IOError (stream closed) (process 7073, thread 0x00000002851b30(Worker 1)):
App 6913 stderr:        from /usr/local/lib/ruby/gems/2.2.0/gems/passenger-5.0.25/src/ruby_supportlib/phusion_passenger/request_handler/thread_handler.rb:136:in `accept'
App 6913 stderr:        from /usr/local/lib/ruby/gems/2.2.0/gems/passenger-5.0.25/src/ruby_supportlib/phusion_passenger/request_handler/thread_handler.rb:136:in `accept_and_process_next_request'
App 6913 stderr:        from /usr/local/lib/ruby/gems/2.2.0/gems/passenger-5.0.25/src/ruby_supportlib/phusion_passenger/request_handler/thread_handler.rb:113:in `main_loop'
App 6913 stderr:        from /usr/local/lib/ruby/gems/2.2.0/gems/passenger-5.0.25/src/ruby_supportlib/phusion_passenger/request_handler.rb:416:in `block (3 levels) in start_threads'
App 6913 stderr:        from /usr/local/lib/ruby/gems/2.2.0/gems/passenger-5.0.25/src/ruby_supportlib/phusion_passenger/utils.rb:113:in `block in create_thread_and_abort_on_exception'
App 6913 stderr: [ 2018-06-08 19:42:44.8921 7073/0x00000002891460(HTTP helper worker) utils.rb:87 ]: *** Exception IOError (stream closed) (process 7073, thread 0x00000002891460(HTTP helper worker)):
App 6913 stderr:        from /usr/local/lib/ruby/gems/2.2.0/gems/passenger-5.0.25/src/ruby_supportlib/phusion_passenger/request_handler/thread_handler.rb:136:in `accept'
App 6913 stderr:        from /usr/local/lib/ruby/gems/2.2.0/gems/passenger-5.0.25/src/ruby_supportlib/phusion_passenger/request_handler/thread_handler.rb:136:in `accept_and_process_next_request'
App 6913 stderr:        from /usr/local/lib/ruby/gems/2.2.0/gems/passenger-5.0.25/src/ruby_supportlib/phusion_passenger/request_handler/thread_handler.rb:113:in `main_loop'
App 6913 stderr:        from /usr/local/lib/ruby/gems/2.2.0/gems/passenger-5.0.25/src/ruby_supportlib/phusion_passenger/request_handler.rb:431:in `block (2 levels) in start_threads'
App 6913 stderr:        from /usr/local/lib/ruby/gems/2.2.0/gems/passenger-5.0.25/src/ruby_supportlib/phusion_passenger/utils.rb:113:in `block in create_thread_and_abort_on_exception'
[ 2018-06-08 19:42:47.0085 6645/7fd86cedc700 age/Cor/App/Poo/AnalyticsCollection.cpp:105 ]: Process (pid=6939, group=/opt/openstudio/server/public (docker)) no longer exists! Detaching it from the pool.
[ 2018-06-08 19:42:47.0085 6645/7fd86cedc700 age/Cor/App/Poo/AnalyticsCollection.cpp:105 ]: Process (pid=7073, group=/opt/openstudio/server/public (docker)) no longer exists! Detaching it from the pool.
[ 2018-06-08 19:42:47.0086 6645/7fd86cedc700 age/Cor/CoreMain.cpp:794 ]: Checking whether to disconnect long-running connections for process 6939, application /opt/openstudio/server/public (docker)
[ 2018-06-08 19:42:47.0091 6645/7fd86cedc700 age/Cor/CoreMain.cpp:794 ]: Checking whether to disconnect long-running connections for process 7073, application /opt/openstudio/server/public (docker)
App 7390 stdout:
App 7414 stdout:
[ 2018-06-08 19:43:26.7031 6645/7fd864e05700 age/Cor/Con/ForwardResponse.cpp:189 ]: [Client 3-1] Application sent EOF before finishing response body: 292753984 bytes already read, 976077852 bytes expected
[ 2018-06-08 19:43:26.7031 6645/7fd864e05700 Ser/Server.h:937 ]: [Client 3-1] Disconnecting client with error: application did not send a complete response

nginx access log from /var/log/nginx/access.log

10.255.0.2 - - [08/Jun/2018:19:25:57 +0000] "GET /favicon.ico HTTP/1.1" 200 0 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0"
10.255.0.2 - - [08/Jun/2018:19:30:10 +0000] "GET /analyses/71cb4da9-4631-4c5a-8226-bc9e3839302c/download_BTAP_results_zip HTTP/1.1" 200 282753274 "http://ec2-XX-XXX-XX-XXX.compute-1.amazonaws.com/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0"
10.255.0.2 - - [08/Jun/2018:19:30:43 +0000] "GET /analyses/71cb4da9-4631-4c5a-8226-bc9e3839302c/download_BTAP_results_zip HTTP/1.1" 200 33050 "http://ec2-XX-XXX-XX-XXX.compute-1.amazonaws.com/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0"
10.255.0.2 - - [08/Jun/2018:19:43:27 +0000] "GET /analyses/71cb4da9-4631-4c5a-8226-bc9e3839302c/download_BTAP_results_zip HTTP/1.1" 200 292619770 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36"
10.255.0.2 - - [08/Jun/2018:19:51:54 +0000] "GET / HTTP/1.1" 200 11839 "-" "Mozilla/5.0 (Windows NT 6.2;en-US) AppleWebKit/537.32.36 (KHTML, live Gecko) Chrome/57.0.3070.92 Safari/537.32"
`
@nllong
Copy link
Member

nllong commented Jun 11, 2018

Hi @padmassun. I thought you submitted a fix for this off your branch? Did I miss something?

#347

Nick

@padmassun
Copy link
Member Author

padmassun commented Jun 11, 2018

Hello @nllong ,

Re #347
docker/deployment/scripts/aws_osserver_init.sh

docker/server/nginx.conf

  • This was an attempt to fix large file download stopping halfway.

I thought the downloads were not working because the expire was not set to max, But, it seems that it was not the issue.

After digging a bit deeper, I found that the passenger was the main culprit, which is killing process which are responsible for sending the download file.

@padmassun
Copy link
Member Author

padmassun commented Jun 11, 2018

Hello,

It seems the main culprit is the memfix.rb from https://github.com/NREL/OpenStudio-server/blob/develop/docker/server/memfix.rb#L3

The PID killed by the memfix.rb corresponds to the Passenger's message. Please see the log output below.

From /opt/openstudio/server/log/mem_saver.log

Killing pid 316 using 2333704 kB of memory
Killing pid 3279 using 505432 kB of memory
Killing pid 2863 using 1403240 kB of memory

From var/log/nginx/

[ 2018-06-11 16:55:10.0104 207/7fa8ef35f700 age/Cor/App/Poo/AnalyticsCollection.cpp:105 ]: Process (pid=3279, group=/opt/openstudio/server/public (docker)) no longer exists! Detaching it from the pool.

@padmassun padmassun changed the title Phusion Passenger kills download process memfix.rb kills download process Jun 11, 2018
padmassun added a commit that referenced this issue Jun 11, 2018
@padmassun
Copy link
Member Author

padmassun commented Jun 11, 2018

The memfix.rb kills processes if any of the ruby process in the web container exceeds the given limit of 500000 (not sure about the units). This is checked every 901 seconds (as stated in the memfix.rb file).

The rail's send_data uses memory greater than the specified limit of 500000 and it gets killed by the memfix.rb. This process is killed because it is sending a large file, which consumes a bit more memory.

As the process is killed, this triggers phusion passenger, which identifies that the PID does not exist and logs it in the error.log file, var/log/nginx/error.log

So I have commented out the memfix.rb for now, which is a protection against Out of Memory (OOM) issue. I am looking into methods to reduce memory consumption while transferring/streaming larger files.

@nllong
Copy link
Member

nllong commented Jun 11, 2018

well done with the investigation.

@henryHorsey FYI

@padmassun
Copy link
Member Author

For now I have commented out the part of the code (72e8098) which calls the memfix.rb as a temporary workaround. So this issue is atleast solved on the nrcan's branch.

padmassun added a commit that referenced this issue Jun 12, 2018
padmassun added a commit that referenced this issue Jul 10, 2018
@nllong nllong added P2 and removed P2 labels Jul 24, 2018
@anyaelena anyaelena added the P2 label Jul 31, 2018
@brianlball
Copy link
Contributor

removing this for P+W in this branch
https://github.com/NREL/OpenStudio-server/tree/memfix-pw

This was referenced Nov 21, 2019
@brianlball
Copy link
Contributor

#527
#528

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants