Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusion of monit_wrapper::default stops executing rest of the recipe if a monit service cannot be started #21

Open
amalakar opened this issue Jun 12, 2016 · 2 comments

Comments

@amalakar
Copy link
Contributor

Hi,

I have noticed that whenever spark recipe has a bug in it (like incorrect port number etc), future runs of the recipe with the fix fails because it won't go beyond the monit_wrapper::default which checks if the service is running. This causes the fix not to be applied, which again fails in the next run. Causing a chicken and egg problem. I had to manually delete (rm /etc/monit/conf.d/spark-standalone-worker.conf) to make the chef recipe run.

recipe: monit_wrapper::default

  • chef_gem[waitutil] action install (up to date)

    Recipe Compile Error in /var/chef/cache/cookbooks/analytics-spark-deploy/recipes/query-spark-worker-next-staging.rb

    RuntimeError

    Timed out waiting to get the status of spark-standalone-worker (currently "Does not exist")

    Cookbook Trace:

    /var/chef/cache/cookbooks/monit_wrapper/libraries/status.rb:65:in `get_stable_monit_service_status'
    /var/chef/cache/cookbooks/monit_wrapper/libraries/status.rb:87:in `monit_service_running?'
    /var/chef/cache/cookbooks/monit_wrapper/libraries/status.rb:101:in `monit_service_exists_and_running?'
    /var/chef/cache/cookbooks/apache_spark/recipes/spark-standalone-worker.rb:104:in `block in from_file'
    /var/chef/cache/cookbooks/apache_spark/recipes/spark-standalone-worker.rb:95:in `from_file'
    /var/chef/cache/cookbooks/ooyala-apache-spark/recipes/spark-worker.rb:2:in `from_file'
    /var/chef/cache/cookbooks/analytics-spark-deploy/recipes/query-spark-worker-next-staging.rb:4:in `from_file'

Relevant File Content:


  /var/chef/cache/cookbooks/monit_wrapper/libraries/status.rb:

   58:        def get_stable_monit_service_status(service_name)
   59:          start_time = Time.now
   60:          timeout_sec = 120
   61:          logged_message = false
   62:          status = get_monit_summary[service_name]
   63:          until monit_status_stable?(status)
   64:            if Time.now - start_time >= timeout_sec
   65>>             raise "Timed out waiting to get the status of #{service_name} " +
   66:                    "(currently #{status.inspect})"
   67:            end
   68:            unless logged_message
   69:              Chef::Log.info('Waiting for Monit to initialize the status of service ' +
   70:                             "#{service_name} for up to #{timeout_sec} seconds")
   71:              logged_message = true
   72:            end
   73:            sleep(1)
   74:            status = get_monit_summary[service_name]`
@amalakar
Copy link
Contributor Author

amalakar commented Jul 1, 2016

The workaround is:
rm /etc/monit/conf.d/spark-standalone-*; monit reload; chef-client

@strong-code
Copy link

Ran into the same problem (weirdly never encountered while testing in Vagrant though, only on a remote server). 👍 for the workaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants