Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Successful jobs exit with error code #4

Open
cwbeitel opened this issue Jan 19, 2018 · 0 comments
Open

Successful jobs exit with error code #4

cwbeitel opened this issue Jan 19, 2018 · 0 comments

Comments

@cwbeitel
Copy link
Owner

Currently all runs that reach the end of main() have an ungraceful system exit. E.g.

class TestRun(unittest.TestCase):

    def test_non_distributed_runs(self):
      os.environ['TF_CONFIG'] = '{"cluster":{"master":["pybullet-kuka-ff-c2f81017-master-v3k7-0:2222"]},"task":{"type":"master","index":0},"environment":"cloud"}'
      tmp_logdir = '/tmp/agents-logs/test/non-distributed-2'
      sys.argv.extend(["--steps=100",
                       "--sync_replicas=False",
                       "--num_agents=1",
                       "--logdir=%s" % tmp_logdir])
      tf.app.run()

Yields

======================================================================
ERROR: test_non_distributed_runs (__main__.TestRun)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/app/trainer/task_test.py", line 34, in test_non_distributed_runs
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
SystemExit

----------------------------------------------------------------------
Ran 1 test in 21.829s

This is problematic for various reasons including jobs on kubflow re-starting thinking the job is failed when it's actually just finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant