Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task failing with logmon failed to create fifo for extracting logs #5803

Closed
fprovencher opened this issue Jun 10, 2019 · 15 comments · Fixed by #5864
Closed

Task failing with logmon failed to create fifo for extracting logs #5803

fprovencher opened this issue Jun 10, 2019 · 15 comments · Fixed by #5864

Comments

@fprovencher
Copy link

fprovencher commented Jun 10, 2019

Nomad version

Affect versions:

  • Nomad v0.9.1 (4b2bdbd)
  • Nomad v0.9.2
  • Nomad v0.9.3
  • Nomad v0.9.0

The issue is NOT present in Nomad v0.8.7.

Operating system and Environment details

Windows 10 Pro, 64bit

Issue

Task failing with logmon error:
logmon: rpc error: code = Unknown desc = failed to create fifo for extracting logs: open //./pipe/kapacitor-153973e1.stdout: Access is denied.

2019-06-10 10_44_36-Nomad

Reproduction steps

Job file

job "tick" {
  datacenters = ["dc1"]

  group "tick" {
	task "influxdb" {
	  driver = "raw_exec"

	  config {
		command = "C:\\tick\\bin\\influxd.exe"
		args = [
		  "-config",
		  "${NOMAD_TASK_DIR}/influxdb.conf"
		]
	  }

	  template {
		destination = "local/influxdb.conf"
		left_delimiter = "<<"
		right_delimiter = ">>"
		data = <<EOF
reporting-disabled = false
bind-address = "<<env `NOMAD_ADDR_influxrpc`>>"

[meta]
  dir = "<<env `NOMAD_TASK_DIR`|js>>\\meta"

[data]
  dir = "<<env `NOMAD_TASK_DIR`|js>>\\data"
  wal-dir = "<<env `NOMAD_TASK_DIR`|js>>\\wal"

[http]
  enabled = true
  bind-address = "<<env `NOMAD_ADDR_influxhttp`>>"
  https-enabled = false
  unix-socket-enabled = false

[[udp]]
  # Default UDP for Telegraf
  enabled = true
  bind-address = "<<env `NOMAD_ADDR_influxudp`>>" # the bind address
  database = "telegraf" # Name of the database that will be written to
  batch-size = 5000 # will flush if this many points get buffered
  batch-timeout = "1s" # will flush at least this often even if the batch-size is not reached
  batch-pending = 10 # number of batches that may be pending in memory
  read-buffer = 0 # UDP read buffer size, 0 means to use OS default

EOF
	  }

	  resources {
		network {
		  # Used for backup and restore
		  port "influxrpc" {
			# static = 8088
		  }
		  port "influxudp" {
			# static = 8089
		  }
		  port "influxhttp" {
			# HACK: Must be static for chronograph to work.
			static = 8086
		  }
		  mbits = 5
		}
	  }

	  service {
		name = "influxrpc"
		port = "influxrpc"

		check {
		  type = "tcp"
		  interval = "20s"
		  timeout = "2s"
		}
	  }

	  service {
		name = "influxhttp"
		port = "influxhttp"

		check {
		  type = "http"
		  path = "/ping"
		  interval = "20s"
		  timeout = "2s"
		}
	  }

	  service {
		name = "influxudp"
		port = "influxudp"

	  }
	  
	}

	task "kapacitor" {
	  driver = "raw_exec"

	  config {
		command = "C:\\tick\\bin\\kapacitord.exe"
		args = [
		  "-config",
		  "${NOMAD_TASK_DIR}\\kapacitor.conf"
		]
	  }

	  template {
		destination = "local/kapacitor.conf"
		left_delimiter = "<<"
		right_delimiter = ">>"
		data = <<EOF
httppost = []
data_dir = "<<env `NOMAD_TASK_DIR`|js>>\\data"
skip-config-overrides = false
default-retention-policy = ""

[http]
  bind-address = "<<env `NOMAD_ADDR_kapacitor`>>"
  auth-enabled = false
  log-enabled = true
  write-tracing = false
  pprof-enabled = false
  https-enabled = false
  shutdown-timeout = "10s"
  shared-secret = ""

[replay]
  dir = "<<env `NOMAD_TASK_DIR`|js>>\\replay"

[storage]
  boltdb = "<<env `NOMAD_TASK_DIR`|js>>\\kapacitor.db"

[task]
  dir = "<<env `NOMAD_TASK_DIR`|js>>\\tasks"
  snapshot-interval = "1m0s"

<<with service `influxhttp`>><<with index . 0>>
[[influxdb]]
  enabled = true
  name = "default"
  default = false
	<<if .Address | contains ":" >>
  urls = ["http://[<<.Address>>]:<<.Port>>"]
	<<else>>
	urls = ["http://<<.Address>>:<<.Port>>"]
	<<end>>
  insecure-skip-verify = false
  disable-subscriptions = false
  subscription-protocol = "http"
  [influxdb.excluded-subscriptions]
    _kapacitor = ["autogen"]
<<end>><<end>>

[logging]
  file = "STDERR"
  level = "INFO"

[config-override]
  enabled = true

EOF
	  }

	  resources {
		network {
		  port "kapacitor" {
			# HACK: Must be static for chronograph to work.
			static = 9092
		  }
		  mbits = 5
		}
	  }

	  service {
		name = "kapacitor"
		port = "kapacitor"

		check {
		  type = "tcp"
		  interval = "20s"
		  timeout = "2s"
		}
	  }
	}

	task "chronograf" {
	  driver = "raw_exec"

	  config {
		command = "C:\\tick\\bin\\chronograf.exe"
		args = [
		  "/host:${NOMAD_IP_chronograf}",
		  "/port:${NOMAD_PORT_chronograf}",
		  "/bolt-path:${NOMAD_TASK_DIR}/chronograf-v1.db",
		]
	  }

	   	template {
	   	  left_delimiter = "<<"
	   	  right_delimiter = ">>"
	   	  destination = "local/chronograf.env"
	   	  data = <<EOF
	   <<with service `influxhttp`>><<with index . 0>>
		 <<if .Address | contains ":">>
	   INFLUXDB_URL=http://[<<.Address>>]:<<.Port>>
	   <<else>>
		 INFLUXDB_URL=http://<<.Address>>:<<.Port>>
		 <<end>>
		 <<with service `kapacitor`>><<with index . 0>>
		 <<if .Address | contains ":">>
	   KAPACITOR_URL=http://[<<.Address>>]:<<.Port>>
		 <<else>>
		 KAPACITOR_URL=http://<<.Address>>:<<.Port>>
		 <<end>>
	   <<end>><<end>><<end>><<end>>
	   EOF
	   	  env = true
	   	}

	  resources {
		network {
		  port "chronograf" {
		  }
		  mbits = 5
		}
	  }

	  service {
		name = "chronograf"
		port = "chronograf"
		check {
		  type = "http"
		  path = "/"
		  interval = "10s"
		  timeout  = "2s"		
		}
	  }
	}
  }
}

Nomad Client logs (if appropriate)

 2019-06-10T14:19:04.201-0400 [INFO ] client.alloc_runner.task_runner.task_hook.logmon.nomad.exe: opening fifo: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=kapacitor @module=logmon path=//./pipe/kapacitor-2cd52e07.stdout timestamp=2019-06-10T14:19:04.201-0400
    2019-06-10T14:19:04.202-0400 [INFO ] client.alloc_runner.task_runner.task_hook.logmon.nomad.exe: opening fifo: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=kapacitor @module=logmon path=//./pipe/kapacitor-2cd52e07.stderr timestamp=2019-06-10T14:19:04.201-0400
    2019/06/10 14:19:04.203209 [INFO] (runner) creating new runner (dry: false, once: false)
    2019/06/10 14:19:04.203209 [INFO] (runner) creating watcher
    2019/06/10 14:19:04.203705 [INFO] (runner) starting
    2019-06-10T14:19:04.206-0400 [INFO ] client.alloc_runner.task_runner.task_hook.logmon.nomad.exe: opening fifo: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=influxdb @module=logmon path=//./pipe/influxdb-128b35b7.stdout timestamp=2019-06-10T14:19:04.206-0400
    2019-06-10T14:19:04.212-0400 [INFO ] client.alloc_runner.task_runner.task_hook.logmon.nomad.exe: opening fifo: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=influxdb @module=logmon path=//./pipe/influxdb-128b35b7.stderr timestamp=2019-06-10T14:19:04.212-0400
    2019-06-10T14:19:04.212-0400 [INFO ] client.alloc_runner.task_runner.task_hook.logmon.nomad.exe: opening fifo: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=chronograf @module=logmon path=//./pipe/chronograf-e7084e4d.stdout timestamp=2019-06-10T14:19:04.212-0400
    2019-06-10T14:19:04.213-0400 [INFO ] client.alloc_runner.task_runner.task_hook.logmon.nomad.exe: opening fifo: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=chronograf @module=logmon path=//./pipe/chronograf-e7084e4d.stderr timestamp=2019-06-10T14:19:04.213-0400
    2019/06/10 14:19:04.213625 [INFO] (runner) creating new runner (dry: false, once: false)
    2019/06/10 14:19:04.214121 [INFO] (runner) creating watcher
    2019/06/10 14:19:04.214121 [INFO] (runner) starting
    2019/06/10 14:19:04.214121 [INFO] (runner) creating new runner (dry: false, once: false)
    2019/06/10 14:19:04.214121 [INFO] (runner) creating watcher
    2019/06/10 14:19:04.214617 [INFO] (runner) starting
    2019/06/10 14:19:04.217605 [INFO] (runner) rendered "(dynamic)" => "C:\\Users\\Owner\\.genvid\\nomad\\data\\alloc\\3167c038-6a1c-fb41-2476-34bb7c621bb6\\kapacitor\\local\\kapacitor.conf"
    2019-06-10T14:19:04.218-0400 [WARN ] client.alloc_runner.task_runner: some environment variables not available for rendering: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=kapacitor keys=
    2019-06-10T14:19:04.218-0400 [INFO ] client.driver_mgr.raw_exec: starting task: driver=raw_exec driver_cfg="{Command:C:\repos\fast\samples\tick\bin\kapacitord.exe Args:[-config C:\Users\Owner\.genvid\nomad\data\alloc\3167c038-6a1c-fb41-2476-34bb7c621bb6\kapacitor\local\kapacitor.conf]}"
    2019/06/10 14:19:04.221563 [INFO] (runner) rendered "(dynamic)" => "C:\\Users\\Owner\\.genvid\\nomad\\data\\alloc\\3167c038-6a1c-fb41-2476-34bb7c621bb6\\influxdb\\local\\influxdb.conf"
    2019/06/10 14:19:04.222062 [INFO] (runner) rendered "(dynamic)" => "C:\\Users\\Owner\\.genvid\\nomad\\data\\alloc\\3167c038-6a1c-fb41-2476-34bb7c621bb6\\chronograf\\local\\chronograf.env"
    2019-06-10T14:19:04.222-0400 [WARN ] client.alloc_runner.task_runner: some environment variables not available for rendering: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=chronograf keys=
    2019-06-10T14:19:04.222-0400 [WARN ] client.alloc_runner.task_runner: some environment variables not available for rendering: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=influxdb keys=
    2019-06-10T14:19:04.223-0400 [INFO ] client.driver_mgr.raw_exec: starting task: driver=raw_exec driver_cfg="{Command:C:\repos\fast\samples\tick\bin\chronograf.exe Args:[/host:::1 /port:28027 /bolt-path:C:\Users\Owner\.genvid\nomad\data\alloc\3167c038-6a1c-fb41-2476-34bb7c621bb6\chronograf\local/chronograf-v1.db]}"
    2019-06-10T14:19:04.223-0400 [INFO ] client.driver_mgr.raw_exec: starting task: driver=raw_exec driver_cfg="{Command:C:\repos\fast\samples\tick\bin\influxd.exe Args:[-config C:\Users\Owner\.genvid\nomad\data\alloc\3167c038-6a1c-fb41-2476-34bb7c621bb6\influxdb\local/influxdb.conf]}"
    2019/06/10 14:19:06.404288 [INFO] (runner) rendered "(dynamic)" => "C:\\Users\\Owner\\.genvid\\nomad\\data\\alloc\\3167c038-6a1c-fb41-2476-34bb7c621bb6\\chronograf\\local\\chronograf.env"
    2019/06/10 14:19:06.404288 [INFO] (runner) rendered "(dynamic)" => "C:\\Users\\Owner\\.genvid\\nomad\\data\\alloc\\3167c038-6a1c-fb41-2476-34bb7c621bb6\\kapacitor\\local\\kapacitor.conf"
    2019/06/10 14:19:06.418174 [INFO] (runner) rendered "(dynamic)" => "C:\\Users\\Owner\\.genvid\\nomad\\data\\alloc\\3167c038-6a1c-fb41-2476-34bb7c621bb6\\chronograf\\local\\chronograf.env"
    2019/06/10 14:19:06.419166 [INFO] (runner) rendered "(dynamic)" => "C:\\Users\\Owner\\.genvid\\nomad\\data\\alloc\\3167c038-6a1c-fb41-2476-34bb7c621bb6\\kapacitor\\local\\kapacitor.conf"
    2019-06-10T14:19:09.119-0400 [WARN ] client.driver_mgr.raw_exec.executor: error closing client during Kill: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 driver=raw_exec task_name=kapacitor err="rpc error: code = Canceled desc = grpc: the client connection is closing"
    2019-06-10T14:19:09.119-0400 [WARN ] client.driver_mgr.raw_exec.executor: plugin failed to exit gracefully: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 driver=raw_exec task_name=kapacitor
    2019-06-10T14:19:09.119-0400 [ERROR] client.driver_mgr.raw_exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 driver=raw_exec task_name=kapacitor error="rpc error: code = Unavailable desc = transport is closing"
    2019-06-10T14:19:09.135-0400 [INFO ] client.alloc_runner.task_runner: restarting task: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=kapacitor reason= delay=0s
    2019-06-10T14:19:09.139-0400 [INFO ] client.alloc_runner.task_runner.task_hook.logmon.nomad.exe: opening fifo: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=kapacitor @module=logmon path=//./pipe/kapacitor-2cd52e07.stdout timestamp=2019-06-10T14:19:09.139-0400
    2019-06-10T14:19:09.139-0400 [ERROR] client.alloc_runner.task_runner.task_hook: failed to start logmon: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=kapacitor error="rpc error: code = Unknown desc = failed to create fifo for extracting logs: open //./pipe/kapacitor-2cd52e07.stdout: Access is denied."
    2019-06-10T14:19:09.139-0400 [ERROR] client.alloc_runner.task_runner: prestart failed: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=kapacitor error="prestart hook "logmon" failed: rpc error: code = Unknown desc = failed to create fifo for extracting logs: open //./pipe/kapacitor-2cd52e07.stdout: Access is denied."
    2019-06-10T14:19:09.139-0400 [INFO ] client.alloc_runner.task_runner: not restarting task: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 task=kapacitor reason="Error was unrecoverable"
    2019-06-10T14:19:09.163-0400 [ERROR] client.driver_mgr.raw_exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 driver=raw_exec task_name=influxdb error="rpc error: code = Unavailable desc = transport is closing"
    2019-06-10T14:19:09.172-0400 [ERROR] client.driver_mgr.raw_exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6 driver=raw_exec task_name=chronograf error="rpc error: code = Unavailable desc = transport is closing"
    2019/06/10 14:19:09.210163 [INFO] (runner) stopping
    2019/06/10 14:19:09.210163 [INFO] (runner) received finish
    2019/06/10 14:19:09.254509 [INFO] (runner) stopping
    2019/06/10 14:19:09.254509 [INFO] (runner) received finish
    2019/06/10 14:19:09.254509 [INFO] (runner) stopping
    2019/06/10 14:19:09.255006 [INFO] (runner) received finish
    2019-06-10T14:19:09.255-0400 [INFO ] client.gc: marking allocation for GC: alloc_id=3167c038-6a1c-fb41-2476-34bb7c621bb6
@fprovencher
Copy link
Author

Updated actual job file to the posting.

@fprovencher
Copy link
Author

Please note that the error is not just happening with kapacitor, it might happen with any other executable.

@fprovencher
Copy link
Author

Could it be related to: #5577 #5616 ?

@fprovencher
Copy link
Author

Issue still present in Nomad v0.9.3 (c5e8b66)

@nickethier
Copy link
Member

Hmm Access is denied. seems suspicious as a user permissions issue. Have you tried running nomad in an admin terminal?

@fprovencher
Copy link
Author

fprovencher commented Jun 13, 2019

We have been using Nomad this way for a while and are in a position where we cannot run it as an admin. This issue started happening a week ago only. It seems to be a race condition that happens sometimes. If we keep restarting the task, it eventually succeeds.

@fprovencher
Copy link
Author

fprovencher commented Jun 14, 2019

Another easier example:

job "example" {
  datacenters = ["dc1"]
  type = "service"
  task "test" {
    driver = "raw_exec"
    config {
      command = "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe"
      args = [
          "-ExecutionPolicy","bypass",
          "-File","${NOMAD_TASK_DIR}\\t.ps1"
      ]
    }

    template {
      data = "Write-Output 'Works!'"
      destination = "local\\t.ps1"
    }
  }
}

Task run as expected and write "Works!" on stdout of the job, but we still get the same error:
image

@fprovencher
Copy link
Author

Just wondering, this task seems to be tagged with driver/exec when we are using raw_exec, is this normal?

@fprovencher
Copy link
Author

@nickethier same result when running Nomad agent -dev from an admin terminal, with the last job:
`

2019/06/14 08:31:59.600089 [DEBUG] (runner) initiating run
2019/06/14 08:31:59.600089 [DEBUG] (runner) checking template 31ad620415d7655ecb6cf42f2b405f1e
2019/06/14 08:31:59.601087 [DEBUG] (runner) rendering "(dynamic)" => "C:\\Users\\Owner\\AppData\\Local\\Temp\\NomadClient306057918\\23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5\\test\\local\\t.ps1"
2019/06/14 08:31:59.619240 [INFO] (runner) rendered "(dynamic)" => "C:\\Users\\Owner\\AppData\\Local\\Temp\\NomadClient306057918\\23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5\\test\\local\\t.ps1"
2019/06/14 08:31:59.619240 [DEBUG] (runner) diffing and updating dependencies
2019/06/14 08:31:59.620216 [DEBUG] (runner) watching 0 dependencies
2019-06-14T08:31:59.619-0400 [WARN ] client.alloc_runner.task_runner: some environment variables not available for rendering: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 task=test keys=
2019/06/14 08:31:59.621214 [DEBUG] (runner) all templates rendered
2019-06-14T08:31:59.622-0400 [INFO ] client.driver_mgr.raw_exec: starting task: driver=raw_exec driver_cfg="{Command:C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe Args:[-ExecutionPolicy bypass -File C:\Users\Owner\AppData\Local\Temp\NomadClient306057918\23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5\test\local\t.ps1]}"
2019-06-14T08:31:59.624-0400 [DEBUG] client.driver_mgr.raw_exec.executor: starting plugin: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 driver=raw_exec task_name=test path=C:\repos\fast\bin\nomad.exe args="[C:\repos\fast\bin\nomad.exe executor {"LogFile":"C:\\Users\\Owner\\AppData\\Local\\Temp\\NomadClient306057918\\23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5\\test\\executor.out","LogLevel":"debug","FSIsolation":false}]"
2019-06-14T08:31:59.627-0400 [DEBUG] client.driver_mgr.raw_exec.executor: plugin started: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 driver=raw_exec task_name=test path=C:\repos\fast\bin\nomad.exe pid=4184
2019-06-14T08:31:59.627-0400 [DEBUG] client.driver_mgr.raw_exec.executor: waiting for RPC address: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 driver=raw_exec task_name=test path=C:\repos\fast\bin\nomad.exe
2019-06-14T08:32:01.322-0400 [DEBUG] client.driver_mgr.raw_exec.executor: using plugin: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 driver=raw_exec task_name=test version=2
2019-06-14T08:32:01.542-0400 [DEBUG] client: updated allocations: index=37 total=3 pulled=2 filtered=1
2019-06-14T08:32:01.542-0400 [DEBUG] client: allocation updates: added=0 removed=0 updated=2 ignored=1
2019-06-14T08:32:01.544-0400 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=2 ignored=1 errors=0
2019-06-14T08:32:01.554-0400 [ERROR] client.driver_mgr.raw_exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 driver=raw_exec task_name=test error="rpc error: code = Unavailable desc = transport is closing"
2019-06-14T08:32:01.555-0400 [DEBUG] client.alloc_runner.task_runner.task_hook.stats_hook: error fetching stats of task: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 task=test error="rpc error: code = Canceled desc = grpc: the client connection is closing"
2019-06-14T08:32:01.584-0400 [DEBUG] client.driver_mgr.raw_exec.executor: plugin process exited: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 driver=raw_exec task_name=test path=C:\repos\fast\bin\nomad.exe pid=4184
2019-06-14T08:32:01.584-0400 [DEBUG] client.driver_mgr.raw_exec.executor: plugin exited: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 driver=raw_exec task_name=test
2019-06-14T08:32:01.586-0400 [INFO ] client.alloc_runner.task_runner: restarting task: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 task=test reason="Restart within policy" delay=18.299734646s
2019-06-14T08:32:01.741-0400 [DEBUG] client: updated allocations: index=38 total=3 pulled=2 filtered=1
2019-06-14T08:32:01.741-0400 [DEBUG] client: allocation updates: added=0 removed=0 updated=2 ignored=1
2019-06-14T08:32:01.742-0400 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=2 ignored=1 errors=0
2019-06-14T08:32:19.889-0400 [INFO ] client.alloc_runner.task_runner.task_hook.logmon.nomad.exe: opening fifo: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 task=test path=//./pipe/test-15e1de7a.stdout @module=logmon timestamp=2019-06-14T08:32:19.889-0400
2019-06-14T08:32:19.889-0400 [ERROR] client.alloc_runner.task_runner.task_hook: failed to start logmon: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 task=test error="rpc error: code = Unknown desc = failed to create fifo for extracting logs: open //./pipe/test-15e1de7a.stdout: Access is denied."
2019-06-14T08:32:54.062-0400 [ERROR] client.alloc_runner.task_runner: prestart failed: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 task=test error="prestart hook "logmon" failed: rpc error: code = Unknown desc = failed to create fifo for extracting logs: open //./pipe/test-15e1de7a.stdout: Access is denied."
2019-06-14T08:32:54.062-0400 [INFO ] client.alloc_runner.task_runner: not restarting task: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 task=test reason="Error was unrecoverable"
2019-06-14T08:32:54.063-0400 [INFO ] client.gc: marking allocation for GC: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5
2019-06-14T08:32:54.085-0400 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 task=test path=C:\repos\fast\bin\nomad.exe pid=8552
2019-06-14T08:32:54.085-0400 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin exited: alloc_id=23e1f3ec-90af-36cd-3ff0-ff7b7bbc5fb5 task=test
2019/06/14 08:32:54.086638 [INFO] (runner) stopping`

@ninoles
Copy link
Contributor

ninoles commented Jun 14, 2019

Hmm Access is denied. seems suspicious as a user permissions issue. Have you tried running nomad in an admin terminal?

Access is denied is often used not only for permission, but also concurrent access of non-shared handles in Windows. It's a typical error in case of dying process and log file access. Also, handles of dying (terminated) processes in Windows are clear by the kernel after the process is reported as dead. The problem could be around such race condition.

@fprovencher
Copy link
Author

fprovencher commented Jun 14, 2019

Running the example job above on Nomad 0.8.7 works correctly:
image

    2019/06/14 17:33:18.391162 [INFO] (runner) starting
    2019/06/14 17:33:18.391162 [DEBUG] (runner) running initial templates
    2019/06/14 17:33:18.391162 [INFO] (runner) initiating run
    2019/06/14 17:33:18.392159 [DEBUG] (runner) checking template 31ad620415d7655ecb6cf42f2b405f1e
    2019/06/14 17:33:18.392159 [DEBUG] (runner) rendering "(dynamic)" => "C:\\Users\\Owner\\AppData\\Local\\Temp\\NomadClient357850230\\6bd1a849-7f20-de03-4f86-eba90d17b206\\test\\local\\t.ps1"
    2019/06/14 17:33:18.397148 [INFO] (runner) rendered "(dynamic)" => "C:\\Users\\Owner\\AppData\\Local\\Temp\\NomadClient357850230\\6bd1a849-7f20-de03-4f86-eba90d17b206\\test\\local\\t.ps1"
    2019/06/14 17:33:18.397148 [DEBUG] (runner) diffing and updating dependencies
    2019/06/14 17:33:18.397148 [DEBUG] (runner) watching 0 dependencies
    2019/06/14 17:33:18.397148 [DEBUG] (runner) all templates rendered
    2019/06/14 17:33:18.397148 [DEBUG] client: unblocking task test for alloc "6bd1a849-7f20-de03-4f86-eba90d17b206": Template
2019-06-14T17:33:18.398-0400 [DEBUG] plugin: starting plugin: path=C:\repos\fast\nomad.exe args="[C:\repos\fast\nomad.exe executor {"LogFile":"C:\\Users\\Owner\\AppData\\Local\\Temp\\NomadClient357850230\\6bd1a849-7f20-de03-4f86-eba90d17b206\\test\\executor.out","LogLevel":"DEBUG"}]"
2019-06-14T17:33:18.401-0400 [DEBUG] plugin: waiting for RPC address: path=C:\repos\fast\nomad.exe
    2019/06/14 17:33:18.614565 [DEBUG] client: updated allocations at index 13 (total 1) (pulled 0) (filtered 1)
    2019/06/14 17:33:18.614565 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 1)
==> Newer Nomad version available: 0.9.3 (currently running: 0.8.7)
    2019/06/14 17:33:19.358577 [DEBUG] http: Request GET /v1/evaluation/8305923c-d8bb-9c04-5a3b-d13cea74e39a (0s)
    2019/06/14 17:33:19.359574 [DEBUG] http: Request GET /v1/evaluation/8305923c-d8bb-9c04-5a3b-d13cea74e39a/allocations (0s)
2019-06-14T17:33:19.495-0400 [DEBUG] plugin.nomad.exe: plugin address: timestamp=2019-06-14T17:33:19.494-0400 network=tcp address=127.0.0.1:14000
    2019/06/14 17:33:20.224877 [DEBUG] http: Request GET /v1/job/example (993.3µs)
    2019/06/14 17:33:20.239839 [DEBUG] http: Request GET /v1/job/example/allocations (996.8µs)
    2019/06/14 17:33:20.241833 [DEBUG] http: Request GET /v1/job/example/evaluations (0s)
    2019/06/14 17:33:20.316644 [DEBUG] http: Request GET /v1/job/example/deployments (0s)
    2019/06/14 17:33:20.386447 [DEBUG] http: Request GET /v1/job/example/summary?index=1 (0s)
    2019/06/14 17:33:20.389436 [ERR] http: Request /v1/namespace/default, error: Nomad Enterprise only endpoint
    2019/06/14 17:33:20.389436 [DEBUG] http: Request GET /v1/namespace/default (0s)
    2019/06/14 17:33:20.539038 [DEBUG] driver.raw_exec: started process with pid: 20244
    2019/06/14 17:33:20.612847 [DEBUG] client: updated allocations at index 14 (total 1) (pulled 0) (filtered 1)
    2019/06/14 17:33:20.612847 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 1)
2019-06-14T17:33:20.914-0400 [DEBUG] plugin: plugin process exited: path=C:\repos\fast\nomad.exe
    2019/06/14 17:33:20.914039 [INFO] client: task "test" for alloc "6bd1a849-7f20-de03-4f86-eba90d17b206" completed successfully
    2019/06/14 17:33:20.914039 [INFO] client: Restarting task "test" for alloc "6bd1a849-7f20-de03-4f86-eba90d17b206" in 15.061440252s
    2019/06/14 17:33:21.012771 [DEBUG] client: updated allocations at index 15 (total 1) (pulled 0) (filtered 1)
    2019/06/14 17:33:21.012771 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 1)
    2019/06/14 17:33:22.263000 [DEBUG] http: Request GET /v1/job/example/summary?index=11 (0s)
    2019/06/14 17:33:25.173389 [DEBUG] http: Request GET /v1/job/example/allocations?index=13 (0s)
    2019/06/14 17:33:25.255170 [DEBUG] http: Request GET /v1/node/2bee33cc-dd28-7c46-3e01-b11f54d8148a (0s)
    2019/06/14 17:33:25.262154 [DEBUG] http: Request GET /v1/allocation/6bd1a849-7f20-de03-4f86-eba90d17b206 (998.5µs)
    2019/06/14 17:33:25.297059 [DEBUG] http: Request GET /v1/client/allocation/6bd1a849-7f20-de03-4f86-eba90d17b206/stats (0s)
    2019/06/14 17:33:25.310026 [DEBUG] http: Request GET /v1/allocation/6bd1a849-7f20-de03-4f86-eba90d17b206 (0s)
    2019/06/14 17:33:25.321027 [DEBUG] http: Request GET /v1/client/allocation/6bd1a849-7f20-de03-4f86-eba90d17b206/stats (0s)
    2019/06/14 17:33:25.827641 [DEBUG] http: Request GET /v1/client/allocation/6bd1a849-7f20-de03-4f86-eba90d17b206/stats (0s)
2019-06-14T17:33:35.976-0400 [DEBUG] plugin: starting plugin: path=C:\repos\fast\nomad.exe args="[C:\repos\fast\nomad.exe executor {"LogFile":"C:\\Users\\Owner\\AppData\\Local\\Temp\\NomadClient357850230\\6bd1a849-7f20-de03-4f86-eba90d17b206\\test\\executor.out","LogLevel":"DEBUG"}]"
2019-06-14T17:33:35.981-0400 [DEBUG] plugin: waiting for RPC address: path=C:\repos\fast\nomad.exe
2019-06-14T17:33:36.958-0400 [DEBUG] plugin.nomad.exe: plugin address: timestamp=2019-06-14T17:33:36.957-0400 address=127.0.0.1:14000 network=tcp
    2019/06/14 17:33:37.999559 [DEBUG] driver.raw_exec: started process with pid: 9452
    2019/06/14 17:33:38.213022 [DEBUG] client: updated allocations at index 16 (total 1) (pulled 0) (filtered 1)
    2019/06/14 17:33:38.213022 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 1)
    2019/06/14 17:33:38.213022 [DEBUG] http: Request GET /v1/job/example/summary?index=15 (12.9438874s)
    2019/06/14 17:33:38.213022 [DEBUG] http: Request GET /v1/job/example/summary?index=15 (13.947247s)
    2019/06/14 17:33:38.213022 [DEBUG] http: Request GET /v1/allocation/6bd1a849-7f20-de03-4f86-eba90d17b206?index=15 (12.1430284s)
2019-06-14T17:33:38.334-0400 [DEBUG] plugin: plugin process exited: path=C:\repos\fast\nomad.exe
    2019/06/14 17:33:38.334737 [INFO] client: task "test" for alloc "6bd1a849-7f20-de03-4f86-eba90d17b206" completed successfully
    2019/06/14 17:33:38.334737 [INFO] client: Restarting task "test" for alloc "6bd1a849-7f20-de03-4f86-eba90d17b206" in 18.726233854s
    2019/06/14 17:33:38.414521 [DEBUG] client: updated allocations at index 17 (total 1) (pulled 0) (filtered 1)
    2019/06/14 17:33:38.414521 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 1)

@fprovencher
Copy link
Author

Running the example job above on Nomad 0.9.0 shows the issue:
image

    2019/06/14 17:28:41.128774 [INFO] (runner) starting
    2019/06/14 17:28:41.128774 [DEBUG] (runner) running initial templates
    2019/06/14 17:28:41.128774 [DEBUG] (runner) initiating run
    2019/06/14 17:28:41.128774 [DEBUG] (runner) checking template 31ad620415d7655ecb6cf42f2b405f1e
    2019/06/14 17:28:41.128774 [DEBUG] (runner) rendering "(dynamic)" => "C:\\Users\\Owner\\AppData\\Local\\Temp\\NomadClient948345938\\d3fb25b2-d20d-9d3f-99aa-f002786e7430\\test\\local\\t.ps1"
    2019/06/14 17:28:41.146055 [INFO] (runner) rendered "(dynamic)" => "C:\\Users\\Owner\\AppData\\Local\\Temp\\NomadClient948345938\\d3fb25b2-d20d-9d3f-99aa-f002786e7430\\test\\local\\t.ps1"
    2019/06/14 17:28:41.146055 [DEBUG] (runner) diffing and updating dependencies
    2019/06/14 17:28:41.146055 [DEBUG] (runner) watching 0 dependencies
    2019/06/14 17:28:41.146055 [DEBUG] (runner) all templates rendered
    2019-06-14T17:28:41.147-0400 [WARN ] client.alloc_runner.task_runner: some environment variables not available for rendering: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 task=test keys=
    2019-06-14T17:28:41.147-0400 [INFO ] client.driver_mgr.raw_exec: starting task: driver=raw_exec driver_cfg="{Command:C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe Args:[-ExecutionPolicy bypass -File C:\Users\Owner\AppD          ata\Local\Temp\NomadClient948345938\d3fb25b2-d20d-9d3f-99aa-f002786e7430\test\local\t.ps1]}"
    2019-06-14T17:28:41.147-0400 [DEBUG] client.driver_mgr.raw_exec.executor: starting plugin: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 driver=raw_exec task_name=test path=C:\repos\fast\bin\nomad.exe args="[C:\repos\fast\bin\noma          d.exe executor {"LogFile":"C:\\Users\\Owner\\AppData\\Local\\Temp\\NomadClient948345938\\d3fb25b2-d20d-9d3f-99aa-f002786e7430\\test\\executor.out","LogLevel":"debug","FSIsolation":false}]"
    2019-06-14T17:28:41.150-0400 [DEBUG] client.driver_mgr.raw_exec.executor: plugin started: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 driver=raw_exec task_name=test path=C:\repos\fast\bin\nomad.exe pid=12628
    2019-06-14T17:28:41.150-0400 [DEBUG] client.driver_mgr.raw_exec.executor: waiting for RPC address: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 driver=raw_exec task_name=test path=C:\repos\fast\bin\nomad.exe
    2019-06-14T17:28:43.139-0400 [DEBUG] client.driver_mgr.raw_exec.executor: using plugin: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 driver=raw_exec task_name=test version=2
    2019-06-14T17:28:43.139-0400 [DEBUG] client.driver_mgr.raw_exec.executor.nomad.exe: plugin address: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 driver=raw_exec task_name=test network=tcp address=127.0.0.1:14000 timestamp=2019-06          -14T17:28:43.139-0400
    2019-06-14T17:28:43.141-0400 [DEBUG] client.driver_mgr.raw_exec.executor.nomad.exe: launching command: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 driver=raw_exec task_name=test @module=executor args="-ExecutionPolicy bypass -Fi          le C:\Users\Owner\AppData\Local\Temp\NomadClient948345938\d3fb25b2-d20d-9d3f-99aa-f002786e7430\test\local\t.ps1" command=C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe timestamp=2019-06-14T17:28:43.141-0400
    2019-06-14T17:28:43.329-0400 [DEBUG] client: updated allocations: index=27 total=2 pulled=1 filtered=1
    2019-06-14T17:28:43.329-0400 [DEBUG] client: allocation updates: added=0 removed=0 updated=1 ignored=1
    2019-06-14T17:28:43.329-0400 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=1 ignored=1 errors=0
    2019-06-14T17:28:43.472-0400 [ERROR] client.driver_mgr.raw_exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 driver=raw_exec task_name=test error="rpc error: code =           Unavailable desc = transport is closing"
    2019-06-14T17:28:43.472-0400 [DEBUG] client.alloc_runner.task_runner.task_hook.stats_hook: error fetching stats of task: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 task=test error="rpc error: code = Canceled desc = grpc: the cl          ient connection is closing"
    2019-06-14T17:28:43.497-0400 [DEBUG] client.driver_mgr.raw_exec.executor: plugin process exited: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 driver=raw_exec task_name=test path=C:\repos\fast\bin\nomad.exe pid=12628
    2019-06-14T17:28:43.497-0400 [DEBUG] client.driver_mgr.raw_exec.executor: plugin exited: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 driver=raw_exec task_name=test
    2019-06-14T17:28:43.497-0400 [INFO ] client.alloc_runner.task_runner: restarting task: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 task=test reason="Restart within policy" delay=15.987098826s
    2019-06-14T17:28:43.530-0400 [DEBUG] client: updated allocations: index=28 total=2 pulled=1 filtered=1
    2019-06-14T17:28:43.530-0400 [DEBUG] client: allocation updates: added=0 removed=0 updated=1 ignored=1
    2019-06-14T17:28:43.530-0400 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=1 ignored=1 errors=0
    2019-06-14T17:28:43.729-0400 [DEBUG] client: updated allocations: index=29 total=2 pulled=1 filtered=1
    2019-06-14T17:28:43.729-0400 [DEBUG] client: allocation updates: added=0 removed=0 updated=1 ignored=1
    2019-06-14T17:28:43.729-0400 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=1 ignored=1 errors=0
    2019-06-14T17:28:59.488-0400 [INFO ] client.alloc_runner.task_runner.task_hook.logmon.nomad.exe: opening fifo: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 task=test @module=logmon path=//./pipe/test-edc44ef1.stdout timestamp=201          9-06-14T17:28:59.488-0400
    2019-06-14T17:28:59.488-0400 [ERROR] client.alloc_runner.task_runner.task_hook: failed to start logmon: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 task=test error="rpc error: code = Unknown desc = failed to create fifo for extr          acting logs: open //./pipe/test-edc44ef1.stdout: Access is denied."
    2019-06-14T17:28:59.488-0400 [ERROR] client.alloc_runner.task_runner: prestart failed: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 task=test error="prestart hook "logmon" failed: rpc error: code = Unknown desc = failed to create           fifo for extracting logs: open //./pipe/test-edc44ef1.stdout: Access is denied."
    2019-06-14T17:28:59.488-0400 [INFO ] client.alloc_runner.task_runner: not restarting task: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 task=test reason="Error was unrecoverable"
    2019-06-14T17:28:59.488-0400 [INFO ] client.gc: marking allocation for GC: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430
    2019-06-14T17:28:59.514-0400 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 task=test path=C:\repos\fast\bin\nomad.exe pid=26352
    2019-06-14T17:28:59.514-0400 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin exited: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 task=test
    2019/06/14 17:28:59.514837 [INFO] (runner) stopping
    2019/06/14 17:28:59.514837 [DEBUG] (runner) stopping watcher
    2019/06/14 17:28:59.514837 [DEBUG] (watcher) stopping all views
    2019-06-14T17:28:59.514-0400 [DEBUG] client.alloc_runner.task_runner: task run loop exiting: alloc_id=d3fb25b2-d20d-9d3f-99aa-f002786e7430 task=test

@endocrimes
Copy link
Contributor

Hey folks ✨,

I'm taking a look into this now that I have a windows test environment back up. It seems to be an issue with the way we handle restarts inside logmon, but I need to dig a little bit further to try and figure out the correct fix.

@endocrimes endocrimes self-assigned this Jun 18, 2019
@urjitbhatia
Copy link

Also seeing this on Ubuntu:
logmon: rpc error: code = Unknown desc = failed to create stdout logfile for "goyaad-runner.stdout": open /opt/nomad/data/alloc/a3f7dd51-c207-ea3b-2a92-adc57441b50b/alloc/logs: no such file or directory slightly different error message though

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants