Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provider 0.3.2-rc2 panics because one of two SDL deployments services lacks global: true #112

Closed
andy108369 opened this issue Aug 6, 2023 · 2 comments
Labels
awaiting-triage repo/provider Akash provider-services repo issues

Comments

@andy108369
Copy link
Contributor

andy108369 commented Aug 6, 2023

Provider-services 0.3.2-rc2 experiences a panic due to an SDL issue with two deployments , where one of them does not expose a global service (has no global: true).

provider

  "owner": "akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh",
  "host_uri": "https://provider.provider-02.sandbox-01.aksh.pw:8443",

version

provider-services v0.3.2-rc2
sandbox v0.23.2-rc3

SDL

I've tested this SDL, have also removed the 33060 port since it should not be there generally:
https://github.com/akash-network/awesome-akash/blob/c3a8fedb5685078e7313be51732e178b6b3ca3f8/wordpress/deploy.yaml#L35-L37

Here is a working SDL where I've added global: true to the db service, and removed 33060 port which should not be there: https://gist.githubusercontent.com/andy108369/733cd5c4a191211885808469d7b4ec35/raw/420c5ecc2c2dc1bcb3cbcb5a3fee9d1d0eca37d1/wordpress-working-sdl.yaml

Logs

root@node1:~# kubectl -n akash-services logs akash-provider-0 |grep ^E

E[2023-08-06|15:32:49.397] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
E0806 15:34:22.761207       1 v2.go:104] io: read/write on closed pipe
E[2023-08-06|15:40:05.957] recovered from panic: runtime error: index out of range [1] with length 1 cmp=provider client=kube
E[2023-08-06|15:40:05.957] unable to deploy lid=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/285119/1/1/akash143ypn84kuf379tv9wvcxsmamhj83d5pg2rfc8v. last known state:
E[2023-08-06|15:40:05.957] deploying workload                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/285119/1/1/akash143ypn84kuf379tv9wvcxsmamhj83d5pg2rfc8v manifest-group=akash err="kube: internal error"
E[2023-08-06|15:40:05.957] execution error                              module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/285119/1/1/akash143ypn84kuf379tv9wvcxsmamhj83d5pg2rfc8v manifest-group=akash state=deploy-active err="kube: internal error"

All provider logs:

https://transfer.sh/QTY7vMopwi/provider.logs

@andy108369 andy108369 added repo/provider Akash provider-services repo issues awaiting-triage labels Aug 6, 2023
@arno01
Copy link

arno01 commented Aug 7, 2023

New logs with provider v0.3.2-rc3 (Atrur enabled extra debug info in it)

SDL tested => https://transfer.sh/mKzHRYPPNj/deploy.yaml.1

I[2023-08-07|13:16:22.267] manifest received                            module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280
I[2023-08-07|13:16:22.267] watchdog done                                module=provider-manifest cmp=provider lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280
I[2023-08-07|13:16:22.271] data received                                module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280 version=b69728b8071fdd2f08fbcef2153f4b204e2a82cee675ad84010d37a7d52a0998
D[2023-08-07|13:16:22.272] requests valid                               module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280 num-requests=1
D[2023-08-07|13:16:22.272] publishing manifest received                 module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280 num-leases=1
D[2023-08-07|13:16:22.272] publishing manifest received for lease       module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280 lease_id=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh
I[2023-08-07|13:16:22.272] manifest received                            module=provider-cluster cmp=provider cmp=service lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh
I[2023-08-07|13:16:22.280] hostnames withheld                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash cnt=0
D[2023-08-07|13:16:22.280] no services                                  cmp=provider client=kube lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh service=db
W0807 13:16:22.356779       1 warnings.go:70] unknown field "status"
E[2023-08-07|13:16:22.420] recovered from panic: 
goroutine 346 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:24 +0x65
github.com/akash-network/provider/cluster/kube.(*client).Deploy.func1()
	github.com/akash-network/provider/cluster/kube/client.go:247 +0x71
panic({0x2d1c440, 0xc001e1fa58})
	runtime/panic.go:884 +0x213
github.com/akash-network/provider/cluster/kube.(*client).Deploy(0xc00055c6c0, {0x3943250, 0xc000d7d2c0}, {0x39272e0, 0xc00131a640})
	github.com/akash-network/provider/cluster/kube/client.go:324 +0x1e90
github.com/akash-network/provider/cluster.(*deploymentManager).doDeploy(0xc0012ab680, {0x39431e0?, 0xc00005e048?})
	github.com/akash-network/provider/cluster/manager.go:378 +0xd5a
github.com/akash-network/provider/cluster.(*deploymentManager).startDeploy.func1()
	github.com/akash-network/provider/cluster/manager.go:273 +0x48
created by github.com/akash-network/provider/cluster.(*deploymentManager).startDeploy
	github.com/akash-network/provider/cluster/manager.go:272 +0xfc
 cmp=provider client=kube
E[2023-08-07|13:16:22.420] unable to deploy lid=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh. last known state:
 cmp=provider client=kube
E[2023-08-07|13:16:22.420] deploying workload                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash err="kube: internal error"
E[2023-08-07|13:16:22.420] execution error                              module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash state=deploy-active err="kube: internal error"
D[2023-08-07|13:16:22.469] purged hostnames                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.469] purged ips                                   module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.548] teardown complete                            module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.548] shutting down                                module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.548] waiting on dm.wg                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
I[2023-08-07|13:16:22.548] shutdown complete                            module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.548] hostnames released                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.548] sending manager into channel                 module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
I[2023-08-07|13:16:22.549] manager done                                 module=provider-cluster cmp=provider cmp=service lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh
D[2023-08-07|13:16:22.549] unreserving capacity                         module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1
I[2023-08-07|13:16:22.549] attempting to removing reservation           module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1
I[2023-08-07|13:16:22.549] removing reservation                         module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1
I[2023-08-07|13:16:22.549] unreserve capacity complete                  module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1
D[2023-08-07|13:16:22.549] reservation count                            module=provider-cluster cmp=provider cmp=service cmp=inventory-service cnt=0

Entire provider log https://transfer.sh/tpLhHlvWRn/logs.log

@andy108369
Copy link
Contributor Author

andy108369 commented Aug 9, 2023

no panic with the provider 0.3.2-rc4 🥳 !


I[2023-08-09|16:54:14.873] order detected                               module=bidengine-service cmp=provider order=order/akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1
I[2023-08-09|16:54:14.876] group fetched                                module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1
I[2023-08-09|16:54:14.877] requesting reservation                       module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1
D[2023-08-09|16:54:14.877] reservation requested                        module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1 resources[{resource:{id:1,cpu:{units:{val:1000}},memory:{size:{val:1073741824}},storage:[{name:default,size:{val:1073741824}},{name:wordpress-db,size:{val:1073741824},attributes:[{key:class,value:beta3},{key:persistent,value:true}]}],gpu:{units:{val:0}},endpoints:null},count:1,price:{denom:uakt,amount:10000.000000000000000000}},{resource:{id:2,cpu:{units:{val:1000}},memory:{size:{val:1073741824}},storage:[{name:default,size:{val:1073741824}},{name:wordpress-data,size:{val:1073741824},attributes:[{key:class,value:beta3},{key:persistent,value:true}]}],gpu:{units:{val:0}},endpoints:[{sequence_number:0}]},count:1,price:{denom:uakt,amount:10000.000000000000000000}}]=(MISSING)
D[2023-08-09|16:54:14.877] reservation count                            module=provider-cluster cmp=provider cmp=service cmp=inventory-service cnt=1
I[2023-08-09|16:54:14.877] Reservation fulfilled                        module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1
D[2023-08-09|16:54:15.714] submitting fulfillment                       module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1 price=15.000000000000000000uakt


I[2023-08-09|16:54:21.127] bid complete                                 module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1
[332870-1-1]$ akash_status 
Detected provider for 332870/1/1: akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh
{
  "services": {
    "db": {
      "name": "db",
      "available": 1,
      "total": 1,
      "uris": null,
      "observed_generation": 1,
      "replicas": 1,
      "updated_replicas": 1,
      "ready_replicas": 0,
      "available_replicas": 1
    },
    "wordpress": {
      "name": "wordpress",
      "available": 1,
      "total": 1,
      "uris": [
        "rjo2kb21s5evffgll5o5grqq54.ingress.provider-02.sandbox-01.aksh.pw"
      ],
      "observed_generation": 1,
      "replicas": 1,
      "updated_replicas": 1,
      "ready_replicas": 0,
      "available_replicas": 1
    }
  },
  "forwarded_ports": {},
  "ips": null
}

as you can see db gets no forwarded_ports for its 3306 port exposed only to the wordpress service

I've also made sure the wordpress service can see the db (the one w/o the global:true):

root@wordpress-0:/var/www/html# echo >/dev/tcp/db/3306
root@wordpress-0:/var/www/html# echo $?
0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-triage repo/provider Akash provider-services repo issues
Projects
None yet
Development

No branches or pull requests

2 participants