Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace never starts and stays stuck in unknown state #9506

Closed
axonasif opened this issue Apr 23, 2022 · 7 comments
Closed

Workspace never starts and stays stuck in unknown state #9506

axonasif opened this issue Apr 23, 2022 · 7 comments
Labels
team: workspace Issue belongs to the Workspace team type: bug Something isn't working

Comments

@axonasif
Copy link
Member

axonasif commented Apr 23, 2022

Bug description

Was trying to open https://github.com/gitpod-io/template-flutter/tree/axon/refactor_dockerfile since yesterday, at that time imagebuild was happening. However, it would stay stuck in # exporting layers stage (what I could see in the jumping G logs box) everytime I tried.

Today, its just being stuck at Preparing Workspace, which shows Unkown state in the dashboard, not landing on any cluster as per the URL.

Steps to reproduce

Try to open https://github.com/gitpod-io/template-flutter/tree/axon/refactor_dockerfile on gitpod

Workspace affected

No response

Expected behavior

No response

Example repository

No response

Anything else?

I was able to open a workspace from a different git-context, however I'm not sure if my changes were there due to #9507
Internal slack discussion

@axonasif axonasif added type: bug Something isn't working team: workspace Issue belongs to the Workspace team labels Apr 23, 2022
@axonasif
Copy link
Member Author

axonasif commented Apr 26, 2022

Seen a few users who might be facing the same issue with new custom dockerfiles on Discord and Front.
Here's one from today: https://canary.discord.com/channels/816244985187008514/968502538699669604/968508799608565833

@axonasif
Copy link
Member Author

Update: It seems to be working now.
Closing the issue 👍

@axonasif
Copy link
Member Author

My bad, I spoke too soon 😓
It's still being stuck at # exporting to image after a new change to my dockerfile.

@axonasif axonasif reopened this Apr 26, 2022
@axonasif axonasif moved this from Done to Scheduled in 🌌 Workspace Team Apr 26, 2022
@Furisto
Copy link
Member

Furisto commented Apr 27, 2022

One of the reasons why this can fail is that buildkitd is stuck in uninterruptible sleep. When inspecting the call stacks we see this:

[<0>] request_wait_answer+0x12a/0x200
[<0>] fuse_simple_request+0x185/0x270
[<0>] fuse_statfs+0xd8/0x150
[<0>] statfs_by_dentry+0x6d/0x90
[<0>] vfs_statfs+0x1b/0xc0
[<0>] ovl_check_namelen.isra.0+0x33/0x70 [overlay]
[<0>] ovl_fill_super+0x329/0x9e0 [overlay]
[<0>] mount_nodev+0x49/0x90
[<0>] ovl_mount+0x18/0x20 [overlay]
[<0>] legacy_get_tree+0x2b/0x50
[<0>] vfs_get_tree+0x2a/0xc0
[<0>] do_mount+0x7b1/0x9c0
[<0>] ksys_mount+0x82/0xd0
[<0>] __x64_sys_mount+0x25/0x30
[<0>] do_syscall_64+0x57/0x190
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

So buildkitd is trying to do IO with fuse as the underlying filesystem. Fuse is used by the stargz snapshotter

3757 6583 0:688 / /workspace/buildkit/runc-stargz/snapshots/snapshotter/snapshots/5/fs rw,nodev,relatime - fuse.rawBridge stargz rw,user_id=0,group_id=0,allow_other
3758 6583 0:689 / /workspace/buildkit/runc-stargz/snapshots/snapshotter/snapshots/6/fs rw,nodev,relatime - fuse.rawBridge stargz rw,user_id=0,group_id=0,allow_other
3759 6583 0:690 / /workspace/buildkit/runc-stargz/snapshots/snapshotter/snapshots/7/fs rw,nodev,relatime - fuse.rawBridge stargz rw,user_id=0,group_id=0,allow_other
3760 6583 0:692 / /workspace/buildkit/runc-stargz/snapshots/snapshotter/snapshots/8/fs rw,nodev,relatime - fuse.rawBridge stargz rw,user_id=0,group_id=0,allow_other
3761 6583 0:733 / /workspace/buildkit/runc-stargz/snapshots/snapshotter/snapshots/9/fs rw,nodev,relatime - fuse.rawBridge stargz rw,user_id=0,group_id=0,allow_other

https://github.com/containerd/stargz-snapshotter/blob/main/docs/INSTALL.md

We see that the cpu instruction pointer points into non executable memory

[5564802.278447] buildkitd       D    0 480451 3670504 0x900043a4
[5564802.278451] Call Trace:
[5564802.278463]  __schedule+0x2e3/0x740
[5564802.278466]  schedule+0x42/0xb0
[5564802.278469]  request_wait_answer+0x12a/0x200
[5564802.278475]  ? wait_woken+0x80/0x80
[5564802.278482]  fuse_simple_request+0x185/0x270
[5564802.278486]  fuse_statfs+0xd8/0x150
[5564802.278492]  statfs_by_dentry+0x6d/0x90
[5564802.278495]  vfs_statfs+0x1b/0xc0
[5564802.278506]  ovl_check_namelen.isra.0+0x33/0x70 [overlay]
[5564802.278512]  ovl_fill_super+0x329/0x9e0 [overlay]
[5564802.278520]  ? free_prealloced_shrinker+0x20/0x40
[5564802.278524]  ? ovl_get_lower_layers+0x3b0/0x3b0 [overlay]
[5564802.278532]  mount_nodev+0x49/0x90
[5564802.278536]  ovl_mount+0x18/0x20 [overlay]
[5564802.278540]  legacy_get_tree+0x2b/0x50
[5564802.278542]  vfs_get_tree+0x2a/0xc0
[5564802.278548]  ? ns_capable+0x10/0x20
[5564802.278555]  do_mount+0x7b1/0x9c0
[5564802.278560]  ksys_mount+0x82/0xd0
[5564802.278563]  __x64_sys_mount+0x25/0x30
[5564802.278567]  do_syscall_64+0x57/0x190
[5564802.278577]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[5564802.278583] RIP: 0033:0x4c2105
[5564802.278589] Code: Bad RIP value.
[5564802.278591] RSP: 002b:000000c00355eee0 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[5564802.278595] RAX: ffffffffffffffda RBX: 000000c00355ee58 RCX: 00000000004c2105
[5564802.278596] RDX: 000000c000bec328 RSI: 000000c006697c80 RDI: 000000c000bec320
[5564802.278597] RBP: 000000c00355ef88 R08: 000000c0031502c0 R09: 0000000000000000
[5564802.278598] R10: 0000000000000000 R11: 0000000000000246 R12: 000000c00355ef68
[5564802.278599] R13: 0000000000000001 R14: 000000c00cff0b60 R15: ffffffffffffffff
[5564923.020456] INFO: task buildkitd:3670504 blocked for more than 362 seconds.
[5564923.027806]       Tainted: G        W  OE     5.4.0-1059-gke #62-Ubuntu
[5564923.034755] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message

We have deactivated stargz as a workaround and will observe if the error is still occuring.

@axonasif
Copy link
Member Author

Hey @Furisto, it was working since last night 👍
Thanks for looking into it 🙏

@sagor999
Copy link
Contributor

@axonasif should this be closed now?

@axonasif
Copy link
Member Author

@axonasif should this be closed now?

Yes, thanks for the nudge 😄

Repository owner moved this from Scheduled to Done in 🌌 Workspace Team May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team: workspace Issue belongs to the Workspace team type: bug Something isn't working
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants