Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot receive data from the simulator. The connection is blocked or the simulator is not running. #15

Open
joeljosephjin opened this issue Apr 30, 2020 · 23 comments

Comments

@joeljosephjin
Copy link

xvfb-run python eval.py --agent-config baselines/config/random-agent.yaml --episode-config config/check-ground-truth.yaml

gives this

Set current directory to /home/joel/goseek-challenge
Found path: /home/joel/goseek-challenge/simulator/goseek-v0.1.4.x86_64
Mono path[0] = '/home/joel/goseek-challenge/simulator/goseek-v0.1.4_Data/Managed'
Mono config path = '/home/joel/goseek-challenge/simulator/goseek-v0.1.4_Data/MonoBleedingEdge/etc'
Preloaded 'ScreenSelector.so'
Display 0 'screen': 640x480 (primary device).
Logging to /home/joel/.config/unity3d/Editor/Player.log
Evaluation episode on episode 0, scene 3
Traceback (most recent call last):
  File "eval.py", line 85, in <module>
    results = main(episode_cfg, agent_args)
  File "eval.py", line 66, in main
    return benchmark.evaluate(agent)
  File "/home/joel/tesse-gym/src/tesse_gym/tasks/goseek/goseek_benchmark.py", line 97, in evaluate
    scene_id=self.scenes[episode], random_seed=self.random_seeds[episode]
  File "/home/joel/tesse-gym/src/tesse_gym/tasks/goseek/goseek.py", line 138, in reset
    super().reset(scene_id, random_seed)
  File "/home/joel/tesse-gym/src/tesse_gym/core/tesse_gym.py", line 247, in reset
    observation = self.get_synced_observation()
  File "/home/joel/tesse-gym/src/tesse_gym/core/tesse_gym.py", line 279, in get_synced_observation
    response = self.observe()
  File "/home/joel/tesse-gym/src/tesse_gym/tasks/goseek/goseek_full_perception.py", line 95, in observe
    return self._data_request(DataRequest(metadata=True, cameras=cameras))
  File "/home/joel/tesse-gym/src/tesse_gym/core/tesse_gym.py", line 383, in _data_request
    raise TesseConnectionError()
tesse_gym.core.utils.TesseConnectionError: Cannot receive data from the simulator. The connection is blocked or the simulator is not running. 

i am running this on my google cloud instance through chrome remote desktop connection. It has an nvidia GPU.

@ZacRavichandran
Copy link
Member

It seems like the client can't connect to the simulator for some reason. This usually happens when the DISPLAY variable isn't set, but the output Display 0 'screen': 640x480 (primary device) indicates otherwise.

Could you check the network connection by running the below commands. This uses the low level interface, tesse-interface, to query the simulator. The following output is expected.

>>> from tesse.msgs import DataRequest; from tesse.env import Env
>>> print(Env().request(DataRequest()).metadata)
<TESSE_Agent_Metadata_v0.5>
  <position x='-6.649457' y='0.4999968' z='-5.790709'/>
  <quaternion x='0' y='0.9426415' z='0' w='-0.3338069'/>
  <velocity x_dot='0' y_dot='2.233302E-06' z_dot='0'/>
  <angular_velocity x_ang_dot='0' y_ang_dot='0' z_ang_dot='0'/>
  <acceleration x_ddot='0' y_ddot='0' z_ddot='0'/>
  <angular_acceleration x_ang_ddot='0' y_ang_ddot='0' z_ang_ddot='0'/>
  <time>295.8927</time>
  <collision status='false' name=''/>
  <collider status='true'/>
</TESSE_Agent_Metadata_v0.5>

Thanks!

@joeljosephjin
Copy link
Author

Error:

>>> from tesse.msgs import DataRequest; from tesse.env import Env
>>> print(Env().request(DataRequest()).metadata)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'metadata'

@joeljosephjin
Copy link
Author

Here is the content of the /home/.config/unity3d/Editor/PLayer.log:

Desktop is 640 x 480 @ 0 Hz
Unable to find a supported OpenGL core profile
Failed to create valid graphics context: please ensure you meet the minimum requirements
E.g. OpenGL core profile 3.2 or later for OpenGL Core renderer
[Vulkan init] extensions: count=15
[Vulkan init] extensions: name=VK_KHR_device_group_creation, enabled=0
[Vulkan init] extensions: name=VK_KHR_display, enabled=1
[Vulkan init] extensions: name=VK_KHR_external_fence_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_memory_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_semaphore_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_get_physical_device_properties2, enabled=0
[Vulkan init] extensions: name=VK_KHR_get_surface_capabilities2, enabled=0
[Vulkan init] extensions: name=VK_KHR_surface, enabled=1
[Vulkan init] extensions: name=VK_KHR_xcb_surface, enabled=0
[Vulkan init] extensions: name=VK_KHR_xlib_surface, enabled=1
[Vulkan init] extensions: name=VK_EXT_acquire_xlib_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_report, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_utils, enabled=0
[Vulkan init] extensions: name=VK_EXT_direct_mode_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_display_surface_counter, enabled=0
Vulkan detection: 0
No supported renderers found, exiting
 
(Filename: ./PlatformDependent/LinuxStandalone/main.cpp Line: 639)

Here is the output of:
glxinfo | grep "version"

server glx version string: 1.4
client glx version string: 1.4
GLX version: 1.4
    Max core profile version: 3.3
    Max compat profile version: 3.1
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.0
OpenGL core profile version string: 3.3 (Core Profile) Mesa 19.2.8
OpenGL core profile shading language version string: 3.30
OpenGL version string: 3.1 Mesa 19.2.8
OpenGL shading language version string: 1.40
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 19.2.8
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00

And here is the output of:
nvidia-smi

Fri May  1 07:22:36 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

It seems unity3d player requires OpenGL "core version" as 3.2+.... which as you can see my machine fits that requirement. Also I don't understand how this could be due to outdated OpenGL since its the latest version nvidia driver on a newly installed google cloud instance with a tesla t4 gpu.

@ZacRavichandran
Copy link
Member

Thanks for providing those diagnostics, that's really helpful.

You're right that OpenGL shouldn't be causing the issue.

I noticed that the output of nvidia-smi does not include a required X server process. If running, it would look something like this in the output of nvidia-smi:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1687      G   /usr/lib/xorg/Xorg                           618MiB |
+-----------------------------------------------------------------------------+

We have a writeup on how to setup a headless machine, which includes running an X server, here. Could you see if any of those instructions help?

@joeljosephjin
Copy link
Author

The last command there sudo X :0 & gave this error:

X.Org X Server 1.19.6
Release Date: 2017-12-20
X Protocol Version 11, Revision 0
Build Operating System: Linux 4.4.0-168-generic x86_64 Ubuntu
Current Operating System: Linux ubuntu-bionic-2 5.3.0-1018-gcp #19~18.04.1-Ubuntu SMP Tue Apr 14 12:49:45 UTC 2020 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-1018-gcp root=UUID=0e3040b1-b682-430d-8f18-b7db7004a9e3 ro scsi_mod.use_blk_mq=Y console=ttyS0
Build Date: 14 November 2019  06:20:00PM
xorg-server 2:1.19.6-1ubuntu4.4 (For technical support please see http://www.ubuntu.com/support) 
Current version of pixman: 0.34.0
	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Fri May  1 21:24:53 2020
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(EE) 
Fatal server error:
(EE) no screens found(EE) 
(EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE) 
(EE) Server terminated with error (1). Closing log file.

I tried running the eval.py command again but the same error comes up. Thank you for helping btw :)

@joeljosephjin
Copy link
Author

Contents of Xorg.0.log file:

[    69.328] 
X.Org X Server 1.19.6
Release Date: 2017-12-20
[    69.328] X Protocol Version 11, Revision 0
[    69.328] Build Operating System: Linux 4.4.0-168-generic x86_64 Ubuntu
[    69.328] Current Operating System: Linux ubuntu-bionic-2 5.3.0-1018-gcp #19~18.04.1-Ubuntu SMP Tue Apr 14 12:49:45 UTC 2020 x86_64
[    69.328] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-1018-gcp root=UUID=0e3040b1-b682-430d-8f18-b7db7004a9e3 ro scsi_mod.use_blk_mq=Y console=ttyS0
[    69.328] Build Date: 14 November 2019  06:20:00PM
[    69.329] xorg-server 2:1.19.6-1ubuntu4.4 (For technical support please see http://www.ubuntu.com/support) 
[    69.329] Current version of pixman: 0.34.0
[    69.329] 	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
[    69.329] Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[    69.329] (==) Log file: "/var/log/Xorg.0.log", Time: Fri May  1 21:24:53 2020
[    69.330] (==) Using config file: "/etc/X11/xorg.conf"
[    69.330] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[    69.380] (==) ServerLayout "Layout0"
[    69.380] (**) |-->Screen "Screen0" (0)
[    69.380] (**) |   |-->Monitor "Monitor0"
[    69.380] (**) |   |-->Device "Device0"
[    69.380] (**) |-->Input Device "Keyboard0"
[    69.380] (**) |-->Input Device "Mouse0"
[    69.380] (==) Automatically adding devices
[    69.380] (==) Automatically enabling devices
[    69.380] (==) Automatically adding GPU devices
[    69.380] (==) Automatically binding GPU devices
[    69.381] (==) Max clients allowed: 256, resource mask: 0x1fffff
[    69.381] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[    69.381] 	Entry deleted from font path.
[    69.381] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[    69.381] 	Entry deleted from font path.
[    69.381] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[    69.381] 	Entry deleted from font path.
[    69.381] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[    69.381] 	Entry deleted from font path.
[    69.381] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[    69.381] 	Entry deleted from font path.
[    69.381] (==) FontPath set to:
	/usr/share/fonts/X11/misc,
	/usr/share/fonts/X11/Type1,
	built-ins
[    69.381] (==) ModulePath set to "/usr/lib/xorg/modules"
[    69.381] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[    69.381] (WW) Disabling Keyboard0
[    69.381] (WW) Disabling Mouse0
[    69.382] (II) Loader magic: 0x556e3830c020
[    69.382] (II) Module ABI versions:
[    69.382] 	X.Org ANSI C Emulation: 0.4
[    69.382] 	X.Org Video Driver: 23.0
[    69.382] 	X.Org XInput driver : 24.1
[    69.382] 	X.Org Server Extension : 10.0
[    69.383] (--) using VT number 2

[    69.383] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
[    69.383] (II) xfree86: Adding drm device (/dev/dri/card0)
[    69.383] (EE) /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied
[    69.385] (--) PCI: (0:0:4:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 0xc0000000/16777216, 0x10000000000/268435456, 0x10010000000/33554432
[    69.385] (II) no primary bus or device found
[    69.385] (II) LoadModule: "glx"
[    69.406] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[    69.443] (II) Module glx: vendor="X.Org Foundation"
[    69.443] 	compiled for 1.19.6, module version = 1.0.0
[    69.443] 	ABI class: X.Org Server Extension, version 10.0
[    69.443] (II) LoadModule: "nvidia"
[    69.443] (WW) Warning, couldn't open module nvidia
[    69.443] (II) UnloadModule: "nvidia"
[    69.443] (II) Unloading nvidia
[    69.443] (EE) Failed to load module "nvidia" (module does not exist, 0)
[    69.443] (==) Matched modesetting as autoconfigured driver 0
[    69.443] (==) Matched fbdev as autoconfigured driver 1
[    69.443] (==) Matched vesa as autoconfigured driver 2
[    69.443] (==) Assigned the driver to the xf86ConfigLayout
[    69.443] (II) LoadModule: "modesetting"
[    69.443] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[    69.462] (II) Module modesetting: vendor="X.Org Foundation"
[    69.462] 	compiled for 1.19.6, module version = 1.19.6
[    69.462] 	Module class: X.Org Video Driver
[    69.462] 	ABI class: X.Org Video Driver, version 23.0
[    69.462] (II) LoadModule: "fbdev"
[    69.462] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[    69.486] (II) Module fbdev: vendor="X.Org Foundation"
[    69.486] 	compiled for 1.19.3, module version = 0.4.4
[    69.486] 	Module class: X.Org Video Driver
[    69.486] 	ABI class: X.Org Video Driver, version 23.0
[    69.486] (II) LoadModule: "vesa"
[    69.487] (II) Loading /usr/lib/xorg/modules/drivers/vesa_drv.so
[    69.514] (II) Module vesa: vendor="X.Org Foundation"
[    69.514] 	compiled for 1.19.3, module version = 2.3.4
[    69.514] 	Module class: X.Org Video Driver
[    69.514] 	ABI class: X.Org Video Driver, version 23.0
[    69.514] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[    69.514] (II) FBDEV: driver for framebuffer: fbdev
[    69.514] (II) VESA: driver for VESA chipsets: vesa
[    69.514] (WW) Falling back to old probe method for modesetting
[    69.515] (WW) Falling back to old probe method for fbdev
[    69.515] (II) Loading sub module "fbdevhw"
[    69.515] (II) LoadModule: "fbdevhw"
[    69.515] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[    69.516] (II) Module fbdevhw: vendor="X.Org Foundation"
[    69.516] 	compiled for 1.19.6, module version = 0.0.2
[    69.516] 	ABI class: X.Org Video Driver, version 23.0
[    69.516] (EE) open /dev/fb0: No such file or directory
[    69.516] (WW) Falling back to old probe method for vesa
[    69.516] (EE) Screen 0 deleted because of no matching config section.
[    69.516] (II) UnloadModule: "modesetting"
[    69.516] (EE) Device(s) detected, but none match those in the config file.
[    69.517] (EE) 
Fatal server error:
[    69.517] (EE) no screens found(EE) 
[    69.517] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[    69.517] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[    69.517] (EE) 
[    69.517] (EE) Server terminated with error (1). Closing log file.

Contents of /etc/X11/xorg.conf file: https://pastebin.com/wFtGvvQv
Contents of /usr/share/X11/xorg.conf.d/10-nvidia.conf file:

Section "OutputClass"
    Identifier "nvidia"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    Option "AllowEmptyInitialConfiguration"
    ModulePath "/usr/lib/x86_64-linux-gnu/nvidia-440/xorg"
EndSection

@lexavtanke
Copy link

Hello,

I am struggling with pretty the same issue. I am also trying to use goseek on google servers.
Here is my /var/log/Xorg.0.log It is a little bit different.

[ 952.445]
X.Org X Server 1.19.6
Release Date: 2017-12-20
[ 952.445] X Protocol Version 11, Revision 0
[ 952.445] Build Operating System: Linux 4.4.0-168-generic x86_64 Ubuntu
[ 952.445] Current Operating System: Linux cbf3aa56d9a9 4.19.104+ #1 SMP Wed Feb 19 05:26:34 PST 2020 x86_64
[ 952.445] Kernel command line: BOOT_IMAGE=/syslinux/vmlinuz.A init=/usr/lib/systemd/systemd boot=local rootwait ro noresume noswap loglevel=7 noinitrd console=ttyS0 security=apparmor virtio_net.napi_tx=1 systemd.unified_cgroup_hierarchy=false systemd.legacy_systemd_cgroup_controller=false csm.disabled=1 dm_verity.error_behavior=3 dm_verity.max_bios=-1 dm_verity.dev_wait=1 i915.modeset=1 cros_efi loadpin.enabled=0 module.sig_enforce=0 root=/dev/dm-0 "dm=1 vroot none ro 1,0 2539520 verity payload=PARTUUID=76B3E38C-A464-A94B-9AB0-DF60201C4CD1 hashtree=PARTUUID=76B3E38C-A464-A94B-9AB0-DF60201C4CD1 hashstart=2539520 alg=sha256 root_hexdigest=434f37a2ee1ed91037365abcdd2fbbdd8bd44393af171292826221bb605c406b salt=0bd8a061534d73349529369f10447da9f13414a20bdbf0686146f776d5867fa8" mitigations=off
[ 952.445] Build Date: 14 November 2019 06:20:00PM
[ 952.445] xorg-server 2:1.19.6-1ubuntu4.4 (For technical support please see http://www.ubuntu.com/support)
[ 952.445] Current version of pixman: 0.34.0
[ 952.445] Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
[ 952.445] Markers: (--) probed, () from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[ 952.445] (==) Log file: "/var/log/Xorg.0.log", Time: Sat May 2 21:41:37 2020
[ 952.445] (==) Using config file: "/etc/X11/xorg.conf"
[ 952.445] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[ 952.446] (==) ServerLayout "Layout0"
[ 952.446] (
) |-->Screen "Screen0" (0)
[ 952.446] () | |-->Monitor "Monitor0"
[ 952.446] (
) | |-->Device "Device0"
[ 952.446] () |-->Input Device "Keyboard0"
[ 952.446] (
) |-->Input Device "Mouse0"
[ 952.446] (==) Automatically adding devices
[ 952.446] (==) Automatically enabling devices
[ 952.446] (==) Automatically adding GPU devices
[ 952.446] (==) Automatically binding GPU devices
[ 952.446] (==) Max clients allowed: 256, resource mask: 0x1fffff
[ 952.446] (WW) The directory "/usr/share/fonts/X11/misc" does not exist.
[ 952.446] Entry deleted from font path.
[ 952.446] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[ 952.446] Entry deleted from font path.
[ 952.446] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[ 952.446] Entry deleted from font path.
[ 952.446] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[ 952.446] Entry deleted from font path.
[ 952.446] (WW) The directory "/usr/share/fonts/X11/Type1" does not exist.
[ 952.446] Entry deleted from font path.
[ 952.446] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[ 952.446] Entry deleted from font path.
[ 952.446] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[ 952.446] Entry deleted from font path.
[ 952.446] (==) FontPath set to:
built-ins
[ 952.446] (==) ModulePath set to "/usr/lib/xorg/modules"
[ 952.446] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[ 952.446] (WW) Disabling Keyboard0
[ 952.446] (WW) Disabling Mouse0
[ 952.446] (II) Loader magic: 0x556c8bb1e020
[ 952.446] (II) Module ABI versions:
[ 952.446] X.Org ANSI C Emulation: 0.4
[ 952.446] X.Org Video Driver: 23.0
[ 952.446] X.Org XInput driver : 24.1
[ 952.446] X.Org Server Extension : 10.0
[ 952.446] (EE) dbus-core: error connecting to system bus: org.freedesktop.DBus.Error.FileNotFound (Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory)
[ 952.449] (--) PCI: (0:0:4:0) 10de:15f8:10de:118f rev 161, Mem @ 0xc0000000/16777216, 0x10000000000/17179869184, 0x10400000000/33554432, I/O @ 0x0000c000/128
[ 952.449] (II) no primary bus or device found
[ 952.449] (II) LoadModule: "glx"
[ 952.449] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[ 952.450] (II) Module glx: vendor="X.Org Foundation"
[ 952.450] compiled for 1.19.6, module version = 1.0.0
[ 952.450] ABI class: X.Org Server Extension, version 10.0
[ 952.450] (II) LoadModule: "nvidia"
[ 952.450] (WW) Warning, couldn't open module nvidia
[ 952.450] (II) UnloadModule: "nvidia"
[ 952.450] (II) Unloading nvidia
[ 952.450] (EE) Failed to load module "nvidia" (module does not exist, 0)
[ 952.450] (==) Matched modesetting as autoconfigured driver 0
[ 952.450] (==) Matched fbdev as autoconfigured driver 1
[ 952.450] (==) Matched vesa as autoconfigured driver 2
[ 952.450] (==) Assigned the driver to the xf86ConfigLayout
[ 952.450] (II) LoadModule: "modesetting"
[ 952.450] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[ 952.450] (II) Module modesetting: vendor="X.Org Foundation"
[ 952.450] compiled for 1.19.6, module version = 1.19.6
[ 952.450] Module class: X.Org Video Driver
[ 952.450] ABI class: X.Org Video Driver, version 23.0
[ 952.450] (II) LoadModule: "fbdev"
[ 952.450] (WW) Warning, couldn't open module fbdev
[ 952.450] (II) UnloadModule: "fbdev"
[ 952.450] (II) Unloading fbdev
[ 952.450] (EE) Failed to load module "fbdev" (module does not exist, 0)
[ 952.450] (II) LoadModule: "vesa"
[ 952.450] (WW) Warning, couldn't open module vesa
[ 952.450] (II) UnloadModule: "vesa"
[ 952.450] (II) Unloading vesa
[ 952.450] (EE) Failed to load module "vesa" (module does not exist, 0)
[ 952.450] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[ 952.450] (EE)
Fatal server error:
[ 952.450] (EE) parse_vt_settings: Cannot open /dev/tty0 (No such file or directory)
[ 952.450] (EE)
[ 952.451] (EE)
Please consult the The X.Org Foundation support
at http://wiki.x.org
for help.
[ 952.451] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[ 952.451] (EE)
[ 952.451] (WW) xf86CloseConsole: KDSETMODE failed: Bad file descriptor
[ 952.451] (WW) xf86CloseConsole: VT_GETMODE failed: Bad file descriptor
[ 952.451] (EE) Server terminated with error (1). Closing log file.

@ZacRavichandran
Copy link
Member

@joeljosephjin np, glad to help 😃 . Hopefully we're close to getting this working! I have two questions:

  1. Could you paste the contents of your /etc/X11/xorg.conf here, or perhaps in a gist? I can't reach to link you sent due to VPN issues.

  2. Were you able to complete up to step 3 on the linked instructions? (here for reference).

@lexavtanke, were you able to get through the linked instructions for setting up a headless server? If so, could you also provide the contents of your /etc/X11/xorg.conf? Thanks!

@lexavtanke
Copy link

@ZacRavichandran Thank you for your fast replay.
Yes, tried to get though your instruction, but it doesn't work for me, and now I think know why.

Here is the solution, but not straightforward, they use openGL.
https://github.com/demotomohiro/remocolab

but the most interesting thing:

  # Without "-seat seat-1" option, Xorg try to open /dev/tty0 but it doesn't exists.
  # You can create /dev/tty0 with "mknod /dev/tty0 c 4 0" but you will get permision denied error.
  subprocess.Popen(["Xorg", "-seat", "seat-1", "-allowMouseOpenFail", "-novtswitch", "-nolisten", "tcp"])

This option doesn't set in your instruction.
Tomorrow I will try your instruction again but with this option, hope It will work too.

Here is the working /etc/X11/xorg.conf:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 418.67
Section "DRI"
	Mode 0666
EndSection

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/mouse"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "Tesla P100-PCIE-16GB"
    BusID          "PCI:0:4:0"
    MatchSeat      "seat-1"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Virtual     1920 1200
        Depth       24
    EndSubSection
EndSection
```

@ZacRavichandran
Copy link
Member

@lexavtanke That's very helpful, thanks for sharing the link! We tested those instructions against some internal servers and AWS, so it's possible that there are additional steps required on Google Cloud or Colab. Please let me know what you find!

@lexavtanke
Copy link

@ZacRavichandran Sad but new config with your instruction doesn't work may be It's because of the versions of driver for video card and kernel. I think so because of this part of code in remocolab

def _setup_nvidia_gl():
  # Install TESLA DRIVER FOR LINUX X64.
  # Kernel module in this driver is already loaded and cannot be neither removed nor updated.
  # (nvidia, nvidia_uvm, nvidia_drm. See dmesg)
  # Version number of nvidia driver for Xorg must match version number of these kernel module.
  # But existing nvidia driver for Xorg might not match.
  # So overwrite them with the nvidia driver that is same version to loaded kernel module.
  ret = subprocess.run(
                  ["nvidia-smi", "--query-gpu=driver_version", "--format=csv,noheader"],
                  stdout = subprocess.PIPE,
                  check = True,
                  universal_newlines = True)

@ZHMA1996
Copy link

ZHMA1996 commented May 7, 2020

@ZacRavichandran faced with same issue. Differently, a X server process was running on my gpu.

python eval.py --agent-config baselines/config/random-agent.yaml --episode-config config/check-ground-truth.yaml give this

  • No protocol specified
    Set current directory to /home/zhma/goseek-challenge
    Found path: /home/zhma/goseek-challenge/simulator/goseek-v0.1.4.x86_64
    Mono path[0] = '/home/zhma/goseek-challenge/simulator/goseek-v0.1.4_Data/Managed'
    Mono config path = '/home/zhma/goseek-challenge/simulator/goseek-v0.1.4_Data/MonoBleedingEdge/etc'
    Preloaded 'ScreenSelector.so'
    Logging to /home/zhma/.config/unity3d/Editor/Player.log
    No protocol specified
    Traceback (most recent call last):
    File "eval.py", line 85, in
    results = main(episode_cfg, agent_args)
    File "eval.py", line 63, in main
    ground_truth_mode=episode_args.ENV.ground_truth_mode,
    File "/home/zhma/anaconda3/envs/goseek/lib/python3.7/site-packages/tesse_gym-0.1.3-py3.7.egg/tesse_gym/tasks/goseek/goseek_benchmark.py", line 70, in init
    File "/home/zhma/anaconda3/envs/goseek/lib/python3.7/site-packages/tesse_gym-0.1.3-py3.7.egg/tesse_gym/tasks/goseek/goseek.py", line 97, in init
    File "/home/zhma/anaconda3/envs/goseek/lib/python3.7/site-packages/tesse_gym-0.1.3-py3.7.egg/tesse_gym/core/tesse_gym.py", line 152, in init
    File "/home/zhma/anaconda3/envs/goseek/lib/python3.7/site-packages/tesse_gym-0.1.3-py3.7.egg/tesse_gym/core/tesse_gym.py", line 343, in _init_pose
    AttributeError: 'NoneType' object has no attribute 'metadata'

@ZacRavichandran
Copy link
Member

@lexavtanke ah yes, it looks like there's a bit of configuration required to get a proper virtual display running on colab. I'll see if I can track down a solution. In the meantime, please let me know if you find any useful resources!

@ZacRavichandran
Copy link
Member

@ZHMA1996 I noticed that there is no Display value in your output. Normally, we would expect to see something like

...
Preloaded 'ScreenSelector.so'
Display 0 '0': 3840x2160 (primary device).
...

Sometimes this is because the DISPLAY environment variable has not been set. In the same terminal with which you run eval.py, could you try running

export DISPLAY=:0

Thanks!

@ZacRavichandran
Copy link
Member

Should that be export DISPLAY=:3 instead of export DISPLAY:=3 :) ?

And to confirm, are you running this remotely?

@ZHMA1996
Copy link

ZHMA1996 commented May 7, 2020

Thanks for your reply, i had tried the command that you presented. However, it doesn't work.
Here is the output of the 'nvidia-smi'
Screenshot from 2020-05-07 21-40-34
Screenshot from 2020-05-07 21-40-34

Either run the command export DISPLAY=:0 or export DISPLAY:=3 didn't help.

Here is the output after running eval.py

Screenshot from 2020-05-07 21-44-39

@ZHMA1996
Copy link

ZHMA1996 commented May 7, 2020

Yes, i am running this remotely

@ZacRavichandran
Copy link
Member

It's odd that the simulator is still not finding the display. Could you try two things to dig into this further?

  1. double check the value of DISPLAY
>>> echo $DISPLAY
  1. Launch the simulator directly and observe the output
>>> cd ~/goseek-challenge/simulator
>>> ./goseek-v0.1.4.x86_64

@ZHMA1996
Copy link

ZHMA1996 commented May 7, 2020

I tried what you presented, and here is the output

Screenshot from 2020-05-07 22-39-08

It seems nothing happened.

@ZacRavichandran
Copy link
Member

Ok thanks, that's helpful.

To confirm the issue is with the display and not Unity, could you try to test via glxgears? This should look like the following

>>> glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
301 frames in 5.0 seconds = 60.150 FPS
300 frames in 5.0 seconds = 59.995 FPS

@ZHMA1996
Copy link

ZHMA1996 commented May 7, 2020

@ZacRavichandran

The issue should be concerned with the display.

Here is the output after testing via glxgears

Error: couldn't open display (null)

@joeljosephjin
Copy link
Author

so far, GCP,AWS, RemoColab and Oracle Cloud instances show this connection error, no matter what.
But Genesis Cloud Instance works.

@ZacRavichandran
Copy link
Member

Which AWS instance type are you using? We're using G4 instances with Nvidia driver version 440.64 and CUDA 10.2.

On our side, everything works as expected after setting up a virtual display. Could we walk through the steps you used to configure your AWS instance? Hopefully that'll solve the issues you're seeing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants