Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perception module won't start with RTX 2080 Ti #12088

Open
dfremont opened this issue Aug 11, 2020 · 45 comments
Open

Perception module won't start with RTX 2080 Ti #12088

dfremont opened this issue Aug 11, 2020 · 45 comments
Assignees
Labels
Module: CarOS Indicates ROS related issues Module: Perception Indicates perception related issues

Comments

@dfremont
Copy link

Per @lemketron and @storypku's discussion here, I'm opening a new issue about the Perception module not working on my machine, which has an RTX 2080 Ti. Please let me know if there's any additional information I can provide to help debug the problem.

Describe the bug
The Perception module will not start. When activated in Dreamview, the slider moves immediately back to "off". No perception.INFO file is produced at all.

To Reproduce
Here are the steps I took:

  1. Start and enter Docker container.
  2. ./scripts/bootstrap_lgsvl.sh
  3. ./scripts/bridge.sh
  4. Start LGSVL Simulator (outside the container) and begin simulation (I used Borregas Ave.).
  5. In Dreamview, select mode, map, and vehicle.
  6. In Dreamview, turn on Localization and Transform modules (map then appeared correctly in Dreamview).
  7. Turn on Perception module (module did not start, as described above).

System setup:

  • OS: Ubuntu 18.04.5 LTS
  • GPU: NVIDIA RTX 2080 Ti
  • Apollo version: built from master branch, commit dbc9f1b
@jinghaomiao
Copy link
Contributor

@dfremont Perception had coredump and was fixed. Please pull the latest update and let us know how it works. Thanks for using Apollo.

@jinghaomiao jinghaomiao added Module: CarOS Indicates ROS related issues Module: Perception Indicates perception related issues labels Aug 11, 2020
@lfcarol
Copy link
Contributor

lfcarol commented Aug 11, 2020

The "RTX 2080 Ti" have been supported by the docker image. And I guess maybe the others reason can't work normally. Please wait me check it tomorrow.

@rongguodong
Copy link
Contributor

I pulled the latest master and still cannot start "perception" module. When I tried to start it manually in command line, I got the following errors:

[guodong@in-dev-docker:/apollo]$ cyber_launch start modules/perception/production/launch/perception.launch
[cyber_launch_11585] INFO Launch file [/apollo/modules/perception/production/launch/perception.launch]
[cyber_launch_11585] INFO ========================================================================================================================
[cyber_launch_11585] INFO Load module [perception] library: [lidar_perception] [CYBER_DEFAULT] conf: [/apollo/modules/perception/production/dag/dag_streaming_perception.dag] exception_handler: []
[cyber_launch_11585] INFO Start process [lidar_perception] successfully. pid: 11590
[cyber_launch_11585] INFO ------------------------------------------------------------------------------------------------------------------------
[cyber_launch_11585] INFO Load module [perception_camera] library: [camera_perception] [CYBER_DEFAULT] conf: [/apollo/modules/perception/production/dag/dag_streaming_perception_camera.dag] exception_handler: []
[cyber_launch_11585] INFO Start process [camera_perception] successfully. pid: 11592
[cyber_launch_11585] INFO ------------------------------------------------------------------------------------------------------------------------
[cyber_launch_11585] INFO Load module [motion_service] library: [motion_service] [CYBER_DEFAULT] conf: [/apollo/modules/perception/production/dag/dag_motion_service.dag] exception_handler: []
[cyber_launch_11585] INFO Start process [motion_service] successfully. pid: 11594
[cyber_launch_11585] INFO ------------------------------------------------------------------------------------------------------------------------
[lidar_perception] WARNING: Logging before InitGoogleLogging() is written to STDERR
[lidar_perception] I0811 10:42:30.886588 11590 module_argument.cc:81] []command: mainboard -d /apollo/modules/perception/production/dag/dag_streaming_perception.dag -p lidar_perception -s CYBER_DEFAULT
[lidar_perception] I0811 10:42:30.886924 11590 global_data.cc:153] []host ip: 192.168.0.185
[lidar_perception] I0811 10:42:30.887106 11590 module_argument.cc:57] []binary_name_ is mainboard, process_group_ is lidar_perception, has 1 dag conf
[lidar_perception] I0811 10:42:30.887110 11590 module_argument.cc:60] []dag_conf: /apollo/modules/perception/production/dag/dag_streaming_perception.dag
[motion_service] WARNING: Logging before InitGoogleLogging() is written to STDERR
[motion_service] I0811 10:42:30.887111 11594 module_argument.cc:81] []command: mainboard -d /apollo/modules/perception/production/dag/dag_motion_service.dag -p motion_service -s CYBER_DEFAULT
[motion_service] I0811 10:42:30.887447 11594 global_data.cc:153] []host ip: 192.168.0.185
[motion_service] I0811 10:42:30.887624 11594 module_argument.cc:57] []binary_name_ is mainboard, process_group_ is motion_service, has 1 dag conf
[motion_service] I0811 10:42:30.887630 11594 module_argument.cc:60] []dag_conf: /apollo/modules/perception/production/dag/dag_motion_service.dag
[camera_perception] WARNING: Logging before InitGoogleLogging() is written to STDERR
[camera_perception] I0811 10:42:30.887990 11592 module_argument.cc:81] []command: mainboard -d /apollo/modules/perception/production/dag/dag_streaming_perception_camera.dag -p camera_perception -s CYBER_DEFAULT
[camera_perception] I0811 10:42:30.888280 11592 global_data.cc:153] []host ip: 192.168.0.185
[camera_perception] I0811 10:42:30.888448 11592 module_argument.cc:57] []binary_name_ is mainboard, process_group_ is camera_perception, has 1 dag conf
[camera_perception] I0811 10:42:30.888456 11592 module_argument.cc:60] []dag_conf: /apollo/modules/perception/production/dag/dag_streaming_perception_camera.dag
[camera_perception] E0811 10:42:30.903358 11592 module_controller.cc:87] [mainboard]Path does not exist: /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_camera.so
[camera_perception] E0811 10:42:30.903404 11592 module_controller.cc:67] [mainboard]Failed to load module: /apollo/modules/perception/production/dag/dag_streaming_perception_camera.dag
[camera_perception] E0811 10:42:30.903412 11592 mainboard.cc:39] [mainboard]module start error.
[lidar_perception] E0811 10:42:30.920011 11590 class_loader_utility.cc:220] [mainboard]poco LibraryLoadException: libc10.so: cannot open shared object file: No such file or directory
[lidar_perception] E0811 10:42:30.920049 11590 class_loader_utility.cc:236] [mainboard]poco shared library failed: /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_lidar.so
[lidar_perception] E0811 10:42:30.920063 11590 class_loader_manager.h:70] [mainboard]Invalid class name: DetectionComponent
[lidar_perception] E0811 10:42:30.920076 11590 module_controller.cc:67] [mainboard]Failed to load module: /apollo/modules/perception/production/dag/dag_streaming_perception.dag
[lidar_perception] E0811 10:42:30.920083 11590 class_loader_utility.cc:258] [mainboard]Attempt to UnloadLibrary lib, but can't find lib: /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_lidar.so
[lidar_perception] E0811 10:42:30.920089 11590 mainboard.cc:39] [mainboard]module start error.
[motion_service] E0811 10:42:30.980911 11594 module_controller.cc:87] [mainboard]Path does not exist: /apollo/bazel-bin/modules/perception/camera/lib/motion_service/libmotion_service.so
[motion_service] E0811 10:42:30.980934 11594 module_controller.cc:67] [mainboard]Failed to load module: /apollo/modules/perception/production/dag/dag_motion_service.dag
[motion_service] E0811 10:42:30.980940 11594 mainboard.cc:39] [mainboard]module start error.
[camera_perception]
[lidar_perception]
[motion_service]
[cyber_launch_11585] ERROR Process [lidar_perception] has finished. [pid 11590, cmd mainboard -d /apollo/modules/perception/production/dag/dag_streaming_perception.dag -p lidar_perception -s CYBER_DEFAULT].
[cyber_launch_11585] ERROR Process [camera_perception] has finished. [pid 11592, cmd mainboard -d /apollo/modules/perception/production/dag/dag_streaming_perception_camera.dag -p camera_perception -s CYBER_DEFAULT].
[cyber_launch_11585] ERROR Process [motion_service] has finished. [pid 11594, cmd mainboard -d /apollo/modules/perception/production/dag/dag_motion_service.dag -p motion_service -s CYBER_DEFAULT].
[cyber_launch_11585] INFO All processes has died.
[cyber_launch_11585] INFO Cyber exit.
[cyber_launch_11585] INFO All processes have been stopped.

@rongguodong
Copy link
Contributor

BTW: The latest master seems missing environment setting. I have to do "source cyber/setup.bash" before I can run any cyber commands (e.g. cyber_monitor, cyber_launch, etc.). Is this a new bug?

@jinghaomiao
Copy link
Contributor

Camera based perception is not working at this moment because of system and dependency upgrade. Please either use the toggle box on dreamview to turn on perception, or use "apollo/modules/perception/production/launch/perception_lidar.launch" to launch lidar based perception.

@jinghaomiao
Copy link
Contributor

BTW: The latest master seems missing environment setting. I have to do "source cyber/setup.bash" before I can run any cyber commands (e.g. cyber_monitor, cyber_launch, etc.). Is this a new bug?

@storypku @changsh726 please take a look

@rongguodong
Copy link
Contributor

Camera based perception is not working at this moment because of system and dependency upgrade. Please either use the toggle box on dreamview to turn on perception, or use "apollo/modules/perception/production/launch/perception_lidar.launch" to launch lidar based perception.

Does it mean the "traffic light" detection is also not working?

This case has been there for long time. Do you have any target date to fix it?

@jinghaomiao
Copy link
Contributor

Camera based perception is not working at this moment because of system and dependency upgrade. Please either use the toggle box on dreamview to turn on perception, or use "apollo/modules/perception/production/launch/perception_lidar.launch" to launch lidar based perception.

Does it mean the "traffic light" detection is also not working?

This case has been there for long time. Do you have any target date to fix it?

Traffic light is not working either. The team is working on an upgrade. Thanks!

@rongguodong
Copy link
Contributor

Just tried lidar based perception and get the following errors:

[guodong@in-dev-docker:/apollo]$ cyber_launch start modules/perception/production/launch/perception_lidar.launch
[cyber_launch_11104] INFO Launch file [/apollo/modules/perception/production/launch/perception_lidar.launch]
[cyber_launch_11104] INFO ========================================================================================================================
[cyber_launch_11104] INFO Load module [perception] library: [lidar_perception] [CYBER_DEFAULT] conf: [/apollo/modules/perception/production/dag/dag_streaming_perception_lidar.dag] exception_handler: []
[cyber_launch_11104] INFO Start process [lidar_perception] successfully. pid: 11109
[cyber_launch_11104] INFO ------------------------------------------------------------------------------------------------------------------------
[lidar_perception] WARNING: Logging before InitGoogleLogging() is written to STDERR
[lidar_perception] I0811 11:48:26.392305 11109 module_argument.cc:81] []command: mainboard -d /apollo/modules/perception/production/dag/dag_streaming_perception_lidar.dag -p lidar_perception -s CYBER_DEFAULT
[lidar_perception] I0811 11:48:26.392665 11109 global_data.cc:153] []host ip: 192.168.0.185
[lidar_perception] I0811 11:48:26.392858 11109 module_argument.cc:57] []binary_name_ is mainboard, process_group_ is lidar_perception, has 1 dag conf
[lidar_perception] I0811 11:48:26.392864 11109 module_argument.cc:60] []dag_conf: /apollo/modules/perception/production/dag/dag_streaming_perception_lidar.dag
[lidar_perception] E0811 11:48:26.596405 11109 class_loader_utility.cc:220] [mainboard]poco LibraryLoadException: libc10.so: cannot open shared object file: No such file or directory
[lidar_perception] E0811 11:48:26.596462 11109 class_loader_utility.cc:236] [mainboard]poco shared library failed: /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_lidar.so
[lidar_perception] E0811 11:48:26.596477 11109 class_loader_manager.h:70] [mainboard]Invalid class name: DetectionComponent
[lidar_perception] E0811 11:48:26.596493 11109 module_controller.cc:67] [mainboard]Failed to load module: /apollo/modules/perception/production/dag/dag_streaming_perception_lidar.dag
[lidar_perception] E0811 11:48:26.596505 11109 class_loader_utility.cc:258] [mainboard]Attempt to UnloadLibrary lib, but can't find lib: /apollo/bazel-bin/modules/perception/onboard/component/libperception_component_lidar.so
[lidar_perception] E0811 11:48:26.596513 11109 mainboard.cc:39] [mainboard]module start error.
[lidar_perception]
[cyber_launch_11104] ERROR Process [lidar_perception] has finished. [pid 11109, cmd mainboard -d /apollo/modules/perception/production/dag/dag_streaming_perception_lidar.dag -p lidar_perception -s CYBER_DEFAULT].
[cyber_launch_11104] INFO All processes has died.
[cyber_launch_11104] INFO Cyber exit.
[cyber_launch_11104] INFO All processes have been stopped.

@jinghaomiao
Copy link
Contributor

jinghaomiao commented Aug 11, 2020

Thanks for reporting the error. The team will take a look. @jeroldchen @lfcarol

@storypku
Copy link
Contributor

storypku commented Aug 12, 2020

BTW: The latest master seems missing environment setting. I have to do "source cyber/setup.bash" before I can run any cyber commands (e.g. cyber_monitor, cyber_launch, etc.). Is this a new bug?

Yep. Try running source scripts/apollo_base.sh before running cyber commands as a workaround.

And it was fixed in #12106 @rongguodong

@lfcarol
Copy link
Contributor

lfcarol commented Aug 12, 2020

Just tried lidar based perception and get the following errors:

[guodong@in-dev-docker:/apollo]$ cyber_launch start modules/perception/production/launch/perception_lidar.launch
[lidar_perception]
[cyber_launch_11104] ERROR Process [lidar_perception] has finished. [pid 11109, cmd mainboard -d /apollo/modules/perception/production/dag/dag_streaming_perception_lidar.dag -p lidar_perception -s CYBER_DEFAULT].
[cyber_launch_11104] INFO All processes has died.
[cyber_launch_11104] INFO Cyber exit.
[cyber_launch_11104] INFO All processes have been stopped.

Please source /apollo/scripts/apollo_base.sh

@lfcarol
Copy link
Contributor

lfcarol commented Aug 12, 2020

Per @lemketron and @storypku's discussion here, I'm opening a new issue about the Perception module not working on my machine, which has an RTX 2080 Ti. Please let me know if there's any additional information I can provide to help debug the problem.

Describe the bug
The Perception module will not start. When activated in Dreamview, the slider moves immediately back to "off". No perception.INFO file is produced at all.

To Reproduce
Here are the steps I took:

  1. Start and enter Docker container.
  2. ./scripts/bootstrap_lgsvl.sh
  3. ./scripts/bridge.sh
  4. Start LGSVL Simulator (outside the container) and begin simulation (I used Borregas Ave.).
  5. In Dreamview, select mode, map, and vehicle.
  6. In Dreamview, turn on Localization and Transform modules (map then appeared correctly in Dreamview).
  7. Turn on Perception module (module did not start, as described above).

System setup:

  • OS: Ubuntu 18.04.5 LTS
  • GPU: NVIDIA RTX 2080 Ti
  • Apollo version: built from master branch, commit dbc9f1b

Hello, I want to know you choose which "setup mode". Is "Mkz lgsvl" ?

@dfremont
Copy link
Author

dfremont commented Aug 12, 2020

Hello, I want to know you choose which "setup mode". Is "Mkz lgsvl" ?

Yes (sorry for not making that clear). Also, I just tried rebuilding Apollo to see if the commits since Monday fixed the problem, but now it won't compile:

(16:03:34) ERROR: /apollo/modules/audio/inference/BUILD:6:11: C++ compilation of rule '//modules/audio/inference:moving_detection' failed (Exit 1)
modules/audio/inference/moving_detection.cc:19:10: fatal error: fftw3.h: No such file or directory
 #include <fftw3.h>

@storypku
Copy link
Contributor

storypku commented Aug 12, 2020

Hello, I want to know you choose which "setup mode". Is "Mkz lgsvl" ?

Yes (sorry for not making that clear). Also, I just tried rebuilding Apollo to see if the commits since Monday fixed the problem, but now it won't compile:

(16:03:34) ERROR: /apollo/modules/audio/inference/BUILD:6:11: C++ compilation of rule '//modules/audio/inference:moving_detection' failed (Exit 1)
modules/audio/inference/moving_detection.cc:19:10: fatal error: fftw3.h: No such file or directory
 #include <fftw3.h>

Try git pull the latest code, and rerun dev_start.sh/dev_into.sh.

Or, install fftw3 it manually via sudo apt-get -y update && sudo apt-get -y install libfftw3-dev.

BTW, could you please show me the output of ./apollo.sh config ?

@dfremont
Copy link
Author

Try git pull the latest code, and rerun dev_start.sh/dev_into.sh.

Thanks, I had done git pull but didn't realize I had to restart the container also. It now compiles, but has the same problem as before (Perception won't start).

BTW, could you please show me the output of ./apollo.sh config ?

[INFO] Apollo Environment Settings:
[INFO]     APOLLO_ROOT_DIR: /apollo
[INFO]     APOLLO_CACHE_DIR: /apollo/.cache
[INFO]     APOLLO_IN_DOCKER: true
[INFO]     APOLLO_VERSION: master-2020-08-12-935b1a2937
[INFO]     DOCKER_IMG: dev-x86_64-18.04-20200811_2001
[INFO]     APOLLO_ENV:  STAGE=dev USE_ESD_CAN=false USE_GPU=1

@lfcarol
Copy link
Contributor

lfcarol commented Aug 13, 2020

Hello, I want to know you choose which "setup mode". Is "Mkz lgsvl" ?

Yes (sorry for not making that clear). Also, I just tried rebuilding Apollo to see if the commits since Monday fixed the problem, but now it won't compile:

(16:03:34) ERROR: /apollo/modules/audio/inference/BUILD:6:11: C++ compilation of rule '//modules/audio/inference:moving_detection' failed (Exit 1)
modules/audio/inference/moving_detection.cc:19:10: fatal error: fftw3.h: No such file or directory
 #include <fftw3.h>

Please pull the latest master branch, try again. Perception will be able to work.
And give me feedback.

@storypku
Copy link
Contributor

storypku commented Aug 13, 2020

Try git pull the latest code, and rerun dev_start.sh/dev_into.sh.

Thanks, I had done git pull but didn't realize I had to restart the container also. It now compiles, but has the same problem as before (Perception won't start).

BTW, could you please show me the output of ./apollo.sh config ?

[INFO] Apollo Environment Settings:
[INFO]     APOLLO_ROOT_DIR: /apollo
[INFO]     APOLLO_CACHE_DIR: /apollo/.cache
[INFO]     APOLLO_IN_DOCKER: true
[INFO]     APOLLO_VERSION: master-2020-08-12-935b1a2937
[INFO]     DOCKER_IMG: dev-x86_64-18.04-20200811_2001
[INFO]     APOLLO_ENV:  STAGE=dev USE_ESD_CAN=false USE_GPU=1

The lidar part should work now, cyber_launch start modules/perception/production/launch/perception_lidar.launch should run smoothly.
And the team is still working to get other parts ready.
We should let everyone noticed once completed ASAP.

@jinghaomiao
Copy link
Contributor

@dfremont @rongguodong This ticket is being closed. Please feel free to open new issues if needed. Thanks for using Apollo.

@dfremont
Copy link
Author

The lidar part should work now, cyber_launch start modules/perception/production/launch/perception_lidar.launch should run smoothly.

Not working for me, unfortunately. I now get many errors like this:

[lidar_perception]  E0813 13:44:20.566957  7464 detection_component.cc:132] [mainboard]Failed to get pose at time: 1.59735e+09
[lidar_perception]  E0813 13:44:20.688186  7461 transform_wrapper.cc:222] [mainboard]Can not find transform. 1.59735e+09 frame_id: world child_frame_id: novatel Error info: Lookup would require extrapolation into the future.  Requested time 1597351453275430912 but the latest data is at time 1597351453195430912, when looking up transform from frame [novatel] to frame [world]

One time the process quit after some of these errors; other times it stayed running but the "Perception" slider didn't turn on.

@jinghaomiao
Copy link
Contributor

The lidar part should work now, cyber_launch start modules/perception/production/launch/perception_lidar.launch should run smoothly.

Not working for me, unfortunately. I now get many errors like this:

[lidar_perception]  E0813 13:44:20.566957  7464 detection_component.cc:132] [mainboard]Failed to get pose at time: 1.59735e+09
[lidar_perception]  E0813 13:44:20.688186  7461 transform_wrapper.cc:222] [mainboard]Can not find transform. 1.59735e+09 frame_id: world child_frame_id: novatel Error info: Lookup would require extrapolation into the future.  Requested time 1597351453275430912 but the latest data is at time 1597351453195430912, when looking up transform from frame [novatel] to frame [world]

One time the process quit after some of these errors; other times it stayed running but the "Perception" slider didn't turn on.

@storypku @lfcarol Please check if it's TF related

@rongguodong
Copy link
Contributor

The lidar part should work now, cyber_launch start modules/perception/production/launch/perception_lidar.launch should run smoothly.

Not working for me, unfortunately. I now get many errors like this:

[lidar_perception]  E0813 13:44:20.566957  7464 detection_component.cc:132] [mainboard]Failed to get pose at time: 1.59735e+09
[lidar_perception]  E0813 13:44:20.688186  7461 transform_wrapper.cc:222] [mainboard]Can not find transform. 1.59735e+09 frame_id: world child_frame_id: novatel Error info: Lookup would require extrapolation into the future.  Requested time 1597351453275430912 but the latest data is at time 1597351453195430912, when looking up transform from frame [novatel] to frame [world]

One time the process quit after some of these errors; other times it stayed running but the "Perception" slider didn't turn on.

I got the same errors when running Lidar-perception. I found out one of the following two changes can make it work:

  1. Increase the TF buffer size (https://github.com/ApolloAuto/apollo/blob/master/modules/perception/production/conf/perception/perception_common.flag#L69) from 0.01 to 0.1; or
  2. Change the querry_time (https://github.com/ApolloAuto/apollo/blob/master/modules/perception/onboard/transform_wrapper/transform_wrapper.cc#L217) to 0.

However, I still want to understand why Lidar timestamp can be larger than TF timestamp without the above changes, and why the above changes can make it work. Any ideas would be helpful!

@changsh726
Copy link
Contributor

The lidar part should work now, cyber_launch start modules/perception/production/launch/perception_lidar.launch should run smoothly.

Not working for me, unfortunately. I now get many errors like this:

[lidar_perception]  E0813 13:44:20.566957  7464 detection_component.cc:132] [mainboard]Failed to get pose at time: 1.59735e+09
[lidar_perception]  E0813 13:44:20.688186  7461 transform_wrapper.cc:222] [mainboard]Can not find transform. 1.59735e+09 frame_id: world child_frame_id: novatel Error info: Lookup would require extrapolation into the future.  Requested time 1597351453275430912 but the latest data is at time 1597351453195430912, when looking up transform from frame [novatel] to frame [world]

One time the process quit after some of these errors; other times it stayed running but the "Perception" slider didn't turn on.

The slider will turn on after all related components have been started. It may take a while for some complex modules like perception that contains multiple components.

@lfcarol
Copy link
Contributor

lfcarol commented Aug 14, 2020

The lidar part should work now, cyber_launch start modules/perception/production/launch/perception_lidar.launch should run smoothly.

Not working for me, unfortunately. I now get many errors like this:

[lidar_perception]  E0813 13:44:20.566957  7464 detection_component.cc:132] [mainboard]Failed to get pose at time: 1.59735e+09
[lidar_perception]  E0813 13:44:20.688186  7461 transform_wrapper.cc:222] [mainboard]Can not find transform. 1.59735e+09 frame_id: world child_frame_id: novatel Error info: Lookup would require extrapolation into the future.  Requested time 1597351453275430912 but the latest data is at time 1597351453195430912, when looking up transform from frame [novatel] to frame [world]

One time the process quit after some of these errors; other times it stayed running but the "Perception" slider didn't turn on.

Please try again with the startup order : transform->perception-> ... . Because the incorrect order can occur the error.

@storypku storypku reopened this Aug 14, 2020
@rongguodong
Copy link
Contributor

@storypku It is simple to reproduce this locally: basically, just follow our instructions here until the step saying "Open the Module Controller tap". Then, turn on "localization" and "transform" modules.
Next, using "cyber_launch start modules/perception/production/launch/lidar_perception.launch" to start lidar-based perception module.

Now, you should see lots of errors about timestamp in the console (as @dfremont posted above).

Note that this may be an old issue. We do not have this error in our fork of Apollo 5.0 because we applied the second change I posted above (i.e. changed the querry_time from "timestamp" to "0"). But this change seems to be hacky. I would like to understand the reason behind it and find out a better solution.

@storypku
Copy link
Contributor

@storypku It is simple to reproduce this locally: basically, just follow our instructions here until the step saying "Open the Module Controller tap". Then, turn on "localization" and "transform" modules.
Next, using "cyber_launch start modules/perception/production/launch/lidar_perception.launch" to start lidar-based perception module.

Now, you should see lots of errors about timestamp in the console (as @dfremont posted above).

Note that this may be an old issue. We do not have this error in our fork of Apollo 5.0 because we applied the second change I posted above (i.e. changed the querry_time from "timestamp" to "0"). But this change seems to be hacky. I would like to understand the reason behind it and find out a better solution.

Thank Guodong for the updates. We will check it asap.

@jinghaomiao
Copy link
Contributor

Thanks @dfremont @rongguodong . The team is looking into this issue: @storypku @jeroldchen @lfcarol

@lemketron
Copy link
Contributor

I got the same errors when running Lidar-perception. I found out one of the following two changes can make it work:

  1. Increase the TF buffer size (https://github.com/ApolloAuto/apollo/blob/master/modules/perception/production/conf/perception/perception_common.flag#L69) from 0.01 to 0.1

This (mostly) worked for me. It got perception to stop crashing immediately on launch. However, I noticed that LiDAR perception is taking 4GB of GPU memory and we don't even have camera perception (traffic light) going...

I hope camera perception will be working again soon, and hope it still uses less GPU memory than the LiDAR one (in 5.0, it uses 1165MB, and traffic light adds another 1088MB for a total of 2253MB, just over half what LiDAR perception alone consumes in master now).

In any case I'm happy to report that I'm actually seeing Apollo master able to drive (on small maps, without traffic light perception) in LGSVL Simulator on a Razer Blade i7 laptop with a 8GB RTX 2070 MaxQ. :-)

@jeroldchen
Copy link
Contributor

@storypku It is simple to reproduce this locally: basically, just follow our instructions here until the step saying "Open the Module Controller tap". Then, turn on "localization" and "transform" modules.
Next, using "cyber_launch start modules/perception/production/launch/lidar_perception.launch" to start lidar-based perception module.

Now, you should see lots of errors about timestamp in the console (as @dfremont posted above).

Note that this may be an old issue. We do not have this error in our fork of Apollo 5.0 because we applied the second change I posted above (i.e. changed the querry_time from "timestamp" to "0"). But this change seems to be hacky. I would like to understand the reason behind it and find out a better solution.

@rongguodong @lemketron Changing query_time to 0 will call the getLatestCommonTime function from a third party lib tf2 to get the latest common time of target frame and source frame (

if (time == 0) {
int retval = getLatestCommonTime(target_id, source_id, time, error_string);
if (retval != tf2_msgs::TF2Error::NO_ERROR) {
return retval;
}
}
), so that the transform from novetal to world can be found through a common timestamp. However, it is not a stable strategy. The key to make lidar perception work in LGSVL is the lidar_query_tf_offset param. In #12163, I add a config for velodyne 128 detection in LGSVL where I set lidar_query_tf_offset to be 200, which means that LGSVL's lidar 128 has a slight time delay. This setting is same as https://github.com/ApolloAuto/apollo/blob/master/modules/perception/production/conf/perception/lidar/velodyne128_segmentation_conf_lgsvl.pb.txt. It should make lidar perception work again.

@rongguodong
Copy link
Contributor

@storypku @jeroldchen Thanks for your replies!
For the "time == 0" if-branch, can you explain more what it does and what is the difference between this branch and other (i.e. time <> 0)?

I tried to add the 200ms delay as in the PR. I do not have the errors of timestamp any more. Previously, I thought this 200ms delay is because of different time between simulator and Apollo, and it should not be needed once Apollo is using sim_time. However, even if I set "MODE_MOCK" in Cyber config, I still get the timestamp errors if I do not have this 200ms delay. Do you have a better explanation of why we still need it now?

Even for now, I still have several issues:

  1. Since image based perception is not available, there is no traffic signal. So the ego can only drive to intersection and wait there for signals. I have to manually drive it crossing the signal and then it can continue drive automatically. Is this an expected behavior of current master?
  2. Although I do not see errors for Lidar based perception now, and cyber_monitor shows it as around 10Hz, I see the detected obstacles in Dreamview are very unstable. It seems it can only detect NPCs very close to ego. For further NPCs, they are very hard to be detected. For example, at the initial position of the Borregas map, it cannot detect any NPCs in the intersection. Only when then they drive down to ego car, they can be detected. The detection range seems to be only 10-20 meters.
  3. When I enabled point cloud display in Dreamview, sometimes I do not see point cloud at all. Sometimes I can see point cloud, but they are extremely delayed (or do not update at all). For example, we only have trees to the right of the ego at the initial potion of Borregas. When I drive the ego in the middle of the intersection, the point cloud of trees are still to the right of the ego, which is wrong. Do you see this issue on your end? Not sure if it is just a visualization bug or not. If not, it may be related to point 2 above.
    (I will try to capture a video showing point 2 & 3 soon.)

@rongguodong
Copy link
Contributor

rongguodong commented Aug 18, 2020

https://drive.google.com/file/d/1cHc9OPJwGTvSkE7PtRyo8Eik-iyKB6H_/view?usp=sharing
Here is a video showing both issues 2&3. The first half of the video shows correct point clouds -- when ego moves, the point cloud of trees stay statically (which is correct). However, the NPCs can only be detected when they are very close to ego.
The second half of the video (around 1 minute to the end) shows the "frozen" point cloud. Where two buses on the left stopped there, and all the point cloud moves together with the ego, which is wrong.

@rongguodong
Copy link
Contributor

rongguodong commented Aug 20, 2020

https://drive.google.com/file/d/1FLhbm3shwVws_W0D98lb0gxVf7Y-TtI-/view?usp=sharing
I tested our fork of Apollo 5.0, and here is the video. As you can see, perception can recognize very far away NPCs stablely, and when ego moves, the point cloud does not move with it.

Our fork of Apollo 5.0 can be found at: https://github.com/lgsvl/apollo-5.0. It has the change of querry_time to 0 (i.e. the #2 change I mentioned above).

For this video, I disabled camera-based perception (by deleting "dag_streaming_perception_camera.dag" from https://github.com/lgsvl/apollo-5.0/blob/simulator/modules/perception/production/launch/perception.launch). So I think this means only Lidar-based perception is used (right?). But it behaves much better than the Lidar-based perception in master (as shown in the video I posted yesterday).

@rongguodong
Copy link
Contributor

So, there are two questions here:

  1. Why old (5.0) Lidar-based perception works much better than the one in latest master?
  2. Why do we need the change of querry_time to 0?

@rongguodong
Copy link
Contributor

After more debugging, I am confused with the TF buffer size (https://github.com/ApolloAuto/apollo/blob/master/modules/perception/production/conf/perception/perception_common.flag#L69):

The default value is 0.01. This value is used as "timeout_second" here (https://github.com/ApolloAuto/apollo/blob/master/modules/transform/buffer.cc#L194). As we can see in the code below, it waited there (by using a while-loop) for "timeout_second" to see if we can find a TF matching the timestamp from Lidar.

However, from cyber_monitor, I see /tf channel's update rate is only around 12Hz. So it is updated every 0.08s. So waiting there for 0.01s is useless for most of time (since /tf will NOT be updated).

So the question is: What is the point to set the default value of this TF buffer size as 0.01? Is it more meaningful to set it as 0.1? (As my option #1 above)

@rongguodong
Copy link
Contributor

@storypku @jeroldchen @jinghaomiao Can you check my videos and reply my above questions? Hope we can find out the root cause and fix this issue as soon as possible. Thanks!

@jeroldchen
Copy link
Contributor

@rongguodong Sorry for not replying in time. Let me answer your questions from the following points:

  1. Timestamp. First, setting "lidar_query_tf_offset" to be 200ms is only necessary and working in LGSVL Simulator. In other environments like real driving system, this offset is set 0 for default. I guess that LGSVL simulator has some internal time delay in lidar128 driver, since the offset is added to lidar timestamps which are from lidar128 driver. Second, as I said in the above, setting "query_time" to be 0 will call another function of tf2 library to get the latest common timestamp of novetal frame and world frame, but that timestamp is not guaranteed to be the same as the timestamp of the current frame we query. What it really do inside the function is complicated and needed for deeper exploring. Third, tf buffer size is better limited for a small value, since this will force you to focus on the real-time capability, not for just making it work.
  2. Lidar perception. We updated lidar obstacle detection method with a newer model called "PointPillars". Previous releases use segmentation + recognition models to output perception obstacles, while we use PointPillars detection model to do this in only one stage currently. As you can see, the shapes of the detected obstacles are all cuboids, and that's why we chose to use a detection model. Besides, the improving of this model is still in progress. For now, it sure is not stable. However, we will keep upgrading the model to more stable versions.
  3. Traffic light detection. Because of the upgrading of Apollo system, we have to refactor the traffic light detection module due to the lack of dependencies. For now, this module is not available. We believe that it can be reactivated in the next release.
  4. Dreamview. For the problem that point cloud is not displayed as expected, I think that it costs Dreamview a high CPU usage to run a such function while running perception module in the meantime. You can record the whole process using recorder and then play this record with point cloud displayed but without perception module running.

Hope you are satisfied with my answer. If you have any questions, please let me know. Thank you.

@rongguodong
Copy link
Contributor

@jeroldchen Thanks for your reply!
For the first part (i.e. timestamp), I still have some questions:

  1. Yes, we do have delay in timestamp, because we are using our sim_time instead of real time. Although our sim_time is initialized to be same as real time, it can be gradually slower and slower than real time. This is the reason why we worked with Baidu to add sim_time support in Cyber/Apollo. With the all changes merged into Apollo now, when you set "cyber_mode" (https://github.com/ApolloAuto/apollo/blob/master/cyber/conf/cyber.pb.conf#L30) to "MODE_MOCK, all Apollo modules should use our sim_time (i.e. the value from /clock channel) as "now".
    So I thought this feature should remove the need to add that additional 200ms. However, it seems we still need it now. Why? Is it because the "sim_time support" is not complete yet?

  2. I agree that keep tf buffer size small is good. However, since /tf channel only has 12Hz, what is the point to make the buffer size much smaller (i.e. 0.01)? It seems to be the same as no buffer at all. What is the reason behind that 0.01 default value?

For Lidar perception, the major issue (as shown in my videos) is it cannot detect far objects. Is it the same for real data? Have you compared perception in 5.0 and perception in master with real data? Do you see the same issue?

@jeroldchen
Copy link
Contributor

@rongguodong

  1. tf buffer size. First, this param is actually a timeout limit within which tf2 lib queries the transform every 3 ms. For default value 0.01 which means 10 ms, tf2 will query for around 3 times which are enough in case that for some reason the query cannot get a response correctly. In my opinion, tf buffer size is set for a guarantee to get a query result, not for the case that some time delay exists.
  2. Lidar perception. The range that the model can detect objects is 50 meters along both x axis and y axis. As I have seen in my validation result on real data, it is this range within which objects can be detected. For perception module in Apollo 5.0, the detection range is bigger, which is true as you have seen.

For the sim_time support question, perhaps @storypku can help.

@rongguodong
Copy link
Contributor

@jeroldchen I see the related code at here. What it is doing is: if "retval" is false, wait 3ms and try again, until the timeout threshold is reached (i.e. 10ms with the default 0.01 value).

Why can "retval" be false? My understanding is that "retval" is false because you cannot find a match between "target_frame" and "source_frame" using the query time "time". If none of these frames will be changed within 10ms, what is the point to try more times with the same query time (because of their low update rate)? Will it always be false in that case?

In other words, what is the purpose of this re-try in the while-loop? Is it waiting for new updates in target_frame? If it is, should we wait longer there (at least longer than the update time)?

@weidezhang
Copy link
Contributor

weidezhang commented Aug 24, 2020

@fengqikai1414 , do you know the answer to guodong's question ?

@wangzhensuo
Copy link

wangzhensuo commented Nov 18, 2020

I add some info maybe helpful to solve lidar and TF timestamp problems.
my local environmet:
apollo v6.0.0
carla 0.9.8

in my local environmet, I didnot find this "Changing query_time to 0 will call the getLatestCommonTime function ".
didnot call getLatestCommonTime.
if (time == 0) { int retval = getLatestCommonTime(target_id, source_id, time, error_string); if (retval != tf2_msgs::TF2Error::NO_ERROR) { return retval; } }

my crash log is shown as below:

E1117 02:36:22.352905 12017 transform_wrapper.cc:223] [mainboard]Can not find transform. 1605580582.304761887 frame_id: world child_frame_id: novatel
Error info: Lookup would require extrapolation into the future.
Requested time 1605580582304761856 but the latest data is at time 1605580582281150976, when looking up transform from frame [novatel] to frame [world]:timeout

Thread 11 "mainboard" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc6ffd700 (LWP 12017)]
0x00007fff56b5516e in apollo::perception::onboard::TransformCache::QueryTransform(double, apollo::perception::onboard::StampedTransform*, double) ()
from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Stransform_Uwrapper_Slibtransform_Uwrapper.so
(gdb)
(gdb) bt
#0 0x00007fff56b5516e in apollo::perception::onboard::TransformCache::QueryTransform(double, apollo::perception::onboard::StampedTransform*, double) ()
from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Stransform_Uwrapper_Slibtransform_Uwrapper.so
#1 0x00007fff56b59296 in apollo::perception::onboard::TransformWrapper::GetSensor2worldTrans(double, Eigen::Transform<double, 3, 2, 0>, Eigen::Transform<double, 3, 2, 0>) ()
from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Stransform_Uwrapper_Slibtransform_Uwrapper.so
#2 0x00007fffe413de6e in apollo::perception::onboard::DetectionComponent::InternalProc(std::shared_ptr<apollo::drivers::PointCloud const> const&, std::shared_ptrapollo::perception::onboard::LidarFrameMessage const&) ()
from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Scomponent_Slibdetection_Ucomponent.so
#3 0x00007fffe413e84e in apollo::perception::onboard::DetectionComponent::Proc(std::shared_ptrapollo::drivers::PointCloud const&) ()
from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Scomponent_Slibdetection_Ucomponent.so
#4 0x00007fffe4143977 in apollo::cyber::croutine::RoutineFactory apollo::cyber::croutine::CreateRoutineFactory<apollo::drivers::PointCloud, apollo::cyber::Component<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType>::Initialize(apollo::cyber::proto::ComponentConfig const&)::{lambda(std::shared_ptrapollo::drivers::PointCloud const&)#1}&>(apollo::cyber::Component<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType>::Initialize(apollo::cyber::proto::ComponentConfig const&)::{lambda(std::shared_ptrapollo::drivers::PointCloud const&)#1}&, std::shared_ptr<apollo::cyber::data::DataVisitor<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType> > const&)::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const ()
from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Scomponent_Slibdetection_Ucomponent.so
#5 0x00007fffe4143c6c in std::_Function_handler<void (), apollo::cyber::croutine::RoutineFactory apollo::cyber::croutine::CreateRoutineFactory<apollo::drivers::PointCloud, apollo::cyber::Component<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType>::Initialize(apollo::cyber::proto::ComponentConfig const&)::{lambda(std::shared_ptrapollo::drivers::PointCloud const&)#1}&>(apollo::cyber::Component<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType>::Initialize(apollo::cyber::proto::ComponentConfig const&)::{lambda(std::shared_ptrapollo::drivers::PointCloud const&)#1}&, std::shared_ptr<apollo::cyber::data::DataVisitor<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType> > const&)::{lambda()#1}::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Scomponent_Slibdetection_Ucomponent.so
#6 0x00007fffee34b21a in apollo::cyber::croutine::(anonymous namespace)::CRoutineEntry(void*) ()
from /apollo/.cache/bazel/540135163923dd7d5820f3ee4b306b32/execroot/apollo/bazel-out/k8-opt/bin/cyber/../_solib_local/libcyber_Scroutine_Slibcroutine.so
#7 0x0000000000000000 in ?? ()

图片

file :third_party\tf2\src\cache.cpp

else if (target_time > latest_time) //go to this ,and cause perception crash. { cache::createExtrapolationException2(target_time, latest_time, error_str); return 0; }

the above fix method(set lidar_query_tf_offset to be 200) is also worked.

@ideasplus
Copy link

ideasplus commented Jan 21, 2021

@rongguodong Sorry for not replying in time. Let me answer your questions from the following points:

  1. Timestamp. First, setting "lidar_query_tf_offset" to be 200ms is only necessary and working in LGSVL Simulator. In other environments like real driving system, this offset is set 0 for default. I guess that LGSVL simulator has some internal time delay in lidar128 driver, since the offset is added to lidar timestamps which are from lidar128 driver. Second, as I said in the above, setting "query_time" to be 0 will call another function of tf2 library to get the latest common timestamp of novetal frame and world frame, but that timestamp is not guaranteed to be the same as the timestamp of the current frame we query. What it really do inside the function is complicated and needed for deeper exploring. Third, tf buffer size is better limited for a small value, since this will force you to focus on the real-time capability, not for just making it work.
  2. Lidar perception. We updated lidar obstacle detection method with a newer model called "PointPillars". Previous releases use segmentation + recognition models to output perception obstacles, while we use PointPillars detection model to do this in only one stage currently. As you can see, the shapes of the detected obstacles are all cuboids, and that's why we chose to use a detection model. Besides, the improving of this model is still in progress. For now, it sure is not stable. However, we will keep upgrading the model to more stable versions.
  3. Traffic light detection. Because of the upgrading of Apollo system, we have to refactor the traffic light detection module due to the lack of dependencies. For now, this module is not available. We believe that it can be reactivated in the next release.
  4. Dreamview. For the problem that point cloud is not displayed as expected, I think that it costs Dreamview a high CPU usage to run a such function while running perception module in the meantime. You can record the whole process using recorder and then play this record with point cloud displayed but without perception module running.

Hope you are satisfied with my answer. If you have any questions, please let me know. Thank you.

@jeroldchen Hello, I want to know why the shapes of the detected obstacles by PointPillars are all cuboids while cnnseg isn't? As far as I know, the model doesn't output the polygon of obstacles in the detection phase but in the tracking phase.

@jeroldchen
Copy link
Contributor

Hi @ideasplus , PointPillars model is designed to output cuboid results.

// read params of bounding box
float x = detections->at(i * FLAGS_num_output_box_feature + 0);
float y = detections->at(i * FLAGS_num_output_box_feature + 1);
float z = detections->at(i * FLAGS_num_output_box_feature + 2);
float dx = detections->at(i * FLAGS_num_output_box_feature + 4);
float dy = detections->at(i * FLAGS_num_output_box_feature + 3);
float dz = detections->at(i * FLAGS_num_output_box_feature + 5);
float yaw = detections->at(i * FLAGS_num_output_box_feature + 6);

@ideasplus
Copy link

@jeroldchen Oh, I see... Thank you for your prompt reply.

@Xxfore
Copy link

Xxfore commented Nov 19, 2021

I add some info maybe helpful to solve lidar and TF timestamp problems. my local environmet: apollo v6.0.0 carla 0.9.8

in my local environmet, I didnot find this "Changing query_time to 0 will call the getLatestCommonTime function ". didnot call getLatestCommonTime. if (time == 0) { int retval = getLatestCommonTime(target_id, source_id, time, error_string); if (retval != tf2_msgs::TF2Error::NO_ERROR) { return retval; } }

my crash log is shown as below:

E1117 02:36:22.352905 12017 transform_wrapper.cc:223] [mainboard]Can not find transform. 1605580582.304761887 frame_id: world child_frame_id: novatel
Error info: Lookup would require extrapolation into the future.
Requested time 1605580582304761856 but the latest data is at time 1605580582281150976, when looking up transform from frame [novatel] to frame [world]:timeout

Thread 11 "mainboard" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffc6ffd700 (LWP 12017)] 0x00007fff56b5516e in apollo::perception::onboard::TransformCache::QueryTransform(double, apollo::perception::onboard::StampedTransform*, double) () from /apollo/bazel-bin/modules/perception/onboard/component/../../../../solib_local/libmodules_Sperception_Sonboard_Stransform_Uwrapper_Slibtransform_Uwrapper.so (gdb) (gdb) bt #0 0x00007fff56b5516e in apollo::perception::onboard::TransformCache::QueryTransform(double, apollo::perception::onboard::StampedTransform*, double) () from /apollo/bazel-bin/modules/perception/onboard/component/../../../../solib_local/libmodules_Sperception_Sonboard_Stransform_Uwrapper_Slibtransform_Uwrapper.so #1 0x00007fff56b59296 in apollo::perception::onboard::TransformWrapper::GetSensor2worldTrans(double, Eigen::Transform<double, 3, 2, 0>, Eigen::Transform<double, 3, 2, 0>) () from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Stransform_Uwrapper_Slibtransform_Uwrapper.so #2 0x00007fffe413de6e in apollo::perception::onboard::DetectionComponent::InternalProc(std::shared_ptr<apollo::drivers::PointCloud const> const&, std::shared_ptrapollo::perception::onboard::LidarFrameMessage const&) () from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Scomponent_Slibdetection_Ucomponent.so #3 0x00007fffe413e84e in apollo::perception::onboard::DetectionComponent::Proc(std::shared_ptrapollo::drivers::PointCloud const&) () from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Scomponent_Slibdetection_Ucomponent.so #4 0x00007fffe4143977 in apollo::cyber::croutine::RoutineFactory apollo::cyber::croutine::CreateRoutineFactory<apollo::drivers::PointCloud, apollo::cyber::Component<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType>::Initialize(apollo::cyber::proto::ComponentConfig const&)::{lambda(std::shared_ptrapollo::drivers::PointCloud const&)#1}&>(apollo::cyber::Component<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType>::Initialize(apollo::cyber::proto::ComponentConfig const&)::{lambda(std::shared_ptrapollo::drivers::PointCloud const&)#1}&, std::shared_ptr<apollo::cyber::data::DataVisitor<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType> > const&)::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const () from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Scomponent_Slibdetection_Ucomponent.so #5 0x00007fffe4143c6c in std::_Function_handler<void (), apollo::cyber::croutine::RoutineFactory apollo::cyber::croutine::CreateRoutineFactory<apollo::drivers::PointCloud, apollo::cyber::Component<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType>::Initialize(apollo::cyber::proto::ComponentConfig const&)::{lambda(std::shared_ptrapollo::drivers::PointCloud const&)#1}&>(apollo::cyber::Component<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType>::Initialize(apollo::cyber::proto::ComponentConfig const&)::{lambda(std::shared_ptrapollo::drivers::PointCloud const&)#1}&, std::shared_ptr<apollo::cyber::data::DataVisitor<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType> > const&)::{lambda()#1}::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /apollo/bazel-bin/modules/perception/onboard/component/../../../../_solib_local/libmodules_Sperception_Sonboard_Scomponent_Slibdetection_Ucomponent.so #6 0x00007fffee34b21a in apollo::cyber::croutine::(anonymous namespace)::CRoutineEntry(void*) () from /apollo/.cache/bazel/540135163923dd7d5820f3ee4b306b32/execroot/apollo/bazel-out/k8-opt/bin/cyber/../_solib_local/libcyber_Scroutine_Slibcroutine.so #7 0x0000000000000000 in ?? ()

图片

file :third_party\tf2\src\cache.cpp

else if (target_time > latest_time) //go to this ,and cause perception crash. { cache::createExtrapolationException2(target_time, latest_time, error_str); return 0; }

the above fix method(set lidar_query_tf_offset to be 200) is also worked.

Hi wangzhensuo,

Currently, i met similar core dump issue , I try to enable debug info , the backtrace shows like below 

#0 _mm256_load_pd (__P=) at /usr/lib/gcc/x86_64-linux-gnu/7/include/avxintrin.h:862
#1 Eigen::internal::pload<double __vector(4)>(Eigen::internal::unpacket_traits<double __vector(4)>::type const*) (from=) at external/eigen/Eigen/src/Core/arch/AVX/PacketMath.h:215
#2 Eigen::internal::ploadt<double __vector(4), 32>(Eigen::internal::unpacket_traits<double __vector(4)>::type const*) (from=) at external/eigen/Eigen/src/Core/GenericPacketMath.h:463
#3 Eigen::internal::evaluator<Eigen::PlainObjectBase<Eigen::Matrix<double, 4, 1, 0, 4, 1> > >::packet<32, double __vector(4)>(long, long) const (this=, col=, row=)
at external/eigen/Eigen/src/Core/CoreEvaluators.h:197
#4 Eigen::internal::generic_dense_assignment_kernel<Eigen::internal::evaluator<Eigen::Matrix<double, 4, 1, 0, 4, 1> >, Eigen::internal::evaluator<Eigen::Matrix<double, 4, 1, 0, 4, 1> >, Eigen::internal::assign_op<double, double>, 0>::assignPacket<32, 32, double __vector(4)>(long, long) (this=, this=, col=, row=)
at external/eigen/Eigen/src/Core/AssignEvaluator.h:652
#5 Eigen::internal::generic_dense_assignment_kernel<Eigen::internal::evaluator<Eigen::Matrix<double, 4, 1, 0, 4, 1> >, Eigen::internal::evaluator<Eigen::Matrix<double, 4, 1, 0, 4, 1> >, Eigen::internal::assign_op<double, double>, 0>::assignPacketByOuterInner<32, 32, double __vector(4)>(long, long) (outer=0, inner=0, this=) at external/eigen/Eigen/src/Core/AssignEvaluator.h:666
#6 Eigen::internal::copy_using_evaluator_innervec_CompleteUnrolling<Eigen::internal::generic_dense_assignment_kernel<Eigen::internal::evaluator<Eigen::Matrix<double, 4, 1, 0, 4, 1> >, Eigen::internal::evaluator<Eigen::Matrix<double, 4, 1, 0, 4, 1> >, Eigen::internal::assign_op<double, double>, 0>, 0, 4>::run (kernel=...) at external/eigen/Eigen/src/Core/AssignEvaluator.h:274
#7 Eigen::internal::dense_assignment_loop<Eigen::internal::generic_dense_assignment_kernel<Eigen::internal::evaluator<Eigen::Matrix<double, 4, 1, 0, 4, 1> >, Eigen::internal::evaluator<Eigen::Matrix<double, 4, 1, 0, 4, 1> >, Eigen::internal::assign_op<double, double>, 0>, 2, 2>::run (kernel=...) at external/eigen/Eigen/src/Core/AssignEvaluator.h:468
#8 Eigen::internal::call_dense_assignment_loop<Eigen::Matrix<double, 4, 1, 0, 4, 1>, Eigen::Matrix<double, 4, 1, 0, 4, 1>, Eigen::internal::assign_op<double, double> > (func=..., src=..., dst=...)
at external/eigen/Eigen/src/Core/AssignEvaluator.h:741
#9 Eigen::internal::Assignment<Eigen::Matrix<double, 4, 1, 0, 4, 1>, Eigen::Matrix<double, 4, 1, 0, 4, 1>, Eigen::internal::assign_op<double, double>, Eigen::internal::Dense2Dense, void>::run (func=...,
src=..., dst=...) at external/eigen/Eigen/src/Core/AssignEvaluator.h:879
#10 Eigen::internal::call_assignment_no_alias<Eigen::Matrix<double, 4, 1, 0, 4, 1>, Eigen::Matrix<double, 4, 1, 0, 4, 1>, Eigen::internal::assign_op<double, double> > (func=..., src=..., dst=...)
at external/eigen/Eigen/src/Core/AssignEvaluator.h:836
#11 Eigen::internal::call_assignment<Eigen::Matrix<double, 4, 1, 0, 4, 1>, Eigen::Matrix<double, 4, 1, 0, 4, 1>, Eigen::internal::assign_op<double, double> >(Eigen::Matrix<double, 4, 1, 0, 4, 1>&, Eigen::Matrix<double, 4, 1, 0, 4, 1> const&, Eigen::internal::assign_op<double, double> const&, Eigen::internal::enable_if<!Eigen::internal::evaluator_assume_aliasing<Eigen::Matrix<double, 4, 1, 0, 4, 1>, Eigen::internal::evaluator_traits<Eigen::Matrix<double, 4, 1, 0, 4, 1> >::Shape>::value, void*>::type) (func=..., src=..., dst=...) at external/eigen/Eigen/src/Core/AssignEvaluator.h:804
#12 Eigen::internal::call_assignment<Eigen::Matrix<double, 4, 1, 0, 4, 1>, Eigen::Matrix<double, 4, 1, 0, 4, 1> > (src=..., dst=...) at external/eigen/Eigen/src/Core/AssignEvaluator.h:782
#13 Eigen::PlainObjectBase<Eigen::Matrix<double, 4, 1, 0, 4, 1> >::_set<Eigen::Matrix<double, 4, 1, 0, 4, 1> > (other=..., this=0x7f991a3fea80) at external/eigen/Eigen/src/Core/PlainObjectBase.h:714
#14 Eigen::Matrix<double, 4, 1, 0, 4, 1>::operator= (other=..., this=0x7f991a3fea80) at external/eigen/Eigen/src/Core/Matrix.h:208
#15 Eigen::QuaternionBase<Eigen::Quaternion<double, 0> >::operator= (other=..., this=0x7f991a3fea80) at external/eigen/Eigen/src/Geometry/Quaternion.h:490
#16 Eigen::Quaternion<double, 0>::operator= (other=..., this=0x7f991a3fea80) at external/eigen/Eigen/src/Geometry/Quaternion.h:240
#17 apollo::perception::onboard::StampedTransform::operator= (this=0x7f991a3fea60) at ./modules/perception/onboard/transform_wrapper/transform_wrapper.h:39
#18 apollo::perception::onboard::TransformCache::QueryTransform (this=this@entry=0x55d934ed7b48, timestamp=timestamp@entry=1637306695.0115578, transform=transform@entry=0x7f991a3fea60,
max_duration=) at modules/perception/onboard/transform_wrapper/transform_wrapper.cc:111

It seems related StampedTransform assignment,
Could you share more info about how could you locate the exception related
file "third_party\tf2\src\cache.cpp " ?

Thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module: CarOS Indicates ROS related issues Module: Perception Indicates perception related issues
Projects
None yet
Development

No branches or pull requests