-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perception module won't start with RTX 2080 Ti #12088
Comments
@dfremont Perception had coredump and was fixed. Please pull the latest update and let us know how it works. Thanks for using Apollo. |
The "RTX 2080 Ti" have been supported by the docker image. And I guess maybe the others reason can't work normally. Please wait me check it tomorrow. |
I pulled the latest master and still cannot start "perception" module. When I tried to start it manually in command line, I got the following errors: [guodong@in-dev-docker:/apollo]$ cyber_launch start modules/perception/production/launch/perception.launch |
BTW: The latest master seems missing environment setting. I have to do "source cyber/setup.bash" before I can run any cyber commands (e.g. cyber_monitor, cyber_launch, etc.). Is this a new bug? |
Camera based perception is not working at this moment because of system and dependency upgrade. Please either use the toggle box on dreamview to turn on perception, or use "apollo/modules/perception/production/launch/perception_lidar.launch" to launch lidar based perception. |
@storypku @changsh726 please take a look |
Does it mean the "traffic light" detection is also not working? This case has been there for long time. Do you have any target date to fix it? |
Traffic light is not working either. The team is working on an upgrade. Thanks! |
Just tried lidar based perception and get the following errors: [guodong@in-dev-docker:/apollo]$ cyber_launch start modules/perception/production/launch/perception_lidar.launch |
Thanks for reporting the error. The team will take a look. @jeroldchen @lfcarol |
Yep. Try running And it was fixed in #12106 @rongguodong |
Please source /apollo/scripts/apollo_base.sh |
Hello, I want to know you choose which "setup mode". Is "Mkz lgsvl" ? |
Yes (sorry for not making that clear). Also, I just tried rebuilding Apollo to see if the commits since Monday fixed the problem, but now it won't compile:
|
Try Or, install fftw3 it manually via BTW, could you please show me the output of |
Thanks, I had done
|
Please pull the latest master branch, try again. Perception will be able to work. |
The lidar part should work now, |
@dfremont @rongguodong This ticket is being closed. Please feel free to open new issues if needed. Thanks for using Apollo. |
Not working for me, unfortunately. I now get many errors like this:
One time the process quit after some of these errors; other times it stayed running but the "Perception" slider didn't turn on. |
|
I got the same errors when running Lidar-perception. I found out one of the following two changes can make it work:
However, I still want to understand why Lidar timestamp can be larger than TF timestamp without the above changes, and why the above changes can make it work. Any ideas would be helpful! |
The slider will turn on after all related components have been started. It may take a while for some complex modules like perception that contains multiple components. |
Please try again with the startup order : transform->perception-> ... . Because the incorrect order can occur the error. |
@storypku It is simple to reproduce this locally: basically, just follow our instructions here until the step saying "Open the Module Controller tap". Then, turn on "localization" and "transform" modules. Now, you should see lots of errors about timestamp in the console (as @dfremont posted above). Note that this may be an old issue. We do not have this error in our fork of Apollo 5.0 because we applied the second change I posted above (i.e. changed the querry_time from "timestamp" to "0"). But this change seems to be hacky. I would like to understand the reason behind it and find out a better solution. |
Thank Guodong for the updates. We will check it asap. |
Thanks @dfremont @rongguodong . The team is looking into this issue: @storypku @jeroldchen @lfcarol |
This (mostly) worked for me. It got perception to stop crashing immediately on launch. However, I noticed that LiDAR perception is taking 4GB of GPU memory and we don't even have camera perception (traffic light) going... I hope camera perception will be working again soon, and hope it still uses less GPU memory than the LiDAR one (in 5.0, it uses 1165MB, and traffic light adds another 1088MB for a total of 2253MB, just over half what LiDAR perception alone consumes in master now). In any case I'm happy to report that I'm actually seeing Apollo master able to drive (on small maps, without traffic light perception) in LGSVL Simulator on a Razer Blade i7 laptop with a 8GB RTX 2070 MaxQ. :-) |
@rongguodong @lemketron Changing apollo/third_party/tf2/src/buffer_core.cpp Lines 349 to 354 in 82a1b88
novetal to world can be found through a common timestamp. However, it is not a stable strategy. The key to make lidar perception work in LGSVL is the lidar_query_tf_offset param. In #12163, I add a config for velodyne 128 detection in LGSVL where I set lidar_query_tf_offset to be 200, which means that LGSVL's lidar 128 has a slight time delay. This setting is same as https://github.com/ApolloAuto/apollo/blob/master/modules/perception/production/conf/perception/lidar/velodyne128_segmentation_conf_lgsvl.pb.txt. It should make lidar perception work again.
|
@storypku @jeroldchen Thanks for your replies! I tried to add the 200ms delay as in the PR. I do not have the errors of timestamp any more. Previously, I thought this 200ms delay is because of different time between simulator and Apollo, and it should not be needed once Apollo is using sim_time. However, even if I set "MODE_MOCK" in Cyber config, I still get the timestamp errors if I do not have this 200ms delay. Do you have a better explanation of why we still need it now? Even for now, I still have several issues:
|
https://drive.google.com/file/d/1cHc9OPJwGTvSkE7PtRyo8Eik-iyKB6H_/view?usp=sharing |
https://drive.google.com/file/d/1FLhbm3shwVws_W0D98lb0gxVf7Y-TtI-/view?usp=sharing Our fork of Apollo 5.0 can be found at: https://github.com/lgsvl/apollo-5.0. It has the change of querry_time to 0 (i.e. the #2 change I mentioned above). For this video, I disabled camera-based perception (by deleting "dag_streaming_perception_camera.dag" from https://github.com/lgsvl/apollo-5.0/blob/simulator/modules/perception/production/launch/perception.launch). So I think this means only Lidar-based perception is used (right?). But it behaves much better than the Lidar-based perception in master (as shown in the video I posted yesterday). |
So, there are two questions here:
|
After more debugging, I am confused with the TF buffer size (https://github.com/ApolloAuto/apollo/blob/master/modules/perception/production/conf/perception/perception_common.flag#L69): The default value is 0.01. This value is used as "timeout_second" here (https://github.com/ApolloAuto/apollo/blob/master/modules/transform/buffer.cc#L194). As we can see in the code below, it waited there (by using a while-loop) for "timeout_second" to see if we can find a TF matching the timestamp from Lidar. However, from cyber_monitor, I see /tf channel's update rate is only around 12Hz. So it is updated every 0.08s. So waiting there for 0.01s is useless for most of time (since /tf will NOT be updated). So the question is: What is the point to set the default value of this TF buffer size as 0.01? Is it more meaningful to set it as 0.1? (As my option #1 above) |
@storypku @jeroldchen @jinghaomiao Can you check my videos and reply my above questions? Hope we can find out the root cause and fix this issue as soon as possible. Thanks! |
@rongguodong Sorry for not replying in time. Let me answer your questions from the following points:
Hope you are satisfied with my answer. If you have any questions, please let me know. Thank you. |
@jeroldchen Thanks for your reply!
For Lidar perception, the major issue (as shown in my videos) is it cannot detect far objects. Is it the same for real data? Have you compared perception in 5.0 and perception in master with real data? Do you see the same issue? |
For the sim_time support question, perhaps @storypku can help. |
@jeroldchen I see the related code at here. What it is doing is: if "retval" is false, wait 3ms and try again, until the timeout threshold is reached (i.e. 10ms with the default 0.01 value). Why can "retval" be false? My understanding is that "retval" is false because you cannot find a match between "target_frame" and "source_frame" using the query time "time". If none of these frames will be changed within 10ms, what is the point to try more times with the same query time (because of their low update rate)? Will it always be false in that case? In other words, what is the purpose of this re-try in the while-loop? Is it waiting for new updates in target_frame? If it is, should we wait longer there (at least longer than the update time)? |
@fengqikai1414 , do you know the answer to guodong's question ? |
I add some info maybe helpful to solve lidar and TF timestamp problems. in my local environmet, I didnot find this "Changing query_time to 0 will call the getLatestCommonTime function ". my crash log is shown as below:
Thread 11 "mainboard" received signal SIGSEGV, Segmentation fault. file :third_party\tf2\src\cache.cpp
the above fix method(set lidar_query_tf_offset to be 200) is also worked. |
@jeroldchen Hello, I want to know why the shapes of the detected obstacles by PointPillars are all cuboids while cnnseg isn't? As far as I know, the model doesn't output the polygon of obstacles in the detection phase but in the tracking phase. |
Hi @ideasplus , PointPillars model is designed to output cuboid results. apollo/modules/perception/lidar/lib/detection/lidar_point_pillars/point_pillars_detection.cc Lines 287 to 294 in 5dfaaf8
|
@jeroldchen Oh, I see... Thank you for your prompt reply. |
Hi wangzhensuo,
#0 _mm256_load_pd (__P=) at /usr/lib/gcc/x86_64-linux-gnu/7/include/avxintrin.h:862 It seems related StampedTransform assignment, Thanks a lot |
Per @lemketron and @storypku's discussion here, I'm opening a new issue about the Perception module not working on my machine, which has an RTX 2080 Ti. Please let me know if there's any additional information I can provide to help debug the problem.
Describe the bug
The Perception module will not start. When activated in Dreamview, the slider moves immediately back to "off". No
perception.INFO
file is produced at all.To Reproduce
Here are the steps I took:
./scripts/bootstrap_lgsvl.sh
./scripts/bridge.sh
System setup:
The text was updated successfully, but these errors were encountered: