This repository is continuously updated. We prioritize including articles that have already been submitted to arXiv.
We kindly invite you to our platform, Auto Driving Heart, for paper interpretation and sharing. If you would like to promote your paper, please feel free to contact me.
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
PlanKD: Compressing End-to-End Motion Planner for Autonomous Driving
VLP: Vision Language Planning for Autonomous Driving
ChatSim: Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
MAPLM: A Real-World Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
RegionGPT: Towards Region Understanding Vision Language Model
Towards Learning a Generalist Model for Embodied Navigation
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness
SemCity: Semantic Scene Generationwith Triplane Diffusion
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
Lane2Seq: Towards Unified Lane Detection via Sequence Generation
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
SemCity: Semantic Scene Generation with Triplane Diffusion
- Paper:
- Code: https://github.com/zoomin-lee/SemCity
BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection
CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images
UniMODE: Unified Monocular 3D Object Detection
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection
MonoCD: Monocular 3D Object Detection with Complementary Depths
- Paper:
- Code: https://github.com/dragonfly606/MonoCD
MoCha-Stereo: Motif Channel Attention Network for Stereo Matching
Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching
Neural Markov Random Field for Stereo Matching
RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception
SNI-SLAM: SemanticNeurallmplicit SLAM
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
Implicit Event-RGBD Neural SLAM
DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement
3DSFLabeling: Boosting 3D Scene Flow Estimation by Pseudo Auto Labeling
- Paper: https://arxiv.org/pdf/2402.18146.pdf
- Code: https://github.com/jiangchaokang/3DSFLabelling
Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency
Point Transformer V3: Simpler, Faster, Stronger
- Paper: https://arxiv.org/pdf/2312.10035.pdf
- Code: https://github.com/Pointcept/PointTransformerV3
Rethinking Few-shot 3D Point Cloud Semantic Segmentation
PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation
Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle
- Paper:
- Code: https://github.com/jihun1998/AO
GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
- Paper:
- Code: https://github.com/GLiDR-CVPR2024/GLiDR
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
RepViT: Revisiting Mobile CNN From ViT Perspective
OMG-Seg: Is One Model Good Enough For All Segmentation?
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
DART: Doppler-Aided Radar Tomography
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
- Paper: https://arxiv.org/pdf/2312.07920.pdf
- Code: https://github.com/VDIGPKU/DrivingGaussian
Dynamic LiDAR Re-simulation using Compositional Neural Fields
- Paper: https://arxiv.org/pdf/2312.05247.pdf
- Code: https://github.com/prs-eth/Dynamic-LiDAR-Resimulation
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
SmartRefine: An Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
- Paper: https://arxiv.org/pdf/2403.16439.pdf
- Code: https://github.com/alfredgu001324/MapUncertaintyPrediction
AFNet: Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
Seeing Motion at Nighttime with an Event Camera
自动驾驶之心知识星球是过国内首个以自动驾驶技术栈为主线的交流学习社区(也是国内最大哦),这是一个前沿技术发布和学习的地方!我们汇总了自动驾驶感知(BEV、多模态感知、Occupancy、毫米波雷达视觉感知、车道线检测、3D感知、目标跟踪、多模态、多传感器融合、Transformer等)、自动驾驶定位建图(在线高精地图、高精地图、SLAM)、多传感器标定(Camera/Lidar/Radar/IMU等近20种方案)、Nerf、视觉语言模型、世界模型、规划控制、轨迹预测、领域技术方案、AI模型部署落地等几乎所有子方向的学习路线!
除此之外,还和数十家自动驾驶公司建立了内推渠道,简历直达!这里可以自由提问交流,许多算法工程师和硕博日常活跃,解决问题!初衷是希望能够汇集行业大佬的智慧,在学习和就业上帮到大家!星球的每周活跃度都在前50内,非常注重大家积极性的调度和讨论,欢迎加入一起成长!
加入链接:自动驾驶之心知识星球 | 国内首个自动驾驶全栈学习社区,近30+感知/融合/规划/标定/预测等学习路线
国内首个基于Transformer的分割检测+视觉大模型教程
多传感器标定全栈系统学习教程(相机/Lidar/Radar/IMU近20+种在线/离线实战方案)
基于TensroRT的CNN/Transformer/检测/BEV模型四大部署代码+CUDA加速全栈学习教程