UniPAD is a novel self-supervised learning framework designed for 3D perception tasks in autonomous driving. Inspired by the paper UniPAD: A Universal Pre-training Paradigm for Autonomous Driving, this repository implements key components of UniPAD, including modular encoders for LiDAR and multi-view images, a unified volumetric representation, and a neural rendering decoder.
UniPAD leverages a unique self-supervised pre-training strategy based on 3D differentiable rendering. It learns rich geometric and semantic representations from both LiDAR point clouds and multi-view images, enabling significant improvements in downstream tasks such as 3D object detection and semantic segmentation.
Key features include:
- Mask Generator: Strategically masks input data to enhance representation learning.
- Modal-Specific Encoders: Extracts features from LiDAR point clouds and multi-view images.
- Volumetric Representation: Transforms features into a unified 3D voxel space.
- Neural Rendering Decoder: Produces continuous geometric and semantic predictions.
- Memory-Efficient Ray Sampling: Optimizes training efficiency with minimal memory overhead.
- Modular design for encoders (LiDAR and image modalities) and decoders.
- Neural rendering for RGB and depth prediction.
- End-to-end training and evaluation scripts.
- Support for the nuScenes dataset or other point cloud and image datasets.
UniPAD/
├── README.md # Project documentation
├── requirements.txt # Dependencies for the project
├── datasets/ # Directory for datasets
├── src/ # Source code
│ ├── data_processing.py # Dataset loading and preprocessing
│ ├── model/ # Model components
│ │ ├── encoder.py # Encoders for LiDAR and image data
│ │ ├── decoder.py # Neural rendering decoder
│ │ └── unipad.py # Full UniPAD pipeline
│ ├── training.py # Training script
│ ├── evaluation.py # Evaluation script
│ └── utils.py # Utility functions
├── examples/ # Examples for using the pipeline
│ ├── 3d_detection.py # Example for 3D object detection
│ └── segmentation.py # Example for semantic segmentation
-
Clone the repository:
git clone https://github.com/your-username/UniPAD.git cd UniPAD
-
Install dependencies:
pip install -r requirements.txt
Prepare the dataset (e.g., nuScenes):
- Store LiDAR point clouds in
datasets/nuscenes/lidar
. - Store corresponding images in
datasets/nuscenes/images
.
Train the model using the examples/3d_detection.py
script:
python examples/3d_detection.py
Evaluate the model using the examples/segmentation.py
script:
python examples/segmentation.py
A strategic masking mechanism for both point cloud and image modalities to enhance model generalization.
- LiDAREncoder: Processes 3D point cloud data.
- ImageEncoder: Extracts features from multi-view image data using a convolutional backbone.
The NeuralRenderingDecoder uses volumetric rendering techniques to predict RGB or depth values.
This implementation aims to replicate the improvements reported in the paper, such as:
- +9.1 NDS for LiDAR-based detection.
- +7.7 NDS for camera-based detection.
- +6.9 NDS for LiDAR-camera fusion.
(Note: Fine-tuning may be necessary to achieve optimal performance.)
Contributions are welcome! If you'd like to enhance this project, feel free to:
- Submit a pull request.
- Open an issue for bugs or feature requests.
This implementation is inspired by the paper UniPAD: A Universal Pre-training Paradigm for Autonomous Driving by Honghui Yang et al. Special thanks to the authors for their groundbreaking work in self-supervised learning for autonomous driving.
This project is licensed under the MIT License. See the LICENSE file for details.