Skip to content

haleematallat/UniPAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

UniPAD is a novel self-supervised learning framework designed for 3D perception tasks in autonomous driving. Inspired by the paper UniPAD: A Universal Pre-training Paradigm for Autonomous Driving, this repository implements key components of UniPAD, including modular encoders for LiDAR and multi-view images, a unified volumetric representation, and a neural rendering decoder.

Overview

UniPAD leverages a unique self-supervised pre-training strategy based on 3D differentiable rendering. It learns rich geometric and semantic representations from both LiDAR point clouds and multi-view images, enabling significant improvements in downstream tasks such as 3D object detection and semantic segmentation.

Key features include:

  • Mask Generator: Strategically masks input data to enhance representation learning.
  • Modal-Specific Encoders: Extracts features from LiDAR point clouds and multi-view images.
  • Volumetric Representation: Transforms features into a unified 3D voxel space.
  • Neural Rendering Decoder: Produces continuous geometric and semantic predictions.
  • Memory-Efficient Ray Sampling: Optimizes training efficiency with minimal memory overhead.

Implemented Features

  • Modular design for encoders (LiDAR and image modalities) and decoders.
  • Neural rendering for RGB and depth prediction.
  • End-to-end training and evaluation scripts.
  • Support for the nuScenes dataset or other point cloud and image datasets.

Repository Structure

UniPAD/
├── README.md               # Project documentation
├── requirements.txt        # Dependencies for the project
├── datasets/               # Directory for datasets
├── src/                    # Source code
│   ├── data_processing.py  # Dataset loading and preprocessing
│   ├── model/              # Model components
│   │   ├── encoder.py      # Encoders for LiDAR and image data
│   │   ├── decoder.py      # Neural rendering decoder
│   │   └── unipad.py       # Full UniPAD pipeline
│   ├── training.py         # Training script
│   ├── evaluation.py       # Evaluation script
│   └── utils.py            # Utility functions
├── examples/               # Examples for using the pipeline
│   ├── 3d_detection.py     # Example for 3D object detection
│   └── segmentation.py     # Example for semantic segmentation

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/UniPAD.git
    cd UniPAD
  2. Install dependencies:

    pip install -r requirements.txt

Usage

1. Dataset Preparation

Prepare the dataset (e.g., nuScenes):

  • Store LiDAR point clouds in datasets/nuscenes/lidar.
  • Store corresponding images in datasets/nuscenes/images.

2. Training

Train the model using the examples/3d_detection.py script:

python examples/3d_detection.py

3. Evaluation

Evaluate the model using the examples/segmentation.py script:

python examples/segmentation.py

Key Components

Mask Generator

A strategic masking mechanism for both point cloud and image modalities to enhance model generalization.

Encoders

  • LiDAREncoder: Processes 3D point cloud data.
  • ImageEncoder: Extracts features from multi-view image data using a convolutional backbone.

Decoder

The NeuralRenderingDecoder uses volumetric rendering techniques to predict RGB or depth values.


Results

This implementation aims to replicate the improvements reported in the paper, such as:

  • +9.1 NDS for LiDAR-based detection.
  • +7.7 NDS for camera-based detection.
  • +6.9 NDS for LiDAR-camera fusion.

(Note: Fine-tuning may be necessary to achieve optimal performance.)


Contributing

Contributions are welcome! If you'd like to enhance this project, feel free to:

  • Submit a pull request.
  • Open an issue for bugs or feature requests.

Acknowledgements

This implementation is inspired by the paper UniPAD: A Universal Pre-training Paradigm for Autonomous Driving by Honghui Yang et al. Special thanks to the authors for their groundbreaking work in self-supervised learning for autonomous driving.


License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages