Skip to content

A huge challenge for autonomous vehicles(ACs) is to have a dataset that captures real-world multitudinous driving conditions. The currently available video datasets are not annotated & most of them aren't high resolution videos which is again an impediment for object detection. I am excited to solve this problem by annotating & generating photo-…

License

Notifications You must be signed in to change notification settings

anubhav0fnu/Synthetic-video-generation-for-Autonomous-cars

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Why I have selected this problem?

Statistically, each year up to 1.2 million deaths occur due to car accidents across the globe are caused by human errors. Autonomous vehicle technology could drastically avoid these accidents. Self-driving car companies are constantly trying to make their autonomous vehicles more robust by capturing a wide distribution of all possible driving scenarios but have failed to achieve at this point due to past recurring crashes. These autonomous systems actually learn from driving videos and the problem with currently available video datasets are

  1. Not annotated.
  2. Most of them aren't high-resolution videos which is again an impediment for object detection.

What I am offering?

An AI software solution with help of which you can:

  1. Generate Semantic Segmentation Masks for existing videos.

2. Generate photo-realistic, high-resolution new driving videos.


How my Full Architechure looks like?

It mainly constitutes of 3 components:

  1. Video to frame sequence generator using OpenCV.
  2. Generation of Semantic Segmentation masked frames for each associated frame sequences using DEEPLAB model.
  3. Generating new photo-realistic, high-resolution videos from the sequence of Semantic Segmentation masked frames using conditional Generative Networks framework.

The full architecture: (Starting from LEFT to RIGHT.)

Brief description of the main directory structure.
Folder/Files Description
src Contains the main components of the architecture.
src/data_prepration Scripts for cutting videos into frames using openCV.
src/semantic_seg_gen Generates semantic segmentation masked frames.
src/synthetic_video_gen Generates new synthetic RGB frames.
results stiched frames in form of gifs
docker Dockerfile to run the complete project as standalone application in a container(work in progress)
datasets Samples of BDD & Citycapes datasets used in the project.
utility python implementation of useful decorators. (assist development)
README.md Overview of the project & guide to use this codebase.

Diving into codebase:

Pre-Requisites:

I have used AWS cloud sources for implementation, training/testing/validation of models. The Deep Learning AMI (Ubuntu) has provided stable pre-intalled conda environments for TensorFlow 1.10.0 & PyTorch 0.4.1 with CUDA 9.0. First time users of AWS could use this to setup there environment.

The p2.xlarge instance was used for second component & p3.2xlarge instance for third component as described below.

EC2-instance size GPUs GPU Memory (GB)
p2.xlarge 1 (NVIDIA K80) --
p3.2xlarge 1 (NVIDIA Tesla V100) 16

I have also used S3 bucket for data dumps using a python script. You could also leverage my hacks & different IDE integration options here to quickly get started working on cloud sources on a local workstation.

Setting it up!

Once you have setup your AWS AMI for an EC2 instance, ssh into the machine, and follow the below instructions:

Installation:

Switch to tensorflow_p36 conda environment & install :

Component 1: opencv pillow

pip install opencv-python
pip install Pillow==2.2.1

Component 2: tf Slim ( It picks up the binaries from tensorflow installation!)

jupyter notebook (Needed for quick visualization.)

conda install -c anaconda jupyter

Now, switch to pytorch_p36 conda environment & install :

Component 3:

dominate

pip install dominate requests

Download and compile a snapshot of FlowNet2 by running:

python src/synthetic_video_gen/scripts/download_flownet2.py

NOTE: Very soon, I will be providing a dockerfile that which will take care of you environment & repository setup in a docker container as explained above.

Component 1: Data prepration.

src/data_prepration/data_prep.py

This script uses OpenCV to cut videos in to frames and save it into the desired folder.

Component 2: Semantic Segmentation mask generation.

The below directory structure is needed for src/semantic_seg_gen because it contains deeplab code & datasets in TFrecord format. For this component, I have used a well documented pre-existing implementation of deeplab.

As, GitHub doesn't support large files, I have written extra instructions while describing the directory structure. Datasets, models, & frozen graphs could be downloaded using this script.


semantic_seg_gen
├── bdd100k (dataset-1)
│   ├── bdd100kscripts
│   ├── checkpoints
│   ├── exp (create these directories, required by deeplab scripts.)
│   │   └── train_on_train_set
│   │       ├── test
│   │       ├── train
│   │       └── val
│   ├── images
│   ├── labels
│   └── tfrecord
├── build_cityscapes_data.py (find scripts at "https://github.com/tensorflow/models/tree/master/research/deeplab/datasets/")
├── build_data.py
├── cityscapes (dataset-2)
│   ├── checkpoints
│   │   └── deeplabv3_cityscapes_train
│   ├── cityscapesscripts (maintain hierarchy, git clone "https://github.com/mcordts/cityscapesscripts.git")
│   │   ├── annotation
│   │   ├── evaluation
│   │   ├── helpers
│   │   ├── preparation
│   │   └── viewer
│   ├── exp
│   │   └── train_on_train_set
│   │       ├── eval
│   │       ├── train
│   │       └── vis
│   │           ├── raw_segmentation_results
│   │           └── segmentation_results
│   ├── gtfine (login & download the "gtfine_trainvaltest.zip" dataset)
│   │   ├── test
│   │   ├── train
│   │   └── val
│   ├── leftimg8bit (login & download the "leftimg8bit_trainvaltest.zip" dataset)
│   │   ├── test
│   │   ├── train
│   │   └── val
│   └── tfrecord (filled by "convert_cityscapes.sh" script.)
├── convert_cityscapes.sh (split data into train & val sets & converts totfrecords's shards.)
├── deeplab ( git clone https://github.com/tensorflow/models/blob/master/research/deeplab)
│   └── ...
├── deeplab_train_1.sh (script to run deeplab/train.py)
├── deeplab_eval_1.sh   (script to run deeplab/eval.py)
├── deeplab_vis_1.sh   (script to run deeplab/vis.py)
└── download_data_in_dir.sh (after creating above directory structure, could be used for populating directories.)

The deeplab implementation is in tensorflow, and we need to first convert our dataset into TFrecord. You can use this script for this purpose. Once you have your dataset in a proper format, start with training and evaluation.

The second froze checkpoint was used for evaluation and comparing results among other 3 will be posted soon here.


Number checkpoint name pre-trained dataset
1 deeplab_cityscapes_xception71_trainfine ImageNet+ MS-COCO + {Cityscapes train_fine set}
2 deeplabv3_cityscapes_train ImageNet+ {Cityscapes train_fine set}
3 deeplab_cityscapes_xception71_trainvalfine ImageNet+ MS-COCO+ {Cityscapes trainval_fine and coarse set}

Training:

src/semantic_seg_gen/deepLab_train_1.sh

It is the local training job using xception_65 model. I have highlighted some of the problems I have faced during training in the comments within the script.

Evaluation:

Later, using the latest checkpoint collected in src/semantic_seg_gen/cityscapes/exp/train_on_train_set/train/train_00_result directory, we could generate the semantic masks for our sequence of images.

Component 3: Video Synthesis using Conditional GAN.

Once we have our labels a.k.a Semantic Segmentation Masks (SSM) available for our videos(sequence of frames). We can start with

Training:

For single GPU, use

src/synthetic_video_gen/scripts/street/test_g1_1024.sh

For multiple GPUs use

src/synthetic_video_gen/scripts/street/train_2048.sh

Testing:

For single GPU, use

src/synthetic_video_gen/scripts/street/test_g1_1024.sh

For multiple GPUs use

src/synthetic_video_gen/scripts/street/test_2048.sh

The below are results on two scales. In each category the initial results are based on 3 inputs to the sequential generator : current (SSM), previous 2 SSMs, previous 2 generated synthetic frames. and later results are based on an additional 4th input to the generator which is the foreground feature of the input SSM's.

Medium(scale: 1024) trained Generator model.

Before


After


Fine (scale: 2048) trained Generator model.

Before


After


Future Steps:

I will continue developing on top of the current codebase and below are some of my ideas:

  1. Adding support for Multi-modal synthesis & semantic Manipulation techniques to generate rare events in the videos.
  2. Perfoming a Joint training of the two Neural networks (DeepLab NN + Conditional Generative adverserial NN).
  3. Planning to add support of CARLA: An Open Urban Driving Simulator inorder to evaluate the newly generated synthetic videos within a self-driving car environment. I need to override my dataset as an input to their simulator.
  4. Benchmarking the results from the current approach with others such as PIX-2-PIXHD & COVST approaches.
  5. Coming up with an Evaluation metric!

**Credits: I would like to thank the authors for providing vid2vid framework in pyTorch and DeepLab implementation in tensorFlow.

About

A huge challenge for autonomous vehicles(ACs) is to have a dataset that captures real-world multitudinous driving conditions. The currently available video datasets are not annotated & most of them aren't high resolution videos which is again an impediment for object detection. I am excited to solve this problem by annotating & generating photo-…

Topics

Resources

License

Stars

Watchers

Forks