Object Detection

Train the Object Detection TF API from scratch using Google Colab

We are using the TensorFlow Object Detection API to retrain the models with the objects that we need. Check installation and more info there. All this has been tried for python2/3 and tensorflow==1.15.

Colab offers free access to a computer that has reasonable GPU good for training, it is a cloud service based on Jupyter Notebooks and internet connectivity is required for access.

Preprocessing

The first step is to take pictures of the objects to train, after doing that, you have to reduce the resolution of the pictures (this reduces training time) and split the images into train/test folders.

run the transform_image_resolution.py script:

python transform_image_resolution.py -d ../images/ -s 800 600

-d is the directory containing all the images.
-s is the resolution that will be applied to the images.

run the split_images script.

python split_images.py -d ../images/complete_dataset -o ../images --train 80

-d is the directory containing all the images.
-o is the directory where the train and test folder will be created.
--train is the percentage of images that will be used for training, 80% for train and %20 for test is the recommended.

Now you need to label the images, use the labelimg open source tool to label all the pictures in both train/test directories (this is a tedious and long process).

labeling

As of this point you should have a image directory that contains your train and test images with respective xml file of each images.

Download the generate_tfrecord.py script and change the class_text_to_int function by adding your own labels.

def  class_text_to_int(row_label):
	if row_label ==  'powerade':
		return  1
	elif row_label ==  'chocolate':
		return  2
	elif row_label ==  'dr_pepper':
		return  3
	elif row_label ==  'danup':
		return  4
	else:
		None

Download the labelmap file and change the id and name to your own labels. NOTE: Be consistent with the id you wrote on the class_text_to_int function.

We are using the Faster-RCNN-Inception-V2 model. Download the model here. Open the downloaded faster_rcnn_inception_v2_coco_2018_01_28.tar.gz file with a file archiver and extract the faster_rcnn_inception_v2_coco_2018_01_28 folder.

Download the faster_rcnn_inception_v2_pets file. Here you can find several parameters like batch size, learning rate, etc. Then, change:

Line 9. Change num_classes to the number of different objects you want the classifier to detect.
Line 130. Change num_examples to the number of images you have in the \images\test directory.

Setup Google Colab Notebook

Create a directory in your google drive.
Download the train_model notebook from the RoBorregos @Home repo.
Go to Colab, sign in with the same account you used to create the directory, create a new notebook and open the train_model notebook.
In the notebook go to Runtime > Change Runtime Type and make sure to select GPU as Hardware accelerator.
Click connect to start using the notebook

The first command is to check if you are using GPU.

import tensorflow as tf

device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
	raise  SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

You should see Found GPU at: /device:GPU:0

The second command is to mount Google Drive with the notebook, click on the link. Then sign in with your google drive account, and grant it access. you will be redirected to a page, copy the code on that page and paste it in the text-box of the Colab session you are running.

The following commands are very straightforward, just make sure to change the paths by replacing the name of your folder. E.g:

cd /content/gdrive/My Drive/@Home/models/research/

To:

cd /content/gdrive/My Drive/Your_Folder/models/research/

After cloning the repo, you have to upload the previous files you edited to the object_detection folder.

Upload the generate_tfrecord.py script to your_folder/models/research/object_detection
Download the xml_to_csv.py script and upload it to your_folder/models/research/object_detection
Create a folder in your_folder/models/research/object_detection named training and upload the labelmap.pbtxt and faster_rcnn_inception_v2_pets.config files.
Create a folder in your_folder/models/research/object_detection named images and upload the test/train folders containing the images and xml files.
Upload the faster_rcnn_inception_v2_coco_2018_01_28 folder to the object_detection folder.

Follow the notebook and continue with the commands.

Training

Once you run the xml_to_csv and generate_tfrecords scripts, the next step is to run the train.py script to start the training, just keep following the notebook and after you run the train script in absence of errors, you should see and output like this:

INFO:tensorflow:global step 1: loss = 25.45 (5.327 sec/step)

........  
........

INFO:tensorflow:global step 1350: loss = 0.6345 (0.231 sec/step)  
INFO:tensorflow:global step 1351: loss = 0.5220 (0.332 sec/step)  
INFO:tensorflow:global step 1352: loss = 0.6718 (0.133 sec/step)  
INFO:tensorflow:global step 1353: loss = 0.6758 (0.432 sec/step)  
INFO:tensorflow:global step 1354: loss = 0.7454 (0.452 sec/step)  
INFO:tensorflow:global step 1355: loss = 0.8354 (0.323 sec/step)

As we are running this in Colab, with 3-4 hours its pretty enough, every 5 minutes aprox the changes are automatically saved in the training folder. Stop the training with CTRL + C.

The final step is to export your inference graph, the command is already in the notebook, the only thing you have to change is the --trained_checkpoint_prefix flag. Go to the training folder and you will see some of the last checkpoints the model saved, copy the id of the last checkpoint and change it on the command: --trained_checkpoint_prefix training/model.ckpt-158879

traning_folder

Finally, you can run the code in the last shell of the notebook to test your model, take new pictures to test the model and place them on a folder named test_images. Remember you should use for inference at least the same tf version that the one used for training.

Change the IMAGE_NAME on the script:

IMAGE_NAME = 'test_images/IMG_0681.jpg'

Run the shell, you should see the image with the objects detected:

test_model

Download the new TF2 Model

In order to use the pre-trained object detection model, please make sure to follow the next release forms

Using the Object Detection Python Scripts

Some useful scripts have been written under object_detection/scripts to handle different type of output detections.

In order to run the scripts just run the following command in the shell with the desired script:

python scripts/object_detection_image.py

Most of the python scripts have been optimized to be run from any directory within robocup-home:

base_directory = 'object_detection'
path_to_file = os.path.abspath(__file__) # get absolute path to current working file
index_of_base_directory = path_to_file.find(base_directory)
WORKING_DIR = path_to_file[0:index_of_base_directory + len(base_directory)]

The model and label_map files are relative to the file been run and the working directory:

MODEL_NAME = 'saved_model'
CWD_PATH = os.path.join(WORKING_DIR, 'models', 'model_tf2')
PATH_TO_SAVED_MODEL = os.path.join(CWD_PATH, MODEL_NAME)
PATH_TO_LABELS = os.path.join(CWD_PATH,'label_map.pbtxt')

Detection with live video

video_detection.py is a script dedicated to perform object_detection from live feed from either a pc webcam or the Intel D435i:

# load model from get_object_and_coordinates
run_inferance_on_image = load_model()

if use_intelRS_camera == True:
  pipeline = create_intelrs_pipeline()
  run_inference_with_intel_camera(run_inferance_on_image, pipeline, show_video)
else:
  cap = cv2.VideoCapture(0)
  run_inference_with_pc_camera(run_inferance_on_image, cap, show_video)

In order to decide which device is going to be used, the -i flag is needed to choose if the Intel Camera will be used and the -s flag to enable streaming of the video to see the live detections:

python scripts/video_detection.py -i True -s False

Home
System Architecture
Speech
- Requirements
Navigation
- Action Server
- Map Contextualizer
- Base Control
Computers
ROS
- Across multiple machines
Vision
- Object Detection
- Face Recognition
- Clothes Detection
Main Engine and Parser
Robot Structure
Coding
Continuous Integration
Docker Usage
Docs and References

Provide feedback

Saved searches

Use saved searches to filter your results more quickly