tf_object_detection

custom object detection using tensorflow object-detection-api

overview

Setup
How to make your own Dataset
Train on your own custom Object
predict object on your own trained model

Setup

Go to tensorflow/model clone this repository
To run your tensorflow-object-detection-api we need to configure add some Dependencies
```
# For CPU
pip install tensorflow
# For GPU
pip install tensorflow-gpu
```

The remaining libraries can be installed

sudo apt-get install protobuf-compiler python-pil python-lxml python-tk
pip install --user Cython
pip install --user contextlib2
pip install --user jupyter
pip install --user matplotlib`

Protobuf Compilation

The Tensorflow Object Detection API uses Protobufs to configure model and training parameters. Before the framework can be used, the Protobuf libraries must be compiled. This should be done by running the following command from the tensorflow/models/research/ directory:

# From tensorflow/models/research/
protoc object_detection/protos/*.proto --python_out=.

Note: If you're getting errors while compiling, you might be using an incompatible protobuf compiler. If that's the case, use the following manual installation

Manual protobuf-compiler installation and usage

If you are on linux:

Download and install the 3.0 release of protoc, then unzip the file.

# From tensorflow/models/research/
wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip
unzip protobuf.zip

Run the compilation process again, but use the downloaded version of protoc

# From tensorflow/models/research/
./bin/protoc object_detection/protos/*.proto --python_out=.

Add Libraries to PYTHONPATH

When running locally, the tensorflow/models/research/ and slim directories should be appended to PYTHONPATH. This can be done by running the following from tensorflow/models/research/:

# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

# run this command to setup in your locally
From tensorflow/models/research/
sudo python3 setup.py install

For more detail how to configure and run your objection-detection-api on other platform other than LINUX go to https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md

Now once you install and configure cd models/research/object_detection then run py typing Jupyter Notebook in command line open object_detection_tutorial.ipynb and run all the cell.

Using on you'r own dataset

make your own dataset

Open terminal make a new dir of name images mkdir images
Download any images from google into your images directory for example I'm using stapler images download atleast 250-300 images.
Once you Download all the images then we need to convert all the images into .xml file for further use to convert it into .csv file so for that go to Labeling-images and clone the repo by using git clone https://github.com/tzutalin/labelImg.git
Once you clone it now you need to setup envoirment to use it run the below command

sudo apt-get install pyqt5-dev-tools
sudo pip3 install -r requirements/requirements-linux-python3.txt
cd labelImg
make qt5py3
python3 labelImg.py
python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]

For more detail about installation on other platform other than LINUX and how to convert images into .xml file go to https://github.com/tzutalin/labelImg or refer to this video

Once you done that make two dir in your images directory test and train copy 90% of your images with thier xml file into train directory and 10% of remaning images with thier xml into test directory
Make a new dir mkdir custom_object_detection move you'r images directory itno it. In the same dir i.e custom_object_detection make two file xml_to_csv.py and generate_tfrecord.py and also make training d directory and data directory which we will use later.

At this point, you should have the following structure,

custom_object_detection
-data/
--test_labels.csv
--train_labels.csv
-images/
--test/
---testingimages.jpg
--train/
---testingimages.jpg
--...yourimages.jpg
-training
-xml_to_csv.py
-generate_tfrecord.py

Open xml_to_csv.py into you'r favourite editor copy below code into it.

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET


def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


def main():
    for directory in ['train','test']: # looping to all your train and test dir 
        image_path = os.path.join(os.getcwd(), 'images/{}'.format(directory)) 
        xml_df = xml_to_csv(image_path)    # converting xml to csv
        xml_df.to_csv('data/{}_labels.csv'.format(directory), index=None) # saving it into two different file train and test 
                                                                          # into data directory
        print('Successfully converted xml to csv.')


main()

and save the file

Now open generate_tfrecord.py and copy the below code into it.

"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python3 generate_tfrecord.py --csv_input=data/train_labels.csv  --output_path=data/train.record

  # Create test data:
  python3 generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=data/test.record
"""
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('image_dir', '', 'Path to images')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
def class_text_to_int(row_label):
    if row_label == 'stapler':
        return 1
    else:
        None


def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(FLAGS.image_dir)
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())

    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

and save it

Note:- If you have more than one class change class_text_to_init function into

# if you want to more class just add more if condition

def class_text_to_int(row_label):
    if row_label == 'stapler':
        return 1  
    else:
        None

Now run You'r xml_to_csv.py by using python xml_to_csv.py this will convert all your images into scv file and stored it into data/train_labels.csv and data/test_labels.csv
Now run generate_tfrecord.py by using

python3 generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.record --image_dir=images/train

python3 generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record --image_dir=images/test

this will create test.record and train.record file into data/ directory

Here, we have two options. We can use a pre-trained model, and then use transfer learning to learn a new object, or we could learn new objects entirely from scratch. The benefit of transfer learning is that training can be much quicker, and the required data that you might need is much less. For this reason, we're going to be doing transfer learning here.

TensorFlow has quite a few pre-trained models with checkpoint files available, along with configuration files. You can do all of this yourself if you like by checking out their configuring jobs documentation. The object API also provides some sample configurations to choose from.

I am going to go with mobilenet, using the following checkpoint and configuration file

Run the below code in you'r terminal in custom_object_detection folder.

wget https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/samples/configs/ssd_mobilenet_v1_pets.config

wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz

After downloading tar.gz file extrat that file in same directory.
Open ssd_mobilenet_v1_pets.config in you'r favourite editor change the below code or paste all the given below code into it.

# SSD with Mobilenet v1, configured for the stapler dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "${YOUR_GCS_BUCKET}" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 1  # if you have more than one class replace 1 with your number of class
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 10    # you can also change the batch size
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true
  # Note: The below line limits the training process to 20K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 20000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "data/train.record"   # input training data 
  }
  label_map_path: "training/object-detection.pbtxt"  # path where label_map
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  num_examples: 1100
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "data/test.record"   # input test data
  }
  label_map_path: "training/object-detection.pbtxt"  # path where label_map
  shuffle: false
  num_readers: 1
}

Inside training dir, add object-detection.pbtxt:

# you can add more item as per you class length in my case there is only one class
item {
  id: 1
  name: 'stapler'
}

Now copy data, images, ssd_mobilenet_v1_coco_11_06_2017, training directory and ssd_mobilenet_v1_pets.config file into tensorflow model models/research/object_detection .
move ssd_mobilenet_v1_pets.config into training directory .

16.And now, the moment of truth! From within models/object_detection:.

python3 train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config

Barring errors, you should see output like:

INFO:tensorflow:global step 11788: loss = 0.6717 (0.398 sec/step)
INFO:tensorflow:global step 11789: loss = 0.5310 (0.436 sec/step)
INFO:tensorflow:global step 11790: loss = 0.6614 (0.405 sec/step)
INFO:tensorflow:global step 11791: loss = 0.7758 (0.460 sec/step)
INFO:tensorflow:global step 11792: loss = 0.7164 (0.378 sec/step)
INFO:tensorflow:global step 11793: loss = 0.8096 (0.393 sec/step)

Your steps start at 1 and the loss will be much higher. Depending on your GPU and how much training data you have, this process will take varying amounts of time. On something like a 1080ti, it should take only about an hour or so. If you have a lot of training data, it might take much longer. You want to shoot for a loss of about ~1 on average (or lower). I wouldn't stop training until you are for sure under 2. You can check how the model is doing via TensorBoard. Your models/object_detection/training directory will have new event files that can be viewed via TensorBoard.

From models/object_detection, via terminal, you start TensorBoard with:

tensorboard --logdir='training'

This runs on 127.0.0.1:6006 (visit in your browser)

Looks good enough, but does it detect stapler?!

In order to use the model to detect things, we need to export the graph

Luckily for us, in the models/research/object_detection directory, there is a script that does this for us: export_inference_graph.py

To run this, you just need to pass in your checkpoint and your pipeline config, then wherever you want the inference graph to be placed. For example:

python3 export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path training/ssd_mobilenet_v1_pets.config \
    --trained_checkpoint_prefix training/model.ckpt-10856 \   
    --output_directory stappler_model_graphs

Your checkpoint files should be in the training directory. Just look for the one with the largest step (the largest number after the dash), and that's the one you want to use. Next, make sure the pipeline_config_path is set to whatever config file you chose, and then finally choose the name for the output directory, I went with stappler_model_graphs

Run the above command from models/research/object_detection

If you get an error about no module named 'nets', then you need to re run:

# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
# switch back to object_detection after this and re run the above command

Otherwise, you should have a new directory, in my case, mine is stappler_model_graphs, inside it, I have new checkpoint data, a saved_model directory, and, most importantly, the forzen_inference_graph.pb file.

Now, we're just going to use the sample notebook, edit it, and see how our model does on some testing images. I copied some of my models/research/object_detection/images/test images into the models/research/object_detection/test_images directory, and renamed them to be image3.jpg, image4.jpg...etc. or you can download new image and move it into models/research/object_detection/test_images directory

# What model to download.
MODEL_NAME = 'stappler_model_graphs'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('training', 'object-detection.pbtxt')

Next, we can just delete the entire Download Model section, since we don't need to download anymore.

Finally, in the Detection section, change the TEST_IMAGE_PATHS var to:

# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'Images{}.jpg'.format(i)) for i in range(3,6) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)

With that, you can go to the Cell menu option, and then "Run All."

Here are a few of my results:

Overall, I am extremely pleased at how well this all works, and, even when you have a very small dataset, you can still have success, and only need to train a model for about an hour (on a decent GPU anyway) using transfer learning. Very cool!

Extra!! If you want to test your object_detction_tf_api model real time download the object_detection_tutorial_tensorflow.py put it into models/research/objection_detection/ and run it by python object_detection_tutorial_tensorflow.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

tf_object_detection

overview

Setup

Protobuf Compilation

Manual protobuf-compiler installation and usage

Add Libraries to PYTHONPATH

For more detail how to configure and run your objection-detection-api on other platform other than LINUX go to https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md

Using on you'r own dataset

make your own dataset

For more detail about installation on other platform other than LINUX and how to convert images into .xml file go to https://github.com/tzutalin/labelImg or refer to this video

At this point, you should have the following structure,

Files

README.md

Latest commit

History

README.md

File metadata and controls

tf_object_detection

overview

Setup

Protobuf Compilation

Manual protobuf-compiler installation and usage

Add Libraries to PYTHONPATH

For more detail how to configure and run your objection-detection-api on other platform other than LINUX go to https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md

Using on you'r own dataset

make your own dataset

For more detail about installation on other platform other than LINUX and how to convert images into .xml file go to https://github.com/tzutalin/labelImg or refer to this video

At this point, you should have the following structure,