Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

obj detection training with faster_rcnn_inception_v2_coco Argument must be a dense tensor: range(0, 3) - got shape [3], but wanted []. #3948

Closed
FalcoGer opened this issue Apr 11, 2018 · 9 comments
Assignees

Comments

@FalcoGer
Copy link

FalcoGer commented Apr 11, 2018

System information

  • What is the top-level directory of the model you are using:
    TensorFlowModels\research\object_detection
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    No, though PycocoAPI from here
    The problem also appeared without those modifications I made:
diff --git a/research/object_detection/core/box_predictor.py b/research/object_detection/core/box_predictor.py
index 6a13970..298a1dc 100644
--- a/research/object_detection/core/box_predictor.py
+++ b/research/object_detection/core/box_predictor.py
@@ -392,7 +392,7 @@ class MaskRCNNBoxPredictor(BoxPredictor):
         the proposals.
     """
     spatial_averaged_image_features = tf.reduce_mean(image_features, [1, 2],
-                                                     keep_dims=True,
+                                                     keepdims=True,
                                                      name='AvgPool')
     flattened_image_features = slim.flatten(spatial_averaged_image_features)
     if self._use_dropout:
diff --git a/research/object_detection/core/losses.py b/research/object_detection/core/losses.py
index 8bc044c..70959b6 100644
--- a/research/object_detection/core/losses.py
+++ b/research/object_detection/core/losses.py
@@ -311,7 +311,7 @@ class WeightedSoftmaxClassificationLoss(Loss):
     num_classes = prediction_tensor.get_shape().as_list()[-1]
     prediction_tensor = tf.divide(
         prediction_tensor, self._logit_scale, name='scale_logit')
-    per_row_cross_ent = (tf.nn.softmax_cross_entropy_with_logits(
+    per_row_cross_ent = (tf.nn.softmax_cross_entropy_with_logits_v2(
         labels=tf.reshape(target_tensor, [-1, num_classes]),
         logits=tf.reshape(prediction_tensor, [-1, num_classes])))
     return tf.reshape(per_row_cross_ent, tf.shape(weights)) * weights
diff --git a/research/object_detection/trainer.py b/research/object_detection/trainer.py
index cf3429a..196af61 100644
--- a/research/object_detection/trainer.py
+++ b/research/object_detection/trainer.py
@@ -225,7 +225,7 @@ def train(create_tensor_dict_fn, create_model_fn, train_config, master, task,

     # Place the global step on the device storing the variables.
     with tf.device(deploy_config.variables_device()):
-      global_step = slim.create_global_step()
+      global_step = tf.train.create_global_step()

     with tf.device(deploy_config.inputs_device()):
       input_queue = create_input_queue(
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Windows 7 x64
  • TensorFlow installed from (source or binary):
    Installed with pip
  • TensorFlow version (use command below):
    1.7.0
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version:
    CPU version
  • GPU model and memory:
    CPU version
  • Exact command to reproduce:
    python train.py --pipeline_config_path="PATH_TO_MODELS/Models/faster_rcnn_inception_v2_coco/pipeline.config" --train_dir="PATH_TO_MODELS/Models/faster_rcnn_inception_v2_coco/train"

Describe the problem

trying to run the command above with tfrecords in the form of

tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))

and the faster_rcnn_inception_v2_coco model from 2018_01_28 creates the following errors:

Source code / logs

python : WARNING:tensorflow:From C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:depth of additional conv before box predictor: 0
Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 510, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1040, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 235, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 214, in constant
    value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 442, in make_tensor_proto
    _GetDenseDimensions(values)))
ValueError: Argument must be a dense tensor: range(0, 3) - got shape [3], but wanted [].
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 524, in _apply_op_helper
    values, as_ref=input_arg.is_ref).dtype.name
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1040, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 235, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 214, in constant
    value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 442, in make_tensor_proto
    _GetDenseDimensions(values)))
ValueError: Argument must be a dense tensor: range(0, 3) - got shape [3], but wanted [].
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "train.py", line 167, in <module>
    tf.app.run()
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
    _sys.exit(main(argv))
  File "train.py", line 163, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "C:\temp\ObjDetection\TensorFlowModels\research\object_detection\trainer.py", line 255, in train
    train_config.optimizer)
  File "C:\temp\ObjDetection\TensorFlowModels\research\object_detection\builders\optimizer_builder.py", line 50, in build
    learning_rate = _create_learning_rate(config.learning_rate)
  File "C:\temp\ObjDetection\TensorFlowModels\research\object_detection\builders\optimizer_builder.py", line 109, in _create_learning_rate
    learning_rate_sequence, config.warmup)
  File "C:\temp\ObjDetection\TensorFlowModels\research\object_detection\utils\learning_schedules.py", line 169, in manual_stepping
    [0] * num_boundaries))
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\array_ops.py", line 2650, in where
    return gen_math_ops.select(condition=condition, x=x, y=y, name=name)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 7112, in select
    "Select", condition=condition, t=x, e=y, name=name)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 528, in _apply_op_helper
    (input_name, err))
ValueError: Tried to convert 't' to a tensor and failed. Error: Argument must be a dense tensor: range(0, 3) - got shape [3], but wanted [].

Note: I've been using the detection api last when tensorflow version 1.4 was current. since updating i couldn't run ssd_mobilenet_v1 anymore, since the pretrained model checkpoint seems to missmatch the configuration. so I decided to try faster_rcnn and get this issue instead.

my pipeline config:

model {
  faster_rcnn {
    num_classes: 1
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: "faster_rcnn_resnet101"
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        height_stride: 16
        width_stride: 16
        scales: 0.25
        scales: 0.5
        scales: 1.0
        scales: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 1.0
        aspect_ratios: 2.0
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.70
    first_stage_max_proposals: 100
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
        use_dropout: false
        dropout_keep_probability: 1.0
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.30
        iou_threshold: 0.60
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}
train_config {
  batch_size: 1
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  optimizer {
    momentum_optimizer {
      learning_rate {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
# ref: https://github.com/tensorflow/models/issues/3794
#          schedule {
#            step: 0
#            learning_rate: 0.0003
#          }
          schedule {
            step: 900000
            learning_rate: 3.0e-05
          }
          schedule {
            step: 1200000
            learning_rate: 3.0e-06
          }
        }
      }
      momentum_optimizer_value: 0.90
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "PATH_TO_PROJECT/Models/faster_rcnn_resnet101_coco/train/model.ckpt"
  from_detection_checkpoint: true
}

# cd							cd C:/temp/ObjDetection/TensorFlowModels/research/object_detection
# train:						python train.py --pipeline_config_path="PATH_TO_PROJECT/Models/faster_rcnn_resnet101_coco/pipeline.config" --train_dir="PATH_TO_PROJECT/Models/faster_rcnn_resnet101_coco/train"
# eval:							python eval.py --pipeline_config_path="PATH_TO_PROJECT/Models/faster_rcnn_resnet101_coco/pipeline.config" --eval_dir="PATH_TO_PROJECT/Models/faster_rcnn_resnet101_coco/eval" --checkpoint_dir="PATH_TO_PROJECT/Models/faster_rcnn_resnet101_coco/train"
# tb:							tensorboard --logdir="PATH_TO_PROJECT/Models/faster_rcnn_resnet101_coco/"
# extract graph:				python export_inference_graph.py --input_type image_tensor --pipeline_config_path "PATH_TO_PROJECT/Models/faster_rcnn_resnet101_coco/pipeline.config" --output_directory "PATH_TO_PROJECT/ExportGraph --trained_checkpoint_prefix "PATH_TO_PROJECT/Models/faster_rcnn_resnet101_coco/train/model.ckpt-REPLACETHIS"

train_input_reader {
  label_map_path: "PATH_TO_PROJECT/Data/labelMap.txt"
  shuffle: true
  tf_record_input_reader {
    input_path: "PATH_TO_PROJECT/Data/train.tfrecord"
  }
}
eval_config {
  num_examples: 8000
  max_evals: 10
  use_moving_averages: false
#  metrics_set: "coco_detection_metrics"
}
eval_input_reader {
  label_map_path: "PATH_TO_PROJECT/Data/labelMap.txt"
  shuffle: false
  num_readers: 1
  tf_record_input_reader {
    input_path: "PATH_TO_PROJECT/Data/val.tfrecord"
  }
}

PATH_TO_PROJECT of course is not really written there, but rather the path would be giving away what I'm working on, and I signed an NDA, so I replaced it here.

@FalcoGer
Copy link
Author

FalcoGer commented Apr 11, 2018

I just uninstalled tensorflow 1.7.0 and installed 1.5.0, then rerun the protoc compilation and setup.py install of the tensorflow models
the problem persists.
the same problem is also present with faster_rcnn_resnet101_coco
the same problem is also present with rfcn_resnet101_coco

ssd runs fine except for #3922 but it seems to train decently enough

@elkalash
Copy link

I'm actually having the same issue here,
WARNING :tensorflow:from"path"is deprecated and will be removed in a future version
:Instructions for updating:
Use the retry module or similar alternatives.
Traceback )most recent call last);
File "src/run_webcap.py",line 32 in module

@FalcoGer
Copy link
Author

@elkalash the depreciate warning is not the issue here.

@robieta
Copy link
Contributor

robieta commented Apr 12, 2018

Hi. Could you modify object_detection\utils\learning_schedules.py to print out global_step, boundaries, and num_boundaries and report the values?

@Lanbig
Copy link

Lanbig commented Apr 12, 2018

Got the same issue.

ValueError: Tried to convert 't' to a tensor and failed. Error: Argument must be a dense tensor: range(0, 3) - got shape [3], but wanted [].

Found the solution : #3705 (comment)

@robieta
Copy link
Contributor

robieta commented Apr 12, 2018

That does indeed seem to be the case. Thanks a lot for finding and linking that.

@robieta robieta closed this as completed Apr 12, 2018
@Edward1900
Copy link

I got the same problem in win10, how to solve it?

@FalcoGer
Copy link
Author

@yafengwa
#3705 (comment)

@mikemwx
Copy link

mikemwx commented Jul 26, 2018

I got the same error when tried to train faster_rcnn_resnet101 coco from model zoo.
I solved it by packing the list/range appeared in (maybe)
spatial_averaged_image_features = tf.reduce_mean(image_features, [1, 2], keep_dims=True, name='AvgPool')
with
tf.convert_to_tensor(list( """your range or list""" ),dtype=np.int64)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants