Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training centered_instance model with input_scaling < 1.0 #1993

Open
BenjaminBo opened this issue Oct 10, 2024 · 6 comments
Open

Training centered_instance model with input_scaling < 1.0 #1993

BenjaminBo opened this issue Oct 10, 2024 · 6 comments
Assignees
Labels
2024-hackathon bug Something isn't working

Comments

@BenjaminBo
Copy link

Bug description

SLEAP: 1.3.4
TensorFlow: 2.7.0
Numpy: 1.19.5
Python: 3.7.12
OS: Linux-5.15.0-122-generic-x86_64-with-debian-bookworm-sid
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: None

I am training the topdown model remotely on a cluster server, since I don't have a local GPU to properly train a sleap model on.
My videos are pretty high in dimensionalilty (2160, 3840, 3), not allowing me to train in their original size. Which brings me to the input_size parameter:
For the centroid model everything works fine. I can set the input scale to <1.0 and it runs without issues.
Running the centered_instance model though, is when I get the following error (here with input_scale = 0.4):

File "main.py", line 80, in <module>
    loop_topdown()
  File "main.py", line 68, in loop_topdown 
    enforced_max_size=2048)
  File "/home/bboche/repos/cimd-micera/utils.py", line 28, in wrap
    result = f(*args, **kw)
  File "/home/bboche/repos/cimd-micera/train_sleap.py", line 137, in train_topdown
    trainer.train()
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 941, in train
    verbose=2,
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/callbacks.py", line 280, in on_epoch_end
    figure = self.plot_fn()
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1346, in <lambda>
    viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1326, in visualize_example
    preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 2088, in call
    out = self.keras_model(crops)
ValueError: Exception encountered when calling layer "find_instance_peaks" (type FindInstancePeaks).

Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 1536, 1536, 3), found shape=(1, 614, 614, 3)

This is interesting, because the non-scaled input size, with auto_crop on, is 3840.

Assumption 1

What seems to be happening is that input_scaling is applied twice, since
3840 * 0,4 = 1536 and
1536 * 0,4 = 614 (rounded).

So I try to find the section where it is applied repetitively.
Following the error message above I looked at the following code section in #sleap/nn/training.py, line 1315-1340:

 # Create an instance peak finding layer.
        find_peaks = FindInstancePeaks(
            keras_model=self.keras_model,
            input_scale=self.config.data.preprocessing.input_scaling,
            peak_threshold=0.2,
            refinement="local",
            return_confmaps=True,
        )

        def visualize_example(example):
            # Find peaks by evaluating model.
            preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
            img = example["instance_image"].numpy()
            cms = preds["instance_confmaps"][0][0].numpy()
            pts_gt = example["center_instance"].numpy()
            pts_pr = preds["instance_peaks"][0][0].numpy()

            scale = 1.0
            if img.shape[0] < 512:
                scale = 2.0
            if img.shape[0] < 256:
                scale = 4.0
            fig = plot_img(img, dpi=72 * scale, scale=scale)
            plot_confmaps(cms, output_scale=cms.shape[0] / img.shape[0])
            plot_peaks(pts_gt, pts_pr, paired=True)
            return fig

, more specifically:

1318    input_scale=self.config.data.preprocessing.input_scaling,

Assumption 2

This might be the section that applies the scaling factor repetitively.
It also seems to be "less relevant", since I understand it to be a section handling visualization only.
I adjust line 1318 in the following way:

1318    input_scale=1.0,

As opposed to before, the training runs through now. But I get the following error next:

Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?
Traceback (most recent call last):
  File "main.py", line 82, in <module>
    loop_topdown()
  File "main.py", line 70, in loop_topdown 
    enforced_max_size=2048)
  File "/home/bboche/repos/cimd-micera/utils.py", line 28, in wrap
    result = f(*args, **kw)
  File "/home/bboche/repos/cimd-micera/train_sleap.py", line 137, in train_topdown
    trainer.train()
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 953, in train
    self.evaluate()
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 966, in evaluate
    split_name="train",
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/evals.py", line 744, in evaluate_model
    labels_pr: Labels = predictor.predict(labels_gt, make_labels=True)
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 526, in predict
    self._make_labeled_frames_from_generator(generator, data)
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 2633, in _make_labeled_frames_from_generator
    for ex in generator:
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 436, in _predict_generator
    ex = process_batch(ex)
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 399, in process_batch
    preds = self.inference_model.predict_on_batch(ex, numpy=True)
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 1069, in predict_on_batch
    outs = super().predict_on_batch(data, **kwargs)
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/keras/engine/training.py", line 1986, in predict_on_batch
    outputs = self.predict_function(iterator)
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1129, in autograph_handler
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/keras/engine/training.py", line 1621, in predict_function  *
        return step_function(self, iterator)
    File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/keras/engine/training.py", line 1611, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/keras/engine/training.py", line 1604, in run_step  **
        outputs = model.predict_step(data) 
    File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/keras/engine/training.py", line 1572, in predict_step
        return self(x, training=False)
    File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None

    ValueError: Exception encountered when calling layer "top_down_inference_model" (type TopDownInferenceModel).

    in user code:

        File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 2265, in call  *
            peaks_output = self.instance_peaks(crop_output)
        File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler  **
            raise e.with_traceback(filtered_tb) from None

        ValueError: Exception encountered when calling layer "find_instance_peaks_1" (type FindInstancePeaks).

        in user code:

            File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 2088, in call  *
                out = self.keras_model(crops)
            File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler  **
                raise e.with_traceback(filtered_tb) from None
            File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/keras/engine/input_spec.py", line 263, in assert_input_compatibility
                raise ValueError(f'Input {input_index} of layer "{layer_name}" is '

            ValueError: Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 1536, 1536, 3), found shape=(None, 614, 614, 3)


        Call arguments received:
          • inputs={'crops': 'tf.RaggedTensor(values=Tensor("top_down_inference_model/centroid_crop_ground_truth/Reshape:0", shape=(None, 1536, 1536, 3), dtype=uint8), row_splits=Tensor("top_down_inference_model/centroid_crop_ground_truth/RaggedFromValueRowIds/control_dependency:0", shape=(None,), dtype=int64))', 'crop_offsets': 'tf.RaggedTensor(values=Tensor("top_down_inference_model/centroid_crop_ground_truth/sub:0", shape=(None, 2), dtype=float32), row_splits=Tensor("top_down_inference_model/centroid_crop_ground_truth/RaggedFromValueRowIds_1/control_dependency:0", shape=(None,), dtype=int64))', 'centroids': 'tf.RaggedTensor(values=Tensor("RaggedFromVariant/RaggedTensorFromVariant:1", shape=(None, 2), dtype=float32), row_splits=Tensor("RaggedFromVariant/RaggedTensorFromVariant:0", shape=(5,), dtype=int64))', 'centroid_vals': 'tf.RaggedTensor(values=Tensor("top_down_inference_model/centroid_crop_ground_truth/ones:0", shape=(None,), dtype=float32), row_splits=Tensor("top_down_inference_model/centroid_crop_ground_truth/RaggedFromValueRowIds_2/control_dependency:0", shape=(None,), dtype=int64))'}


    Call arguments received:
      • example={'image': 'tf.Tensor(shape=(4, 2160, 3840, 3), dtype=uint8)', 'raw_image_size': 'tf.Tensor(shape=(4, 3), dtype=int32)', 'example_ind': 'tf.Tensor(shape=(4, 1), dtype=int64)', 'video_ind': 'tf.Tensor(shape=(4, 1), dtype=int32)', 'frame_ind': 'tf.Tensor(shape=(4, 1), dtype=int64)', 'scale': 'tf.Tensor(shape=(4, 2), dtype=float32)', 'instances': 'tf.RaggedTensor(values=tf.RaggedTensor(values=Tensor("RaggedFromVariant_1/RaggedTensorFromVariant:2", shape=(None, 2), dtype=float32), row_splits=Tensor("RaggedFromVariant_1/RaggedTensorFromVariant:1", shape=(None,), dtype=int64)), row_splits=Tensor("RaggedFromVariant_1/RaggedTensorFromVariant:0", shape=(5,), dtype=int64))', 'skeleton_inds': 'tf.RaggedTensor(values=Tensor("RaggedFromVariant_2/RaggedTensorFromVariant:1", shape=(None,), dtype=int32), row_splits=Tensor("RaggedFromVariant_2/RaggedTensorFromVariant:0", shape=(5,), dtype=int64))', 'track_inds': 'tf.RaggedTensor(values=Tensor("RaggedFromVariant_3/RaggedTensorFromVariant:1", shape=(None,), dtype=int32), row_splits=Tensor("RaggedFromVariant_3/RaggedTensorFromVariant:0", shape=(5,), dtype=int64))', 'n_tracks': 'tf.Tensor(shape=(4, 1), dtype=int32)', 'offset_x': 'tf.Tensor(shape=(4, 1), dtype=int32)', 'offset_y': 'tf.Tensor(shape=(4, 1), dtype=int32)', 'centroids': 'tf.RaggedTensor(values=Tensor("RaggedFromVariant/RaggedTensorFromVariant:1", shape=(None, 2), dtype=float32), row_splits=Tensor("RaggedFromVariant/RaggedTensorFromVariant:0", shape=(5,), dtype=int64))'}

terminate called without an active exception
/usr/local/bin/run_and_cleanup.sh: line 26: 2313990 Aborted                 (core dumped) "$PROGRAM_PATH" "$TRAIN_FILE" "${@:3}"
Cleaning for NAME: and PPID:2313990
No child processes running.
Finished!

This is where I get confused.
This suggest that the section before might've not only been handling visual issues? Or that the problem runs deeper? But the error above only happens after training, not in between epochs.
At this point there, must've been images from both the training- and validation-set of size 1536 running through the model, unless I understand something wrong. Why did I not get errors then? Also: are the assumptions that I made above incorrect?

Scaling

Also at this point I am not sure if I am addressing my core issue. I want to scale the input data down. I do not want to crop it (which works with an input_scale of 1.0). The reason being that I need all of the information in the frames.

  1. I thought that would be possible with the usage of input_scaling. I will try to further understand the error above and maybe adjust your code in a way to make it work for me, but I am not sure if I can. Can you help me with the issues that I described above?
  2. Another thought that I had, was to scale the data down in the dataset. In the case of sleap (afaik), the dataset is a Labels()-object. My problem is that I am a bit overwhelmed and not quite sure where to start, since the object looks like this:
special variables
function variables
all_instances =
[Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), ...]
has_missing_videos =
False
has_predicted_instances =
False
has_user_instances =
True
is_multi_instance =
False
labeled_frames =
[LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), ...]
labels =
[LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), ...]
max_user_instances =
1
min_user_instances =
1
negative_anchors =
{}
nodes =
[Node(name='forelegR_...eight=1.0), Node(name='forelegR1...eight=1.0), Node(name='hindlegR1...eight=1.0), Node(name='hindlegL1...eight=1.0), Node(name='hindlegL1...eight=1.0), Node(name='forelegR1...eight=1.0), Node(name='forelegL_...eight=1.0), Node(name='hindlegL_...eight=1.0), Node(name='hindlegL_...eight=1.0), Node(name='forelegR1...eight=1.0), Node(name='forelegL_...eight=1.0), Node(name='forelegR1...eight=1.0), Node(name='forelegR1...eight=1.0), Node(name='hindlegR_...eight=1.0), ...]
predicted_instances =
[]
provenance =
{}
skeleton =
Skeleton(name='Skeleton-1', description='None', nodes=['nose', 'chin', 'middle', 'tail', 'tail_middle', 'tail_tip', 'hindlegR_5s', 'forelegL_5a', 'forelegR_5a', 'forelegL_2a', 'forelegR_1a', 'hindlegR_4a', 'forelegR_2a', 'forelegL_4a', 'hindlegR_3s', 'forelegL_4s', 'hindlegR_1a', 'hindlegR_2s', 'forelegL_3a', 'forelegL_5s', 'hindlegL_5a', 'hindlegL_2a', 'hindlegR_1s', 'hindlegL_1a', 'forelegR_4a', 'forelegR_3a', 'hindlegL_3s', 'hindlegL_4a', 'hindlegR', 'hindlegL_5s', 'hindlegR_4s', 'hindlegR_5a', 'hindlegL_1s', 'forelegL_3s', 'forelegL', 'hindlegR_3a', 'hindlegR_2a', 'forelegL_1s', 'hindlegL_4s', 'forelegR_1s', 'forelegL_1a', 'hindlegL', 'hindlegL_2s', 'forelegL_2s', 'forelegR', 'forelegR_4s', 'forelegR_2s', 'hindlegL_3a', 'forelegR_3s', 'forelegR_5s'], edges=[('middle', 'chin'), ('chin', 'nose'), ('middle', 'hindlegR'), ('middle', 'hindlegL'), ('middle', 'forelegR'), ('middle', 'forelegL'), ('middle', 'tail'), ('forelegR', 'forelegR_1a'), ('forelegR', 'forelegR_2a'), ('forelegR', 'forelegR_3a'), ('forelegR'...
skeletons =
[Skeleton(name='Skele...egR_5s')])]
suggestions =
[SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), SuggestionFrame(vide...roup=None), ...]
tracks =
[]
unlabeled_suggestions =
[]
user_instances =
[Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), Instance(video=Video...rack=None), ...]
user_labeled_frame_inds =
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, ...]
user_labeled_frames =
[LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), LabeledFrame(video=H...stances=1), ...]
video =
'Traceback (most recent call last):\n  File "/home/bboche/.vscode-server/extensions/ms-python.python-2023.22.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_resolver.py", line 189, in _get_py_dictionary\n    attr = getattr(var, name)\n  File "/home/bboche/mambaforge/envs/sleap/lib/python3.7/site-packages/sleap/io/dataset.py", line 577, in video\n    "Labels.video can only be used when there is only a single video saved "\nValueError: Labels.video can only be used when there is only a single video saved in the labels. Use Labels.videos instead.\n'
videos =
[Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), Video(backend=HDF5Vi...ge=False)), ...]
_Labels__temp_dir =
None
_abc_impl =
<_abc_data object at 0x7f4b6ec5f600>
_cache =
LabelsDataCache(labels=Labels(labeled_frames=404, videos=32, skeletons=1, tracks=0))
_update_containers =
<bound method Labels._update_containers of Labels(labeled_frames=404, videos=32, skeletons=1, tracks=0)>

Meaning, there are quite a few attributes in this object, some containing frames, instances and bounding_boxes. I am just not sure which I'd have to scale and which I shouldn't touch. Do you have a function/ a simpler way for me to use to scale the dataset?
3. There doesn't seem to be a way to scale the data in sleap-label (which is where I created my ground truth and where I created and exported my dataset). I want to avoid to go all the way back to the original videos and scale them down, because I'd have to relable all of the hundreds of instances I have now.

@BenjaminBo BenjaminBo added the bug Something isn't working label Oct 10, 2024
@delaroob
Copy link

Hi!
I think I'm having a similar issue(?).
I wanted to train a top-down model for a multianimal project. Training the centroid model worked with no problem, but then I also seem to have this layer shape issue in the case of the centered model:
Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 64, 64, 3), found shape=(1, 8, 8, 3)

(I also worked with autocrop, tried changing that, scaling the input and it still didn't work.)

Software versions:
SLEAP: 1.3.4
TensorFlow: 2.7.0
Numpy: 1.21.6
Python: 3.7.12
OS: Windows-10-10.0.22621-SP0

...

INFO:sleap.nn.training:Finished trainer set up. [3.2s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
INFO:sleap.nn.training:Finished creating training datasets. [6.2s]
INFO:sleap.nn.training:Starting training loop...
Epoch 1/200
Traceback (most recent call last):
  File "C:\anaconda\envs\sleap\Scripts\sleap-train-script.py", line 33, in <module>
    sys.exit(load_entry_point('sleap==1.3.4', 'console_scripts', 'sleap-train')())
  File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 2014, in main
    trainer.train()
  File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 941, in train
    verbose=2,
  File "C:\anaconda\envs\sleap\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\callbacks.py", line 280, in on_epoch_end
    figure = self.plot_fn()
  File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1346, in <lambda>
    viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
  File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1326, in visualize_example
    preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
  File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 2088, in call
    out = self.keras_model(crops)
ValueError: Exception encountered when calling layer "find_instance_peaks" (type FindInstancePeaks).

Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 64, 64, 3), found shape=(1, 8, 8, 3)

Call arguments received:
  • inputs=tf.Tensor(shape=(1, 64, 64, 3), dtype=float32)
INFO:sleap.nn.callbacks:Closing the reporter controller/context.
INFO:sleap.nn.callbacks:Closing the training controller socket/context.

@eberrigan
Copy link
Contributor

Hi @BenjaminBo,

We are working to fix this. In the mean time, what happens when you only use input scaling for the centroid model?

"""
I want to scale the input data down. I do not want to crop it (which works with an input_scale of 1.0).
"""

The centroid model will always result in cropping around the animal instance. Are the features on your animal very small compared to the animal itself? This would be the only time when the centered-instance model needs input scaling < 1.

Best,

Elizabeth

@eberrigan eberrigan self-assigned this Oct 22, 2024
@BenjaminBo
Copy link
Author

BenjaminBo commented Nov 15, 2024

Hey @eberrigan,

what happens when you only use input scaling for the centroid model?

as I wrote in my bug description:

For the centroid model everything works fine. I can set the input scale to <1.0 and it runs without issues.

But since I can't (at least not that I am aware of) chain the centroid and centered instance models together, so as to have the centorid output be the centered instance input during training, it isn't really helping me for centered instance. I have also tried this in one of the configs:

"heads": {
            "single_instance": null,
            "centroid": {
                "anchor_part": null,
                "sigma": 2.5,
                "output_stride": 2,
                "loss_weight": 1.0,
                "offset_refinement": false
            },
            "centered_instance": {
                "anchor_part": null,
                "part_names": null,
                "sigma": 2.5,
                "output_stride": 4,
                "loss_weight": 1.0,
                "offset_refinement": false
            },
            "multi_instance": null,
            "multi_class_bottomup": null,
            "multi_class_topdown": null
        },

, but that doesn't work either.
If there is a way to train centroid and centered instance together, so that I could just use input scaling < 1 for the centroid model and the centered instance model continues to use that, I would love to know that.


Other than that I wrote the following function, the goal of which it is to scale a labels dataset. I am not sure I understood the whole Labels() structure, but here is my understanding of it with the context that I want to scale the whole dataset by one value (apply input_scaling). The idea here is to use it as a kind of preprocessing step on the dataset, since live doesn't work for me. Notice that I am not trying to overwrite the Labels()-object (labels) that I am reading from, since it doesn't seem to support item assignment. Meaning, what I do in this function is extract data to be scaled and save it in a new Labels()-object (new_labels):

def scale_slp_data(labels_path:Union[str, os.PathLike], scaling_factor:float, show:bool=False, save_path:os.PathLike = None) -> Tuple[slp.Labels, os.PathLike]:
    '''
    :param labels_path:
    :param scaling_factor:
    :param show:
    :returns :
    '''
    if save_path is None:
        # If no other path is given, write the new data to the folder of the old data. Named after original name + scaling factor.
        labels_name = os.path.basename(labels_path)
        save_path = os.path.join(os.path.dirname(labels_path), f"{labels_name}_scaled_{scaling_factor}.pkg.slp")

    labels = slp.load_file(labels_path, detect_videos=False)
    
    def scale_frame(frame:np.ndarray) -> np.ndarray:
        if len(frame.shape) == 4:
            frame = cv2.resize(frame[0,:], (int(frame.shape[2]*scaling_factor), int(frame.shape[1]*scaling_factor)))
        elif len(frame.shape) == 3:
            frame = cv2.resize(frame, (int(frame.shape[1]*scaling_factor), int(frame.shape[0]*scaling_factor)))
        frame = np.expand_dims(frame, axis=0)
        return frame
    
    def scale_video(video:slp.Video) -> np.ndarray:
        frame_idxs = [int(idx_key) for idx_key in video.backend._HDF5Video__original_to_current_frame_idx.keys()]
        frames = video.get_frames(frame_idxs)
        np_video = np.concatenate(list(map(scale_frame, frames)))
        return np_video
    
    new_labeled_frames:List[slp.LabeledFrame] = [] #list where newly instanciated labeled frames will be appended
    video_dict:dict = {} #dict that maps a video's filename to it's data and it's "original_to_current_frame_idx".
    for i, instance in enumerate(labels.all_instances):
        instance_video_name = instance.video.backend.source_video_available.backend.filename #video filename
        #Video
        if  not(instance_video_name in video_dict.keys()):  #only need to scale video if it hasn't been scaled before through another instance. 
                                                            #save it in a dictionary.
            new_video:np.ndarray = None

            new_video = scale_video(instance.video) #scale every frame in instance.video.
            LOGGER.debug(f"{instance_video_name}\n    original shape: {instance.video.shape}\n    scaled shape: {new_video.shape}") #if these don't align, it means that 
            
            new_video = slp.Video.from_numpy(new_video)
            new_video.backend.filename = save_path # filename is the path to the newly generated Labels()-object
            video_dict[instance_video_name] = {
                "video": new_video,
                "original_to_current_frame_idx": instance.video.backend._HDF5Video__original_to_current_frame_idx
            }
        #Instance
        for point in instance._points:#scale all points
            point.x = int(point.x*scaling_factor)
            point.y = int(point.y*scaling_factor)
        new_instance = slp.Instance(skeleton=labels.skeleton,
                                    points=instance._points)
        
        new_frame_idx = video_dict[instance_video_name]["original_to_current_frame_idx"][instance.frame_idx]

        #LabeledFrame
        new_labeled_frames.append(slp.LabeledFrame(video=video_dict[instance_video_name]["video"],
                                                   frame_idx=new_frame_idx,
                                                   instances=[new_instance]))
    #Labels
    new_labels = slp.Labels(labeled_frames=new_labeled_frames)
    #Visualize
    if show:
        visualization_save = os.path.join(os.path.dirname(labels_path), f"show_scaling_{scaling_factor}")
        visualize_sleap.visualize_instances_or_pairs_of(instances=new_labels.all_instances,
                                                        save_path=visualization_save,
                                                        any_info_txt_file=f"scaling of {labels_path}",
                                                        save_as_video=False)
    
    new_labels.save(save_path)

    return new_labels, save_path

This works! It runs through and I am generating an output Labels()-object. (I also only have 1 instance per frame.) I do visualize these instances and their frames in visualize_instances_or_pairs_of and they do look correct.

When trying to load the data though, I get the following issue:

Cannot load file containing pickled data when allow_pickle=False
  File "/home/bboche/repos/cimd-micera/train_sleap.py", line 72, in train_slp_model
    trainer = slp.nn.training.Trainer.from_config(config_mod)
  File "/home/bboche/repos/cimd-micera/main.py", line 83, in topdown_train_and_infer_and_evaluate
    config_modifications=config["topdown"][0]))
  File "/home/bboche/repos/cimd-micera/main.py", line 160, in <module>
    topdown_train_and_infer_and_evaluate()
ValueError: Cannot load file containing pickled data when allow_pickle=False

, which I don't know what to do with.

@BenjaminBo
Copy link
Author

Hey @eberrigan ,
I just realized I didn't answer

Are the features on your animal very small compared to the animal itself?

The answer is yes, they are pretty small. I really think being able to scale my input images down a bit would really help.

As as an addition to my code in my message before and the allow_pickle=False Error:
Is there a way I can create/save my new Labels()-object, so that doesn't happen?

Thank you for your help :)

@gitttt-1234
Copy link
Contributor

gitttt-1234 commented Dec 16, 2024

Hi @BenjaminBo,

Apologies for the delay! We are working on fixing the input_scaling issue for centered-instance model. Also, we wouldn't suggest modifying the .slp file directly (changing the source images).
Here's my analysis on why the error occurs:


1. Error while visualizing the results during training

Thank you for your analysis! The error isn't because of scaling being applied twice.

  • While setting up the visualizer here during training:

    sleap/sleap/nn/training.py

    Lines 1301 to 1306 in 3417a18

    def _setup_visualization(self):
    """Set up visualization pipelines and callbacks."""
    # Create visualization/inference pipelines.
    self.training_viz_pipeline = self.pipeline_builder.make_viz_pipeline(
    self.data_readers.training_labels_reader
    )

    , we initialize the dataset training_viz_ds_iter by calling the make_viz_pipeline which generates samples, without any resizing applied, to visualize the results on the source images.
  • However, for training, we set-up the data pipeline

    sleap/sleap/nn/training.py

    Lines 786 to 791 in 3417a18

    self.training_pipeline = (
    self.pipeline_builder.make_training_pipeline(
    self.data_readers.training_labels_reader
    )
    + key_mapper
    )

    by calling make_training_pipeline() , which resizes the image before getting crops from the full images.
  • When crop_size is set to "auto", we compute the crop_size here:

    sleap/sleap/nn/training.py

    Lines 1248 to 1255 in 3417a18

    # Compute crop size that is divisible by max stride
    self.config.data.instance_cropping.crop_size = sleap.nn.data.instance_cropping.find_instance_crop_size(
    self.data_readers.training_labels,
    padding=self.config.data.instance_cropping.crop_size_detection_padding,
    maximum_stride=self.model.maximum_stride,
    input_scaling=self.config.data.preprocessing.input_scaling,
    min_crop_size=self.config.data.instance_cropping.crop_size,
    )

    , which is 1536*1536 in your case, by considering the scaling to be applied to the image. Thus, the computed crop size is to be used on the resized image and not on original image. Here's the first problem: using the crop_size on non resized images in the visualizer pipeline.
  • The next issue is with this line:

    sleap/sleap/nn/training.py

    Lines 1315 to 1322 in 3417a18

    # Create an instance peak finding layer.
    find_peaks = FindInstancePeaks(
    keras_model=self.keras_model,
    input_scale=self.config.data.preprocessing.input_scaling,
    peak_threshold=0.2,
    refinement="local",
    return_confmaps=True,
    )

    We run inference by setting up the FindInstancePeaks class and pass the input_scale argument to the class. When we call find_peaks(), we resize the images with the input scaling provided, which shouldn't be the case. Because of this, the crop size is now reduced to 614*614.

2. Error during inference after training is done

  • Once the training is complete, we evaluate the model and initialize our Predictor.
    Since the inference is run for a centered instance model after training, we create a centroid ground truth class, which just returns crops from the original images.
  • Here again, we use the crop_size (supposed to be used on resized image) on the original image to generate crops. These crops are then resized in the preprocess() call. and the crop size is now reduced to 614*614.

We've created a PR #2054 to fix both the above errors! Let us know if you have any more questions!

Thanks,

Divya

@gitttt-1234 gitttt-1234 self-assigned this Dec 16, 2024
@eberrigan eberrigan removed their assignment Jan 2, 2025
@MHRosenberg
Copy link

MHRosenberg commented Jan 17, 2025

Hi all, I just discovered this issue. I think I'm struggling with similar problems but currently for the single animal case. I used the approach linked below to attempt to downsize the input images but it occurs to me that the labels need to be downsized as well now. I see that this is being discouraged above. Is there a timeline for when a streamlined in-house SLEAP fix is planned to be released? This is currently bottlenecking my usage of the library.

Here's a link to my approach: #2085 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024-hackathon bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants