Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Found multiple skeletons after running inference #2025

Open
Desvino opened this issue Nov 19, 2024 · 6 comments
Open

Found multiple skeletons after running inference #2025

Desvino opened this issue Nov 19, 2024 · 6 comments
Labels
2024-hackathon bug Something isn't working

Comments

@Desvino
Copy link

Desvino commented Nov 19, 2024

Hi! Thanks for your help!

Bug description

We first encountered this bug when we tried to use relabelled instances to run training.

When performing a single animal project, it turned to

ValueError: Labels.skeleton can only be used when there is only a single skeleton saved in the labels. Use Labels.skeletons instead. 

But it could run when we restarted Sleap.

However, when performing a multiple animal top-down project, it turned to

Traceback (most recent call last):
  File "C:\Users\MiaoLab-Guest\anaconda3\envs\sleap\Scripts\sleap-train-script.py", line 33, in <module>
    sys.exit(load_entry_point('sleap==1.3.4', 'console_scripts', 'sleap-train')())
  File "C:\Users\MiaoLab-Guest\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 2014, in main
    trainer.train()
  File "C:\Users\MiaoLab-Guest\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 924, in train
    self.setup()
  File "C:\Users\MiaoLab-Guest\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 910, in setup
    self._setup_model()
  File "C:\Users\MiaoLab-Guest\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 727, in _setup_model
    base_example = next(iter(base_pipeline.make_dataset()))
  File "C:\Users\MiaoLab-Guest\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 800, in __next__
    return self._next_internal()
  File "C:\Users\MiaoLab-Guest\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 786, in _next_internal
    output_shapes=self._flat_output_shapes)
  File "C:\Users\MiaoLab-Guest\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2844, in iterator_get_next
    _ops.raise_from_not_ok_status(e, name)
  File "C:\Users\MiaoLab-Guest\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\ops.py", line 7107, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = 2 is not in [0, 1)
         [[{{node GatherV2}}]] [Op:IteratorGetNext]

In this case, it couldn't run when we restarted Sleap.
I think it is a similar issue to #713 and #1090.
When we tried methods you provided in #713 , we found that there were 3 skeletons in our project. These 3 skeletons were totally the same. After deleting the other skeletons, we could run training.
So I think multiple skeletons appeared as followed:

  • Add videos into new project
  • Run inference using previous model
  • Edit instances
  • Run training
  • See error

We want to know how can we avoid this bug.Thank you very much!

Your personal set up

  • OS: Windows 10
  • Version(s): Sleap v1.3.4
@Desvino Desvino added the bug Something isn't working label Nov 19, 2024
@gitttt-1234
Copy link
Contributor

gitttt-1234 commented Dec 17, 2024

Hi @Desvino !

Apologies for the delay! I tried the steps that you mentioned, but I'm not able to replicate this error. Here's what I did:

  • Added videos into new project
  • Ran inference with an existing checkpoint
  • Modified the predicted instances
  • Ran training

Could you please let me know if you added any skeletons (used the Load Skeleton option) to the project before running inference using previous model?

Thanks,

Divya

@Desvino
Copy link
Author

Desvino commented Dec 18, 2024

Hi @gitttt-1234 !
Thanks for your kindly reply!
I'm sure we didn't use the Load Skeleton option to the project. Recently we found that if we added 3 videos into new project, at last there would be 3 skeletons. While if we added 11 videos into new project, at last there would be 11 skeletons. I doubt if there were something wrong with our trained model.

Thanks,
Desvino

@gitttt-1234
Copy link
Contributor

gitttt-1234 commented Dec 18, 2024

Hi @Desvino,

Thank you for getting back! Sorry, I'm still not able to replicate the error at my side. Let's try to isolate where the duplicate skeletons are created. Could you please confirm if you were able to see multiple skeletons right after running inference with an existing model (in the .slp file generated after running inference)? It would be great if you could share your training_config.json for the existing ckpt, the model folder and a couple of your videos on which you ran inference to debug better on our end. You can upload the files here.

Thanks,

Divya

@Desvino
Copy link
Author

Desvino commented Dec 19, 2024

Hi @gitttt-1234 !

I have checked that I found multiple skeletons right after running inference with an existing mode, without any other steps. And I have uploaded the files you mentioned.
Thanks a lot!

Desvino

@gitttt-1234
Copy link
Contributor

Hi @Desvino!

Thank you for sending the data, I was able to replicate the issue. Thanks @roomrys for your detailed analysis here.
We're currently working on fixing this in #2075! I'll update once it's merged in and then you can install sleap from source using the below steps, which would resolve the issue:

Thanks,

Divya

@Desvino
Copy link
Author

Desvino commented Dec 24, 2024

Hi @gitttt-1234 !
Thank you for your quick response and for looking into the issue. I really appreciate your efforts to fix the bug and will look forward to the update once it's resolved!

Best regards,
Desvino

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024-hackathon bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants