Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] UnicodeDecodeError when running add_yolo_labels() #1497

Closed
valentindbdg opened this issue Dec 30, 2021 · 10 comments
Closed

[BUG] UnicodeDecodeError when running add_yolo_labels() #1497

valentindbdg opened this issue Dec 30, 2021 · 10 comments
Labels
bug Bug fixes

Comments

@valentindbdg
Copy link

valentindbdg commented Dec 30, 2021

System information

  • Google Colab:
  • FiftyOne installed from pip:

Commands to reproduce

dataset = fo.Dataset.from_dir(
    dataset_dir="/content/drive/MyDrive/yolodataset",
    dataset_type=fo.types.YOLOv4Dataset,
    label_field="ground_truth",
)
import fiftyone.utils.yolo as fouy
fouy.add_yolo_labels(
    sample_collection=dataset,
    label_field="predictions", 
    labels_path= "/content/DATASET/validation/data",
)

Describe the problem

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-56-0017c2b745f4> in <module>()
      3     sample_collection=dataset,
      4     label_field="predictions",
----> 5     labels_path= "/content/DATASET/validation/data",
      6 )

4 frames
/usr/lib/python3.7/codecs.py in decode(self, input, final)
    320         # decode input (taking the buffer into account)
    321         data = self.buffer + input
--> 322         (result, consumed) = self._buffer_decode(data, self.errors, final)
    323         # keep undecoded input until the next call
    324         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I am having this problem when trying to add yolo detections. Any known solution?

@valentindbdg valentindbdg added the bug Bug fixes label Dec 30, 2021
@benjaminpkane
Copy link
Contributor

Your TXT files can't be decoded. I would double check what is in them. They should only contain numbers, per the format

@brimoor brimoor changed the title [QUESTION] UnicodeDecodeError when trying to add_yolo_labels [BUG] UnicodeDecodeError when running add_yolo_labels() Dec 30, 2021
@valentindbdg
Copy link
Author

valentindbdg commented Jan 3, 2022

Thank you @benjaminpkane.
All my files contain numbers, per the format, and it works when I load and visualize the dataset using the following command:

import fiftyone as fo

name = "pred-dataset"
dataset_dir = "/content/DATASET/validation"

dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.YOLOv4Dataset,
    name=name,
    label_field = "predictions",
)

session = fo.launch_app(dataset)

However I'd like to add this dataset (predictions) to another one (ground truth). I get this error when trying using add_yolo_labels() . Any other way I can do that using the code above?

@benjaminpkane
Copy link
Contributor

Interesting. We will try to reproduce.

@valentindbdg
Copy link
Author

valentindbdg commented Jan 3, 2022

Thank you @benjaminpkane
I can add the ground_truth to the predictions, but not the other way around. Therefore I cannot visualize all the images in my dataset that are part of ground_truth but do not have predictions on it.

If there is another solution to display the images while loading predictions:


dataset = fo.Dataset.from_dir(
    dataset_dir= "/content/DATASET/validation",
    dataset_type=fo.types.YOLOv4Dataset,
    label_field = "predictions",
)

then adding the ground truth but with ALL the images of ground_truth (not only the ones with predicitons):

fouy.add_yolo_labels(
    sample_collection=dataset,
    label_field="ground_truth", 
    labels_path= "/content/yolodataset/data",
)

session = fo.launch_app(dataset)

Then it could solve the problem I have. (both filepath contain all the images in the dataset)

@benjaminpkane
Copy link
Contributor

benjaminpkane commented Jan 3, 2022

Using merge might be alternative solution for you.

import fiftyone as fo

pred = fo.Dataset.from_dir(...)
gt = fo.Dataset.from_dir(...)

both = fo.Dataset("both")

both.merge(pred)
both.merge(gt)

@ehofesmann
Copy link
Member

@valentindbdg There may be an issue with the images in /content/DATASET/validation/data being read instead of the TXT files. To test this, could you try to create a new directory and only copy over the TXT files from /content/DATASET/validation/data, then call add_yolo_labels() and set labels_path to the new directory?

Out of curiosity, what extension do the images have in your dataset?

@brimoor
Copy link
Contributor

brimoor commented Jan 3, 2022

@valentindbdg I see the problem. In this syntax:

fouy.add_yolo_labels(
    sample_collection=dataset,
    label_field="predictions", 
    labels_path="/content/yolodataset/data",
)

the labels_path argument of add_yolo_labels() assumes that every file is a TXT file, but you have both images and TXT files in that directory.

There are a variety of ways to resolve this.

  1. Use the alternate add_yolo_labels() syntax where labels_path is a dict mapping image filenames to TXT filepaths:
import os
import eta.core.utils.as etau
import fiftyone.utils.yolo as fouy

labels_path = "/content/yolodataset/data"
labels_dict = {
    os.path.splitext(os.path.basename(p))[0] + ".jpg": p  # assumes your images are JPG
    for p in etau.list_files(labels_path, abs_paths=True, recursive=True)
    if p.endswith(".txt")
}

fouy.add_yolo_labels(
    sample_collection=dataset,
    label_field="predictions", 
    labels_path=labels_dict,
)
  1. Re-organize your files like this:
/path/to/images
    image1.ext
    image2.ext
    ...

/path/to/ground_truth
    image1.txt
    image2.txt
    ...

/path/to/predictions
    image1.txt
    image2.txt
    ...

and then load everything like this:

import fiftyone as fo
import fiftyone.utils.yolo as fouy

dataset = fo.Dataset.from_dir(
    data_path= "/path/to/images",
    labels_path="/path/to/ground_truth",
    dataset_type=fo.types.YOLOv4Dataset,
    label_field="ground_truth",
)

fouy.add_yolo_labels(dataset, "predictions", "/path/to/predictions")
  1. Load the two datasets separately and use merge_samples() per Ben's suggestion.

@valentindbdg
Copy link
Author

Thank you @benjaminpkane and @brimoor !
I tried solution 2 and it worked well.

I also have confidence stored for my predictions, how can I input them to my dataset in Fiftyone so I can visualize them too? Should I open a new issue for this?

Note: I previously had them stored in a .csv file next to each prediction before conversion to yolo format:

frame | prediction_class | confidence | left_x | top_y | width | height
000000086755.jpg | person | 0.7 | 320 | 211 | 76 | 98
000000441468.jpg | person | 0.54 | 240 | 388 | 122 | 198
000000441468.jpg | person | 0.57 | 373 | 124 | 11 | 45

@brimoor
Copy link
Contributor

brimoor commented Jan 3, 2022

Support for loading confidence from YOLO TXT files was just added in #1465. It hasn't been released yet but you could use it via a source install.

However, since your data isn't natively stored in YOLO format but instead a CSV format you devised, I would instead recommend one of these approaches:

  1. Just write a simple Python loop that constructs your dataset from your CSV format
  2. Formalize 1 by writing a custom DatasetImporter that directly loads your CSV format. Here's an example of that

@brimoor brimoor closed this as completed Jan 3, 2022
@valentindbdg
Copy link
Author

valentindbdg commented Jan 4, 2022

Thank you @brimoor
I did a source install in the google colab:
https://github.com/voxel51/fiftyone#source-installs-in-google-colab

Then added a confidence column in the TXT files following this format:
<target> <x-center> <y-center> <width> <height> <confidence>

it worked well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes
Projects
None yet
Development

No branches or pull requests

4 participants