[BUG] UnicodeDecodeError when running add_yolo_labels() #1497

valentindbdg · 2021-12-30T09:35:37Z

System information

Google Colab:
FiftyOne installed from pip:

Commands to reproduce

dataset = fo.Dataset.from_dir(
    dataset_dir="/content/drive/MyDrive/yolodataset",
    dataset_type=fo.types.YOLOv4Dataset,
    label_field="ground_truth",
)

import fiftyone.utils.yolo as fouy
fouy.add_yolo_labels(
    sample_collection=dataset,
    label_field="predictions", 
    labels_path= "/content/DATASET/validation/data",
)

Describe the problem

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-56-0017c2b745f4> in <module>()
      3     sample_collection=dataset,
      4     label_field="predictions",
----> 5     labels_path= "/content/DATASET/validation/data",
      6 )

4 frames
/usr/lib/python3.7/codecs.py in decode(self, input, final)
    320         # decode input (taking the buffer into account)
    321         data = self.buffer + input
--> 322         (result, consumed) = self._buffer_decode(data, self.errors, final)
    323         # keep undecoded input until the next call
    324         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I am having this problem when trying to add yolo detections. Any known solution?

The text was updated successfully, but these errors were encountered:

benjaminpkane · 2021-12-30T15:19:06Z

Your TXT files can't be decoded. I would double check what is in them. They should only contain numbers, per the format

valentindbdg · 2022-01-03T15:14:35Z

Thank you @benjaminpkane.
All my files contain numbers, per the format, and it works when I load and visualize the dataset using the following command:

import fiftyone as fo

name = "pred-dataset"
dataset_dir = "/content/DATASET/validation"

dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.YOLOv4Dataset,
    name=name,
    label_field = "predictions",
)

session = fo.launch_app(dataset)

However I'd like to add this dataset (predictions) to another one (ground truth). I get this error when trying using add_yolo_labels() . Any other way I can do that using the code above?

benjaminpkane · 2022-01-03T15:28:04Z

Interesting. We will try to reproduce.

valentindbdg · 2022-01-03T15:53:31Z

Thank you @benjaminpkane
I can add the ground_truth to the predictions, but not the other way around. Therefore I cannot visualize all the images in my dataset that are part of ground_truth but do not have predictions on it.

If there is another solution to display the images while loading predictions:


dataset = fo.Dataset.from_dir(
    dataset_dir= "/content/DATASET/validation",
    dataset_type=fo.types.YOLOv4Dataset,
    label_field = "predictions",
)

then adding the ground truth but with ALL the images of ground_truth (not only the ones with predicitons):

fouy.add_yolo_labels(
    sample_collection=dataset,
    label_field="ground_truth", 
    labels_path= "/content/yolodataset/data",
)

session = fo.launch_app(dataset)

Then it could solve the problem I have. (both filepath contain all the images in the dataset)

benjaminpkane · 2022-01-03T16:05:18Z

Using merge might be alternative solution for you.

import fiftyone as fo

pred = fo.Dataset.from_dir(...)
gt = fo.Dataset.from_dir(...)

both = fo.Dataset("both")

both.merge(pred)
both.merge(gt)

ehofesmann · 2022-01-03T16:27:05Z

@valentindbdg There may be an issue with the images in /content/DATASET/validation/data being read instead of the TXT files. To test this, could you try to create a new directory and only copy over the TXT files from /content/DATASET/validation/data, then call add_yolo_labels() and set labels_path to the new directory?

Out of curiosity, what extension do the images have in your dataset?

brimoor · 2022-01-03T16:33:53Z

@valentindbdg I see the problem. In this syntax:

fouy.add_yolo_labels(
    sample_collection=dataset,
    label_field="predictions", 
    labels_path="/content/yolodataset/data",
)

the labels_path argument of add_yolo_labels() assumes that every file is a TXT file, but you have both images and TXT files in that directory.

There are a variety of ways to resolve this.

Use the alternate add_yolo_labels() syntax where labels_path is a dict mapping image filenames to TXT filepaths:

import os
import eta.core.utils.as etau
import fiftyone.utils.yolo as fouy

labels_path = "/content/yolodataset/data"
labels_dict = {
    os.path.splitext(os.path.basename(p))[0] + ".jpg": p  # assumes your images are JPG
    for p in etau.list_files(labels_path, abs_paths=True, recursive=True)
    if p.endswith(".txt")
}

fouy.add_yolo_labels(
    sample_collection=dataset,
    label_field="predictions", 
    labels_path=labels_dict,
)

Re-organize your files like this:

/path/to/images
    image1.ext
    image2.ext
    ...

/path/to/ground_truth
    image1.txt
    image2.txt
    ...

/path/to/predictions
    image1.txt
    image2.txt
    ...

and then load everything like this:

import fiftyone as fo
import fiftyone.utils.yolo as fouy

dataset = fo.Dataset.from_dir(
    data_path= "/path/to/images",
    labels_path="/path/to/ground_truth",
    dataset_type=fo.types.YOLOv4Dataset,
    label_field="ground_truth",
)

fouy.add_yolo_labels(dataset, "predictions", "/path/to/predictions")

Load the two datasets separately and use merge_samples() per Ben's suggestion.

valentindbdg · 2022-01-03T17:32:49Z

Thank you @benjaminpkane and @brimoor !
I tried solution 2 and it worked well.

I also have confidence stored for my predictions, how can I input them to my dataset in Fiftyone so I can visualize them too? Should I open a new issue for this?

Note: I previously had them stored in a .csv file next to each prediction before conversion to yolo format:

frame | prediction_class | confidence | left_x | top_y | width | height
000000086755.jpg | person | 0.7 | 320 | 211 | 76 | 98
000000441468.jpg | person | 0.54 | 240 | 388 | 122 | 198
000000441468.jpg | person | 0.57 | 373 | 124 | 11 | 45

brimoor · 2022-01-03T17:38:02Z

Support for loading confidence from YOLO TXT files was just added in #1465. It hasn't been released yet but you could use it via a source install.

However, since your data isn't natively stored in YOLO format but instead a CSV format you devised, I would instead recommend one of these approaches:

Just write a simple Python loop that constructs your dataset from your CSV format
Formalize 1 by writing a custom DatasetImporter that directly loads your CSV format. Here's an example of that

valentindbdg · 2022-01-04T04:19:24Z

Thank you @brimoor
I did a source install in the google colab:
https://github.com/voxel51/fiftyone#source-installs-in-google-colab

Then added a confidence column in the TXT files following this format:
<target> <x-center> <y-center> <width> <height> <confidence>

it worked well.

valentindbdg added the bug Bug fixes label Dec 30, 2021

brimoor changed the title ~~[QUESTION] UnicodeDecodeError when trying to add_yolo_labels~~ [BUG] UnicodeDecodeError when running add_yolo_labels() Dec 30, 2021

brimoor closed this as completed Jan 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] UnicodeDecodeError when running add_yolo_labels() #1497

[BUG] UnicodeDecodeError when running add_yolo_labels() #1497

valentindbdg commented Dec 30, 2021 •

edited

Loading

benjaminpkane commented Dec 30, 2021

valentindbdg commented Jan 3, 2022 •

edited

Loading

benjaminpkane commented Jan 3, 2022

valentindbdg commented Jan 3, 2022 •

edited

Loading

benjaminpkane commented Jan 3, 2022 •

edited

Loading

ehofesmann commented Jan 3, 2022

brimoor commented Jan 3, 2022 •

edited

Loading

valentindbdg commented Jan 3, 2022

brimoor commented Jan 3, 2022

valentindbdg commented Jan 4, 2022 •

edited

Loading

[BUG] UnicodeDecodeError when running add_yolo_labels() #1497

[BUG] UnicodeDecodeError when running add_yolo_labels() #1497

Comments

valentindbdg commented Dec 30, 2021 • edited Loading

System information

Commands to reproduce

Describe the problem

benjaminpkane commented Dec 30, 2021

valentindbdg commented Jan 3, 2022 • edited Loading

benjaminpkane commented Jan 3, 2022

valentindbdg commented Jan 3, 2022 • edited Loading

benjaminpkane commented Jan 3, 2022 • edited Loading

ehofesmann commented Jan 3, 2022

brimoor commented Jan 3, 2022 • edited Loading

valentindbdg commented Jan 3, 2022

brimoor commented Jan 3, 2022

valentindbdg commented Jan 4, 2022 • edited Loading

valentindbdg commented Dec 30, 2021 •

edited

Loading

valentindbdg commented Jan 3, 2022 •

edited

Loading

valentindbdg commented Jan 3, 2022 •

edited

Loading

benjaminpkane commented Jan 3, 2022 •

edited

Loading

brimoor commented Jan 3, 2022 •

edited

Loading

valentindbdg commented Jan 4, 2022 •

edited

Loading