Nutrition facts chart transcription

This page contains a small dataset for the task of recovering structured precise nutritional information from an image of a product.

Dataset description

Size:

This dataset contains 633 images.

Data description:

The images are natural close-up photographs of one side of food products with the nutritional facts box clearly visible. All of them are of spanish products.

Examples

Ground truth description

All images have been labeled by hand with the ground truth value of 8 nutritional values: energy (kJ), energy (kcal), fat (g), saturated fat (g), carbohydrates (g), sugar (g), protein (g) and salt (g). Other nutritional values may be present in the annotations but we didn't take them into account, as they did not appear in the nutrition facts chart for most products. As all the products are spanish and the charts in the images are in spanish, the ground truth is provided in spanish. The correspondence in english is the following:

energía_kj = energy (kJ)
energía_kcal = energy (kcal)
grasa = fat
saturada = saturated fat
hidratos = carbohydrates
azúcar = sugar
proteínas = protein
sal = salt

These annotations are provided in the following way: one csv file per image contains all the information. This file may contain two or three columns depending on how much information appears in the product. The first one is always a list of the nutritional values that appear in the image. The second one contains the quantity of those fields per 100g of product. The third one, if present, contains the quantity of those fields per serving. This column isn't always present in the product labeling, so it is provided when present.

Example


energía_kj	802	3246
energía_kcal	190	770
grasa	4.5	18
saturada	2.7	11
hidratos	28	112
azúcar	3.4	14
proteínas	8.5	34
sal	1.2	5.1
fibra	2.1	8.6

Data download

You can download the images from here and the ground truth from here

Data extraction task

The target of this task is to obtain the above mentioned 8 nutritional information values per 100g from the image itself. To that purpose, we used Google's transcription engine Tesseract and our nutrition facts chart detector. Several preprocessing and post processing algorithms are also applied to finally get the information structured and as correct as possible. As a metric we use the number of errors made in each image, obtaining the following results.

We also calculated the accuracy per nutritional value. In the last table, we show the number of errors per each one of them.

	Number of errors
	0	1	2	3	4	5	6	7	8	Total
Number of transcriptions	144 (23%)	95 (15%)	105 (17%)	71 (11%)	62 (10%)	58 (9%)	46 (7%)	18 (3%)	34 (5%)	633

	Energy (kJ)	Energy (kcal)	Fat (g)	Saturated fat (g)	Carbohydrates (g)	Sugar (g)	Protein (g)	Salt (g)
Number of errors	137 (22%)	139 (22%)	226 (36%)	232 (37%)	225 (36%)	227 (36%)	285 (45%)	259 (41%)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
examples		examples
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nutrition facts chart transcription

Dataset description

Size:

Data description:

Examples

Ground truth description

Example

Data download

Data extraction task

About

Releases

Packages

License

jofuelo/nutrition_facts_chart_transcription

Folders and files

Latest commit

History

Repository files navigation

Nutrition facts chart transcription

Dataset description

Size:

Data description:

Examples

Ground truth description

Example

Data download

Data extraction task

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages