This page contains a small dataset for the task of recovering structured precise nutritional information from an image of a product.
This dataset contains 633 images.
The images are natural close-up photographs of one side of food products with the nutritional facts box clearly visible. All of them are of spanish products.
All images have been labeled by hand with the ground truth value of 8 nutritional values: energy (kJ), energy (kcal), fat (g), saturated fat (g), carbohydrates (g), sugar (g), protein (g) and salt (g). Other nutritional values may be present in the annotations but we didn't take them into account, as they did not appear in the nutrition facts chart for most products. As all the products are spanish and the charts in the images are in spanish, the ground truth is provided in spanish. The correspondence in english is the following:
- energía_kj = energy (kJ)
- energía_kcal = energy (kcal)
- grasa = fat
- saturada = saturated fat
- hidratos = carbohydrates
- azúcar = sugar
- proteínas = protein
- sal = salt
These annotations are provided in the following way: one csv
file per image contains all the information. This file may contain two or three columns depending on how much information appears in the product. The first one is always a list of the nutritional values that appear in the image. The second one contains the quantity of those fields per 100g of product. The third one, if present, contains the quantity of those fields per serving. This column isn't always present in the product labeling, so it is provided when present.
energía_kj | 802 | 3246 |
energía_kcal | 190 | 770 |
grasa | 4.5 | 18 |
saturada | 2.7 | 11 |
hidratos | 28 | 112 |
azúcar | 3.4 | 14 |
proteínas | 8.5 | 34 |
sal | 1.2 | 5.1 |
fibra | 2.1 | 8.6 |
You can download the images from here and the ground truth from here
The target of this task is to obtain the above mentioned 8 nutritional information values per 100g from the image itself. To that purpose, we used Google's transcription engine Tesseract and our nutrition facts chart detector. Several preprocessing and post processing algorithms are also applied to finally get the information structured and as correct as possible. As a metric we use the number of errors made in each image, obtaining the following results.
We also calculated the accuracy per nutritional value. In the last table, we show the number of errors per each one of them.
Number of errors | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total | |
Number of transcriptions | 144 (23%) | 95 (15%) | 105 (17%) | 71 (11%) | 62 (10%) | 58 (9%) | 46 (7%) | 18 (3%) | 34 (5%) | 633 |
Energy (kJ) | Energy (kcal) | Fat (g) | Saturated fat (g) | Carbohydrates (g) | Sugar (g) | Protein (g) | Salt (g) | |
---|---|---|---|---|---|---|---|---|
Number of errors | 137 (22%) | 139 (22%) | 226 (36%) | 232 (37%) | 225 (36%) | 227 (36%) | 285 (45%) | 259 (41%) |