Skip to content

A dataset and preliminar results for the task of obtaining structured nutritional information from an image

License

Notifications You must be signed in to change notification settings

jofuelo/nutrition_facts_chart_transcription

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

Nutrition facts chart transcription

This page contains a small dataset for the task of recovering structured precise nutritional information from an image of a product.

Dataset description

Size:

This dataset contains 633 images.

Data description:

The images are natural close-up photographs of one side of food products with the nutritional facts box clearly visible. All of them are of spanish products.

Examples

Example1 Example2

Ground truth description

All images have been labeled by hand with the ground truth value of 8 nutritional values: energy (kJ), energy (kcal), fat (g), saturated fat (g), carbohydrates (g), sugar (g), protein (g) and salt (g). Other nutritional values may be present in the annotations but we didn't take them into account, as they did not appear in the nutrition facts chart for most products. As all the products are spanish and the charts in the images are in spanish, the ground truth is provided in spanish. The correspondence in english is the following:

  • energía_kj = energy (kJ)
  • energía_kcal = energy (kcal)
  • grasa = fat
  • saturada = saturated fat
  • hidratos = carbohydrates
  • azúcar = sugar
  • proteínas = protein
  • sal = salt

These annotations are provided in the following way: one csv file per image contains all the information. This file may contain two or three columns depending on how much information appears in the product. The first one is always a list of the nutritional values that appear in the image. The second one contains the quantity of those fields per 100g of product. The third one, if present, contains the quantity of those fields per serving. This column isn't always present in the product labeling, so it is provided when present.

Example

energía_kj 802 3246
energía_kcal 190 770
grasa 4.5 18
saturada 2.7 11
hidratos 28 112
azúcar 3.4 14
proteínas 8.5 34
sal 1.2 5.1
fibra 2.1 8.6

Data download

You can download the images from here and the ground truth from here

Data extraction task

The target of this task is to obtain the above mentioned 8 nutritional information values per 100g from the image itself. To that purpose, we used Google's transcription engine Tesseract and our nutrition facts chart detector. Several preprocessing and post processing algorithms are also applied to finally get the information structured and as correct as possible. As a metric we use the number of errors made in each image, obtaining the following results.

We also calculated the accuracy per nutritional value. In the last table, we show the number of errors per each one of them.

Number of errors
0 1 2 3 4 5 6 7 8 Total
Number of transcriptions 144 (23%) 95 (15%) 105 (17%) 71 (11%) 62 (10%) 58 (9%) 46 (7%) 18 (3%) 34 (5%) 633
Energy (kJ) Energy (kcal) Fat (g) Saturated fat (g) Carbohydrates (g) Sugar (g) Protein (g) Salt (g)
Number of errors 137 (22%) 139 (22%) 226 (36%) 232 (37%) 225 (36%) 227 (36%) 285 (45%) 259 (41%)

About

A dataset and preliminar results for the task of obtaining structured nutritional information from an image

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published