Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service_test meaning #20

Open
ericluo04 opened this issue Feb 15, 2022 · 3 comments
Open

service_test meaning #20

ericluo04 opened this issue Feb 15, 2022 · 3 comments

Comments

@ericluo04
Copy link

ericluo04 commented Feb 15, 2022

In the labeled data for training and validation, I noticed a service_test column. I might have missed it, but I couldn't find information on this in the paper or the repository. What is this variable referencing? Is it the data used for testing the "classification accuracy of commercial services"?

Thanks!

@andsteing
Copy link

andsteing commented Feb 24, 2022

I noticed that the number of examples within every age groups are balanced wrt gender and race for the images that have service_test=True (but not for those that have service_test=False)

For example, within the training dataset:

| service_test   | age          |   Female |   Male |
|:---------------|:-------------|---------:|-------:|
| False          | 0-2          |      506 |    908 |
| False          | 10-19        |     2583 |   1743 |
| False          | 20-29        |     6809 |   3773 |
| False          | 3-9          |     2819 |   3902 |
| False          | 30-39        |     4228 |   6624 |
| False          | 40-49        |     1810 |   4398 |
| False          | 50-59        |     1125 |   2970 |
| False          | 60-69        |      573 |   1202 |
| False          | more than 70 |      243 |    276 |
| True           | 0-2          |      186 |    192 |
| True           | 10-19        |     2414 |   2363 |
| True           | 20-29        |     7454 |   7562 |
| True           | 3-9          |     1820 |   1867 |
| True           | 30-39        |     4192 |   4206 |
| True           | 40-49        |     2264 |   2272 |
| True           | 50-59        |     1069 |   1064 |
| True           | 60-69        |      467 |    537 |
| True           | more than 70 |      196 |    127 |
| service_test   | age          |   Black |   East Asian |   Indian |   Latino_Hispanic |   Middle Eastern |   Southeast Asian |   White |
|:---------------|:-------------|--------:|-------------:|---------:|------------------:|-----------------:|------------------:|--------:|
| False          | 0-2          |     221 |          358 |      116 |               135 |               76 |               164 |     344 |
| False          | 10-19        |     854 |          474 |      848 |               941 |               42 |               688 |     479 |
| False          | 20-29        |    1013 |         2616 |      926 |              1387 |              204 |              1398 |    3038 |
| False          | 3-9          |    1540 |         1215 |      991 |              1047 |              207 |               989 |     732 |
| False          | 30-39        |    1437 |         1162 |     1579 |              1733 |             1195 |               810 |    2936 |
| False          | 40-49        |     809 |          300 |      992 |              1306 |              876 |               410 |    1515 |
| False          | 50-59        |     452 |          219 |      680 |               819 |              536 |               321 |    1068 |
| False          | 60-69        |     115 |          132 |      321 |               201 |              293 |               188 |     525 |
| False          | more than 70 |      38 |           39 |      112 |                42 |               73 |               102 |     113 |
| True           | 0-2          |      58 |           50 |       53 |                54 |               58 |                50 |      55 |
| True           | 10-19        |     664 |          702 |      681 |               668 |              693 |               700 |     669 |
| True           | 20-29        |    2158 |         2149 |     2151 |              2144 |             2119 |              2112 |    2183 |
| True           | 3-9          |     532 |          521 |      539 |               542 |              514 |               534 |     505 |
| True           | 30-39        |    1182 |         1220 |     1192 |              1199 |             1203 |              1199 |    1203 |
| True           | 40-49        |     650 |          639 |      643 |               651 |              641 |               648 |     664 |
| True           | 50-59        |     312 |          305 |      304 |               305 |              301 |               295 |     311 |
| True           | 60-69        |     164 |          138 |      136 |               154 |              142 |               119 |     151 |
| True           | more than 70 |      34 |           48 |       55 |                39 |               43 |                68 |      36 |
| service_test   | age          |   Female/Black |   Female/East Asian |   Female/Indian |   Female/Latino_Hispanic |   Female/Middle Eastern |   Female/Southeast Asian |   Female/White |   Male/Black |   Male/East Asian |   Male/Indian |   Male/Latino_Hispanic |   Male/Middle Eastern |   Male/Southeast Asian |   Male/White |
|:---------------|:-------------|---------------:|--------------------:|----------------:|-------------------------:|------------------------:|-------------------------:|---------------:|-------------:|------------------:|--------------:|-----------------------:|----------------------:|-----------------------:|-------------:|
| False          | 0-2          |             62 |                 122 |              51 |                       80 |                       0 |                       51 |            140 |          159 |               236 |            65 |                     55 |                    76 |                    113 |          204 |
| False          | 10-19        |            512 |                 275 |             551 |                      575 |                       0 |                      364 |            306 |          342 |               199 |           297 |                    366 |                    42 |                    324 |          173 |
| False          | 20-29        |            699 |                1820 |             648 |                      946 |                       0 |                      823 |           1873 |          314 |               796 |           278 |                    441 |                   204 |                    575 |         1165 |
| False          | 3-9          |            584 |                 414 |             436 |                      570 |                       0 |                      430 |            385 |          956 |               801 |           555 |                    477 |                   207 |                    559 |          347 |
| False          | 30-39        |            733 |                 509 |             577 |                      778 |                       0 |                      328 |           1303 |          704 |               653 |          1002 |                    955 |                  1195 |                    482 |         1633 |
| False          | 40-49        |            353 |                  26 |             343 |                      516 |                       0 |                       88 |            484 |          456 |               274 |           649 |                    790 |                   876 |                    322 |         1031 |
| False          | 50-59        |            217 |                  38 |             237 |                      276 |                       0 |                      101 |            256 |          235 |               181 |           443 |                    543 |                   536 |                    220 |          812 |
| False          | 60-69        |             85 |                  32 |             158 |                       90 |                       0 |                       88 |            120 |           30 |               100 |           163 |                    111 |                   293 |                    100 |          405 |
| False          | more than 70 |             25 |                  13 |              72 |                       26 |                       0 |                       57 |             50 |           13 |                26 |            40 |                     16 |                    73 |                     45 |           63 |
| True           | 0-2          |             29 |                  24 |              27 |                       27 |                      28 |                       25 |             26 |           29 |                26 |            26 |                     27 |                    30 |                     25 |           29 |
| True           | 10-19        |            338 |                 357 |             339 |                      343 |                     343 |                      354 |            340 |          326 |               345 |           342 |                    325 |                   350 |                    346 |          329 |
| True           | 20-29        |           1070 |                1082 |            1056 |                     1058 |                    1041 |                     1048 |           1099 |         1088 |              1067 |          1095 |                   1086 |                  1078 |                   1064 |         1084 |
| True           | 3-9          |            258 |                 261 |             259 |                      267 |                     248 |                      272 |            255 |          274 |               260 |           280 |                    275 |                   266 |                    262 |          250 |
| True           | 30-39        |            590 |                 606 |             587 |                      597 |                     616 |                      588 |            608 |          592 |               614 |           605 |                    602 |                   587 |                    611 |          595 |
| True           | 40-49        |            329 |                 322 |             316 |                      316 |                     327 |                      324 |            330 |          321 |               317 |           327 |                    335 |                   314 |                    324 |          334 |
| True           | 50-59        |            154 |                 148 |             151 |                      150 |                     155 |                      157 |            154 |          158 |               157 |           153 |                    155 |                   146 |                    138 |          157 |
| True           | 60-69        |             80 |                  62 |              66 |                       71 |                      67 |                       48 |             73 |           84 |                76 |            70 |                     83 |                    75 |                     71 |           78 |
| True           | more than 70 |             19 |                  30 |              35 |                       29 |                      22 |                       37 |             24 |           15 |                18 |            20 |                     10 |                    21 |                     31 |           12 |

@aaronsnoswell
Copy link

+1 for this issue - I would also like to know what the service_test column means. Is it just an indicator that service_test == True are the balanced data?

@zoltanfarkasgis
Copy link

There are great statistics in the previous posts. That made me look into the label statistics too.

If filtered by service_test == True then both Train and Test datasets are pretty balanced for race and gender (and all combinations between the two categories). Without this filter the dataset is imbalanced towards White and Male.

So, if we use Race and Gender as combined filter (i.e. White vs Black or White+Male vs Black+Female) then the distribution within the age groups is also fairly balanced (though the population gets very small in some sub categories and the balance in such small numbers can be off). Good job by the authors.
The 2nd post in this thread nicely demonstrates this.

The balance between age groups is more tricky: the population in the full dataset is strongly imbalanced for age, the 20-29 and 30-39 groups together represent more than 50% of the population in the dataset, 20-29 group alone is more than 30% (if all age groups are equal it would give 11%).
Arguably, there was probably no objective for the authors to balance between age groups (it was deemed too difficult or to constraining to get similar number of images for i.e. the 0-3 and 20-23 groups especially in ALL race and gender combinations). This is reasonable.

Though, this raises a question for training: as race and gender are balanced but age is imbalanced the weighting (i.e. in loss calculation) might differ. Unfortunately the paper does not give any such detail on the training recipe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants