-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine which labels to exclude from Rekognition’s label set #4643
Comments
I have gone through the 3k labels (🥲) and pulled out the labels that I think we should exclude or consider to be excluded. Many of the "questionable" ones I think are fine, but we said (per #4662) that we would review each one so I think they're worth considering. Personally, I think they're okay to keep but open for a blocking objection there. Exclude
Questionable
|
@AetherUnbound I agree with both of your lists but am also willing to contend with anyone else's blocking objections on some of the questionable terms. |
On the excluded list, why are there labels like |
@sarayourfriend Here's the issue! |
Here are my suggestions. For context (because we discussed this process in the IP), I have not looked at Madison's recommendations, though I did read Krystle's comment which mentioned two specific terms in Madison's recommendations. Questionable
Exclude
Notes and observations
|
@AetherUnbound Can you include explanations for the terms you've suggested to exclude and ones you felt were questionable? There is some overlap between our questionable/exclude lists, and it looks like we definitely covered the umambiguously gendered terms in our exclude lists between the two of us (for which I also didn't add an explanation because they are clearly gendered). I'm very curious about some of the others, you recommended far more than I did, in particular those related to babies, childbirth, and hair style/colour. Regarding age: I had adult, teen, and child in the questionable category, but didn't include baby or senior citizen. Baby is, I think, rather unambiguous, and I couldn't think of any culturally sensitive reasons to exclude it like I could for adult, teen, and child. What's was your reasoning? My thought process was similar for senior citizen. For "Ballerina", I didn't include it in my lists because I thought it was gender neutral, but it looks like it's a good candidate like "fireman" for us to transpose to a gender neutral term. Fireman can easily be turned into firefighter, and ballerina could easily be "ballet dancer". I think these are unambiguous gender neutral alternatives to those words. I noted some other possible terms like this but none were as clear as firefighter and ballet dancer. Regarding "barbie": what is the reason to exclude it? It comes from the toys category of labels, I'd expect it to only refer to the barbie style toy. Were you operating from a different assumption? I missed "Exchange Of Vows" in my review, but likewise cannot find a reason on my own to exclude it, so would appreciate an explanation of your thought process there. |
Thanks for providing your own list and notes @sarayourfriend! To answer you and @krysal - my inclusion of the other hair style/color labels were because they seemed to apply to the broader notion of "demographics". I don't feel particularly strongly about them (I was merely surprised to see them as part of the labels), so I'd be comfortable including them and removing them from my exclusions list. As for the age related ones, "Age" was one of the categories we discussed excluding explicitly in the IP. Even though I agree that baby is fairly ambiguous, there are still some cases that might be questionable or mislabeled (would an adult in baby clothes or a diaper be labeled "baby"? I could imagine cases where the model may mislabel those). On the other side, there may be conditions that might mistakenly label a child or young adult as a senior citizen (progeria, for instance). Given the nuance there, I do think it's best to avoid age terms entirely. "Barbie" also seemed like a gendered term to me but if we consider it as referring to the toy itself then perhaps it's not something we need to consider. I hadn't thought about mapping terms though! That gives us an interesting opportunity to capture some of these using a more appropriate term; I'm all for it and I like your suggestions. But your notes also bring forth an assumption I didn't realize I was making: I thought we might make all of the labels lower case before inserting them into the catalog. There indeed appears to be some information encapsulated in the capitalization though, so perhaps that may not always work. The reason I was thinking of doing this is that our tag collection endpoint is case-sensitive, and the more generic nature of these labels would be better represented without casing. Additionally, the Clarifai tags are all lower-case. What do you think? Perhaps we can lower case them if they are Caps Case, and leave as-is all other cases? (e.g. GPS or iPod) |
I feel strongly that we should not lowercase everything. Aside from brand names, there are proper nouns in these tags that shouldn't be generically lower cased like "Buddha", "Christ the Redeemer" (referring to the monument in Rio de Janeiro). The tags endpoint is case sensitive, and whether it should or shouldn't be is not a question relevant to the metadata in the catalogue, as far as I'm concerned, it's a question about how we index and search that metadata. The catalogue only provides that metadata, it does not dictate the indexed format of it with respect to search, and such a separation would be inherently and needlessly limiting to the flexibility of search. We don't need to make a decision like "lowercase all incoming tags" in the catalogue (there's no technical reason the catalogue of openly licensed works needs to do that) and whether search does that or not doesn't have anything to do with the catalogue. In fact, the catalogue should enable whatever option deemed necessary for search, whether search treats tags as case sensitive or not. Doing a blanket transformation like lowercasing the tags in the catalogue removes that flexibility from search. I don't think we should do anything other than correct what we deem to be orthographic errors, whether those are errors like "Ipod" where the capitalisation is plainly incorrect or "GPS" and "Atm", where the error is in stylistic consistency across the incoming dataset, rather than in the single instance. I'd only say that about enrichment metadata, not provider tags, for what it's worth. We do have full control here, so while I think we shouldn't do things like lowercase all the tags in the catalogue, correcting orthographic errors or inconsistencies does seem worthwhile. Whatever the clarifai tags are doing or did doesn't hold much sway here, I think. We didn't make decisions about that and don't know the context of them, and there's no direct reason to choose to be consistent with something we cannot explain or even know the provenance of. On the other hand, I don't think that applies to how we do the inclusion check. For the purposes of checking whether a label should be included, I think it's fine to use a separate list of normalised labels so that e.g., using a Python of the labels makes the check O(1) for each label. To clarify, I'm only talking about label storage in the catalogue, not saying we need to at all times treat the labels in their original cases. In fact, this would solve the problem of needing to make sure our comparison of labels for the purposes of inclusion/exclusion does not affect how we fix orthographic errors, which we should do after the check (so that our orthographic fixes aren't limited by needing to keep characters in the same location in the string to use the upstream lists) but at least reduces the cognitive overhead of wondering whether those orthographic differences could affect the include/exclude logic.
I don't think it's whether we consider it to be that. Rekognition says that is the case, the label is in the toys category. Regarding the age labels, I agree, let's filter all of them. For the hair, I might agree with hair style but I'm not sure about hair colour. On the other hand, I also don't think it's important enough to argue for including them (I'd just as well exclude all of these tags in favour of other ways of enriching our metadata that don't rely on machine vision) so let's filter them as well. Can you explain your rationale behind excluding "Exchange of Vows"? The exclude list would be then, your list, less Barbie, plus the addition of tribe, and hoe from my list? And then whether to exclude exchange of vows depends on understanding the rationale for it? Also: reading the Rekognition docs, it looks like the label lists we were looking at might be different from the labels used by Rekognition at the time that they processed the dataset in the grant. That's based on this section of their documentation on aliases:
(Emphasis mine) I think this includes an assumption that labels are being added and removed, and that they might not be in the current list of tags which we've reviewed in this issue. Two things come from this:
|
@AetherUnbound Thank you for clarifying. Regarding Exclude
I reviewed the Person Description, Profession, Symbols, and Flags categories, as these seemed the most likely to contain something related to the indicated criteria. If labels in Religion are all considered potentially sensitive (there are only 11), it might be better to exclude them altogether as well. |
FWIW I'm in favour of a broader list of exclusions, with the knowledge that we can easily reverse any of those decisions or change how we decide if a label is in/out using more context in the future. We won't make search worse by excluding "too many" labels or anything like that, and I suspect there may be greater value from using the labels in context with the existing metadata for many works than in isolation anyway. |
That's some really solid logic around keeping capitalization (especially given our thoughts around how we're treating the catalog), thanks for expressing that! IIRC, our search is case insensitive which is certainly what matters the most, so I agree that we don't need to alter the casing (except in the "errors" you've pointed out).
Again to me, this felt like it had the propensity to be gendered. But I concede that it is more neutral, and I'm fine with leaving it out of the excluded labels. And thanks too for your notes about the actual inclusion vs exclusion logic. I like the idea of having an
I hadn't considered using the categories as a way of doing more blanketed exclusions! That's a great idea Krystle, and I'm all for it. So to summarize, the final list is:
Which would mean this is the full exclusion list: Final Exclusions
And the orthographic/gendered corrections would be: Corrections made during insertion
@krysal @sarayourfriend @zackkrida, does the above look right? Another thing that this has me thinking...I think I may have been assuming going into this that we were going to exclude the labels as they were being added to the catalog so they never even made it in1, but given the approach we've been taking with the catalog as a data warehouse, I'm not sure that's the best move anymore. What do you think? (CC @stacimc as well for just this particular paragraph in case you don't want to load in context for the rest of the convo!) Footnotes |
I definitely missed this detail from the IP and at the time would have pushed to include them in the database and filter them at the filter data step of the data refresh: indeed, to stay consistent with the data warehouse approach. However, the stakes are far lower if we don't plan to the excluded tags for any of the ideas we've discussed about them in the near-term and are making sure to keep the dataset in S3 (as recently clarified in the project thread); we can re-load the tags with new parameters and logic at any time, in that case. It would seem more consistent to have a single place where we filter data though 🙂 Also: exclusion list looks good to me! I would need to go back and look at the tags again to see if there were other capitalisation or orthographic changes, I don't remember them all off the top of my head and I stopped writing them down after the first few examples. The other note I brought up was about tags with hyphens in them, present in the tags from the "damage detection" category (presumably intended for insurance use cases?). They'll get excluded in their current form by the filter data step. Mostly wanted to just make sure that was documented so that we could address it in some way in the future if we wanted (maybe explicitly ignore tags from that category as well? or make appropriate orthographic transformations? or re-evaluate the hyphen-in-string exclusion logic?).
If the framing I used about the tag capitalisation helps clarify the role of the catalogue data compared to the data as searched, it would be a good thing for us to pull out into documentation about the architecture of our data and probably make sure the whole Openverse maintainers team are aware of it. It, like the catalogue being a data warehouse, is an important conceptual division between the cataloguing and retrieval aspects of our search domain. Our ongoing architectural discussion already bearing good fruit 😊 |
I'll go ahead and modify the IP one more time to include an explicit note that we'll be inserting all available Rekognition data, but filter it during the data refresh process. Then I'll make an issue based on that to ensure that work is captured, with clarity on using an inclusion-based filter while taking note of the "unreviewed" labels that the filter encounters. I'll also go through the Rekognition list one more time to see if there are any other capitalization/orthographic errors we'll need to mitigate.
I'm actually not sure this is the case; we don't have any code in the current filter step that removes tags that includes hyphens as far as I'm aware, nor in the enrich tags portion of the |
I've gone through and identified a few other corrections:
|
Perfect, thanks for looking for those orthographic errors too.
I misremembered the
Fine by me. I mentioned this privately and forgot to share it here: shall we also exclude the expressions ("Expressions and Emotions" category), based on Lisa Feldman Barrett's research on the accuracy of human emotional perception based on facial characteristics (video of Barrett discussing the topic)? I'd particularly wonder about the negative emotions, due to the significance of stigma and gendered aspects of those judgements, but we can keep it simple at the start by excluding them altogether? For anyone reading to give context worried about the expansiveness of the exclusion list, there are 3082 labels in Rekognition's current data set, and we're talking about excluding around 80 labels, so roughly 2.6% of the possible labels. In other words, a minuscule amount, and which says nothing about the actual effective number of labels we're excluding from the real dataset of labelled images, which may not encompass all 3082 available labels. Besides that, I believe we have justified these exclusions under the conditions discussed in the implementation plan. |
That's a fair point about the expressions & emotions too - I'm also fine excluding those. I've added the final list of exclusions & corrections to the issue description at the top. I feel like we've reached a good consensus on this, which is exciting! 😊 I think it might make sense to have that list codified in the project planning documents, I'll have a PR to do that which closes this issue. |
Description
This will involve a manual process of looking through each of the available labels for Rekognition and seeing if they match any of the criteria to be filtered. This process should be completed by two maintainers, and their list of exclusions discussed & combined. The excluded labels should then be saved in an accessible location, either on S3 or within the sensitive terms repository as a new file. Consent & approval should be sought from two other maintainers on the accuracy of the exclusion list prior to publishing.
Additional context
See this section of the IP.
Final exclusions
This is the list of exclusions we've determined, based on the discussion below.
Exclusions
Corrections
For various reasons (removing gender, capitalization correction, etc.) we plan on mapping the following terms to the corrected values.
Corrections
The text was updated successfully, but these errors were encountered: