Should utils.iana_name()
return the actual IANA name?
#572
Labels
enhancement
New feature or request
utils.iana_name()
return the actual IANA name?
#572
charset_normalizer.utils.iana_name('utf-8')
returns'utf_8'
, which does not appear at all on https://www.iana.org/assignments/character-sets/character-sets.xhtml -- it's calledUTF-8
there, or possiblyutf-8
(as the table notes " no distinction is made between use of upper and lower case letters").(The concrete usecase that brought this up was serving arbitrary files over HTTP and generating an appropriate
content-type: text/plain; charset=UTF-8
header for them. I was quite suprised to getcharset=utf_8
instead, which browsers don't understand and then interpret wrongly.)I've looked at the current implementation, which is based on
encoding.aliases
from the stdlib -- but that explicitly talks about normalizing the names beforehand, because it is meant to lookup python modules AFAIU, whose syntax rules are quite different than the IANA encoding names. So I'm not sure if that's actually an appropriate datasource for that use case, or am I completely misunderstanding something here? I'll be grateful for any light that someone could shed onto this.The text was updated successfully, but these errors were encountered: