-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding not handled correctly for natural earth data #739
Comments
This sounds reasonable if it isn't part of the core shapefile capability. Out of interest, have you tried loading the shapefiles with Fiona? Is there an assumed encoding there? |
I tried Here's a small sample that reads in the 50m and 10m files w/ fiona and each of the cp1252/utf-8 encodings, and you can see the difference for the 'name' property (which is 'NAME' in the 10m file). import cartopy.io.shapereader as sr
import fiona
from itertools import product
import regex as re
for resolution, encoding in product(['50m', '10m'], ['cp1252', 'utf-8']):
filename = sr.natural_earth(
resolution=resolution,
category='cultural',
name='admin_0_map_subunits')
source = fiona.open(filename, encoding=encoding)
print filename, len(source)
try:
for f in source:
for propname in ['name', 'NAME']:
try:
name = f['properties'][propname]
break
except KeyError:
name = '<missing>'
if re.search(r'[\u0080-\u7fff]', name):
print u"%-4s %-8s %-30s ==> %-30s" % (resolution, encoding, repr(name), name)
except:
print "ERROR DURING", resolution, encoding
source.close() |
Incidentally, the fiona user's manual even says:
|
Excellent. Thank you @jtbraun. I think fiona is becoming more readily installable, and is a reasonable optional dependency for cartopy. The upshot will be huge performance boosts, which is always nice 😄 |
Prior to version 3.x of the natural earth data, the strings inside the *.dbf files were encoded as Windows-1252 as documented here: http://www.naturalearthdata.com/features/
Starting with the 3.x versions, the *.dbf files are encoded with UTF-8, as mentioned here: nvkelso/natural-earth-vector#89
At some point the zip files began including a .cpg file (like ne_10m_admin_0_map_subunits.cpg), whose contents specify the character encoding (UTF-8 in the example given).
In my opinion, since cartopy.io.sharereader.natural_earth() does the magic downloading of the natural earth data, it should also look for and unzip/cache the *.cpg file and the *.VERSION.txt file. It should look for the *.cpg file for the encoding, and if that doesn't exist it should read the version and compare it against 3.x and assume Windows-1252 or UTF-8.
Then, pyshp (shapefile.py) needs to be modified to allow the encoding to be specified. Today it auto-assumes utf-8 under sys.vertion_info[0] == 3, and assumes nothing (passes the bytes back/forth) for sys.version_info[0] != 3. (see GeospatialPython/pyshp#46)
The text was updated successfully, but these errors were encountered: