-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent creation of invalid CF variable names. #3009
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
* All var names being written to NetCDF are now CF compliant. Non alpha-numeric characters are replaced with '_', and must always have a leading letter. | ||
Ref: https://github.com/SciTools/iris/pull/2930 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1339,6 +1339,27 @@ def _get_dim_names(self, cube): | |
dimension_names.append(dim_name) | ||
return dimension_names | ||
|
||
@staticmethod | ||
def cf_valid_var_name(var_name): | ||
""" | ||
Return a valid CF var_name given a potentially invalid name. | ||
|
||
Args: | ||
|
||
* var_name (str): | ||
The var_name to normalise | ||
|
||
Returns: | ||
A var_name suitable for passing through for variable creation. | ||
|
||
""" | ||
# Replace invalid charaters with an underscore ("_"). | ||
var_name = re.sub(r'[^a-zA-Z0-9]', "_", var_name) | ||
# Ensure the variable name starts with a letter. | ||
if re.match(r'^[^a-zA-Z]', var_name): | ||
var_name = 'var_{}'.format(var_name) | ||
return var_name | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to this by unidata, it is valid to have multibyte, UTF-8 encoded, NFC-normalized Unicode characters as part of the naming convention for netCDF variables. I guess we should also support that, otherwise we may impact some users. I had a peek at the CF Metedata conventions for variables, but it appears to state that the convention doesn't standardise variable names. But the naming conventions don't mention unicode at all.. Do we follow that to the letter, or consider that an oversight? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to also consider There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Quite possibly, yes. I kept this one limited, but happy to creep the scope if you force me 😄 |
||
|
||
@staticmethod | ||
def _cf_coord_identity(coord): | ||
""" | ||
|
@@ -1448,6 +1469,7 @@ def _get_cube_variable_name(self, cube): | |
# Convert to lower case and replace whitespace by underscores. | ||
cf_name = '_'.join(cube.name().lower().split()) | ||
|
||
cf_name = self.cf_valid_var_name(cf_name) | ||
return cf_name | ||
|
||
def _get_coord_variable_name(self, cube, coord): | ||
|
@@ -1480,6 +1502,8 @@ def _get_coord_variable_name(self, cube, coord): | |
name = 'unknown_scalar' | ||
# Convert to lower case and replace whitespace by underscores. | ||
cf_name = '_'.join(name.lower().split()) | ||
|
||
cf_name = self.cf_valid_var_name(cf_name) | ||
return cf_name | ||
|
||
def _create_cf_cell_measure_variable(self, cube, dimension_names, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# (C) British Crown Copyright 2013 - 2017, Met Office | ||
# (C) British Crown Copyright 2013 - 2018, Met Office | ||
# | ||
# This file is part of Iris. | ||
# | ||
|
@@ -472,6 +472,33 @@ def test_masked_byte_fill_value_passed(self): | |
pass | ||
|
||
|
||
class Test_cf_valid_var_name(tests.IrisTest): | ||
def test_no_replacement(self): | ||
self.assertEqual(Saver.cf_valid_var_name('valid_Nam3'), | ||
'valid_Nam3') | ||
|
||
def test_special_chars(self): | ||
self.assertEqual(Saver.cf_valid_var_name('inv?alid'), | ||
'inv_alid') | ||
|
||
def test_leading_underscore(self): | ||
self.assertEqual(Saver.cf_valid_var_name('_invalid'), | ||
'var__invalid') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we care to have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't. Do you? |
||
|
||
def test_leading_number(self): | ||
self.assertEqual(Saver.cf_valid_var_name('2invalid'), | ||
'var_2invalid') | ||
|
||
def test_leading_invalid(self): | ||
self.assertEqual(Saver.cf_valid_var_name('?invalid'), | ||
'var__invalid') | ||
|
||
def test_no_hyphen(self): | ||
# CF explicitly prohibits hyphen, even though it is fine in NetCDF. | ||
self.assertEqual(Saver.cf_valid_var_name('valid-netcdf'), | ||
'valid_netcdf') | ||
|
||
|
||
class _Common__check_attribute_compliance(object): | ||
def setUp(self): | ||
self.container = mock.Mock(name='container', attributes={}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could just use
r'[\W]'
here. And what about unicode characters?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I can never remember if
\W
contains the-
character. Given that, my preference is explicit. Happy to be convinced otherwise though.