Skip to content

Commit

Permalink
Address code page issues w/ Windows file paths
Browse files Browse the repository at this point in the history
On Windows, HDF5 attempted to convert file paths passed to open() and
remove() to UTF-16 in order to handle Unicode file paths. This scheme
does not work when the system uses code pages to handle non-ASCII
file names.

As suggested in the forum post below, we now also try to see if we
can open the file with open(), which should handle systems where
non-ASCII code pages are in use.

https://forum.hdfgroup.org/t/open-create-hdf5-files-with-non-utf8-chars-such-as-shift-jis/11785
  • Loading branch information
derobins committed Mar 18, 2024
1 parent 840476e commit 73f788a
Showing 1 changed file with 45 additions and 17 deletions.
62 changes: 45 additions & 17 deletions src/H5system.c
Original file line number Diff line number Diff line change
Expand Up @@ -516,13 +516,10 @@ H5_get_utf16_str(const char *s)
/*-------------------------------------------------------------------------
* Function: Wopen_utf8
*
* Purpose: UTF-8 equivalent of open(2) for use on Windows.
* Converts a UTF-8 input path to UTF-16 and then opens the
* file via _wopen() under the hood
* Purpose: UTF-8 equivalent of open(2) for use on Windows
*
* Return: Success: A POSIX file descriptor
* Failure: -1
*
*-------------------------------------------------------------------------
*/
int
Expand All @@ -532,10 +529,6 @@ Wopen_utf8(const char *path, int oflag, ...)
wchar_t *wpath = NULL; /* UTF-16 version of the path */
int pmode = 0; /* mode (optionally set via variable args) */

/* Convert the input UTF-8 path to UTF-16 */
if (NULL == (wpath = H5_get_utf16_str(path)))
goto done;

/* _O_BINARY must be set in Windows to avoid CR-LF <-> LF EOL
* transformations when performing I/O. Note that this will
* produce Unix-style text files, though.
Expand All @@ -551,12 +544,33 @@ Wopen_utf8(const char *path, int oflag, ...)
va_end(vl);
}

/* Open the file */
/* First try opening the file with the normal POSIX open() call.
* This will handle ASCII without additional processing as well as
* systems where code pages are being used instead of true Unicode.
*/
if ((fd = open(path, oflag, pmode)) >= 0) {
/* If this succeeds, we're done */
goto done;
}

if (errno == ENOENT) {
/* Not found, reset errno and try with UTF-16 */
errno = 0;
}
else {
/* Some other error (like permissions), so just exit */
goto done;
}

/* Convert the input UTF-8 path to UTF-16 */
if (NULL == (wpath = H5_get_utf16_str(path)))
goto done;

/* Open the file using a UTF-16 path */
fd = _wopen(wpath, oflag, pmode);

done:
if (wpath)
H5MM_xfree((void *)wpath);
H5MM_xfree(wpath);

return fd;
} /* end Wopen_utf8() */
Expand All @@ -565,12 +579,9 @@ Wopen_utf8(const char *path, int oflag, ...)
* Function: Wremove_utf8
*
* Purpose: UTF-8 equivalent of remove(3) for use on Windows.
* Converts a UTF-8 input path to UTF-16 and then opens the
* file via _wremove() under the hood
*
* Return: Success: 0
* Failure: -1
*
*-------------------------------------------------------------------------
*/
int
Expand All @@ -579,16 +590,33 @@ Wremove_utf8(const char *path)
wchar_t *wpath = NULL; /* UTF-16 version of the path */
int ret = -1;

/* First try opening the file with the normal POSIX open() call.
* This will handle ASCII without additional processing as well as
* systems where code pages are being used instead of true Unicode.
*/
if ((ret = remove(path)) >= 0) {
/* If this succeeds, we're done */
goto done;
}

if (errno == ENOENT) {
/* Not found, reset errno and try with UTF-16 */
errno = 0;
}
else {
/* Some other error (like permissions), so just exit */
goto done;
}

/* Convert the input UTF-8 path to UTF-16 */
if (NULL == (wpath = H5_get_utf16_str(path)))
goto done;

/* Open the file */
/* Remove the file using a UTF-16 path */
ret = _wremove(wpath);

done:
if (wpath)
H5MM_xfree((void *)wpath);
H5MM_xfree(wpath);

return ret;
} /* end Wremove_utf8() */
Expand Down

0 comments on commit 73f788a

Please sign in to comment.