Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode Unicode in X-XlsForm-FormId-Fallback #54

Closed
matthew-white opened this issue May 8, 2023 · 3 comments · Fixed by #55
Closed

Encode Unicode in X-XlsForm-FormId-Fallback #54

matthew-white opened this issue May 8, 2023 · 3 comments · Fixed by #55

Comments

@matthew-white
Copy link
Member

Software and hardware versions

pyodk 0.3.0, Python v3.11.3

Problem description

I'm noticing that pyODK doesn't encode the X-XlsForm-FormId-Fallback header. Central expects the header to be ASCII. Unicode is expected to be URL-encoded. (pyxform-http is the one to decode it.) This came up in Central in getodk/central#196.

That said, I'm not sure to what extent this is a real problem. I tried using client.forms.update() to send an XLSForm with Unicode in its filename, and pyODK seemed happy to send a Unicode header. If the Central API and pyxform-http are happy to receive a Unicode header, then the only issue would be filenames that contain % (filenames for which the filename and the URL-decoded filename are not the same).

@lindsay-stevens
Copy link
Contributor

@matthew-white thanks for the report - it would help a lot if you could you please 1) add example code to reproduce the issue, 2) show the expected result, and 3) actual result?

@matthew-white
Copy link
Member Author

matthew-white commented May 10, 2023

I've uploaded a form with an ID of ✅ here: https://staging.getodk.cloud/#/projects/22/forms/%E2%9C%85. The issue can be reproduced by downloading that form, then running the following:

client.forms.update(project_id=22, form_id='✅', definition='✅.xlsx')

Without changing the version string in the XLSForm, I think I should receive a 409 error response. However, the request doesn't seem to get off the ground. I see the following error:

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1256, in putheader
    values[i] = one_value.encode('latin-1')
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2705' in position 0: ordinal not in range(256)

I'm not very familiar with Python, but I'm thinking that this could be solved by calling urllib.parse.quote() on file_path.stem here:

"X-XlsForm-FormId-Fallback": file_path.stem,

That said, I'm not sure to what extent this is a real problem. … pyODK seemed happy to send a Unicode header. … the only issue would be filenames that contain %

Looking at it more, I think I was wrong about this. I think pyODK actually generally isn't willing to send a Unicode X-XlsForm-FormId-Fallback header. I think I got confused by the form at getodk/central#196. When I download that form from GitHub, the resulting file name is tést.xlsx (encoded as te%CC%81st.xlsx). But the form ID in the XLSForm is tést, and even though the two look the same, the latter is encoded as t%C3%A9st. I guess there are multiple ways to input é, and while one (%C3%A9) can be encoded as latin-1, the other (e%CC%81) cannot. I tried to avoid this issue in my reproduction steps above by using ✅, which is definitely not latin-1.

@lindsay-stevens
Copy link
Contributor

@matthew-white thanks for these details. I've put together a draft PR (linked above). I haven't tested it against Central yet but if you would like to try it out please let me know how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants