-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot export texts containing certain characters as UIMA CAS XMI #4058
Comments
The XML standard does not allow certain characters to part of the XML document. While the XML 1.1 standard allows more than the XML 1.0 standard, there are still some forbidden characters even in XML 1.1. |
… XMI - Rename packages in XMI support module - Switch to modern spring factories declaration
Thanks. The thing is, I can export the individual documents I have annotated to the format. I just cannot export the entire project. Any idea why this might be? |
Can you provide the part of the log output that contains the stack trace any maybe a few lines before? |
|
If you export the file If you download the file as a plain text file and open it in a hex editor, you should see that the third byte in the data is control character |
I'm able to export that file just fine in either XML 1.0 or 1.1:
|
That is very interesting since the code used to export the document should be the same in both instances. I wonder if you could share a project export privately with me for investigation? (Exported using "no secondary format"). Btw. does the document text actually start with |
@reckart Sent you the file. I'm not sure where the |
The Removing the ZIP file from your documents lists fixes the problem. |
Interesting. The ZIP file was an export of a file that I reimported to try to test the curation mode. Do you know why the import didn't properly unpack the ZIP file? |
If you export a project as ZIP, you need to import that project through the project overview page again, not as a document. If you export a document as XMI, it comes down as a ZIP too - but you cannot import that ZIP back in directly. For uploading an XMI file, you'd have to unzip the file and only upload the |
… XMI - Enable sanitziing of illegal characters on export for XMI formats - Added option to control if sanitation happens or not - Updated documentation of XMI formats
… XMI - Enable sanitziing of illegal characters on export for XMI formats - Added option to control if sanitation happens or not - Updated documentation of XMI formats
… XMI - Enable sanitziing of illegal characters on export for XMI formats - Added option to control if sanitation happens or not - Updated documentation of XMI formats
… XMI - Enable sanitziing of illegal characters on export for XMI formats - Added option to control if sanitation happens or not - Updated documentation of XMI formats
…port-texts-containing-certain-characters-as-UIMA-CAS-XMI #4058 - Cannot export texts containing certain characters as UIMA CAS XMI
…cters as UIMA CAS XMI - Rename packages in XMI support module - Switch to modern spring factories declaration
…cters as UIMA CAS XMI - Enable sanitziing of illegal characters on export for XMI formats - Added option to control if sanitation happens or not - Updated documentation of XMI formats
* main: No issue. Slightly improving PDF editor visual No issue: Experiment with a Jenkinsfile No issue: Experiment with a Jenkinsfile No issue: Experiment with a Jenkinsfile No issue: Experiment with a Jenkinsfile No issue: Experiment with a Jenkinsfile #4058 - Cannot export texts containing certain characters as UIMA CAS XMI #3673 - Update dependencies #4066 - Display document name on export failure No issue. Minor additions to BioC format description #4062 - ViewportTracker should focus on block-like elements #4058 - Cannot export texts containing certain characters as UIMA CAS XMI #4032 - Allow using externalized strings from backend code #4060 - Clean up redundant code in annotation handlers #4026: Support for error tracking with Sentry
…cters as UIMA CAS XMI - Enable sanitziing of illegal characters on export for XMI formats - Added option to control if sanitation happens or not - Updated documentation of XMI formats
Describe the bug
Project backup (xmi-xml1.1)
Unexpected error during project export: SAXParseException: Trying to serialize non-XML 1.1 character: 0x0 at offset 5 in string starting with PK
To Reproduce
Project Settings > Export > Backup export with Secondary format: UIMA CAS XMI (XML 1.1)
Expected behavior
No response
Screenshots
No response
Environment
Version and build ID: INCEpTION -- 28.1 (2023-05-26 16:54:12, build 867bcf1)
Operating system: macOS 13.3.1 (a)
Java: openjdk version "11.0.19" 2023-04-18
Browser: Firefox 114.0
Additional context
CAS Doctor doesn't show anything suspicious.
The text was updated successfully, but these errors were encountered: