Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mapping of XML document properties #3573

Closed
1 task done
SlowFox71 opened this issue May 18, 2023 · 5 comments · Fixed by #3589
Closed
1 task done

mapping of XML document properties #3573

SlowFox71 opened this issue May 18, 2023 · 5 comments · Fixed by #3589

Comments

@SlowFox71
Copy link

SlowFox71 commented May 18, 2023

This is:

- [ ] a bug report
- [X] a feature request
- [X] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

Parse <o:DocumentProperties> from XML files

What features do you think are causing the issue

  • Reader

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

XML only

I finally managed to find an XSL of the office namespace used in DocumentProperties: https://schemas.liquid-technologies.com/Office/2003/?page=office_xsd.html

With this a mapping of tags to PhpSpreadsheet-methods should be straightforward. So far I identified:

Titel => setTitle()
Subject => setSubject()
Keywords => setKeywords()
Description => setDescription()
Author => setCreator()
LastAuthor => setLastModifiedBy()
Created => setCreated()
LastSaved => setModified()
Manager => setManager()
Company => setCompany()
Category => setCategory()

If have not found PhpSpreadsheet-equivalents so far for:

HyperlinkBase
Version

CustomDocumentProperties are supported as well - however somewhat strange with the property name as the tag name, the type as a dt-namespaced dt-Attribute and the value as the content (see example). The schema for dt namespace is also available at the url above, but I could only use types
"string", "dateTime.tz", "boolean" and "float"; anything else (even "int") was converted into "string" by my Excel365.

test.txt

@MarkBaker
Copy link
Member

@oleibman Thanks both to you and @SlowFox71 for picking up all these improvements to the Xml Reader.
It was written as an MVP, but so few people ever used the SpreadsheetML format that I never expanded it beyond that; but it's good to know that you've been working on all these improvements

@oleibman
Copy link
Collaborator

@SlowFox71 As far as I can tell, almost all of these are already supported. When I use PhpSpreadsheet to load your spreadsheet and save it as Xlsx, all the standard properties (except not yet supported HyperlinkBase) are present in the output file (and equal to the values in the input file); the same is true for all but one of the custom properties (DateProperty). I shall certainly look into DateProperty (and maybe see about adding HyperlinkBase, which I do see, but I don't see Version - where do you see it?). In the meantime, do your results differ from mine?

@oleibman
Copy link
Collaborator

Just to clarify about Version - I see it in the XML, but I don't see it listed as a Property (or Advanced Property) in Excel.

@oleibman
Copy link
Collaborator

HyperlinkBase is really interesting. All the other document properties are 'meta'; but HyperlinkBase is functional - if you supply a relative address for a link, Excel will use HyperlinkBase, if supplied, to convert to an absolute address. (Default is directory where spreadsheet is located.) Now, guess what? Excel messes up this processing for Xml spreadsheets. It gets it right for Xlsx. I don't know, or particularly care, about Xls for the moment. Gnumeric and Odt, and of course Csv, have no equivalent that I can find. Html allows for an equivalent base tag in the head section. I'll have to think about how far I want to go with this.

@SlowFox71
Copy link
Author

@SlowFox71 As far as I can tell, almost all of these are already supported.

Oh dear, my bad - sorry :-(
I am using an older version of PhpSpreadsheet (which does not support them) and was missing them due to the changed file structure.

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue May 28, 2023
Fix PHPOffice#3573. The original issue concerned non-support of Document Properties in Xml spreadsheets. However, most of the Properties mentioned there were already supported. But the investigation revealed some gaps in Html coverage.

HyperlinkBase is the one property mentioned in the issue that was not supported for Xml, nor indeed for any other format. All the other document properties are 'meta'; but HyperlinkBase is functional - if you supply a relative address for a link, Excel will use HyperlinkBase, if supplied, to convert to an absolute address. (Default is directory where spreadsheet is located.) Here's a summary of how this PR will handle this property for various formats:
- Support is added for Xlsx read and write.
- Support is added for Xml read (there is no Xml writer). Ironically, Excel messes up this processing when reading an Xml spreadsheet; however, PhpSpreadsheet will get it right.
- HyperlinkBase is supported for Xls, but I have no idea how to read or write this property. For now, when writing hyperlinked cells, PhpSpreadsheet will be changed to convert any relative addresses that it can detect to absolute references by adding HyperlinkBase to the relative address. In a similar vein, Xls supports custom properties, but PhpSpreadsheet does not know how to read or write those.
- Gnumeric has no equivalent property, so nothing needs to be done to its reader. Since we don't have a Gnumeric writer, that's not really a problem for us.
- Odt has no equivalent property, so nothing needs to be done to its reader. The Odt writer does not have any special logic for hyperlinks, so, at least for now, will remain unchanged.
- Csv has no equivalent property, so nothing needs to be done to its reader. The Csv writer does not have any special logic for hyperlinks, so, at least for now, will remain unchanged.
- Html allows for an equivalent `base` tag in the head section. Support for this is added to Html reader and writer.

Html Writer was only handling 8 of the 11 'core' properties. Support is added for `created`, `modified`, and `lastModifiedBy`. Custom properties were not supported at all, and now are.

Html Reader did not support any properties. It will now support all of them.
oleibman added a commit that referenced this issue Jun 3, 2023
* HyperlinkBase Property, and Html Handling of Properties

Fix #3573. The original issue concerned non-support of Document Properties in Xml spreadsheets. However, most of the Properties mentioned there were already supported. But the investigation revealed some gaps in Html coverage.

HyperlinkBase is the one property mentioned in the issue that was not supported for Xml, nor indeed for any other format. All the other document properties are 'meta'; but HyperlinkBase is functional - if you supply a relative address for a link, Excel will use HyperlinkBase, if supplied, to convert to an absolute address. (Default is directory where spreadsheet is located.) Here's a summary of how this PR will handle this property for various formats:
- Support is added for Xlsx read and write.
- Support is added for Xml read (there is no Xml writer). Ironically, Excel messes up this processing when reading an Xml spreadsheet; however, PhpSpreadsheet will get it right.
- HyperlinkBase is supported for Xls, but I have no idea how to read or write this property. For now, when writing hyperlinked cells, PhpSpreadsheet will be changed to convert any relative addresses that it can detect to absolute references by adding HyperlinkBase to the relative address. In a similar vein, Xls supports custom properties, but PhpSpreadsheet does not know how to read or write those.
- Gnumeric has no equivalent property, so nothing needs to be done to its reader. Since we don't have a Gnumeric writer, that's not really a problem for us.
- Odt has no equivalent property, so nothing needs to be done to its reader. The Odt writer does not have any special logic for hyperlinks, so, at least for now, will remain unchanged.
- Csv has no equivalent property, so nothing needs to be done to its reader. The Csv writer does not have any special logic for hyperlinks, so, at least for now, will remain unchanged.
- Html allows for an equivalent `base` tag in the head section. Support for this is added to Html reader and writer.

Html Writer was only handling 8 of the 11 'core' properties. Support is added for `created`, `modified`, and `lastModifiedBy`. Custom properties were not supported at all, and now are.

Html Reader did not support any properties. It will now support all of them.

* Scrutinizer

Remove one dead reference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants