-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mapping of XML document properties #3573
Comments
@oleibman Thanks both to you and @SlowFox71 for picking up all these improvements to the Xml Reader. |
@SlowFox71 As far as I can tell, almost all of these are already supported. When I use PhpSpreadsheet to load your spreadsheet and save it as Xlsx, all the standard properties (except not yet supported HyperlinkBase) are present in the output file (and equal to the values in the input file); the same is true for all but one of the custom properties (DateProperty). I shall certainly look into DateProperty (and maybe see about adding HyperlinkBase, which I do see, but I don't see Version - where do you see it?). In the meantime, do your results differ from mine? |
Just to clarify about Version - I see it in the XML, but I don't see it listed as a Property (or Advanced Property) in Excel. |
HyperlinkBase is really interesting. All the other document properties are 'meta'; but HyperlinkBase is functional - if you supply a relative address for a link, Excel will use HyperlinkBase, if supplied, to convert to an absolute address. (Default is directory where spreadsheet is located.) Now, guess what? Excel messes up this processing for Xml spreadsheets. It gets it right for Xlsx. I don't know, or particularly care, about Xls for the moment. Gnumeric and Odt, and of course Csv, have no equivalent that I can find. Html allows for an equivalent |
Oh dear, my bad - sorry :-( |
Fix PHPOffice#3573. The original issue concerned non-support of Document Properties in Xml spreadsheets. However, most of the Properties mentioned there were already supported. But the investigation revealed some gaps in Html coverage. HyperlinkBase is the one property mentioned in the issue that was not supported for Xml, nor indeed for any other format. All the other document properties are 'meta'; but HyperlinkBase is functional - if you supply a relative address for a link, Excel will use HyperlinkBase, if supplied, to convert to an absolute address. (Default is directory where spreadsheet is located.) Here's a summary of how this PR will handle this property for various formats: - Support is added for Xlsx read and write. - Support is added for Xml read (there is no Xml writer). Ironically, Excel messes up this processing when reading an Xml spreadsheet; however, PhpSpreadsheet will get it right. - HyperlinkBase is supported for Xls, but I have no idea how to read or write this property. For now, when writing hyperlinked cells, PhpSpreadsheet will be changed to convert any relative addresses that it can detect to absolute references by adding HyperlinkBase to the relative address. In a similar vein, Xls supports custom properties, but PhpSpreadsheet does not know how to read or write those. - Gnumeric has no equivalent property, so nothing needs to be done to its reader. Since we don't have a Gnumeric writer, that's not really a problem for us. - Odt has no equivalent property, so nothing needs to be done to its reader. The Odt writer does not have any special logic for hyperlinks, so, at least for now, will remain unchanged. - Csv has no equivalent property, so nothing needs to be done to its reader. The Csv writer does not have any special logic for hyperlinks, so, at least for now, will remain unchanged. - Html allows for an equivalent `base` tag in the head section. Support for this is added to Html reader and writer. Html Writer was only handling 8 of the 11 'core' properties. Support is added for `created`, `modified`, and `lastModifiedBy`. Custom properties were not supported at all, and now are. Html Reader did not support any properties. It will now support all of them.
* HyperlinkBase Property, and Html Handling of Properties Fix #3573. The original issue concerned non-support of Document Properties in Xml spreadsheets. However, most of the Properties mentioned there were already supported. But the investigation revealed some gaps in Html coverage. HyperlinkBase is the one property mentioned in the issue that was not supported for Xml, nor indeed for any other format. All the other document properties are 'meta'; but HyperlinkBase is functional - if you supply a relative address for a link, Excel will use HyperlinkBase, if supplied, to convert to an absolute address. (Default is directory where spreadsheet is located.) Here's a summary of how this PR will handle this property for various formats: - Support is added for Xlsx read and write. - Support is added for Xml read (there is no Xml writer). Ironically, Excel messes up this processing when reading an Xml spreadsheet; however, PhpSpreadsheet will get it right. - HyperlinkBase is supported for Xls, but I have no idea how to read or write this property. For now, when writing hyperlinked cells, PhpSpreadsheet will be changed to convert any relative addresses that it can detect to absolute references by adding HyperlinkBase to the relative address. In a similar vein, Xls supports custom properties, but PhpSpreadsheet does not know how to read or write those. - Gnumeric has no equivalent property, so nothing needs to be done to its reader. Since we don't have a Gnumeric writer, that's not really a problem for us. - Odt has no equivalent property, so nothing needs to be done to its reader. The Odt writer does not have any special logic for hyperlinks, so, at least for now, will remain unchanged. - Csv has no equivalent property, so nothing needs to be done to its reader. The Csv writer does not have any special logic for hyperlinks, so, at least for now, will remain unchanged. - Html allows for an equivalent `base` tag in the head section. Support for this is added to Html reader and writer. Html Writer was only handling 8 of the 11 'core' properties. Support is added for `created`, `modified`, and `lastModifiedBy`. Custom properties were not supported at all, and now are. Html Reader did not support any properties. It will now support all of them. * Scrutinizer Remove one dead reference.
This is:
What is the expected behavior?
Parse <o:DocumentProperties> from XML files
What features do you think are causing the issue
Does an issue affect all spreadsheet file formats? If not, which formats are affected?
XML only
I finally managed to find an XSL of the office namespace used in DocumentProperties: https://schemas.liquid-technologies.com/Office/2003/?page=office_xsd.html
With this a mapping of tags to PhpSpreadsheet-methods should be straightforward. So far I identified:
Titel => setTitle()
Subject => setSubject()
Keywords => setKeywords()
Description => setDescription()
Author => setCreator()
LastAuthor => setLastModifiedBy()
Created => setCreated()
LastSaved => setModified()
Manager => setManager()
Company => setCompany()
Category => setCategory()
If have not found PhpSpreadsheet-equivalents so far for:
HyperlinkBase
Version
CustomDocumentProperties are supported as well - however somewhat strange with the property name as the tag name, the type as a dt-namespaced dt-Attribute and the value as the content (see example). The schema for dt namespace is also available at the url above, but I could only use types
"string", "dateTime.tz", "boolean" and "float"; anything else (even "int") was converted into "string" by my Excel365.
test.txt
The text was updated successfully, but these errors were encountered: