Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc element properties are not lining up with their ECMA definitions #780

Open
1 of 2 tasks
rmboggs opened this issue Jul 21, 2020 · 14 comments
Open
1 of 2 tasks

Misc element properties are not lining up with their ECMA definitions #780

rmboggs opened this issue Jul 21, 2020 · 14 comments
Assignees
Labels
Milestone

Comments

@rmboggs
Copy link
Contributor

rmboggs commented Jul 21, 2020

Before submitting an issue, please fill this out

Is this a:

  • Issue with the OpenXml library
  • Question on library usage

Description

The DocumentFormat.OpenXml.WordProcessing.OnOffOnlyValues enum only contains values On and Off but if I am reading the ECMA-376 standard correctly, properties with this value type may contain values On, Off, 0, 1, true, and false. When values, such as true or false, are present for an xml attribute for this type, the Value property of the EnumValue<OnOffOnlyType> object throws a System.FormatException exception.

The attached document contains an element /document[0]/tbl[0]/tr[0]/trPr[0]/cantSplit which contains the property in question.
file-sample_500kB.docx

I'm open to discuss this further but it doesn't seem like this should be the correct behavior.

Information

  • .NET Target: .NET Framework 4.7.2
  • DocumentFormat.OpenXml Version: 2.11.3

Repro

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using System;
using System.IO;
using System.Linq;

namespace ConsoleApp2
{
    class Program
    {
        static void Main(string[] args)
        {
            // Change  to path of test file
            const string pathToFile = @"C:\path\to\file-sample_500kB.docx";

            using (var f = new FileStream(pathToFile, FileMode.Open))
            {
                using (var docx = WordprocessingDocument.Open(f, false))
                {
                    var sample = docx.MainDocumentPart.Document.Descendants<CantSplit>().FirstOrDefault();

                    if (sample != null)
                    {
                        var val = sample.Val;
                        Console.WriteLine($"Sample has value: {val.HasValue}");
                        Console.WriteLine($"Sample inner text: {val.InnerText}");

                        try
                        {
                            Console.WriteLine($"Sample value: {val.Value}");
                        }
                        catch (Exception x)
                        {
                            Console.WriteLine($"Sample value threw {x.GetType().Name}");
                            Console.WriteLine($"Sample value exception message: {x.Message}");
                        }
                    }
                }
            }
            Console.WriteLine("Press any key to quit");
            Console.ReadKey(true);
        }
    }
}

Observed

Sample has value: False
Sample inner text: false
Sample value threw FormatException
Sample value exception message: The text value is not a valid enumeration value.
Press any key to quit

Expected

Sample has value: True
Sample inner text: false
Sample value: DocumentFormat.OpenXml.Wordprocessing.OnOffOnlyValues.False or DocumentFormat.OpenXml.Wordprocessing.OnOffOnlyValues.Off
Press any key to quit
@rmboggs
Copy link
Contributor Author

rmboggs commented Jul 21, 2020

FYI - I caught this while testing out the Serialize.OpenXml.CodeGen library.

@twsouthwick
Copy link
Member

Interesting. @tomjebo can you take a look at the spec and weigh in here?

@twsouthwick
Copy link
Member

@rmboggs Can you share the section of the spec you were looking at?

@rmboggs
Copy link
Contributor Author

rmboggs commented Jul 22, 2020

I was referencing the first revision of the open xml ecma standard located here.
Pdf Page 308 - cantSplit definition which references the possible values defined by the ST_OnOff simple type
Pdf Page 1786 - ST_OnOff possible values which include the values I listed in the initial issue report above.

Please let me know if there are more questions.

@rmboggs
Copy link
Contributor Author

rmboggs commented Jul 22, 2020

Actually, I dug a bit deeper here as there are other elements that also have the same possible values. DocumentFormat.OpenXml.WordProcessing.Caps for example. It has the same set of possible values as the DocumentFormat.OpenXml.WordProcessing.CantSplit but they inherit from different base classes. Caps inherits from DocumentFormat.OpenXml.Wordprocessing.OnOffType while CantSplit inherits from DocumentFormat.OpenXml.Wordprocessing.OnOffOnlyType. Since it appears that OnOffType offers the appropriate values for this kind of element, I believe that changing CantSplit to inherit from OnOffType would solve this issue rather than rewrite the OnOffOnlyType class. I'm hoping this won't be a shocking/breaking change.

@twsouthwick
Copy link
Member

My gut says that's a breaking change. I'll dig into it too see. We'll probably do a v3 soon and can handle any breaking changes then

@rmboggs
Copy link
Contributor Author

rmboggs commented Jul 23, 2020

It makes sense to be cautious. I don't think this is urgent since the cantSplit class has been set this way for quite some time and the difference is being noticed now, in a non production environment. I have an idea how to get around this in my project so I should be ok. In the meantime, would it be ok to label this for the 3.0 release so it doesn't get lost.

@tomjebo
Copy link
Collaborator

tomjebo commented Jul 24, 2020

@twsouthwick @rmboggs I think we have a mistake in the schema processing (backend). cantSplit looks like it was defined in our backend processing as CT_OnOffOnly which is not correct based on the ISO definition or what I see in the Office source schemas. I see it actually associated with both CT_OnOffOnly and CT_OnOff in our backend code and the former looks like it's overriding the latter. I think this is wrong. I'm not sure if correcting this to CT_OnOff would be breaking as it would be expanding the possible values, not taking any away. I'll need to check the other elements that are set to this type as well.

@tomjebo tomjebo added the bug label Jul 24, 2020
@tomjebo tomjebo self-assigned this Jul 24, 2020
@rmboggs
Copy link
Contributor Author

rmboggs commented Jul 24, 2020

Thanks @tomjebo. Please let me know if you need more information on this.

@twsouthwick twsouthwick added this to the v3.0 milestone Jul 24, 2020
@twsouthwick
Copy link
Member

This is definitely a v3.0 change as it will be a breaking change. We'll track this and get that in.

@rmboggs
Copy link
Contributor Author

rmboggs commented Jul 25, 2020

Thanks guys, I have a work around for my stuff in the meantime.

It's not the prettiest work around but it will do.

@rmboggs
Copy link
Contributor Author

rmboggs commented Jul 27, 2020

Hi,

I'm finding other non-enumvalue type properties that do not seem to be matching up to their values in some of the sample documents that I am testing against. Since they seem to be related to schema mismatches at first glance similar to this, should I append what I find to this issue or create a new one to keep things clean? Please let me know.

@twsouthwick
Copy link
Member

Yes, let's make this an uber-issue for those kind of issues. Then we can track it for v3.0

@rmboggs
Copy link
Contributor Author

rmboggs commented Jul 27, 2020

Ok, let me see if I can modify the title to be more broad in a bit.

In the meantime, I'm seeing more inconsistencies in the sample docx file mentioned at the beginning of this issue when compared to the ecma standard. First one is for the DocGrid.CharacterSpace property. The property type is set to Int32Value but the sample document attached has 4294961151 as its value in the SectionProperties element, which is out of range for that type.. When I check the ecma standard, it says the values for this type should be a positive/negative decimal number (defined in 2.18.16 ST_DecimalNumber (Decimal Number Value)) which is pretty vague in terms of what the minimum/maximum values should be. On the flip side, DivId.Val, which has the same definition as the DocGrid.CharacterSpace property in the ecma standard (2.18.16 ST_DecimalNumber (Decimal Number Value)), is setup as a StringValue type in the SDK. I have yet to check out the actual code for these two properties but if they are generated the same way as the CantSplit type, then there is another issue with the generation process. Chances are that the disconnect between the two properties is due to the ambiguity of the ecma schema definition but it should be looked at, imho.

Please let me know if more details are needed for this.

@rmboggs rmboggs changed the title OnOffOnlyValues Enum may be missing values Misc element properties are not lining up with their ECMA definitions Jul 27, 2020
@twsouthwick twsouthwick modified the milestones: v3.0, v4.0 Nov 15, 2023
edwintorok added a commit to edwintorok/pandoc that referenced this issue Dec 19, 2023
```
    {
        "FilePath": "test/docx/golden/tables.docx",
        "ValidationErrors": "[{\"Description\":\"The attribute 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:val' has invalid value 'true'. The Enumeration constraint failed.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:tbl[1]/w:tr[1]/w:trPr[1]/w:tblHeader[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_AttributeValueDataTypeDetailed\",\"ErrorType\":\"Schema\"}]"
    }
```

Although this one might actually be a bug in Open-XML-SDK similar to
this, or a subtle difference between standard versions:
dotnet/Open-XML-SDK#780

Signed-off-by: Edwin Török <[email protected]>
edwintorok added a commit to edwintorok/pandoc that referenced this issue Dec 19, 2023
```
    {
        "FilePath": "test/docx/golden/tables.docx",
        "ValidationErrors": "[{\"Description\":\"The attribute 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:val' has invalid value 'true'. The Enumeration constraint failed.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:tbl[1]/w:tr[1]/w:trPr[1]/w:tblHeader[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_AttributeValueDataTypeDetailed\",\"ErrorType\":\"Schema\"}]"
    }
```

Although this one might actually be a bug in Open-XML-SDK similar to
this, or a subtle difference between standard versions:
dotnet/Open-XML-SDK#780

Signed-off-by: Edwin Török <[email protected]>
edwintorok added a commit to edwintorok/pandoc that referenced this issue Dec 19, 2023
```
    {
        "FilePath": "test/docx/golden/tables.docx",
        "ValidationErrors": "[{\"Description\":\"The attribute 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:val' has invalid value 'true'. The Enumeration constraint failed.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:tbl[1]/w:tr[1]/w:trPr[1]/w:tblHeader[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_AttributeValueDataTypeDetailed\",\"ErrorType\":\"Schema\"}]"
    }
```

Although this one might actually be a bug in Open-XML-SDK similar to
this, or a subtle difference between standard versions:
dotnet/Open-XML-SDK#780

Signed-off-by: Edwin Török <[email protected]>
jgm pushed a commit to jgm/pandoc that referenced this issue Dec 19, 2023
```
    {
        "FilePath": "test/docx/golden/tables.docx",
        "ValidationErrors": "[{\"Description\":\"The attribute 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:val' has invalid value 'true'. The Enumeration constraint failed.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:tbl[1]/w:tr[1]/w:trPr[1]/w:tblHeader[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_AttributeValueDataTypeDetailed\",\"ErrorType\":\"Schema\"}]"
    }
```

Although this one might actually be a bug in Open-XML-SDK similar to
this, or a subtle difference between standard versions:
dotnet/Open-XML-SDK#780

Signed-off-by: Edwin Török <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants