Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globalization Extension #239

Merged
merged 9 commits into from
Aug 28, 2018
Merged

Globalization Extension #239

merged 9 commits into from
Aug 28, 2018

Conversation

mhmd-azeez
Copy link
Contributor

@mhmd-azeez mhmd-azeez commented Aug 25, 2018

Closes #238

This extension adds dir="rtl" attribute to elements that start with RTL characters, RTL characters are specified by section D of RFC3454.

The main work on the extension done, what remains is for me to test it a bit more, possibly make it more efficient and less verbose and write a couple of unit tests and documentation.

@mhmd-azeez
Copy link
Contributor Author

@xoofx I think the extension is ready, I added a spec for it too. However, I am not sure how the test generation is supposed to work exactly. Do I only need to add globalization to specFiles of Specs.tt?

// Generated from Table D.1 of RFC3454
// http://www.ietf.org/rfc/rfc3454.txt

return c >= 0x0005D0 && c <= 0x0005EA ||
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that these are not Unicode code points instead? You need one or two chars to build a unicode codepoint actually (a char is UTF16 in .NET but can span across two chars to get a full unicode codepoint). You should probably iterate differently on the char taking into account that

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually you do this by checking that the first char is char.IsHighSurrogate(c1), if it is not, it is directly a codepoint otherwise you need to read the next character, then verify that it is a char.IsLowSurrogate(c2) and if it is, you can convert c1 and c2 to char.ConvertToUtf32(c1, c2) to a codepoint.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also side note, add an internal comment to the method IsRightToLeft to specify that the implementation is inefficient and should probably be transformed to a binary search approach at some point.

Copy link
Contributor Author

@mhmd-azeez mhmd-azeez Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xoofx I worked on your suggestion and this is what I came up with:

private bool StartsWithRtlCharacter(string text)
{
    for (int i = 0; i < text.Length; i++)
    {

#if PORTABLE
        int c = text[i];
#else
        int c;
        if (char.IsHighSurrogate(text[i]) &&
            i < text.Length - 1 &&
            char.IsLowSurrogate(text[i + 1]))
        {
            c = char.ConvertToUtf32(text[i], text[i + 1]);
        }
        else
        {
            c = text[i];
        }
#endif
        if (CharHelper.IsRightToLeft(c))
            return true;
        else if (CharHelper.IsLeftToRight(c))
            return false;
    }

    return false;
}

As you might have noticed, char.IsHighSurrogate, char.IsLowSurrogatee and char.ConvertToUtf32 are not supported by portable40-net40+sl5+win8+wp8+wpa81. So we need to add a new compilation symbol to represent that target platform.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add the following code to CharHelper:

        private const char HIGH_SURROGATE_START = '\ud800';
        private const char HIGH_SURROGATE_END = '\udbff';
        private const char LOW_SURROGATE_START = '\udc00';
        private const char LOW_SURROGATE_END = '\udfff';

        public static bool IsHighSurrogate(char c)
        {
            return ((c >= HIGH_SURROGATE_START) && (c <= HIGH_SURROGATE_END));
        }

        public static bool IsLowSurrogate(char c)
        {
            return ((c >= LOW_SURROGATE_START) && (c <= LOW_SURROGATE_END));
        }

Copy link
Contributor

@Kryptos-FR Kryptos-FR Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need to support portable profiles such as portable40-net40+sl5+win8+wp8+wpa81? I thought that with .net standard those would be gone (and win8 is a dead OS anyway).

Copy link
Contributor Author

@mhmd-azeez mhmd-azeez Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xoofx and for ConvertToUtf32, can we use .net core's code?

public static int ConvertToUtf32(char highSurrogate, char lowSurrogate)
{
    if (!IsHighSurrogate(highSurrogate))
    {
        throw new ArgumentOutOfRangeException(nameof(highSurrogate), SR.ArgumentOutOfRange_InvalidHighSurrogate);
    }
    if (!IsLowSurrogate(lowSurrogate))
    {
        throw new ArgumentOutOfRangeException(nameof(lowSurrogate), SR.ArgumentOutOfRange_InvalidLowSurrogate);
    }
    return (((highSurrogate - CharUnicodeInfo.HIGH_SURROGATE_START) * 0x400) + (lowSurrogate - CharUnicodeInfo.LOW_SURROGATE_START) + UNICODE_PLANE01_START);
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes (we will probably have to copy the MIT license in CharHelper at least)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need to support portable profiles such as portable40-net40+sl5+win8+wp8+wpa81? I thought that with .net standard those would be gone (and win8 is a dead OS anyway).

Well, I would not, but I'm worried that it may break a few apps out there, so I would prefer that we keep this compatibility for a period and I will probably kill it at some point

@@ -1,6 +1,27 @@
// Copyright (c) Alexandre Mutel. All rights reserved.
Copy link
Owner

@xoofx xoofx Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not remove the original license but append the .NET MIT License after (with a one line preamble just before the MIT license that explains which code was taken from .NET Core, ,that should be enough)

@xoofx
Copy link
Owner

xoofx commented Aug 28, 2018

@xoofx I think the extension is ready, I added a spec for it too. However, I am not sure how the test generation is supposed to work exactly. Do I only need to add globalization to specFiles of Specs.tt?

For the tests, you need to modify the Specs.tt file to add a reference to the new spec file, and when saving the tt file it should regenerate the .cs file, and your tests will be able to run.

@mhmd-azeez
Copy link
Contributor Author

@xoofx Good, I will also split up the spec by html element (list, heading, table)

@mhmd-azeez
Copy link
Contributor Author

I mean in the same file, just split the example

@mhmd-azeez mhmd-azeez changed the title WIP Globalization Extension Globalization Extension Aug 28, 2018
@xoofx xoofx merged commit 6947474 into xoofx:master Aug 28, 2018
@xoofx
Copy link
Owner

xoofx commented Aug 28, 2018

Amazing, thanks for this whole PR! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants