-
-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Globalization Extension #239
Conversation
…a block should be rtl or not
@xoofx I think the extension is ready, I added a spec for it too. However, I am not sure how the test generation is supposed to work exactly. Do I only need to add globalization to |
src/Markdig/Helpers/CharHelper.cs
Outdated
// Generated from Table D.1 of RFC3454 | ||
// http://www.ietf.org/rfc/rfc3454.txt | ||
|
||
return c >= 0x0005D0 && c <= 0x0005EA || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that these are not Unicode code points instead? You need one or two chars to build a unicode codepoint actually (a char is UTF16 in .NET but can span across two chars to get a full unicode codepoint). You should probably iterate differently on the char taking into account that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually you do this by checking that the first char is char.IsHighSurrogate(c1)
, if it is not, it is directly a codepoint otherwise you need to read the next character, then verify that it is a char.IsLowSurrogate(c2)
and if it is, you can convert c1
and c2
to char.ConvertToUtf32(c1, c2)
to a codepoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also side note, add an internal comment to the method IsRightToLeft
to specify that the implementation is inefficient and should probably be transformed to a binary search approach at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xoofx I worked on your suggestion and this is what I came up with:
private bool StartsWithRtlCharacter(string text)
{
for (int i = 0; i < text.Length; i++)
{
#if PORTABLE
int c = text[i];
#else
int c;
if (char.IsHighSurrogate(text[i]) &&
i < text.Length - 1 &&
char.IsLowSurrogate(text[i + 1]))
{
c = char.ConvertToUtf32(text[i], text[i + 1]);
}
else
{
c = text[i];
}
#endif
if (CharHelper.IsRightToLeft(c))
return true;
else if (CharHelper.IsLeftToRight(c))
return false;
}
return false;
}
As you might have noticed, char.IsHighSurrogate
, char.IsLowSurrogatee
and char.ConvertToUtf32
are not supported by portable40-net40+sl5+win8+wp8+wpa81
. So we need to add a new compilation symbol to represent that target platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add the following code to CharHelper:
private const char HIGH_SURROGATE_START = '\ud800';
private const char HIGH_SURROGATE_END = '\udbff';
private const char LOW_SURROGATE_START = '\udc00';
private const char LOW_SURROGATE_END = '\udfff';
public static bool IsHighSurrogate(char c)
{
return ((c >= HIGH_SURROGATE_START) && (c <= HIGH_SURROGATE_END));
}
public static bool IsLowSurrogate(char c)
{
return ((c >= LOW_SURROGATE_START) && (c <= LOW_SURROGATE_END));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need to support portable profiles such as portable40-net40+sl5+win8+wp8+wpa81
? I thought that with .net standard those would be gone (and win8 is a dead OS anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xoofx and for ConvertToUtf32
, can we use .net core's code?
public static int ConvertToUtf32(char highSurrogate, char lowSurrogate)
{
if (!IsHighSurrogate(highSurrogate))
{
throw new ArgumentOutOfRangeException(nameof(highSurrogate), SR.ArgumentOutOfRange_InvalidHighSurrogate);
}
if (!IsLowSurrogate(lowSurrogate))
{
throw new ArgumentOutOfRangeException(nameof(lowSurrogate), SR.ArgumentOutOfRange_InvalidLowSurrogate);
}
return (((highSurrogate - CharUnicodeInfo.HIGH_SURROGATE_START) * 0x400) + (lowSurrogate - CharUnicodeInfo.LOW_SURROGATE_START) + UNICODE_PLANE01_START);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes (we will probably have to copy the MIT license in CharHelper at least)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need to support portable profiles such as portable40-net40+sl5+win8+wp8+wpa81? I thought that with .net standard those would be gone (and win8 is a dead OS anyway).
Well, I would not, but I'm worried that it may break a few apps out there, so I would prefer that we keep this compatibility for a period and I will probably kill it at some point
@@ -1,6 +1,27 @@ | |||
// Copyright (c) Alexandre Mutel. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not remove the original license but append the .NET MIT License after (with a one line preamble just before the MIT license that explains which code was taken from .NET Core, ,that should be enough)
For the tests, you need to modify the Specs.tt file to add a reference to the new spec file, and when saving the tt file it should regenerate the .cs file, and your tests will be able to run. |
@xoofx Good, I will also split up the spec by html element (list, heading, table) |
I mean in the same file, just split the example |
Amazing, thanks for this whole PR! 👍 |
Closes #238
This extension adds dir="rtl" attribute to elements that start with RTL characters, RTL characters are specified by section D of RFC3454.
The main work on the extension done, what remains is for me to test it a bit more, possibly make it more efficient and less verbose and write a couple of unit tests and documentation.