-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future Direction for i18n of Web Applications #50
Comments
Very interesting. You say you would like to remove GetText, I tend to agree from the standpoint of requiring yet another dependency with cross platform concerns. Instead this is much more suitable as a standard with platform specific inplementations or in our case Mono/CIL. Is your proposal to replace it with a new parser that outputs nuggets into PO format? I would be in favor of this as I am biased to the localization happening in the HTTP pipeline and irrespective of tier. _("") supports neither. On 2013-04-06, at 8:00 AM, Martin Connell [email protected] wrote:
|
Yes, drop GetText and replace with our own parser to extract nuggets from sources in a Visual Studio project and/or folder branch. This could be called GetMessages to keep with PO terminology, or GetNuggets :) I'm not so sure about dropping msgmerge, however. This seems to live more in the PO world and does a good job as far as I can see, though of course perfectly includable in GetMessages. Back to GetText: the current regex in the v2.0 branch for post-processing the My plan at the moment is to write this parser when my current web project needs to go international. Not sure when that will be and it could be a year or so away; but in the meantime I'm coding translatable messages in the app as nuggets. Given they don't exist in the PO file at present (because of course GetText ignores them), they are being output as they are with the markers removed (by the v2.0 post processing). |
I just want to add that [ and ] are not super comfortable to write on a Swedish keyboard. but on the other hand all a swedish person has access to without pushing a secondary button is so that sort of suck... but i write _() faster than i write [] |
As Martin mentioned, it's less about what the nugget tokens are vs. the On Sat, Apr 6, 2013 at 1:07 PM, Rickard [email protected] wrote:
Daniel Crenna |
Agreed, I think it sounds lovely with post work since it must make random attributes and similar work better and not needed overloading. while someone is here, i can't find how to programatically change language. LanguageFilter.RedirectWithLanguage is set protected but would do the trick. am i missing something. |
Good point about the markers/tokens being user-configurable. I can't see any reason why not, and that would be very nice. It might be slightly more tricky when it comes down to formatted nuggets. An example of the syntax at the moment for these is:
which is used, say, in Razor like:
The extra level of indirection is required to pass the userName and lastDate values through to the post processor where they are passed through formatting once more (with any message string got from the PO file, the translator thus having the freedom to put %0 and %1 anywhere they want in the message). So we have:
Making the start and end strings different such that they don't overlap eases the parsing of nuggets considerably. For instance, with '###' for both start and end, any parser needs to keep a progress track of whether it is on the start or end marker. And checking for closed markers becomes necessary etc. That is why I went for square brackets, because they naturally formed open and close pairs, weren't HTML/XML markup, and were less common than (). Oh, and weren't used by C# string formatter i.e. {}. Personally I would have used |
No, sorry I have no idea on how to create those on my keyboard, nor my girlfriends German one ;-) but you have a valid point that macros are a sweet way to go. I have never written a VS plugin but several ideas pops in to my head... such as select any text and double tap ctrl to wrap... or similar... a VS plugin would make macros the ultimate way to go i think |
A then when the VS plugin supports inline translation it will be absolutely amazing. I think big parts of the world is in my situation... have a "small" language like Swedish... Swedish is required but English is usually needed soon thereafter. After that tho it's usually fine for a while. now point here is that programmers in my situation usually speak both their native tongue and English... so I can translate everything myself to the first language (Swedish, since default is english). and that would be pretty slick to be able to do "inline" with a popup of some sort from the vs plugin, as soon as i have typed a line. oh well, one can dream :-) |
This might be mission creep, but agree it would be nice if the nugget syntax allowed for inline translations. E.g.
IIRC, the post processor at the moment stops at the first ||| when it extracts the identity of the token, so only The post -processor would then check for any PO translations first, then any inline ones, and fallback to token/default. |
Actually I am not sure I would like to have multiples in the file. I would rather want a tool come up and merge in the translation into the PO file... but details. Martin, do you have a minute, i have two questions on multiple projects and postbuild... I input both projects with inputpaths but it does not seem to parse second project. |
This all sounds good, I'm a bit worried about supporting backwards compatibility, maybe we should just do a clean break to avoid having to package gettext at all. |
I agree, I think it is time to drop all backwards compability. so both gettext/msgmerge but also all classes, overloads and interfaces that was there simply to handle _() function call. |
No doubt we have all come to this i18n project looking for a better way to internationalize our web applications. We see that doing the old .NET resource look-up is backward. I expect we also see that leveraging the PO infrastructure for getting messages translated is the way forward.
Unfortunately, the PO infrastructure (i.e. the GNU Portable Object file format specification and the world of tools for translating the files) is very much tethered to GetText, the latter being very backward IMO.
Whoever invented GetText had a brain-wave: we can encode strings in our source code in such a way that A) they can be hooked at run-time, and B) we can find and extract those strings from the source code. A very nice duality! So he or she wrote a library of functions that can look-up and swap message strings, and a tool for scanning source code files for those function calls and extracting message strings to be translated. It therefore assumes that all your message strings are contained in source code files which it can parse, and that they can be encoded as an argument to a function call e.g. _("Translate me!");
For someone facing the problem of how to internationalize a GUI app written in C, GetText is a good approach. For a back-end server program (like a web application), I suggest it is also reasonable, but not the best. With a back-end application, we have access to the output stream, and with a web application it is very easy to get at the HTTP response and do our translations there.
Now, as soon as we drop one side of the duality, one might start to wonder about the other side.
The question is, why bother with all those
_()
functions when we only need them to mark the message strings (given that we can hook into the HTTP response body). The reason, of course, is that we still need to mark the message strings so they can be extracted into the PO file. Okay, but if we were going to choose a method for marking message strings for extraction, unhindered by any considerations other than it needs to be reliable, would we choose prefixing the string with_("
and suffixing with")
?There must be a better way to mark message strings, so that they can be easily picked up in source code and the HTTP response. The same algorithm can be used for both. Better still would be compatibility with SQL LIKE so that they can be extracted from database tables too e.g. product descriptions.
The marking can be done in the string itself, so message strings can be written straight into source files without the need to call any helper functions. Very useful for const strings such as C# attributes and data annotations. They would be entirely language independent: C#, Razor, JavaScript, HTML. They can also be written straight into database fields. No need to think "how do I access that helper function?"
Performing the translations at the HTTP response layer has the advantage of confining message look-up and patching to a single place, hence efficiency gains. It reduces dependency on any particular web development platform; we can forget about MVC and drop down the stack to ASP.NET (or even lower).
So where are we with this? With Issue #37 I have taken a stab at defining a suitable message marking syntax, called the Nugget syntax. There will be scope for improvement on the syntax I have no doubt (and a better name). It would be great to have a discussion with you guys on this. I'm sure we can come up with a syntax that is easy to remember and use, and yet robust. Support for string formatting is essential (i.e. {0} substitution), and pluralization would be nice.
With the marking syntax defined, the only outstanding work is to swap out (or augment) the GetText-dependent post-build task with new logic for extracting the marked message strings and adding them to the PO output. My preference here would be to drop GetText altogether (along with the _() calls), but that would mean dropping backward compatibility for projects.
The v2.0 branch includes all the other support necessary for post-processing the HTTP response. At the moment it has support for processing the Nugget marking syntax, and changing that to support any new syntax would be trivial.
We then get to keep the best bits of the GNU translation project:
It has been a few months now that I have been developing a web app using i18n v2.0 branch, where there is the option to encode a message string as either
_("Translate Me")
or"[[[Translate Me]]]"
. Given the latter takes no extra thought other than including the[[[
and]]]
it wins every time.Martin Connell
The text was updated successfully, but these errors were encountered: