-
Notifications
You must be signed in to change notification settings - Fork 62
Language dependent configuration
The languages.xml
file (located in the config
directory of the toolkit) describes language-dependent variables that are required for processing Wikipedia dumps. It is often necessary to modify and update this file to process new dumps of Wikipedia.
##Format
The language configuration file contains one Language element per language version of Wikipedia. Each of these contains RootCategory, DisambiguationCategory, and DisambiguationTemplate elements as children.
###Language
The Language element must provide the following attributes:
attribute | description |
---|---|
code | A short (typically two letter) code for the language (e.g. en, de, fr) |
name | A name (in English) for the language version |
localName | A name (in the given language) for the language version |
###RootCategory
Describes the root category, from which all other content-based categories descend (e.g. this one for the en version)
There should be only one root category per language.
###DisambiguationCategory
Describes a category to which disambiguation pages often belong to (e.g. this one for the en version)
There may be multiple disambiguation categories per language.
###DisambiguationTemplate
Describes a template that is often invoked on disambiguation pages (e.g. this one for the en version)
There may be multiple disambiguation templates per language.
###RedirectIdentifier
Describes the syntax used to create redirects. For example, if redirects are identified like this:
#REDIRECT [[target]]
then you should use the following redirect identifier
<RedirectIdentifier>REDIRECT</RedirectIdentifier>
There may be multiple redirect identifiers per language.
###NamespaceAlias
Describes alternative names for namespaces. For example, if the File
namespace can also be called Image
, then you should add the following alias:
<NamespaceAlias from='Image' to='File' />
##Current languages
Below are examples of Language elements for processing different versions of Wikipedia. Please feel free to add and modify these.
###Full English Wikipedia
<Language code="en" name="English" localName="English">
<RootCategory>Fundamental categories</RootCategory>
<DisambiguationCategory>disambiguation</DisambiguationCategory>
<DisambiguationTemplate>disambiguation</DisambiguationTemplate>
<DisambiguationTemplate>disambig</DisambiguationTemplate>
<DisambiguationTemplate>geodis</DisambiguationTemplate>
<DisambiguationTemplate>hndis</DisambiguationTemplate>
<DisambiguationTemplate>hospitaldis</DisambiguationTemplate>
<DisambiguationTemplate>mathdab</DisambiguationTemplate>
<DisambiguationTemplate>mountianindex</DisambiguationTemplate>
<DisambiguationTemplate>numberdis</DisambiguationTemplate>
<DisambiguationTemplate>roaddis</DisambiguationTemplate>
<DisambiguationTemplate>schooldis</DisambiguationTemplate>
<DisambiguationTemplate>shipindex</DisambiguationTemplate>
<DisambiguationTemplate>SIA</DisambiguationTemplate>
<RedirectIdentifier>REDIRECT</RedirectIdentifier>
<NamespaceAlias from='WP' to='Wikipedia' />
<NamespaceAlias from='WT' to='Wikipedia talk' />
<NamespaceAlias from='Image' to='File' />
<NamespaceAlias from='Image talk' to='File talk' />
<NamespaceAlias from='Project' to='Wikipedia' />
<NamespaceAlias from='Project talk' to='Wikipedia talk' />
</Language>
###Simple English Wikipedia
<Language code="simple" name="Simple English" localName="Simple English">
<RootCategory>articles</RootCategory>
<DisambiguationCategory>disambiguation</DisambiguationCategory>
<DisambiguationTemplate>disambiguation</DisambiguationTemplate>
<DisambiguationTemplate>disambig</DisambiguationTemplate>
<DisambiguationTemplate>2CC</DisambiguationTemplate>
<DisambiguationTemplate>3CC</DisambiguationTemplate>
<DisambiguationTemplate>hndis</DisambiguationTemplate>
<RedirectIdentifier>REDIRECT</RedirectIdentifier>
<NamespaceAlias from='WP' to='Wikipedia' />
<NamespaceAlias from='WT' to='Wikipedia talk' />
<NamespaceAlias from='Image' to='File' />
<NamespaceAlias from='Image talk' to='File talk' />
<NamespaceAlias from='Project' to='Wikipedia' />
<NamespaceAlias from='Project talk' to='Wikipedia talk' />
</Language>
###German Wikipedia
<Language code="de" name="German" localName="Deutsch">
<RootCategory>!Hauptkategorie</RootCategory>
<DisambiguationCategory>Begriffsklärung</DisambiguationCategory>
<DisambiguationTemplate>Begriffsklärung</DisambiguationTemplate>
<RedirectIdentifier>REDIRECT</RedirectIdentifier>
<RedirectIdentifier>WEITERLEITUNG</RedirectIdentifier>
</Language>
###Spanish Wikipedia
<Language code="es" name="Spanish" localName="Español">
<RootCategory>Artículos</RootCategory>
<DisambiguationCategory>Wikipedia:Desambiguación</DisambiguationCategory>
<DisambiguationTemplate>desambiguación</DisambiguationTemplate>
<DisambiguationTemplate>des</DisambiguationTemplate>
<DisambiguationTemplate>desambiguacion</DisambiguationTemplate>
<DisambiguationTemplate>disambig</DisambiguationTemplate>
<DisambiguationTemplate>desambig</DisambiguationTemplate>
<RedirectIdentifier>REDIRECT</RedirectIdentifier>
<RedirectIdentifier>des</RedirectIdentifier>
<RedirectIdentifier>otros usos</RedirectIdentifier>
<RedirectIdentifier>redirige aquí</RedirectIdentifier>
<RedirectIdentifier>ico-des</RedirectIdentifier>
</Language>