Skip to content

Language dependent configuration

dnmilne edited this page Aug 22, 2013 · 5 revisions

The ''languages.xml'' file (located in the ''config'' directory of the toolkit) describes language-dependent variables that are required for processing Wikipedia dumps. It is often necessary to modify and update this file to process new dumps of Wikipedia.


The language configuration file contains one Language element per language version of Wikipedia. Each of these contains RootCategory, DisambiguationCategory, and DisambiguationTemplate elements as children.


The Language element must provide the following attributes: ||code|A short (typically two letter) code for the language (e.g. en, de, fr) ||name|A name (in English) for the language version ||localName|A name (in the given language) for the language version

###RootCategory Describes the root category, from which all other content-based categories descend (e.g. [this one|] for the ''en'' version)

There should be only one root category per language.


Describes a category to which disambiguation pages often belong to (e.g. [this one|] for the ''en'' version)

There may be multiple disambiguation categories per language.


Describes a template that is often invoked on disambiguation pages (e.g. [this one|] for the ''en'' version)

There may be multiple disambiguation templates per language.


Describes the syntax used to create redirects. For example, if redirects are identified like this:

#REDIRECT [[target]]

then you should use the following redirect identifier


There may be multiple redirect identifiers per language.

##Current languages

Below are examples of Language elements for processing different versions of Wikipedia. Please feel free to add and modify these.

###Full English Wikipedia

        <Language code="en" name="English" localName="English">
               <RootCategory>Fundamental categories</RootCategory>

###Simple English Wikipedia

        <Language code="simple" name="Simple English" localName="Simple English">

###German Wikipedia

        <Language code="de" name="German" localName="Deutsch">


