This is a command line tool to convert the contents of a Confluence space into a MediaWiki imprt data format.
- PHP >= 7.4 with the XML extension must be installed
- The
pandoc
tool must be installed and available in thePATH
(https://pandoc.org/installing.html)
- Download
migrate-confluence.phar
from https://github.com/hallowelt/migrate-confluence/releases/tag/latest - Copy
migrate-confluence.phar
to/usr/local/bin/migrate-confluence
- Create an export of your confluence space
Step 1:
Step 2:
Step 3:
- Save it to a location that is accessbile by this tool (e.g.
/tmp/confluence/input/Confluence-export.zip
) - Extract the ZIP file (e.g.
/tmp/confluence/input/Confluence-export
)- The folder should contain the files
entities.xml
andexportDescriptor.properties
, as well as the folderattachments
- The folder should contain the files
- Create the "workspace" directory (e.g.
/tmp/confluence/workspace/
) - From the parent directory (e.g.
/tmp/confluence/
), run the migration commands- Run
migrate-confluence analyze --src input/ --dest workspace/
to create "working files". After the script has run you can check those files and maybe apply changes if required (e.g. when applying structural changes). - Run
migrate-confluence extract --src input/ --dest workspace/
to extract all contents, like wikipage contents, attachments and images into the workspace - Run
migrate-confluence convert --src workspace/ --dest workspace/
to convert the wikipage contents from Confluence Storage XML to MediaWiki WikiText - Run
migrate-confluence compose --src workspace/ --dest workspace/
to create importable data
- Run
If you re-run the scripts you will need to clean up the "workspace" directory!
- Copy the diretory "workspace/result" directory (e.g.
/tmp/confluence/workspace/result/
to your target wiki server (e.g./tmp/result
) - Go to your MediaWiki installation directory
- Make sure you have the target namespaces set up properly
- Use
php maintenance/importImages.php /tmp/result/images/
to first import all attachment files and images - Use
php maintenance/importDump.php /tmp/result/output.xml
to import the actual pages
You may need to update your MediaWiki search index afterwards.
In the case that the tool can not migrate content or functionality it will create a category, so you can manually fix issues after the import
Broken_link
Broken_user_link
Broken_page_link
Broken_image
Broken_layout
Broken_macro/<macro-name>
- User identities
- Comments
- Various macros
- Various layouts
- Blog posts
- Clone this repo
- Run
composer update
- Run
box build
to actually create the PHAR file indist/
. See also https://github.com/humbug/box
- Reduce multiple linebreaks (
<br />
) to one - Remove line breaks and arbitrary fromatting (e.g.
<b>
) from headings - Mask external images (
<img />
) - Preserve filename of "Broken_attachment"
- Add
wikitable
as default class to<table>
- Merge multiple
<code>
lines into<pre>
- Remove bold/italic formatting from wikitext headings (e.g.
=== '''Some heading''' ===
) - Fix unconverted HTML lists in wikitext (e.g.
<ul><li>==== Lorem ipsum ====</li><li>'''<span class="confluence-link"> </span>[[Media:Some_file.pdf]]'''</li></ul><ul>
) - Remove empty confluence storage format fragments (e.g.
<span class="confluence-link"> </span>
,<span class="no-children icon">
)