How to migrate pages from MediaWiki to Confluence

wiki-to-confluence-migration

In this post I will go through how to migrate from the MediaWiki to Confluence server. It will include the historical information as well as image attachments.

Environment

Source:

  • Centos5
  • MediaWiki: Ver 1.24.1 (Also tested on a pilot site with the latest version 1.28.1)
  • Mysql
  • Wiki type: MediaWiki

Target:

  • Centos7
  • Confluence 6.1
  • Postgresql

Converter Tool

  • UWC Ver 1.3.2

Introduction.

Atlassian used to have a supported plugin called UWC for this very job. It still available in archives, but it is marked as not supported for versions after 4. I gave it a go and still does what I needed for our case.

The only limitation at this stage is, I couldn’t work out how to map original creators as Confluence authors. There is a section in the config files to do that, but it requires a plugin that no longer available in the newer versions of Confluence. So all pages created under the name of user used in config file and dated to the date of import. This is still significant limitation so I will be looking for the ways to get around it and will update the post once found a way. But for now it does what I needed. So be aware of this limitation if you are following this post.

Download UWC

You need to download uwc.tar tool to the confluence server.

I uploaded it to here as well for when Atlassian removes it in future.

#Download the uwc tool to ~/wikiconvert/uwc.tar on the Confluence server.
cd ~/wikiconvert
tar -xvf uwc.tar

Export MediaWiki

Prepare export config file

To export the pages go through these steps. You need to have wiki mysql database name, user name and password to get access to it.

mkdir my-wiki   #for exported files in next step
cp conf/exporter.mediawiki.properties conf/exporter.my-mediawiki.properties
vim conf/exporter.my-mediawiki.properties

Now update:

  • databaseName
  • dbUrl
  • login (mysql user)
  • password (mysqluser pass)
  • output

values and un-comment:

  • history
  • history-suffix
  • udmf=true

 

# Exporter class
exporter.class=com.atlassian.uwc.exporters.MediaWikiExporter

### Mediawiki database connection information
## database name is the name of the mediawiki database
databaseName=wikidb

## dbUrl is the JDBC connection url. The following is an example mysql url
dbUrl=jdbc:mysql://10.1.10.23:3306

# The JDBC driver you wish to use
# Note: You will have to provide the JAR, unless you use MySQL. See:
# http://confluence.atlassian.com/display/CONFEXT/UWC+F.A.Q.#UWCF.A.Q.-HowdoIaddalibraryjartotheUWC%3F
# The Mysql driver has been provided. The following would be the class if you use MySQL.
jdbc.driver.class=com.mysql.jdbc.Driver

## Login info to connect to this database. You will need to replace this.
login=wiki
password=wikipass

## This is the output directory. The export will send text files with Mediawiki syntax to this directory.
output=my-wiki/

## OPTIONAL properties
## database prefix is the prefix that mediawiki assigns to every table.
## You can find it in LocalSettings.php as $wgDBprefix
## This is only used by the default SQL
dbPrefix=

## encoding, if this is not set, the default is utf-8
## See this list for available options:
## http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
#encoding=utf-8
## You can turn on url-encoding with the page titles (if your file system can't
## handle certain characters) by uncommenting this setting. If you do this,
## remember to set to true its counterpart in converter.mediawiki.properties
## Mediawiki.0002.illegalnames-urldecode.property=true
#urlencoding=true
## page history properties
## Set history to true if you wish to preserve page histories
## Choose a filename suffix that follows the requirements of the page
## history framework. Described here: http://confluence.atlassian.com/display/CONFEXT/UWC+Page+History+Framework#UWCPageHistoryFramework-pagehistorysuffixproperty
history=true
history-suffix=-#.txt

## User and Timestamp data properties (udmf)
## Set this to true if you would like user and timestamp metadata to be
## preserved. Data will be added to the beginning of each file as:
## {user:foobar}
## {timestamp:yyyymmddhhmmss}
udmf=true

 

## output the original title to page content as an {orig-title:foo bar} macro
## Useful for hierarchy data that's maintained in the page title like that
## maintained with SubPageList3 plugin
#origtitle=true
 

## Provide your own sql for querying the database.
## In order for these statements to be used instead of the default:
## All properties (except db.sql.revdata) must be defined and non-empty.
## The textdata SQL will be run once for every result row of the pagedata SQL.
## The revdata SQL will be run, if the above page history properties are set.
## You can use any column properties in the the textdata sql.
## You can use the pageid column property in the revdata sql.
## Those refs will be replaced with appropriate data from the results
## of the pagedata SQL.
## Here is an example of what the default SQL looks like.
## It is compatible with Mediawiki 1.7.1, and should (untested)
## be backward compatible with Mediawiki 1.5
##
#db.sql.pagedata=select page_id, page_namespace, page_title, page_latest from mw_page where page_namespace!='8' and page_namespace!='12';
#db.sql.textdata=select old_text from mw_text where old_id="db.column.textid";
#db.sql.textiddata=select rev_text_id from mw_revision where rev_id="db.column.textid";
#db.sql.revdata=select rev_id from mw_revision where rev_page = "db.column.pageid";
#db.column.title=page_title
#db.column.namespace=page_namespace
#db.column.pageid=page_id
#db.column.textid=page_latest
#db.column.text=old_text
##
## Here is an example of some suggested Mediawiki 1.4 SQL, submitted by Blair Labatt via jira issue UWC-189 - http://developer.atlassian.com/jira/browse/UWC-189
#db.sql.pagedata=select cur_id, cur_namespace, cur_title from cur;

#db.sql.textdata=(select old_text from old where old_id in( select rc_this_oldid from recentchanges where rc_cur_id = "db.column.textid" )) union (select cur_text from cur where cur_id = "db.column.textid");

#db.column.title=cur_title
#db.column.namespace=cur_namespace
#db.column.pageid=cur_id
#db.column.textid=cur_id
#db.column.text=cur_text
## For examples relating to other versions, see http://confluence.atlassian.com/display/CONFEXT/UWC+Mediawiki+Notes#UWCMediawikiNotes-UserSuggestedOptionalSQLProperties
## Namespaces
## Identifies which namespaces to output when not using optional sql properties.
## The default behavior will be:
## Namespace | Out directory
## Main -> Pages
## Users -> Users
## Any custom namespaces -> idnum or define them here
## namespaces.ids=comma seperated list of namespace id numbers to be exported
#namespaces.ids=0,2
## namespaces.exportallcustom=true (default), if you want all custom
## namespaces to be exported. false, if you want to define the custom namespaces
## explicitly in namespaces.ids
#namespaces.exportallcustom=true
## namespaces.customnamespace.mapping=comma delimted key value pairs.
## key is namespace id
## value is namespace name, values should be able to be legal directory names
## use syntax: id=>name,id2=>name2
#namespaces.customnamespace.mapping=100=>Foo,101=>Foo_talk

Run export command

./run_cmdline.sh -e conf/exporter.my-mediawiki.properties
#check for the files:
ll my-wiki/exported_mediawiki_pages/Pages

Now there should be one text file for every wiki page (and each version if history is enabled).

Backup Images/Attachments

You should backup images manually. By default, they live under /var/www/html/wiki/images. Find the images directory on your server and make a copy of them.

Follow these steps on original Wiki server machine:

cd /var/www/dkintranet/wiki
tar -vcf ~/wiki-bup/wiki-images.tar /var/www/html/wiki/images

 

copy wiki-images.tar file to the confluence server and follow below steps there:

scp wikiserver:/home/saeed/wiki-bup/wiki-images.tar ~/wikiconvert/
tar -xvf wiki-images.tar

Convert/Import to Confluence

Now we have all exported files and images ready on the confluence machine.

Prepare Confluence

Better to create a space on confluence server for the imported pages. I created a space called “Wiki Migration Space” with WIKI as key. You will need this key for the next step.

Install udmf plugin (do not follow this step)

I didn’t have much luck on few different trials on this step. I leave it here just to document the steps I took and the issue with udmf plugin in newer versions.

cd ~/wikiconvert
wget https://marketplace-cdn.atlassian.com/files/artifact/fd6fd717-9fbb-431d-9150-2a6446ffeeda/udmf-rpc-1.1.jar
sudo cp udmf-rpc-1.1.jar /opt/atlassian/confluence/confluence/WEB-INF/lib/
sudo systemctl restart confluence

Although udmf is now in the right director, it still can not be enabled.

 

Update confluence config file

cp conf/confluenceSettings.properties conf/my-confluenceSettings.properties
vim conf/my-confluenceSettings.properties
#Thu Apr 10 22:19:42 EDT 2017 
current.tab.index=0
space=WIKI
url=http://myconfluence.com:8090
trustpass=
pages=my-wiki/exported_mediawiki_pages/Pages/
uploadOrphanAttachments=false
pageChooserDir=
attachments=my-wiki/images/
trustall= attachment.size.max=-1
sendToConfluence=true
pattern=
login=myconfluenceuser
truststore=
feedback.option=true
password=myconfluencepass
wikitype=mediawiki

Update convert config file

The last thing to customise is the mediawiki conversion config file. Make a copy first and update the file as below:

cp conf/converter.mediawiki.properties conf/converter.my-mediawiki.properties
vim conf/converter.my-mediawiki.properties
Mediawiki.0001.allow-tilde-in-links.property=true
Mediawiki.0002.illegalnames-urldecode.property=false
Mediawiki.0003.xmlevents.property=com.atlassian.uwc.converters.xml.DefaultXmlEvents
Mediawiki.0003.xml-fragments.property=true
Mediawiki.0003.xml-fragments-root.property=html
## Html Tidy - Turn this option on to automatically correct malformed html/xml
#Mediawiki.0003.xml-use-htmltidy.property=true

## Html Tidy Options
##           - set JTidy options using the following naming convention:
##           - xml-tidyopt-key.property=val
##           - where key is any supported JTidy key listed here:
##             http://www.docjar.com/docs/api/org/w3c/tidy/Tidy.html
##             but using the property naming convention listed here:
##             http://tidy.sourceforge.net/docs/quickref.html 
##           - and value is one of the options described here:
##             http://tidy.sourceforge.net/docs/quickref.html
##           - Example. To turn on numeric-entities, uncomment:
#Mediawiki.0003.xml-tidyopt-numeric-entities.property=true

## User Date Metadata - Uncomment these if you want the UWC to convert
## optionally exportable user and timestamp metadata
## change the users-must-exist property to false if you do not want the 
## framework to check if the username is an existing confluence user
## Requires UDMF Plugin https://studio.plugins.atlassian.com/wiki/display/udmf
## If you need to get rid of user/timestamp macros, but don't want the 
## user timestamp data actually associated, set userdate-disabled to true
#Mediawiki.0004.userdate.class=com.atlassian.uwc.converters.mediawiki.UserDateConverter
Mediawiki.0004.users-must-exist.property=false
Mediawiki.0004.userdate-disabled.property=true
Mediawiki.0005.tokenize-math.java-regex-tokenizer=((?s)<math>.*?<\/math>){replace-with}$1

## Hierarchy 
## If your hierarchy can be represented in the page content, uncomment and
## configure the 0006 properties to maintain your hierarchy data.
## CONFIGURE:
## * content-hierarchy-pattern - this is the regex that will find your hierarchy
##   data. The contents of group 1 in this regex should list the current pages
##   ancestors.
## * content-hierarchy-pattern-includes-current - 
##   if true, content-hierarchy-pattern will be expected to list the current
##   page title as well. If false, only the ancestors will be expected.
## * content-hierarchy-delim - the delimiter that seperates the ancestors in
##   the ancestor string found by content-hierarchy-pattern
## * content-hierarchy-default-root - The pagename of the parent page for
##   pages with no ancestor. If blank, or a nonexistant pagename, the page will
##   be given no parent, ie. will be an orphan page, and sibling to Home
## * content-hierarchy-setname - If true, the origtitle data will be used
##   to set the title. Set to true when setting char encoding.
## * remove-content - If you want the hierarchy content to be scrubbed from 
##   the page content, set the regex to the same property as 
##   content-hierarchy-pattern. Remember to leave the {replace-with} at the end.
#Mediawiki.0006.switch.hierarchy-builder=UseBuilder
#MediaWiki.0006.classname.hierarchy-builder=com.atlassian.uwc.hierarchies.ContentHierarchy
#Mediawiki.0006.content-hierarchy-pattern.property=\{orig-title:([^}]*)\}
#Mediawiki.0006.content-hierarchy-pattern-includes-current.property=true
#Mediawiki.0006.content-hierarchy-delim.property=/
#MediaWiki.0006.content-hierarchy-default-root.property=Home
#MediaWiki.0006.content-hierarchy-setname.property=false
#MediaWiki.0006.remove-content.java-regex=\{orig-title:([^}]*)\}{replace-with}
Mediawiki.0009-escapebraces.class=com.atlassian.uwc.converters.mediawiki.EscapeBracesConverter
Mediawiki.0010-win-newline-cleaner.class=com.atlassian.uwc.converters.tikiwiki.WinNewlinesConverter
Mediawiki.0014-tokenizedollars.java-regex-tokenizer=([$]){replace-with}$1

## Page Histories - if you exported with page history options turned on,
## uncomment and configure these options. See 
## http://confluence.atlassian.com/display/CONFEXT/UWC+Page+History+Framework
Mediawiki.0050.switch.page-history-preservation=true
Mediawiki.0051.suffix.page-history-preservation=-#.txt
## Code, Pre, and Leading Spaces
Mediawiki.0090-re_pre.java-regex-tokenizer=\<pre\>((?s).*?)\<\/pre\>{replace-with}{code}$1{code}
Mediawiki.0095-re_code.java-regex-tokenizer=\<code\>((?s).*?)\<\/code\>{replace-with}{code}$1{code}
## requires the Table of Contents Macro, tested with version 1.4.7, 
## http://www.randombits.org/display/CONF/Table+of+Contents+Plugin
Mediawiki.0100-toc.java-regex=__TOC__{replace-with}{toc:outline=true|printable=false|style=none|indent=20px}
Mediawiki.0300-re_bold_italics.java-regex='{5}(.*?)'{5}{replace-with}*_$1_*
Mediawiki.0330-multiline-edge1.java-regex=''+\r?\n(=+.*?=+\r?\n)''+{replace-with}$1
Mediawiki.0340-re_multiline.class=com.atlassian.uwc.converters.mediawiki.MultilineBasicConverter
Mediawiki.0350-re_bold.java-regex='{3}\s*(.*?)\s*'{3}{replace-with}*$1*
Mediawiki.0350-re_italics.java-regex='{2}\s*(.*?)\s*'{2}{replace-with}_$1_
Mediawiki.0355-removehtmlcomments.java-regex=<!--.*?-->{replace-with}
## Spans need to be cleared before tables
#Mediawiki.0356.span-color.java-regex=<span style=\"color:([^\"]+)\">(.*?)<\/span>{replace-with}{color:$1}$2{color}
#Mediawiki.0357.span-clear.java-regex=<span[^>]+>(.*?)<\/span>{replace-with}$1

## Tables
## Output can be configured with the tableoutput property. Options are:
## "confluence", and "contentformatting" which refer to the default confluence
## table syntax and the Content Formatting Plugin's table syntax, respectively
## Note: switch the 1509 table parser option, if html tables were used as well
#Mediawiki.0359.enforce-table-columns-minimum.class=com.atlassian.uwc.converters.mediawiki.EnforceTableMinimumConverter
Mediawiki.0360.tableoutput.property=confluence
Mediawiki.0360-re_tables.class=com.atlassian.uwc.converters.mediawiki.TableConverter
Mediawiki.0361-cleannestedtables.java-regex=(\{panel\}\s*)\|\\\}{replace-with}$1
Mediawiki.0362-re_noformat.java-regex=<nowiki>((?s).*?)<\/nowiki>{replace-with}{noformat}$1{noformat}
Mediawiki.0365.tokenize-exclamation.java-regex-tokenizer=([!]){replace-with}\\!

## Handle redirect syntax before link syntax - you'll need to install the
## Redirection Plugin to use this converter
## http://www.customware.net/repository/display/AtlassianPlugins/Redirection+Plugin
#Mediawiki.0368.redirect.class=com.atlassian.uwc.converters.mediawiki.RedirectConverter
## NOTE: Images must come after tables or the whitespace gets screwed up if a table has images
## Images must come before Links or Alias handling will make Image conversion more complicated
Mediawiki.0370-re_images.class=com.atlassian.uwc.converters.mediawiki.ImageConverter
Mediawiki.0400-categories2labels.class=com.atlassian.uwc.converters.mediawiki.CategoryConverter
Mediawiki.0401-re_links_no_categories.java-regex=((?i)\[\[(category:[^\]]+)\]\]\s*)+{replace-with}
Mediawiki.0402-re_bild_to_images.java-regex=(?i)\[\[Bild:\s*([^\]\|\s]+)\s*\]\]{replace-with}!$1!
Mediawiki.0402-re_file_to_images.java-regex=(?i)\[\[file:\s*([^\]\|\s]+)\s*\]\]{replace-with}!$1!
Mediawiki.0403-re_links_to_images.java-regex=(?i)\[\[:?Media:\s*([^\]]+)\]\]{replace-with}[[^$1]]

## 404 and 405 are wikipedia interwiki links. See uwc-296.
## Uncomment to have [[wikipedia:tags]] become [tags@wikipedia].
## You'll need to add the appropriate Confluence shortcut link
## 404 and 405 must be before the namespace cleaner.
#Mediawiki.0404.allow-at-in-links.property=true
#Mediawiki.0405.wikipedia.java-regex=\[wikipedia:([^|\]]+){replace-with}[$1@wikipedia
Mediawiki.0406-mailto.java-regex=(?i)(?<=\[)(mailto:[^\] ]*) +([^\]]*){replace-with}$2|$1
Mediawiki.0407-re_links_colons.class=com.atlassian.uwc.converters.mediawiki.NamespaceCleaner
Mediawiki.0408-re_links-ws1.java-regex=\[\s+([^\]]+?)\]{replace-with}[$1]
Mediawiki.0409-re_links-ws2.java-regex=\[([^\]]+?)\s+\]{replace-with}[$1]
Mediawiki.0410-re_links.java-regex=\[\[([^|\]]*)\]\]{replace-with}[$1]
Mediawiki.0420-re_links_alias.java-regex=\[\[([^|]*)\| *([^\]]*)\]\]{replace-with}[$2|$1]
Mediawiki.0421-re_links_ws.java-regex=(\[[^|\]]+\|[^\]]+) (\]){replace-with}$1$2
Mediawiki.0430-re_links_external_alias.java-regex=\[((?:(?:https?)|(?:file)):\/\/\S+) ([^\]]*)\]{replace-with}[$2|$1]
Mediawiki.0500-re_h4.java-regex=(?s)={5}\s*(.*?)\s*={5}{replace-with}h4. $1
Mediawiki.0510-re_h3.java-regex=(?s)={4}\s*(.*?)\s*={4}{replace-with}h3. $1
Mediawiki.0520-re_h2.java-regex=(?s)={3}\s*(.*?)\s*={3}{replace-with}h2. $1
Mediawiki.0530-re_h1.java-regex=(?s)={2}\s*(.*?)\s*={2}{replace-with}h1. $1
Mediawiki.0540-re_title.java-regex=(^|\n)={1}([^=]+)={1}(\n|$){replace-with}$1h1. $2$3
#Mediawiki.0710-images_ws2underscore.class=com.atlassian.uwc.converters.mediawiki.ImageWhitespaceConverter

## Set the external-internal-links-identifier to your mediawiki's domain, if 
## your users might have included links to raw internal mediawiki urls
## example: [http://mymediawiki.org/index.php/Some_Mediawiki_Page Link] 
#Mediawiki.0790.external-internal-links.class=com.atlassian.uwc.converters.mediawiki.ExternalInternalLinksConverter
#Mediawiki.0790.external-internal-links-identifier.property=https?:\/\/wiki.someplace.com\/
Mediawiki.0800-attachments.class=com.atlassian.uwc.converters.mediawiki.AttachmentConverter
Mediawiki.0910-linebreaks.class=com.atlassian.uwc.converters.mediawiki.BreakConverter
## Lists
Mediawiki.0950-lists.java-regex=(^|\n)([*#]+)([^*#\s])([^*\n]*)(?=\n){replace-with}$1$2 $3$4
Mediawiki.0955-lists-w-bold.java-regex=(^|\n)([*#]*)([*])(\s)([^*\n]*?)( *[*]){replace-with}$1$2$4*$5*
Mediawiki.0960-definitionlists.class=com.atlassian.uwc.converters.mediawiki.DefinitionList
Mediawiki.0970-indenting.java-regex=(^|\n):+(.*){replace-with}$1$2
## SubPageList3
## If you used the SubPageList3 <splist ...> tag syntax to automatically
## list children, uncomment this converter to have them transformed to 
## confluence children macro
#Mediawiki.0980-subpagelist3-children.class=com.atlassian.uwc.converters.mediawiki.SubpagelistConverter
## Discussion Pages -> Comments 
## Use the delim properties to seperate the Discussion page into distinct 
## comments. Here are 3 examples based on wikipedia discussion pages
## Tell the CommentConverter where the Discussion pages are in relation
## to the Pages by setting a relative directory in the location property
## For more info see UWC Mediawiki Notes - Comments section
#Mediawiki.1000.discussionpages2comments.class=com.atlassian.uwc.converters.mediawiki.CommentConverter
Mediawiki.1000.discussion-delim-end-1.property=\[\[User.*?UTC\)[^\n]*
Mediawiki.1000.discussion-delim-start-2.property=\n[=]
Mediawiki.1000.discussion-delim-start-3.property=\n[----]
Mediawiki.1000.discussion-location.property=../Discussions/

## Filenames
## strip out filename extensions when importing
Mediawiki.1010-remove-extension.class=com.atlassian.uwc.converters.ChopPageExtensionsConverter
## replace single _ with space
Mediawiki.1020-underscore2space.class=com.atlassian.uwc.converters.mediawiki.ConvertUnderscoresInTitle
## set this property to true if you want underscores in links to be translated to spaces. See UWC-291
Mediawiki.1021.underscore2space-links.property=false
## HTML
## HTML: prep for the sax parser
Mediawiki.1400.amp-entity.java-regex=[&](?![#a-zA-Z0-9]{2,5};){replace-with}&amp;
Mediawiki.1410.tokenize-math-again.java-regex-tokenizer=((?s)<math>.*?<\/math>){replace-with}$1
## HTML: If you are getting sax errors complaining about namespaces that aren't
##       bound, add converters like 1420 and 1421, to remove refs to them:
##       where, t = the first letter of the tags that are having the problem
##       and, x = the namespace that isn't bound
##       1420 handles an attribute with no value. Ex: x:abc
##       1421 handles an attribute with a value. Ex: x:foo="bar"
#Mediawiki.1420.unbound-namespace-noval.java-regex=<(t[^ >]* )[^>]*?x:[^ >]*{replace-with}<$1
#Mediawiki.1421.unbound-namespace-hasval.java-regex=<(t[^ >]* )[^>]*?(x:[^">]*"[^">]*"\s*)+{replace-with}<$1
## Alternative to XmlConverter for some unnested html syntaxes
#Mediawiki.1470.optionalunnestedhtml-bold.java-regex=<\/?b>{replace-with}*
#Mediawiki.1471.optionalunnestedhtml-ital.java-regex=<\/?i>{replace-with}_
#Mediawiki.1472.optionalunnestedhtml-strike.java-regex=<\/?strike>{replace-with}-
#Mediawiki.1473.optionalunnestedhtml-tt.java-regex=<tt>(.*?)</tt>{replace-with}{{$1}}
#Mediawiki.1474.optionalunnestedhtml-strong.java-regex=<\/?strong>{replace-with}*
#Mediawiki.1475.optionalunnestedhtml-u.java-regex=<\/?u>{replace-with}+
#Mediawiki.1476.optionalunnestedhtml-header.java-regex=<h(\d)>(.*?)</h\d>{replace-with}h$1. $2
#Mediawiki.1477.optionalunnestedhtml-em.java-regex=<\/?em>{replace-with}_
#Mediawiki.1478.optionalunnestedxml-graphiz.java-regex-tokenizer=<\/?graphviz[^>]*>{replace-with}{graphviz}
#Mediawiki.1479.optionalunnestedxml-emptycell.java-regex=> (<\/td>){replace-with}>&nbsp;&nbsp;$1
#Mediawiki.1480.optionalunnestedxml-tablenoparams.java-regex=<\/?((?:table)|(?:tr)|(?:td))>{replace-with}{$1}
#Mediawiki.1481.optionalunnestedxml-tableparams.class=com.atlassian.uwc.converters.mediawiki.UnnestedTableHtmlParams
#Mediawiki.1482.optionalunnestedxml-hidetable.java-regex-tokenizer=(?s)(\{table.*?\{table\}){replace-with}$1
## HTML: set up the parser events
Mediawiki.1501.bold.xmlevent={tag}b{class}com.atlassian.uwc.converters.xml.example.BoldParser
Mediawiki.1501.strong.xmlevent={tag}strong{class}com.atlassian.uwc.converters.xml.example.BoldParser
Mediawiki.1502.italic.xmlevent={tag}i{class}com.atlassian.uwc.converters.xml.ItalicParser
Mediawiki.1503.emph.xmlevent={tag}em{class}com.atlassian.uwc.converters.xml.ItalicParser
Mediawiki.1504.underline.xmlevent={tag}u{class}com.atlassian.uwc.converters.xml.UnderlineParser
Mediawiki.1505.monospace.xmlevent={tag}tt{class}com.atlassian.uwc.converters.xml.MonoParser
Mediawiki.1506.header.xmlevent={tag}h1, h2, h3, h4, h5, h6{class}com.atlassian.uwc.converters.xml.HeaderParser
Mediawiki.1507.lists.xmlevent={tag}ol,ul,li{class}com.atlassian.uwc.converters.xml.ListParser
Mediawiki.1508.horizrule.xmlevent={tag}hr{class}com.atlassian.uwc.converters.xml.HorizRuleParser
## If you want confluence tables output, use SimpleTableParser, or if you
## want content formatting plugin output, use ContentFormattingTableParser
Mediawiki.1509.table.xmlevent={tag}table,tr,td{class}com.atlassian.uwc.converters.xml.SimpleTableParser
#Mediawiki.1509.table.xmlevent={tag}table,tr,td,th{class}com.atlassian.uwc.converters.xml.ContentFormattingTableParser
## HTML: Parse the xml document
Mediawiki.1590.xmlconverter.class=com.atlassian.uwc.converters.xml.XmlConverter
## Leading Spaces -> panel or noformat macros
## Set leading-spaces-noformat property to true if you want the output 
## to be noformat lines instead of one big panel macro. 
## Note: using noformat will be more effecient
## Set the leading-spaces-delim to 'code','noformat', or 'panel'. the noformat 
##     property must be false.  Default is panel
Mediawiki.1600.leading-spaces-noformat.property=false
#Mediawiki.1600.leading-spaces-delim.property=code
Mediawiki.1600-ws2panel.class=com.atlassian.uwc.converters.mediawiki.LeadingSpacesConverter
## For any tokenizer regex above, strip out tokens
Mediawiki.2000-detokenize.class=com.atlassian.uwc.converters.DetokenizerConverter
## Do math last, after math tags are detokenized
Mediawiki.2100-math.class=com.atlassian.uwc.converters.mediawiki.MathConverter

Note: After playing with few parameters and testing the result, I decided to leave the file unchanged. We have a vanilla installation of the MediaWiki with not much customisation and all pages live on the main namespace.

Convert pages

./run_cmdline.sh conf/my-confluenceSettings.properties conf/converter.my-mediawiki.properties

 

18 Comments

  1. Could you put a direct link to the uwc.tar file? Also, under “Update convert config file” you don’t list the updates.

    Thank you for the post! Positive experiences using the UWC are seldom found so seeing a detailed post is a welcome sight.

    1. Hi Jon,
      thanks for your comment. I added a link to the file and also added the content of converter file. I must have been desperate to get to bed after midnight when I was documenting this.

      I agree with your comment about the experience with UWC. When i started researching on this, after reading others comments, I didn’t have much faith that this will work. Especially comments from the original authors which now seems are having a business around providing this as a service.
      Anyway, I was lucky that the wiki server I am dealing with was a vanilla installation with no customisation and flat structure for all files.
      Hope you have smooth transition too

  2. I’m experiencing a lot of failures due to the nature of our content so i expect some manual work. That being said, if i can convert the bulk if our 26000 pages I’ll be happy. I’m also doing a confluence SSL conversion so another of your posts is a time saver. Keep up the great work!

    1. that’s a lot of pages. Good luck with the migration. I hope at least they are in a sort of logical structure. I ended up with 6000 pages in a big bucket. It will be a long process to go through them.
      Have you figured out how to bring the original creators names across to confluence?

      1. also good luck with the ssl. It should be straight forward once you have the instruction 🙂

      2. I thought i had but they all came across as me! That’s not as critical in our setup as this is mostly legacy content. It looks like i got around 15000 pages in. We have a lot of in the failed pages & the converter chokes looking for a closing tag. Is there a converter option for that?

  3. Works pretty well for Mediawiki 1.11.1 to Confluence 6.2.0 too. Thanks Saeed for this post!

    Just some minor notes.

    In the part about “Export MediaWiki” – “Prepare export config file” you indicate “values and un-comment” the “history-prefix” but in the actual config below that it’s called “history-suffix” not “history-prefix”.

    I’m also not getting my attachments to be uploaded. For pages with an attachment I’m getting:
    2017-07-24 18:20:15,506 INFO [main] – no attachment files found in directory: =/home/user/wikiconvert/my-wiki/images/

    Any idea what this could be? I’m using a server to run the uwc on that connects to confluence (they’re not on the same server). Pages import works fine, just not the images (attachements & images) part.

    1. Hi Graham,
      Thanks for the read and the note on pre-fix issue. It should be suffix. updated it. thanks.

      The image directory should be on the same machine that you run your script on. You need to update confluenceSettings.properties file to point to the right image directory.

  4. hi i need help

    2017-08-29 16:40:47,338 FATAL [main] – Could not connect to database: wikidb_conversao. Check database settings: url, name, user, and pass.
    2017-08-29 16:40:47,338 ERROR [main] – Problem while exporting
    java.sql.SQLException: No suitable driver found for a1-orinovo1.srv/wikidb_conversao

    1. it is hard to help with limited information you have provided. Is it after you run the export command on your MediaWiki server or running the import script on your confluence server?

      If it is the former, then you need to check your database settings in first 15 lines of exporter config file.

      Also second error suggests you might have a db different than Mysql server. In that case you need a driver for that. Follow the link on the import config file. Our db was mysql and it is the most straight forward scenario. Unfortunately I won’t be of much help if your wiki uses a database other than mysql. You can follow the link from first few lines of the exporter config file and see if that helps.

  5. I tried using the same converter.mediawiki.properties that you used and I ended up getting a error “CONVERTER_ERROR Error while parsing xml. Skipping”
    I do have internal links in mediawiki that point to different pages how can we configure to get those internal links working in confluence (which is one confluence page should be pointing to another confluence page internally)

  6. Never Mind I did figure out how to solve the above issue but there is a new issue now, all the code blocks in mediawiki are not showing properly let says I have a code in media wiki as

    use Traders;

    select * from SendBidEmailConfig

    — to disable EmailService
    exec ActivateObj 1675220 — UseEmailServiceBit = 0

    — to enable EmailService (after disruption is resolved)
    exec ActivateObj 1507174 — UseEmailServiceBit = 1

    then this code should ideally be shown in the panel macro in Confluence, could you suggest the properties that should be set to make this happen.

  7. Hi Saed,

    I have my mediawiki attachments getting stored in the database rather than in filesystem.Is there anyway I can import this to confluence?

    Also for me after giving import in uwc its stuck at 80 perc but I can see pages getting created in my output folder.I have around 20000 content pages and 72000 total pages.

  8. Hi Saeed,

    Very good stuff on this topic. I don’t find much information in UWC from Atlassian community. Our migration ask is little different. We want to migrate page by page from MediaWiki to Confluence. We don’t want bulk migration as we want the owner of a MediaWiki page choose his/her own page for migration. I see that we can mention the sql query in the exporter.mediawiki.properties to fetch a page content by page_id from the MediaWiki MySql DB.

    But i have a concern and that one is , UWC calls the Confluence APIs that are implemented using XML-RPC technology. These are old APIs of Confluence. Now Atlassian has clearly mentioned that these XML-RPC APis are deprecated and recommends to use the REST APIs instead which are new. If we do so then we need to modify the source code of UWC to make calls to the REST APIs. But that will need a change in the payload structure as that need to be sent on JSON. Currently i don’t know how much effort will that be.

    Does the UWC take care of migrating hyper links, attachments and comments of a page to Confluence?

    Thanks
    Ayaskant

    1. hi Ayaskant,

      Attachments transferred with no problem. Most of the hyperlinks also came through with no issues. I have noticed some problems around the parsing of the hyperlinks to external links. They are just shown as html codes rather than the links.

      Some of them look like this:

      unnamed |

      [| [http://mediawiki.org|http://mediawiki.org]

      ] |

      Most of the transferred pages were too old, outdated and for archive purposes so I didn’t bother spending more time on getting them right.

      Good luck with the migration 🙂

  9. Hi Saeed,

    How does UWC fetch the attachments? I see first it exports a page content from MediaWiki DB to a text file with *.txt extension. Where does it store the attachments and images of a page? As far as i know MediaWiki does not store attachments and images in it’s DB. They are store as files on disk or mounted file system.

    Ayaskant

Leave a Reply