This tool allows you to take data from an MySQL server (only tested on 5.x) and write a PostgresSQL compatable (8.2 or higher) dump file or pipe it directly into your running PostgreSQL server (8.2 or higher).
Attention!
Currently there is no support for importing spatial data from MySQL.
If you're like me you don't like random stuff polluting your python install. Might I suggest installing this in an virtualenv?
> virtualenv --no-site-packages ~/envs/py-mysql2pgsql > source ~/envs/py-mysql2pgsql/bin/activate
- Python 2.7
- MySQL-python
- psycopg2
- PyYAML
- termcolor (unless you're installing on windows)
- pytz
I have only done limited testing on this platform using Python 2.7. Here are the driver dependencies for windows, install these before attempting to install py-mysql2pgsql or it will fail.
All dependencies should be automatically installed when installing the app the following ways
> pip install py-mysql2pgsql
> git clone git://github.com/philipsoutham/py-mysql2pgsql.git > cd py-mysql2pgsql > python setup.py install
Looking for help?
> py-mysql2pgsql -h usage: py-mysql2pgsql [-h] [-v] [-f FILE] Tool for migrating/converting data from mysql to postgresql. optional arguments: -h, --help show this help message and exit -v, --verbose Show progress of data migration. -f FILE, --file FILE Location of configuration file (default: mysql2pgsql.yml). If none exists at that path, one will be created for you.
Don't worry if this is your first time, it'll be gentle.
> py-mysql2pgsql No configuration file found. A new file has been initialized at: mysql2pgsql.yml Please review the configuration and retry...
As the output suggests, a file was created at mysql2pgsql.yml for you to edit. For the impatient, here is what the file contains.
# a socket connection will be selected if a 'socket' is specified # also 'localhost' is a special 'hostname' for MySQL that overrides the 'port' option # and forces it to use a local socket connection # if tcp is chosen, you can use compression # if use a schema, use colon like this 'mydatabase:schema', else will import to schema 'public' # if sameschame is true, the 'schema' of 'mydatabase:schema' will use mysql.database # if getdbinfo is true, only get mysql database satistics info, not convert anything mysql: hostname: localhost port: 3306 socket: /tmp/mysql.sock username: mysql2psql password: database: mysql2psql_test compress: false getdbinfo: false destination: # if file is given, output goes to file, else postgres file: postgres: hostname: localhost port: 5432 username: mysql2psql password: database: mysql2psql_test sameschame: true # if only_tables is given, only the listed tables will be converted. leave empty to convert all tables. #only_tables: #- table1 #- table2 # if exclude_tables is given, exclude the listed tables from the conversion. #exclude_tables: #- table3 #- table4 # if supress_data is true, only the schema definition will be exported/migrated, and not the data supress_data: false # if supress_ddl is true, only the data will be exported/imported, and not the schema supress_ddl: false # if force_truncate is true, forces a table truncate before table loading force_truncate: false # if timezone is true, forces to append/convert to UTC tzinfo mysql data timezone: false # if index_prefix is given, indexes will be created whith a name prefixed with index_prefix index_prefix: # For Greenplum Database(base on PSQL) , advise this true # if is_gpdb is true, ignore INDEXES(not PRIMARY KEY INDEXE), CONSTRAINTS, AND TRIGGERS is_gpdb: false
Pretty self explainitory right? A couple things to note, first if destination -> file is populated all output will be dumped to the specified location regardless of what is contained in destination -> postgres. So if you want to dump directly to your server make sure the file value is blank.
Say you have a MySQL db with many, many tables, but you're only interested in exporting a subset of those table, no problem. Add only the tables you want to include in only_tables or tables that you don't want exported to exclude_tables.
Other items of interest may be to skip moving the data and just create the schema or vice versa. To skip the data and only create the schema set supress_data to true. To migrate only data and not recreate the tables set supress_ddl to true; if there's existing data that you want to drop before importing set force_truncate to true. force_truncate is not necessary when supress_ddl is set to false.
Note that when migrating, it's sometimes possible to knock your sequences out of whack. When this happens, you may get IntegrityErrors about your primary keys saying things like, "duplicate key value violates unique constraint." See this page for a fix
Due to different naming conventions in mysql an postgrsql, there is a chance that the tool generates index names that collide with table names. This can be circumvented by setting index_prefix.
One last thing, the --verbose flag. Without it the tool will just go on it's merry way without bothering you with any output until it's done. With it you'll get a play-by-play summary of what's going on. Here's an example.
> py-mysql2pgsql -v -f mysql2pgsql.yml START PROCESSING table_one START - CREATING TABLE table_one FINISH - CREATING TABLE table_one START - WRITING DATA TO table_one 24812.02 rows/sec [20000] FINISH - WRITING DATA TO table_one START - ADDING INDEXES TO table_one FINISH - ADDING INDEXES TO table_one START - ADDING CONSTRAINTS ON table_one FINISH - ADDING CONSTRAINTS ON table_one FINISHED PROCESSING table_one START PROCESSING table_two START - CREATING TABLE table_two FINISH - CREATING TABLE table_two START - WRITING DATA TO table_two FINISH - WRITING DATA TO table_two START - ADDING INDEXES TO table_two FINISH - ADDING INDEXES TO table_two START - ADDING CONSTRAINTS ON table_two FINISH - ADDING CONSTRAINTS ON table_two FINISHED PROCESSING table_two
Since there is not a one-to-one mapping between MySQL and PostgreSQL data types, listed below are the conversions that are applied. I've taken some liberties with some, others should come as no surprise.
MySQL | PostgreSQL |
---|---|
char | character |
varchar | character varying |
tinytext | text |
mediumtext | text |
text | text |
longtext | text |
tinyblob | bytea |
mediumblob | bytea |
blob | bytea |
longblob | bytea |
binary | bytea |
varbinary | bytea |
bit | bit varying |
tinyint | smallint |
tinyint unsigned | smallint |
smallint | smallint |
smallint unsigned | integer |
mediumint | integer |
mediumint unsigned | integer |
int | integer |
int unsigned | bigint |
bigint | bigint |
bigint unsigned | numeric |
float | real |
float unsigned | real |
double | double precision |
double unsigned | double precision |
decimal | numeric |
decimal unsigned | numeric |
numeric | numeric |
numeric unsigned | numeric |
date | date |
datetime | timestamp without time zone |
time | time without time zone |
timestamp | timestamp without time zone |
year | smallint |
enum | character varying (with check constraint) |
set | ARRAY[]::text[] |
Not just any valid MySQL database schema can be simply converted to the PostgreSQL. So when you end with a different database schema please note that:
- Most MySQL versions don't enforce NOT NULL constraint on date and enum feilds. Because of that NOT NULL is skipped for this types. Here's an excuse for the dates: http://bugs.mysql.com/bug.php?id=59526.
I ported much of this from an existing project written in Ruby by Max Lapshin over at https://github.com/maxlapshin/mysql2postgres. I found that it worked fine for most things, but for migrating large tables with millions of rows it started to break down. This motivated me to write py-mysql2pgsql which uses a server side cursor, so there is no "paging" which means there is no slow down while working it's way through a large dataset.