Notes below were as written during development of the adaptor in a bit of a effort to reduce the number of comments in the files although there are still a lot of comments in some areas.
This is being built using the Postgresql adaptor as a basis and using
go-sql-driver/mysql. It's noted that
go-mysql-org and in particular
canal look like a good
alternative though. NOTE: We switched to go-mysql-org/go-mysql
for
replication/tailing.
-
Install client and server
sudo pkgin install mysql-client sudo pkgin install mysql-server
-
Edit
/opt/pkg/etc/my.cnf
and pointdata-dir
somewhere (I opted for~/Database/mysql
). Addsecure_file_priv = "/tmp"
too. -
Run
mysql_install_db
-
Run it
cd /opt/pkg ; /opt/pkg/bin/mysqld_safe &
Alternatively (because right now only 5.6 is available via Pkgsrc),
obtain a DMG of the community server for MacOS.
Version 5.7.31 for macos10.14
is available and works on Monterey.
You'll need to change the root password to empty/blank for the tests though:
SET PASSWORD FOR 'root'@'localhost' = PASSWORD('');
Postgresql has an ARRAY data type so for each array also pulls the element type within
When a table column [...] the respective information schema view only contains ARRAY in the column data_type.
This happens under the iterateTable
function. Note that here the c
is a sql
variable; Not to be confused with the c
variable outside of this; Yay for
naming. If we want to run these queries manually the only bits that change are
the %v
. E.g: ...WHERE c.table_schema = 'public' AND c.table_name = 't_random'
.
The query will output something like this:
column_name | data_type | element_type
-------------+-----------+--------------
s | integer |
md5 | text |
(2 rows)
Comparing differences from Postgresql using these sources:
- https://www.postgresql.org/docs/9.6/datatype.html
- https://dev.mysql.com/doc/refman/5.7/en/data-types.html
There are three code areas that need changing:
colXXX
constants at top of adaptor_test.gosetupData
in adaptor_test.goTestReadComplex
in reader_test.go
Some comments:
- No ARRAY in MySQL
- Timestamp assumes UTC
- The
--colbytea
bits are all just comments so it's easier to match things up - On that note I'm re-ording things so it's consistent
- Inserting binary can be done like this
- No BIGSERIAL, etc
- Geometry is there, just a bit different
- No CIDR
- ENUM has to be done a bit differently, no need to CREATE TYPE
I'm currently developing with a ye-olde 5.6 version so it doesn't like:
- ENUM
- SET
- VARBINARY
- JSON
Remove newline for now for text
:
--- FAIL: TestReadComplex (0.01s)
reader_test.go:117: Expected coltext of row to equal this is \n extremely important (string), but was this is
extremely important (string)
Float MySQL also supports this optional precision specification, but the precision value in FLOAT(p) is used only to determine storage size. A precision from 0 to 23 results in a 4-byte single-precision FLOAT column. A precision from 24 to 53 results in an 8-byte double-precision DOUBLE column.
Tried using go:embded
and inserting the blob data as a string, but couldn't
get it to work. I.e.
// For testing Blob
//go:embed logo-mysql-170x115.png
blobdata string
And then:
fmt.Sprintf(`INSERT INTO %s VALUES (... '%s');`, blobdata)
Tried with '%s'
(didn't work at all) and %q
which inserted, but didn't
extract correctly.
In the end I used LOAD_FILE
to insert (like you are probably meant to), but
would be nice to do directly from Go.
Ultimately I'll probably remove this test.
This is a handy package: https://github.com/paulmach/orb
Need to think about how we want to handle spatial types:
- Decode from WKB in reader.go before we get to testing OR
- Leave as WKB, decode for the test only OR
- Leave as WKB, don't decode at all, instead encode the test data to match
Another good option: https://github.com/twpayne/go-geom
Struggling. I think I'd like to take the "raw" data and decode for the test.
Another option: https://github.com/paulsmith/gogeos
From here: https://dev.mysql.com/doc/refman/5.6/en/gis-data-formats.html
Internally, MySQL stores geometry values in a format that is not identical to either WKT or WKB format. (Internal format is like WKB but with an initial 4 bytes to indicate the SRID.)
For the WKB part, these MySQL-specific considerations apply:
The byte-order indicator byte is 1 because MySQL stores geometries as little-endian values.
MySQL supports geometry types of Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Other geometry types are not supported.
Maybe we should just strip the SRID? Then we'd be left with just wkb
Getting ahead a bit, but need to think about how we transfer things MySQL to MySQL and MySQL to X.
I managed to get things working with go-geom and reading the MySQL data as hex. go-geom has handy wkbhex functions that Orb doesn't. It's possible we fell foul of this with Orb:
Scanning directly from MySQL columns is supported. By default MySQL returns geometry data as WKB but prefixed with a 4 byte SRID. To support this, if the data is not valid WKB, the code will strip the first 4 bytes, the SRID, and try again. This works for most use cases.
Emphasis mine.
I've had to strip off the SRID to get things to work with go-geom. Going to Hex allows us to do that.
TODO (write words here)
This is probably very similar to Blob. At the moment we store a Hex value and for the purposes of testing and comparison we then convert that to a string representation of the hex value on read.
Postgresql uses this format:
query := fmt.Sprintf("INSERT INTO %v (%v) VALUES (%v);", m.Namespace(), strings.Join(keys, ", "), strings.Join(placeholders, ", "))
log.Infoln(query1)
// INSERT INTO writer_insert_test.simple_test_table (id, colvar, coltimestamp) VALUES ($1, $2, $3);
_, err := s.Exec(query, data...)
I.e. takes advantage of Postgresql parameters. MySQL... doesn't work the same. Can't find it documented, but can do this for mysql:
INSERT INTO writer_insert_test.simple_test_table (id, colvar, coltimestamp) VALUES (?, ?, ?);
Maybe we can also use named values with a :colon
prefix? But probably we don't need to.
Seeing some odd switching around though:
INFO[0000] INSERT INTO writer_insert_test.simple_test_table (id, colvar, coltimestamp) VALUES (?, ?, ?);
INFO[0000] INSERT INTO writer_insert_test.simple_test_table (id, colvar, coltimestamp) VALUES (?, ?, ?);
INFO[0000] INSERT INTO writer_insert_test.simple_test_table (coltimestamp, id, colvar) VALUES (?, ?, ?);
Needs to be ordered? Maybe not, seems it adjusts the order of the data too:
INFO[0000] INSERT INTO writer_insert_test.simple_test_table (id, colvar, coltimestamp) VALUES (?, ?, ?);
INFO[0000] [7 hello world 2021-12-16 13:14:20.575528 +0000 UTC]
INFO[0000] INSERT INTO writer_insert_test.simple_test_table (coltimestamp, id, colvar) VALUES (?, ?, ?);
INFO[0000] [2021-12-16 13:14:20.57585 +0000 UTC 8 hello world]
It's inserting data fine:
mysql> select * from simple_test_table;
+----+-------------+---------------------+
| id | colvar | coltimestamp |
+----+-------------+---------------------+
| 0 | hello world | 2021-12-16 13:14:21 |
| 1 | hello world | 2021-12-16 13:14:21 |
| 2 | hello world | 2021-12-16 13:14:21 |
| 3 | hello world | 2021-12-16 13:14:21 |
| 4 | hello world | 2021-12-16 13:14:21 |
| 5 | hello world | 2021-12-16 13:14:21 |
| 6 | hello world | 2021-12-16 13:14:21 |
| 7 | hello world | 2021-12-16 13:14:21 |
| 8 | hello world | 2021-12-16 13:14:21 |
| 9 | hello world | 2021-12-16 13:14:21 |
+----+-------------+---------------------+
10 rows in set (0.00 sec)
I was seeing this error:
After sorting the parameter issue (?
) I was then left with this failure:
--- FAIL: TestInsert (0.11s)
writer_test.go:93: Error on test query: sql: Scan error on column index 2, name "coltimestamp": unsupported Scan, storing driver.Value type []uint8 into type *time.Time
Reading more on go-sql-driver/mysql I found:
The default internal output type of MySQL
DATE
andDATETIME
values is[]byte
which allows you to scan the value into a[]byte
,string
orsql.RawBytes
variable in your program.However, many want to scan MySQL
DATE
andDATETIME
values intotime.Time
variables, which is the logical equivalent in Go toDATE
andDATETIME
in MySQL. You can do that by changing the internal output type from[]byte
totime.Time
with the DSN parameterparseTime=true
.
And so sticking that on in TestInsert
was enough:
mysql://root@tcp(localhost)/%s?parseTime=true
I think we can assume the SRID is 0:
Another rough note to self... do we need to look at using
interpolateParams=true
?
We switched to go-mysql-org/go-mysql from go-sql-driver/mysql because it has replication support. There are two parts to it:
- Replication
- Canal
AFAICT Canal is more a higher level abstraction on the Replication stuff. It uses that package. So possibly what we want to use is the Replication package as the brief example given looks close to what we want to do and what MySQL does.
Can't run SHOW MASTER STATUS;
on Compose to get what we need for replication.
Well, not as is anyway, will need additional grants.
Can build a dummy/simple app to test out the replication package:
package main
import (
"github.com/go-mysql-org/go-mysql/replication"
"github.com/go-mysql-org/go-mysql/mysql"
"github.com/go-mysql-org/go-mysql/client"
"net/url"
"os"
"context"
"fmt"
"strconv"
)
var (
dsn = "mysql://admin:[REDACTED]@aws-eu-west-1-portal.4.dblayer.com:15788/compose"
)
func main() {
// Could add a dns.Parse to the driver
parsedDSN, _ := url.Parse(dsn)
host := parsedDSN.Hostname()
port := parsedDSN.Port()
portInt, _ := strconv.Atoi(port)
user := parsedDSN.User.Username()
// stupid password makes things harder
pass, _ := parsedDSN.User.Password()
path := parsedDSN.Path[1:]
scheme := parsedDSN.Scheme
// Need to get the log and position. Use driver or client? I guess Transporter client properly, but
// for testing use package client directly?
conn, _ := client.Connect(fmt.Sprintf("%s:%s", host, port), user, pass, path)
r, _ := conn.Execute("SHOW MASTER STATUS")
binFile, _ := r.GetString(0, 0)
binPosition, _ := r.GetInt(0, 1)
cfg := replication.BinlogSyncerConfig {
ServerID: 100,
Flavor: scheme,
Host: host,
Port: uint16(portInt),
User: user,
Password: pass,
}
syncer := replication.NewBinlogSyncer(cfg)
streamer, _ := syncer.StartSync(mysql.Position{binFile, uint32(binPosition)})
// OR
//gtidSet, _ := mysql.ParseMysqlGTIDSet("a852989a-1894-4fcb-a060-a4aaaf06b9f0:1-36")
//streamer, _ := syncer.StartSyncGTID(gtidSet)
for {
ev, _ := streamer.GetEvent(context.Background())
ev.Dump(os.Stdout)
}
// Then need to start handling things here a bit differently.
}
Also, reading through the Postgresql logical decoding so can understand what the Postgresql Tailer is looking at versus what we get from the binlog, etc.
How does Postgresql get changes since last call? Magic inside Postgresql it seems, you only get the changes once.
If using the Pkgsrc MySQL then need to edit /opt/pkg/etc/my.cnf
and ensure:
log_bin
is uncommentedserver_id
is uncommented and has a value
to test tailing.
Need 5.7+ MySQL as 5.6 gives:
=== QueryEvent ===
Date: 2022-02-22 15:40:24
Log position: 138769
Event size: 197
Slave proxy ID: 1
Execution time: 0
Error code: 0
Schema: test
Query: INSERT INTO recipes (recipe_id, recipe_name) VALUES (1,"Tacos"), (2,"Tomato Soup"), (3,"Grilled Cheese")
I.e. under a QueryEvent
and not a RowsEvent
If using community server install...
sudo mkdir /usr/local/mysql/etc
sudo vim /usr/local/mysql/etc/my.cnf
[mysqld]
log_bin
server_id = 100
secure_file_priv = "/tmp"
Need at least that in to run tailing tests, etc.
The binlog appears to have two entries a before vs after:
=== UpdateRowsEventV2 ===
Date: 2022-02-22 19:49:19
Log position: 2716787
Event size: 71
TableID: 299
Flags: 3
Column count: 3
Values:
--
0:11
1:"Superwoman"
2:"2022-02-22 19:49:18"
--
0:11
1:"hello"
2:"2022-02-22 19:49:19"
mysql> select * from recipes;
+-----------+----------------+---------------+
| recipe_id | recipe_name | recipe_rating |
+-----------+----------------+---------------+
| 1 | Tacos | NULL |
| 2 | Tomato Soup | NULL |
| 3 | Grilled Cheese | NULL |
+-----------+----------------+---------------+
3 rows in set (0.00 sec)
mysql> update recipes set recipe_name = 'Nachos' where recipe_id = 1;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from recipes;
+-----------+----------------+---------------+
| recipe_id | recipe_name | recipe_rating |
+-----------+----------------+---------------+
| 1 | Nachos | NULL |
| 2 | Tomato Soup | NULL |
| 3 | Grilled Cheese | NULL |
+-----------+----------------+---------------+
3 rows in set (0.00 sec)
Results in:
[[1 Tacos <nil>] [1 Nachos <nil>]]
How does Transporter handle this? Well, what does Postgresql do?
compose=> update recipes set recipe_name = 'Nachos' where recipe_id = 1;
compose=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
lsn | xid | data
-----------+-----+------------------------------------------------------------------------------------------------------------------------
0/6000108 | 497 | BEGIN 497
0/6000108 | 497 | table public.recipes: UPDATE: recipe_id[integer]:1 recipe_name[character varying]:'Nachos' recipe_rating[integer]:null
0/60002D0 | 497 | COMMIT 497
(3 rows)
So just one row from Postgresql
So for MySQL we need to skip the first row if it's an update. Gah.