Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meterialize: Getting 'unexpected error' when making a workflow with sql join query #14749

Closed
Amila-Rukshan opened this issue Dec 11, 2023 · 4 comments · Fixed by #14752
Closed

Comments

@Amila-Rukshan
Copy link

Amila-Rukshan commented Dec 11, 2023

Overview of the Issue

I have a Vitess setup running on a local k8s setup. It's simple, no shards. I have two tables car and booking.
I have a simple join between the above two tables and it just works fine when executing as a SQL query via a client.
when I try to materialize it shows the workflow status message as an unexpected error.

Reproduction Steps

But I got a workflow message saying some issue in the query:

{
	"Workflow": "car_search",
	"SourceLocation": {
		"Keyspace": "car-listing-marketplace",
		"Shards": [
			"-"
		]
	},
	"TargetLocation": {
		"Keyspace": "car-listing-marketplace",
		"Shards": [
			"-"
		]
	},
	"MaxVReplicationLag": 4,
	"MaxVReplicationTransactionLag": 1702312270,
	"Frozen": false,
	"ShardStatuses": {
		"-/zone1-2186216262": {
			"PrimaryReplicationStatuses": [
				{
					"Shard": "-",
					"Tablet": "zone1-2186216262",
					"ID": 12,
					"Bls": {
						"keyspace": "car-listing-marketplace",
						"shard": "-",
						"filter": {
							"rules": [
								{
									"match": "car_search",
									"filter": "SELECT car.id, car.make, car.model, car.year, car.mileage, car.price, JSON_ARRAYAGG(JSON_OBJECT('start', booking.start_time, 'end', booking.end_time)) AS booked_times FROM car LEFT JOIN booking ON car.id = booking.car_id GROUP BY car.id"
								}
							]
						}
					},
					"Pos": "",
					"StopPos": "",
					"State": "Running",
					"DBName": "vt_car-listing-marketplace",
					"TransactionTimestamp": 0,
					"TimeUpdated": 1702312266,
					"TimeHeartbeat": 0,
					"TimeThrottled": 0,
					"ComponentThrottled": "",
					"Message": "unexpected: select car.id, car.make, car.model, car.`year`, car.mileage, car.price, JSON_ARRAYAGG(json_object('start', booking.start_time, 'end', booking.end_time)) as booked_times from car left join booking on car.id = booking.car_id group by car.id",
					"Tags": "",
					"WorkflowType": "Materialize",
					"WorkflowSubType": "None",
					"CopyState": null,
					"RowsCopied": 0
				}
			],
			"TabletControls": null,
			"PrimaryIsServing": true
		}
	},
	"SourceTimeZone": "",
	"TargetTimeZone": ""
}

This is the command I used:

vtctlclient Materialize -- '{"workflow": "car_search", "source_keyspace": "car-listing-marketplace", "target_keyspace": "car-listing-marketplace", "table_settings": [{"target_table": "car_search", "source_expression": "SELECT car.id, car.make, car.model, car.year, car.mileage, car.price, JSON_ARRAYAGG(JSON_OBJECT('\''start'\'', booking.start_time, '\''end'\'', booking.end_time)) AS booked_times FROM car LEFT JOIN booking ON car.id = booking.car_id GROUP BY car.id" }]}'

Schemas used:

CREATE TABLE car (
    id INT PRIMARY KEY AUTO_INCREMENT,
    make VARCHAR(255) NOT NULL,
    model VARCHAR(255) NOT NULL,
    year INT NOT NULL,
    mileage INT,
    price DECIMAL(10, 2) NOT NULL,
    description TEXT,
    image_url VARCHAR(255)
);

CREATE TABLE booking (
    id INT PRIMARY KEY AUTO_INCREMENT,
    user_id INT NOT NULL,
    car_id INT NOT NULL,
    start_time DATETIME NOT NULL,
    end_time DATETIME NOT NULL,
    created_at DATETIME NOT NULL,
    cancelled_at DATETIME,
    FOREIGN KEY (user_id) REFERENCES user(id),
    FOREIGN KEY (car_id) REFERENCES car(id)
);

// output table
CREATE TABLE car_search (
    id INT PRIMARY KEY,
    model VARCHAR(255),
    make VARCHAR(255),
    year INT,
    mileage INT,
    price DECIMAL(10, 2),
    description TEXT,
    image_url VARCHAR(255),
    booked_times JSON
);

Binary Version

Version: 18.0.0-SNAPSHOT (Git revision 1659386f4368ca42f1ce9841d02720ca0f545a54 branch 'main') built on Tue Aug 15 23:22:53 +0530 2023 by [email protected] using go1.20.6 darwin/amd64

Operating System and Environment details

-
Darwin 23.1.0
x86_64

Log Fragments

No response

@Amila-Rukshan Amila-Rukshan added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Dec 11, 2023
@Amila-Rukshan Amila-Rukshan changed the title Vitess Meterialize: Gerring 'unexpected error' when doing SQL join query Vitess Meterialize: Getting 'unexpected error' when making a workflow with sql join query Dec 11, 2023
@Amila-Rukshan Amila-Rukshan changed the title Vitess Meterialize: Getting 'unexpected error' when making a workflow with sql join query meterialize: Getting 'unexpected error' when making a workflow with sql join query Dec 11, 2023
@mattlord
Copy link
Contributor

Thanks, @Amila-Rukshan !

As discussed in Slack, this is because the query is using some unsupported expressions -- a JOIN. So I think the real bug/issue here is that:

  1. We don't have the limitations of Materialize clearly documented anywhere that I've found (ideally here: https://vitess.io/docs/reference/vreplication/materialize)
  2. The error message is not terribly helpful

We can use this issue for those items.

@mattlord mattlord self-assigned this Dec 11, 2023
@mattlord mattlord added Component: VReplication and removed Needs Triage This issue needs to be correctly labelled and triaged labels Dec 11, 2023
@derekperkins
Copy link
Member

derekperkins commented Dec 11, 2023

Here are some things we have had to workaround in the past. I haven't tested all of them with all the latest versions, so some of them might be possible now:

@mattlord
Copy link
Contributor

mattlord commented Dec 12, 2023

In the draft PR, the error message is now:

"Message": "unsupported from expression (*sqlparser.JoinTableExpr) in query: SELECT car.id, car.make, car.model, car.year, car.mileage, car.price, JSON_ARRAYAGG(JSON_OBJECT('start', booking.start_time, 'end', booking.end_time)) AS booked_times FROM car LEFT JOIN booking ON car.id = booking.car_id GROUP BY car.id",

@Amila-Rukshan what do you think of that?

And FWIW, this slimmed down query worked fine:

vtctlclient Materialize -- '{"workflow": "car_search", "source_keyspace": "commerce", "target_keyspace": "commerce", "table_settings": [{"target_table": "car_search", "source_expression": "SELECT id, make, model, year, mileage, price, JSON_ARRAYAGG(JSON_OBJECT('\''start'\'', start_time, '\''end'\'', end_time)) AS booked_times FROM car GROUP BY id" }]}'

@Amila-Rukshan
Copy link
Author

@mattlord, The message is now clearer on what's going on.
Thanks for the updated query as well. I will try it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants