`.eq_any([["foo"]])` on `SqlType = Array<Text>` causes runtime error on Postgres #4349

Ten0 · 2024-11-19T17:27:41Z

Setup

Versions

Diesel: ce6fa0b
Database: Postgres

Feature Flags

diesel: postgres

Problem Description

array.eq_any(array_of_array) doesn't work on Postgres, because Postgres doesn't have such a thing as "array of array". It has 2D array but "2-D array is a different concept".
This causes an error, so we should probably use the non-specialized ANSI IN (...) when dealing with arrays.

What are you trying to accomplish?

table! {
    posts (id) {
        id -> Int4,
        // ...
        tags -> Array<Text>,
    }
}

#[test]
#[cfg(feature = "postgres")]
fn filter_array_by_in() {
    use crate::schema::posts::dsl::*;

    let connection: &mut PgConnection = &mut connection();
    let tag_combinations_to_look_for: &[&[&str]] = &[&["foo"], &["foo", "bar"], &["baz"]];
    let result: Vec<i32> = posts
        .filter(tags.eq_any(tag_combinations_to_look_for))
        .select(id)
        .load(connection)
        .unwrap();
    assert_eq!(result, &[] as &[i32]);
}

What is the actual output?

called `Result::unwrap()` on an `Err` value: DatabaseError(Unknown, "could not find array type for data type text[]")

What is the expected output?

Either of:

test passes, serializing as IN ($1, $2, $3) with $1 = ["foo"]...
Doesn't compile, requiring an explicit writing (e.g. .eq_any(abc).ansi()) to tell it to use ANSI serialization

ATM AFAICT there's no way to even tell it to use ANSI serialization because of bounds here:

diesel/diesel/src/expression/array_comparison.rs

Lines 100 to 103 in ce6fa0b

    
           impl<T, U, DB> QueryFragment<DB, sql_dialect::array_comparison::AnsiSqlArrayComparison> for In<T, U> 
        
           where 
        
               DB: Backend 
        
                   + SqlDialect<ArrayComparison = sql_dialect::array_comparison::AnsiSqlArrayComparison>,

which forbids the use of the ANSI implementation on Pg Backend (which seems weird since ANSI should always work, no?)

(NB: All of this is useful only to the extent to which indexing will work and this behaves reasonably even if there are 500 different values in there. I recall that we did tend to prefer = ANY maybe because of this but I may be mistaken here, was it only for statement caching ?)

Steps to reproduce

Ten0@d6c93fd

or dedicated working branch to fixing this:

cd diesel
git remote add Ten0 https://github.com/Ten0/diesel.git
git checkout array_eq_any
cd diesel_tests
DATABASE_URL="..." cargo test --no-default-features --features "postgres" -- filter_array_by_in --nocapture

The text was updated successfully, but these errors were encountered:

Ten0 · 2024-11-19T18:03:52Z

= ANY (VALUES (ARRAY['foo']), (ARRAY['bar'])) works, so it looks like as a VALUES subquery in the Many specialization would also work (and maybe generate query plans that match better the = ANY (ARRAY) case but not sure)...
The main question is: how to detect that you actually have an Array of Array.
I thought about:

Using the type metadata (array_oid of array is zero): we can't ATM because that requires having a connection (for MetadataLookup), which we don't have for the ToSql AST pass (notably for debug reasons)
Adding that information to the SqlType trait.
a. We could have that as a type like is done for IsNullable but that may not be worth the additional complexity (and/or breaking changes), considering that "associated type defaults are unstable (Tracking issue for RFC 2532, "Associated type defaults" rust-lang/rust#29661)"... but maybe that's fine because that is mostly (always?) derived anyway?
b. It doesn't necessarily need to be a type though: even if it's just an associated const we could use it in the QueryFragment implementation. If it's just an associated const we can't use that to restrict compilation of additional array nesting though, because "associated const equality is incomplete (Tracking Issue for associated const equality rust-lang/rust#92827)", so if it's meant to actually be a type then we probably shouldn't do this...

This commit fixes an issue that allowed to trigger a runtime error by passing an array of arrays to `.eq_any()`. We rewrite queries containing `IN` expressions to `= ANY()` on postgresql as that more efficient (allows caching + allows binding all values at once). That's not possible for arrays of arrays as we do not support nested arrays yet. The fix introduces an associated constant for the `SqlType` trait that tracks if a SQL type is a `Array<T>` or not. This information is then used to conditionally generate the "right" SQL. By defaulting to `false` while adding this constant we do not break existing code. We use the derive to set the right value based on the type name.

Ten0 · 2024-11-19T19:24:19Z

A workaround that uses the VALUES (...), (...) writing:

use {
	diesel::{
		backend::Backend,
		expression::{
			array_comparison::{AsInExpression, MaybeEmpty},
			AsExpression, Expression, TypedExpressionType, ValidGrouping,
		},
		prelude::*,
		query_builder::{AstPass, QueryFragment, QueryId},
		serialize::ToSql,
		sql_types::{HasSqlType, SingleValue, SqlType},
	},
	std::marker::PhantomData,
};

/// Fixes `eq_any` usage with arrays of arrays:
/// https://github.com/diesel-rs/diesel/issues/4349
///
/// Usage: `column.eq_any(AsSingleColumnValueSubselect(iterator_of_arrays))`
pub struct AsSingleColumnValueSubselect<I>(pub I);
impl<I, T, ST> AsInExpression<ST> for AsSingleColumnValueSubselect<I>
where
	I: IntoIterator<Item = T>,
	T: AsExpression<ST>,
	ST: SqlType + TypedExpressionType,
{
	type InExpression = SingleColumnValuesSubselect<ST, T>;

	fn as_in_expression(self) -> Self::InExpression {
		SingleColumnValuesSubselect {
			values: self.0.into_iter().collect(),
			p: PhantomData,
		}
	}
}

#[derive(Debug, Clone)]
pub struct SingleColumnValuesSubselect<ST, I> {
	/// The values contained in the `= ANY ()` clause
	pub values: Vec<I>,
	p: PhantomData<ST>,
}

impl<ST, I, GB> ValidGrouping<GB> for SingleColumnValuesSubselect<ST, I>
where
	ST: SingleValue,
	I: AsExpression<ST>,
	I::Expression: ValidGrouping<GB>,
{
	type IsAggregate = <I::Expression as ValidGrouping<GB>>::IsAggregate;
}

// This implementation is fake, it's not an expression, but it's used to check consistency between the SQL types
// of both sides of the `= ANY ()` clause.
impl<ST, I> Expression for SingleColumnValuesSubselect<ST, I>
where
	ST: TypedExpressionType,
{
	type SqlType = ST;
}

impl<ST, I> MaybeEmpty for SingleColumnValuesSubselect<ST, I> {
	fn is_empty(&self) -> bool {
		self.values.is_empty()
	}
}

impl<ST, I, QS> SelectableExpression<QS> for SingleColumnValuesSubselect<ST, I>
where
	SingleColumnValuesSubselect<ST, I>: AppearsOnTable<QS>,
	ST: SingleValue,
	I: AsExpression<ST>,
	<I as AsExpression<ST>>::Expression: SelectableExpression<QS>,
{
}

impl<ST, I, QS> AppearsOnTable<QS> for SingleColumnValuesSubselect<ST, I>
where
	SingleColumnValuesSubselect<ST, I>: Expression,
	I: AsExpression<ST>,
	ST: SingleValue,
	<I as AsExpression<ST>>::Expression: SelectableExpression<QS>,
{
}

impl<ST, I, DB> QueryFragment<DB> for SingleColumnValuesSubselect<ST, I>
where
	DB: Backend + HasSqlType<ST>,
	ST: SingleValue,
	I: ToSql<ST, DB>,
{
	fn walk_ast<'b>(&'b self, mut out: AstPass<'_, 'b, DB>) -> QueryResult<()> {
		if self.values.is_empty() {
			return Err(diesel::result::Error::QueryBuilderError(
				"Cannot generate an empty `= ANY (VALUES (...), (...))` clause - avoid making a query in that case"
					.into(),
			));
		}
		out.unsafe_to_cache_prepared();
		out.push_sql("VALUES ");
		let mut first = true;
		for value in &self.values {
			if first {
				first = false;
			} else {
				out.push_sql(", ");
			}
			out.push_sql("(");
			out.push_bind_param(value)?;
			out.push_sql(")");
		}
		Ok(())
	}
}

impl<ST, I> QueryId for SingleColumnValuesSubselect<ST, I> {
	type QueryId = ();

	const HAS_STATIC_QUERY_ID: bool = false;
}

weiznich · 2024-11-19T19:28:56Z

I've pushed #4350 with another approach to fix that problem, but I'm open to different solutions.

As a more general note: I wonder if it would be meaningful (and possible) to have some sort of fuzzer that takes the documentation and keeps generating queries via the DSL and then checks if they compile and if they do not return an syntax error while executing them.

Ten0 · 2024-11-20T01:45:31Z

As a more general note: I wonder if it would be meaningful (and possible) to have some sort of fuzzer that takes the documentation and keeps generating queries via the DSL and then checks if they compile and if they do not return an syntax error while executing them.

This looks interesting. However I feel like it's probably fine if Diesel doesn't cover for mistakes people never make, and there ought to be some of those that the type system allows but the complexity of preventing isn't worth the probability of occurrence...
If this is true, then I'm not sure what would be a strategy to create a general test system that would test only "relevant" cases but still be able to infer them automatically.

Ten0 added the bug label Nov 19, 2024

Ten0 changed the title ~~.eq_any([["foo"]]) on SqlType = Array<Text> causes runtime error on Postgres~~ .eq_any([["foo"]]) on SqlType = Array<Text> causes runtime error on Postgres Nov 19, 2024

Ten0 added the enhancement label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`.eq_any([["foo"]])` on `SqlType = Array<Text>` causes runtime error on Postgres #4349

`.eq_any([["foo"]])` on `SqlType = Array<Text>` causes runtime error on Postgres #4349

Ten0 commented Nov 19, 2024

Ten0 commented Nov 19, 2024 •

edited

Loading

Ten0 commented Nov 19, 2024 •

edited

Loading

weiznich commented Nov 19, 2024

Ten0 commented Nov 20, 2024 •

edited

Loading

.eq_any([["foo"]]) on SqlType = Array<Text> causes runtime error on Postgres #4349

.eq_any([["foo"]]) on SqlType = Array<Text> causes runtime error on Postgres #4349

Comments

Ten0 commented Nov 19, 2024

Setup

Versions

Feature Flags

Problem Description

What are you trying to accomplish?

What is the actual output?

What is the expected output?

Steps to reproduce

Ten0 commented Nov 19, 2024 • edited Loading

Ten0 commented Nov 19, 2024 • edited Loading

weiznich commented Nov 19, 2024

Ten0 commented Nov 20, 2024 • edited Loading

`.eq_any([["foo"]])` on `SqlType = Array<Text>` causes runtime error on Postgres #4349

`.eq_any([["foo"]])` on `SqlType = Array<Text>` causes runtime error on Postgres #4349

Ten0 commented Nov 19, 2024 •

edited

Loading

Ten0 commented Nov 19, 2024 •

edited

Loading

Ten0 commented Nov 20, 2024 •

edited

Loading