Modernize CrateDB data type implementations #65

amotl · 2024-06-09T20:58:08Z

About

On behalf of GH-9, we added an SQLAlchemy data type implementation for CrateDB's FLOAT_VECTOR data type. On this task, we discovered a few details that made us think we should also modernize the other data type implementations, eventually.

The reason is because they have been conceived a while ago already, around SQLAlchemy 1.x, and additional mechanisms have been introduced to SQLAlchemy since then.

Implementation Notes

An improved implementation of the FloatVector data type for CrateDB,
compared to the previous implementation on behalf of the LangChain adapter.

The previous implementation, based on SQLAlchemy's UserDefinedType, didn't
respect the python_type property on backward/reverse resolution of types.
This was observed on Meltano's database connector machinery doing a
type cast, which led to a NotImplementedError.

typing.cast(type, sql_type.python_type) => NotImplementedError

The UserDefinedType approach is easier to implement, because it doesn't
need compiler support.

To get full SQLAlchemy type support, including support for forward- and
backward resolution / type casting, the custom data type should derive
from SQLAlchemy's TypeEngine base class instead.

When deriving from TypeEngine, you will need to set the __visit_name__
attribute, and add a corresponding visitor method to the CrateTypeCompiler,
in this case, visit_FLOAT_VECTOR.

Now, rendering a DDL succeeds. However, when reflecting the DDL schema back,
it doesn't work until you will establish a corresponding reverse type mapping.

By invoking SELECT DISTINCT(data_type) FROM information_schema.columns;,
you will find out that the internal type name is float_vector, so you
announce it to the dialect using TYPES_MAP["float_vector"] = FloatVector.

Still not there: NotImplementedError: Default TypeEngine.as_generic() heuristic method was unsuccessful for target_cratedb.sqlalchemy.vector.FloatVector. A custom as_generic() method must be implemented for this type class.

So, as it signals that the type implementation also needs an as_generic
property, let's supply one, returning sqltypes.ARRAY.

It looks like, in exchange to those improvements, the get_col_spec
method is not needed any longer.

TODO: Would it be a good idea to derive from SQLAlchemy's
ARRAY right away, to get a few of the features without
the need to redefine them?

Please note the outcome of this analysis and the corresponding implementation
has been derived from empirical observations, and from the feeling that we also
lack corresponding support on the other special data types of CrateDB (ARRAY and
OBJECT) within the SQLAlchemy dialect, i.e. "that something must be wrong or
incomplete". In this spirit, it is advisable to review and improve their
implementations correspondingly.

References

Types: Add support for CrateDB's FLOAT_VECTOR data type and KNN_MATCH function #9 (comment)

The text was updated successfully, but these errors were encountered:

amotl mentioned this issue Jun 9, 2024

Types: Add support for CrateDB's FLOAT_VECTOR data type and KNN_MATCH function #9

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modernize CrateDB data type implementations #65

Modernize CrateDB data type implementations #65

amotl commented Jun 9, 2024

Modernize CrateDB data type implementations #65

Modernize CrateDB data type implementations #65

Comments

amotl commented Jun 9, 2024

About

Implementation Notes

References