You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On behalf of GH-9, we added an SQLAlchemy data type implementation for CrateDB's FLOAT_VECTOR data type. On this task, we discovered a few details that made us think we should also modernize the other data type implementations, eventually.
The reason is because they have been conceived a while ago already, around SQLAlchemy 1.x, and additional mechanisms have been introduced to SQLAlchemy since then.
Implementation Notes
An improved implementation of the FloatVector data type for CrateDB,
compared to the previous implementation on behalf of the LangChain adapter.
The previous implementation, based on SQLAlchemy's UserDefinedType, didn't
respect the python_type property on backward/reverse resolution of types.
This was observed on Meltano's database connector machinery doing a
type cast, which led to a NotImplementedError.
The UserDefinedType approach is easier to implement, because it doesn't
need compiler support.
To get full SQLAlchemy type support, including support for forward- and
backward resolution / type casting, the custom data type should derive
from SQLAlchemy's TypeEngine base class instead.
When deriving from TypeEngine, you will need to set the __visit_name__
attribute, and add a corresponding visitor method to the CrateTypeCompiler,
in this case, visit_FLOAT_VECTOR.
Now, rendering a DDL succeeds. However, when reflecting the DDL schema back,
it doesn't work until you will establish a corresponding reverse type mapping.
By invoking SELECT DISTINCT(data_type) FROM information_schema.columns;,
you will find out that the internal type name is float_vector, so you
announce it to the dialect using TYPES_MAP["float_vector"] = FloatVector.
Still not there: NotImplementedError: Default TypeEngine.as_generic() heuristic method was unsuccessful for target_cratedb.sqlalchemy.vector.FloatVector. A custom as_generic() method must be implemented for this type class.
So, as it signals that the type implementation also needs an as_generic
property, let's supply one, returning sqltypes.ARRAY.
It looks like, in exchange to those improvements, the get_col_spec
method is not needed any longer.
TODO: Would it be a good idea to derive from SQLAlchemy's ARRAY right away, to get a few of the features without
the need to redefine them?
Please note the outcome of this analysis and the corresponding implementation
has been derived from empirical observations, and from the feeling that we also
lack corresponding support on the other special data types of CrateDB (ARRAY and
OBJECT) within the SQLAlchemy dialect, i.e. "that something must be wrong or
incomplete". In this spirit, it is advisable to review and improve their
implementations correspondingly.
About
On behalf of GH-9, we added an SQLAlchemy data type implementation for CrateDB's
FLOAT_VECTOR
data type. On this task, we discovered a few details that made us think we should also modernize the other data type implementations, eventually.The reason is because they have been conceived a while ago already, around SQLAlchemy 1.x, and additional mechanisms have been introduced to SQLAlchemy since then.
Implementation Notes
An improved implementation of the
FloatVector
data type for CrateDB,compared to the previous implementation on behalf of the LangChain adapter.
The previous implementation, based on SQLAlchemy's
UserDefinedType
, didn'trespect the
python_type
property on backward/reverse resolution of types.This was observed on Meltano's database connector machinery doing a
type cast, which led to a
NotImplementedError
.The
UserDefinedType
approach is easier to implement, because it doesn'tneed compiler support.
To get full SQLAlchemy type support, including support for forward- and
backward resolution / type casting, the custom data type should derive
from SQLAlchemy's
TypeEngine
base class instead.When deriving from
TypeEngine
, you will need to set the__visit_name__
attribute, and add a corresponding visitor method to the
CrateTypeCompiler
,in this case,
visit_FLOAT_VECTOR
.Now, rendering a DDL succeeds. However, when reflecting the DDL schema back,
it doesn't work until you will establish a corresponding reverse type mapping.
By invoking
SELECT DISTINCT(data_type) FROM information_schema.columns;
,you will find out that the internal type name is
float_vector
, so youannounce it to the dialect using
TYPES_MAP["float_vector"] = FloatVector
.Still not there:
NotImplementedError: Default TypeEngine.as_generic() heuristic method was unsuccessful for target_cratedb.sqlalchemy.vector.FloatVector. A custom as_generic() method must be implemented for this type class.
So, as it signals that the type implementation also needs an
as_generic
property, let's supply one, returning
sqltypes.ARRAY
.It looks like, in exchange to those improvements, the
get_col_spec
method is not needed any longer.
TODO: Would it be a good idea to derive from SQLAlchemy's
ARRAY
right away, to get a few of the features withoutthe need to redefine them?
Please note the outcome of this analysis and the corresponding implementation
has been derived from empirical observations, and from the feeling that we also
lack corresponding support on the other special data types of CrateDB (ARRAY and
OBJECT) within the SQLAlchemy dialect, i.e. "that something must be wrong or
incomplete". In this spirit, it is advisable to review and improve their
implementations correspondingly.
References
FLOAT_VECTOR
data type andKNN_MATCH
function #9 (comment)The text was updated successfully, but these errors were encountered: