You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Related to the original Singer spec's sql-datatype (see #1323, #1903), this new metadata object would work in an analogous manner to Python's logging module capability for importing and instantiating arbitrary callables and classes.
The story, specifically for vector/embedding fields for LLMs, would go something like:
As a user, I know that one or more fields in a stream represent a vector. For example, {"id": 1, "my_vector": [1, 2, 3]}.
This would trigger the SDK to import pgvector.sqlalchemy.Vector and instantiate it as Vector(dim=3).
The other requirement is that the user installs pgvector-python in the same virtual environment as target-postgres, which could be achieved with package extras, e.g. target-postgres[vector] or by documenting known postgres SQLAlchemy extensions in the target's readme.
Potential issues
The biggest problem I can think off is figuring out the priority with which this type override is considered. For example, in target-postgres:
a few JSON schema types are mapped first to postgres-specific column types, e.g. `int -> BIGINT
then the SDK defaults are used
But by introducing this feature, we'd expect targets to resolve in the following order:
sqlalchemy_type overrides
custom target implementation overrides
SDK defaults
The text was updated successfully, but these errors were encountered:
thank you for converging this from MeltanoLabs/target-pinecone#20 so quickly. Is GH-1872 actually already setting the stage for your proposal, or is it something different?
Feature scope
Tap/target metadata.
Description
Related to the original Singer spec's
sql-datatype
(see #1323, #1903), this new metadata object would work in an analogous manner to Python's logging module capability for importing and instantiating arbitrary callables and classes.The story, specifically for vector/embedding fields for LLMs, would go something like:
As a user, I know that one or more fields in a stream represent a vector. For example,
{"id": 1, "my_vector": [1, 2, 3]}
.I would like to use MeltanoLabs/target-postgres alongside pgvector/pgvector-python to declare the SQLAlchemy type of
my_vector
:pgvector.sqlalchemy.Vector
I would like not to only declare the type but also any arbitrary parameteres for it, like length, dimension, etc.:
This would trigger the SDK to import
pgvector.sqlalchemy.Vector
and instantiate it asVector(dim=3)
.The other requirement is that the user installs
pgvector-python
in the same virtual environment as target-postgres, which could be achieved with package extras, e.g.target-postgres[vector]
or by documenting known postgres SQLAlchemy extensions in the target's readme.Potential issues
The biggest problem I can think off is figuring out the priority with which this type override is considered. For example, in target-postgres:
But by introducing this feature, we'd expect targets to resolve in the following order:
sqlalchemy_type
overridesThe text was updated successfully, but these errors were encountered: