Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.1 content updating vector #759

Merged
merged 6 commits into from
Sep 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 37 additions & 8 deletions docs/MatrixOne/Develop/schema-design/1.1-vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,13 @@

数据库拥有向量能力意味着数据库系统具备存储、查询和分析向量数据的能力。这些向量通常与复杂的数据分析、机器学习和数据挖掘任务相关。以下是数据库拥有向量能力的一些优点:

- **支持多种应用领域:**向量能力使数据库适用于多种应用领域,包括人脸识别、推荐系统和基因组学分析等
- **生成式 AI 应用程序**:这些数据库可以作为生成式 AI 应用程序的后端,使它们能够根据用户提供的查询获取最近邻结果,提高输出质量和相关性

- **高性能分析:**数据库可以通过针对向量数据的高性能查询和分析操作来加速复杂的数据分析任务。这对于大规模数据集和实时数据处理至关重要
- **高级对象识别**:它们对于开发识别不同数据集之间相似性的高级对象识别平台是无价的。这在抄袭检测、面部识别和 DNA 匹配等领域都有实际应用

- **相似性搜索:**向量能力允许数据库执行相似性搜索操作,例如查找最接近给定向量的数据库中的向量。这在推荐系统和搜索引擎等应用中非常有用
- **个性化推荐系统**:矢量数据库可以通过整合用户偏好和选择来增强推荐系统。这将带来更准确、更有针对性的推荐,从而改善用户体验和参与度

- **维度灵活性:**向量能力通常支持不同维度的向量,使数据库可以处理各种数据类型和特征。

- **机器学习集成:**数据库支持向量存储和查询,可以与机器学习模型集成,使模型的训练和推断更加高效和便捷。
- **异常检测**:向量数据库可以用来存储代表正常行为的特征向量。然后可以通过比较输入向量和存储向量来检测异常。这在网络安全和工业质量控制中很有用。

## 开始前准备

Expand Down Expand Up @@ -106,7 +104,7 @@ CREATE TABLE t1 (
);

-- 插入一些示例数据
INSERT INTO t1 (b) VALUES ('[1,2,3]'), ('[4,5,6]'), ('[2,1,1]'), ('[7,8,9]'), ('[0,0,0]'), ('[3,1,2]');
INSERT INTO t1 (id,b) VALUES (1, '[1,2,3]'), (2, '[4,5,6]'), (3, '[2,1,1]'), (4, '[7,8,9]'), (5, '[0,0,0]'), (6, '[3,1,2]');

-- 使用l1_distance进行Top K查询
SELECT * FROM t1 ORDER BY l1_norm(b - '[3,1,2]') LIMIT 5;
Expand Down Expand Up @@ -162,7 +160,38 @@ mysql> SELECT * FROM t1 ORDER BY 1 - cosine_similarity(b, '[3,1,2]') LIMIT 5;
5 rows in set (0.00 sec)
```

这些查询演示了如何使用不同的距离度量和相似度度量来检索与给定向量 `[3,1,2]` 最相似的前 5 个向量。在这里,`l1_distance` 和 `l2_distance` 分别表示 L1 距离和 L2 距离,`cosine_similarity` 表示余弦相似度,`cosine_distance` 表示余弦距离。通过这些查询,你可以根据不同的度量标准找到与目标向量最匹配的数据。
这些查询演示了如何使用不同的距离度量和相似度度量来检索与给定向量 `[3,1,2]` 最相似的前 5 个向量。通过这些查询,你可以根据不同的度量标准找到与目标向量最匹配的数据。

## 最佳实践

- **向量类型转换**:在将向量从一种类型转换为另一种类型时,建议同时指定维度。例如:

```
SELECT b + CAST("[1,2,3]" AS vecf32(3)) FROM t1;
```

这种做法确保了向量类型转换的准确性和一致性。

- **使用二进制格式**:为了提高整体插入性能,考虑使用二进制格式而不是文本格式。在转换为十六进制编码之前,确保数组采用小端序格式。以下是示例 Python 代码:

```python
import binascii

# 'value' 是一个 NumPy 对象
def to_binary(value):
if value is None:
return value

# 小端序浮点数组
value = np.asarray(value, dtype='<f')

if value.ndim != 1:
raise ValueError('期望 ndim 为 1')

return binascii.b2a_hex(value)
```

这种方法可以显著提高数据插入的效率。

## 限制

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

`cosine_similarity()` 是余弦相似度,它衡量了两个向量之间夹角的余弦值,通过它们在多维空间中的接近程度来表示它们的相似性,其中 1 表示完全相似,-1 表示完全不相似。余弦相似度的计算是通过将两个向量的内积除以它们的 l2 范数的乘积来实现的。

![cosine_similarity](https://github.com/matrixorigin/artwork/blob/main/docs/reference/vector/cosine_similarity.png?raw=true)
![cosine_similarity](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/reference/vector/cosine_similarity.png?raw=true)

## **函数语法**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

`INNER PRODUCT` 函数用于计算两个向量之间的内积/点积,它是两个向量的对应元素相乘然后相加的结果。

![inner_product](https://github.com/matrixorigin/artwork/blob/main/docs/reference/vector/inner_product.png?raw=true)
![inner_product](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/reference/vector/inner_product.png?raw=true)

## **函数语法**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

`l1_norm` 函数用于计算 `l1`/曼哈顿/TaxiCab 范数。`l1` 范数通过对向量元素的绝对值求和得到。

![l1_normy](https://github.com/matrixorigin/artwork/blob/main/docs/reference/vector/l1_norm.png?raw=true)
![l1_normy](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/reference/vector/l1_norm.png?raw=true)

你可以使用 `l1` 范数来计算 `l1` 距离。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

`l2_norm` 函数用于计算 `l2`/欧几里得范数。`l2` 范数通过对向量元素的平方和进行平方根运算得到。

![l2_normy](https://github.com/matrixorigin/artwork/blob/main/docs/reference/vector/l2_norm.png?raw=true)
![l2_normy](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/reference/vector/l2_norm.png?raw=true)

## **函数语法**

Expand Down
28 changes: 14 additions & 14 deletions docs/MatrixOne/Reference/Functions-and-Operators/1.1-Vector/misc.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,14 @@ mysql> select sqrt(b) from vec_table;
```sql
drop table if exists vec_table;
create table vec_table(a int, b vecf32(3), c vecf64(3));
insert into vec_table values(1, "[1,2,3]", "[4,5,6]");
mysql> select * from vec_table;
+------+-----------+-----------+
| a | b | c |
+------+-----------+-----------+
| 1 | [1, 2, 3] | [4, 5, 6] |
+------+-----------+-----------+
1 row in set (0.00 sec)
insert into vec_table values(1, "[-1,-2,3]", "[4,5,6]");
mysql> select * from vec_table;
+------+-------------+-----------+
| a | b | c |
+------+-------------+-----------+
| 1 | [-1, -2, 3] | [4, 5, 6] |
+------+-------------+-----------+
1 row in set (0.00 sec)

mysql> select abs(b) from vec_table;
+-----------+
Expand Down Expand Up @@ -119,12 +119,12 @@ mysql> select * from vec_table;
+------+-----------+-----------+
1 row in set (0.00 sec)

mysql> select abs(cast("[-1,-2,3]" as vecf32(3)));
+-----------------------------------+
| abs(cast([-1,-2,3] as vecf32(3))) |
+-----------------------------------+
| [1, 2, 3] |
+-----------------------------------+
mysql> select b + cast("[1,2,3]" as vecf32(3)) from vec_table;
+--------------------------------+
| b + cast([1,2,3] as vecf32(3)) |
+--------------------------------+
| [2, 4, 6] |
+--------------------------------+
1 row in set (0.00 sec)
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,22 @@
drop table if exists vec_table;
create table vec_table(a int, b vecf32(3), c vecf64(3));
insert into vec_table values(1, "[1,2,3]", "[4,5,6]");
insert into vec_table values(2, "[7,8,9]", "[1,2,3]");
mysql> select * from vec_table;
+------+-----------+-----------+
| a | b | c |
+------+-----------+-----------+
| 1 | [1, 2, 3] | [4, 5, 6] |
| 2 | [7, 8, 9] | [1, 2, 3] |
+------+-----------+-----------+
1 row in set (0.00 sec)
2 row in set (0.00 sec)

mysql> select vector_dims(b) from vec_table;
+----------------+
| vector_dims(b) |
+----------------+
| 3 |
| 3 |
+----------------+
1 row in set (0.01 sec)
2 row in set (0.01 sec)
```