Merge pull request #28 from peng9808/main

update README
hwameistor · Jun 25, 2024 · a8da393 · a8da393
2 parents 64b3557 + 3272c8b
commit a8da393
Show file tree

Hide file tree

Showing 5 changed files with 174 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -23,15 +23,18 @@ Datastore contains several modules:
 
 The Dataset Manager (DSM) is designed to manage datasets and allocate local acceleration cache volumes for datasets. 
 Other modules (such as DLM) can use the cache volumes provided by DSM to load datasets.
-[Learn more](https://github.com/hwameistor/hwameistor/docs/docs/modules/dsm.md)
+[Learn more](./docs/dsm.md)
 
 ### dataload-manager
 
 Dataload-manager (DLM) provides a service for loading data sets for local cache volumes.
 It aims to quickly pull data sets for application programs for training in AI training environments.
-[Learn more](https://github.com/hwameistor/hwameistor/docs/docs/modules/dlm.md)
+[Learn more](./docs/dlm.md)
 
 
+## Quick Use
+[Learn more](./docs/use.md)
+
 
 ## Documentation
 

diff --git a/README_zh.md b/README_zh.md
@@ -5,7 +5,7 @@
 Datastore 是为了加速本地存储 hwameistor 加载数据而开发的项目。
 它可以帮助 hwameistor 很好地应用于 AI 场景以及其他需要快速加载数据进行训练的场景。
 
-![System architecture](https://github.com/hwameistor/hwameistor/docs/docs/img/datastore.png)
+![System architecture](./docs/img/datastore.png)
 
 ## 发版状态
 
@@ -23,13 +23,17 @@ Datastore 包含若干模块：
 
 数据集管理器（DSM）用于管理数据集，为数据集分配本地加速缓存卷。
 其他模块（如DLM）可以使用DSM提供的缓存卷加载数据集。
-[了解更多](https://github.com/hwameistor/hwameistor/docs/docs/modules/dsm.md)
+[了解更多](./docs/dsm.md)
 
 ### 数据加载管理器
 
 Dataload-manager (DLM) 提供为本地缓存卷加载数据集的服务。
 旨在快速拉取数据集供应用程序在 AI 训练环境中进行训练。
-[了解更多](https://github.com/hwameistor/hwameistor/docs/docs/modules/dlm.md)
+[了解更多](./docs/dlm.md)
+
+
+## Quick Use
+[了解更多](./docs/use.md)
 
 
 ## 文档
@@ -55,7 +59,7 @@ please check the [adopters list](https://github.com/hwameistor/hwameistor/adopte
 
 HwameiStor 技术沟通群：
 
-![扫描二维码入群](./docs/docs/img/wechat.png)
+![扫描二维码入群](./docs/img/wechat.png)
 
 ## 讨论
 

diff --git a/docs/dlm.md b/docs/dlm.md
@@ -0,0 +1,15 @@
+
+# DataLoad Manager
+
+DataLoad Manager is a module of DataStor, which is a cloud-native local storage system acceleration solution in AI scenarios. It combines p2p technology to provide the ability to quickly load remote data.
+
+## Applicable scenarios
+
+DataloadManager supports multiple data loading protocols: s3, nfs, ftp, http, ssh
+
+In AI data training scenarios, data can be loaded into local cache volumes faster.
+Especially when the data set supports s3 protocol pull, p2p technology can be combined to significantly improve data loading.
+## Usage with DataLoad Manager
+
+DataLoad Manager is a component of HwameiStor and must work with the [DataLoad Manager](./dsm.md) module.
+
diff --git a/docs/dsm.md b/docs/dsm.md
@@ -0,0 +1,8 @@
+
+# DataSet Manager
+
+DataSet Manager is one of the modules of DataStor which is a cloud-native local storage system acceleration solution in AI scenarios. It provides high-performance local cache volumes for data sets required by AI applications
+
+Supported volume managers: `LVM`.
+
+Supported storage medium: `HDD`, `SSD`, `NVMe`.
diff --git a/docs/use.md b/docs/use.md
@@ -0,0 +1,138 @@
+
+# Local Cache Volumes
+
+It is very simple to run AI training applications using HwameiStor
+
+As an example, we will deploy an Nginx application by creating a local cache volume.
+
+
+## Verify `DataSet`
+
+Take minio as an example
+
+```console
+apiVersion: datastore.io/v1alpha1
+kind: DataSet
+metadata:
+  name: dataset-test
+spec:
+  refresh: true
+  type: minio
+  minio:
+    endpoint: Your service ip address:9000
+    bucket: BucketName/Dir  #Defined according to the directory level where your dataset is located
+    secretKey: minioadmin
+    accessKey: minioadmin
+    region: ap-southeast-2  
+```
+
+## Create `DataSet`
+
+
+```Console
+$ kubectl apply -f dataset.yaml
+```
+
+Confirm that the cache volume has been created successfully
+
+```Console
+$ k get dataset
+NAME           TYPE    LASTREFRESHTIME   CONNECTED   AGE     ERROR
+dataset-test   minio                                 4m38s
+
+$ k get lv
+NAME                                       POOL                   REPLICAS   CAPACITY     USED        STATE   PUBLISHED           AGE
+dataset-test                               LocalStorage_PoolHDD              211812352                Ready                       4m27s
+
+$ k get pv
+NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                                    STORAGECLASS                 REASON   AGE
+dataset-test                               202Mi      ROX            Retain           Available                                                                                                  35s
+
+```
+
+The size of pv is determined by the size of your data set
+
+## Create a PVC and bind it to dataset PV
+
+```Console
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: hwameistor-dataset
+  namespace: default
+spec:
+  accessModes:
+  - ReadOnlyMany
+  resources:
+    requests:
+      storage: 202Mi  #dataset size
+  volumeMode: Filesystem
+  volumeName: dataset-test  #dataset name
+```
+
+Confirm that the pvc has been created successfully
+
+```Console
+
+## Verify  PVC
+
+$ k get pvc
+k get pvc
+NAME                 STATUS   VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS   AGE
+hwameistor-dataset   Bound    dataset-test   202Mi      ROX                           4s
+```
+
+## Create `StatefulSet`
+
+```Console
+$ kubectl apply -f sts-nginx-AI.yaml
+```
+
+Please note the `claimName` uses the name of the pvc bound to the dataset
+
+```yaml
+    spec:
+      volumes:
+        - name: data
+          persistentVolumeClaim:
+            claimName: hwameistor-dataset
+```
+## Verify Nginx Pod 
+```Console
+$ kubectl get pod
+NAME               READY   STATUS            RESTARTS   AGE
+nginx-dataload-0   1/1     Running           0          3m58s
+$ kubectl  logs nginx-dataload-0 hwameistor-dataloader
+Created custom resource
+Custom resource deleted, exiting
+DataLoad execution time: 1m20.24310857s
+```
+According to the log, loading data took 1m20.24310857s
+
+## [Optional] Scale Nginx out into a 3-node Cluster
+
+HwameiStor cache volumes support horizontal expansion of `StatefulSet`. Each `pod` of `StatefulSet` will attach and mount a HwameiStor cache volume bound to the same dataset.
+
+```console
+$ kubectl scale sts/sts-nginx-AI --replicas=3
+
+$ kubectl get pod -o wide
+NAME               READY   STATUS    RESTARTS   AGE
+nginx-dataload-0   1/1     Running   0          41m
+nginx-dataload-1   1/1     Running   0          37m
+nginx-dataload-2   1/1     Running   0          35m
+
+
+$ kubectl logs nginx-dataload-1 hwameistor-dataloader
+Created custom resource
+Custom resource deleted, exiting
+DataLoad execution time: 3.24310857s
+
+$ kubectl logs nginx-dataload-2 hwameistor-dataloader
+Created custom resource
+Custom resource deleted, exiting
+DataLoad execution time: 2.598923144s
+
+```
+
+According to the log, the second and third loading of data only took 3.24310857s and 2.598923144s respectively. Compared with the first loading, the speed has been greatly improved.