Datapipe is a data processing pipeline that (currently) extracts audio clips from youtube videos and generates two transcription candidates with a Vosk (Kaldi) and a Wav2Vec2 model. The goal of the software is to ease the generation of datasets for ASR by automatically extracting and processing large audio sources.
Install k3s
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644
# Check for Ready node,takes maybe 30 seconds
k3s kubectl get node
#Create alias for Kubectl
mkdir -p ~/.kube/ && sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config &&
sudo chown $USER:$USER ~/.kube/config && chmod 600 ~/.kube/config && export KUBECONFIG=~/.kube/config
Install kustomize
curl -s \
"https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash && \
sudo mv kustomize /usr/local/bin/
Create namespace
kubectl create namespace assistent
Encode secret password
#Get BASE64 encoded password
echo -n "password123#$" | base64 -i -
Create secret file and paste encoded password (k8s/postgresql/secret.yaml) As a recommendation keep POSTGRES_USER variable set to default (datapipe)
apiVersion: v1
kind: Secret
metadata:
namespace: assistent
name: datapipe-db-secret
data:
POSTGRES_USER: "ZGF0YXBpcGU="
POSTGRES_PASSWORD: "cGFzc3dvcmQxMjMjJA=="
make deploy
Access to any pod that was set up using projecteaina/datapipe image (example: converter-, fetcher-.. )
kubectl -n assistent exec -it fetcher-YOUR_POD_ID bash
Using the cli add new channel
python -m cli add-channel https://www.youtube.com/user/gencat/
Okteto allows you to develop inside a container. When you run okteto up your Kubernetes deployment is replaced by a development container that contains your development tools. Learn more about Okteto
Install okteto
curl https://get.okteto.com -sSfL | sh
In the case that your cluster setup is not local, please set the KUBECONFIG env variable to the path of your kube config file.
#Example for setting KUBECONFIG generated by goteleport to access remote cluster
export KUBECONFIG=${HOME?}/teleport-kubeconfig.yaml
If you are using a local cluster setup then run next command
mkdir -p ~/.kube/ && sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config &&
sudo chown $USER:$USER ~/.kube/config && chmod 600 ~/.kube/config && export KUBECONFIG=~/.kube/config
Select and start development container
okteto up
Licensed under the GNU Affero General Public License v3.0. Copy of the license
This tool was initially built by the community and its further development and maintanence is being funded by the Catalan Ministry of the Vice-presidency, Digital Policies and Territory of Generalitat within the framework of Projecte AINA.