⛰️Valley: Video Assistant with Large Language model Enhanced abilitY

Understanding Complex Videos Relying on Large Language and Vision Models

The online demo is no longer available, because we released the code for offline demo deployment

Video Assistant with Large Language model Enhanced abilitY

Generated by stablecog via "A cute llama with valley"

Usage and License Notices: The data, code and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

Install

Clone this repository and navigate to Valley folder
Install Package

conda create -n valley python=3.10 -y
conda activate valley
pip install --upgrade pip
pip install -e .

Data & Weight

Coming soon.

Web UI

The framework of this webUI comes from LLaVA and FastChat, we modified a part of the code to make this demo support the input of video and images.

launch a controller

python valley/serve/controller.py

launch a model worker

python valley/serve/model_worker.py --model-path /path/to/valley-13b-v1

Ps: At present, only single card mode is supported to load the model, and at least 30G of video memory is required, so the graphics card needs at least one Tesla V100.

launch a gradio demo

python valley/serve/gradio_web_server_video.py --share

Inference Valley in Command Line

We now update inference code which is more convient, and supports input in the form of openai api.

Inference CLI

python3 inference/run_valley.py --model-name [PATH TO VALLEY WEIGHT] --video_file [PATH TO VIDEO] --quary [YOUR QUERY ON THE VIDEO]

Inference Chinese Valley

python3 inference/run_valley.py --model-name [PATH TO CHINESE VALLEY WEIGHT] --video_file [PATH TO VIDEO] --query [YOUR QUERY ON THE VIDEO] --system-prompt "你是大型语言视觉助手 Chinese-Valley。你能够理解用户提供的视觉内容或视频，并使用自然语言协助用户完成各种任务。请仔细按照人类的指令进行回答，并详细解释你的答案。"

Inference in code

You can utilize the code located at valley/inference/run_valley_llamma_v2.py to run inference on a video. All that's required is a video path

python valley/inference/run_valley_llamma_v2.py --video_file <path-to-video-file>

Train Valley Step By Step

Coming soon. We modified our code for training valley and managed the model hyperparameters with yaml files. Run the following two scripts to perform valley training.

Pretrain

The llm backbone that currently supports pre-training is Llama(7b,13b), vicuna(7b,13b), stable-vicuna(13b), Llama2(chat-7b, chat-13b). You need to download these open source language model weights yourself and convert them to the huggingface format.

bash valley/train/train.sh valley/configs/experiment/valley_stage1.yaml

Finetune

bash valley/train/train.sh valley/configs/experiment/valley_stage2.yaml

Acknowledgement

LLaVA & MOSS: Thanks to these two repositories for providing high-quality code, our code is based on them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

⛰️Valley: Video Assistant with Large Language model Enhanced abilitY

Install

Data & Weight

Web UI

launch a controller

launch a model worker

launch a gradio demo

Inference Valley in Command Line

Train Valley Step By Step

Pretrain

Finetune

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

⛰️Valley: Video Assistant with Large Language model Enhanced abilitY

Install

Data & Weight

Web UI

launch a controller

launch a model worker

launch a gradio demo

Inference Valley in Command Line

Train Valley Step By Step

Pretrain

Finetune

Acknowledgement