Multimodal Vision-Language Chatbot Interface

This repository provides a chatbot interface designed to interact with multimodal data (images and text) using vision-language models. The interface uses Gradio for a web-based front-end and communicates with models deployed via OpenAI-compatible APIs. It supports selecting models from multiple deployment endpoints, generating responses, and providing interactive feedback for image and text inputs.

Features

Model Selection:
- Reads a YAML file to list available models deployed on specified IPs and ports. (TODO)
- Automatically detects available models using OpenAI-compatible APIs, creating a dictionary of accessible models. (DONE)
- Provides a dropdown in the Gradio interface to select models. (TODO)
Multimodal Input:
- Accepts both text and image inputs via a MultimodalTextbox. Users can interact with the chatbot by combining images and textual queries.
- Allows for multiple image uploads in a single input. (DONE)
Error Handling:
- Displays error messages in the front-end when model interactions fail. (TODO)
Interactive Parameters:
- Users can adjust parameters like temperature and maximum output tokens directly in the interface.
Example Inputs:
- Predefined examples guide users on how to interact with the interface using images and text.

Installation

Prerequisites

Python 3.8 or later
Required Python packages:
```
pip install gradio openai pyyaml pillow
```

Clone the Repository

git clone https://github.com/your-repo/multimodal-chatbot.git
cd multimodal-chatbot

Configuration

YAML File for Model Configuration

To manage models, define their IPs and ports in a YAML file. For example:

model:
  llama-3.1-8b:
    - ip: "localhost"
      port: 18001
    - ip: "100.1.100.122"
      port: 8001
  llama-3.1-70b: []

(Reading and using this file is currently marked as a TODO.)

Usage

Start the Server

Run the following command to launch the Gradio interface:

python chatbot.py --host localhost --port 19000

Access the Interface

Open your browser and navigate to:

http://localhost:19000

Interface Overview

Model Configuration

Model Name: Select the model to interact with.
Model URL: Specify the base URL of the model deployment.
API Key: Provide the API key for authentication.

Parameters

Temperature: Controls the randomness of the model's responses.
Max Output Tokens: Limits the maximum number of tokens generated in the response.

Example Inputs

Example inputs guide users on how to interact with the chatbot using text and images.

File Structure

├── chatbot.py               # Main script for running the Gradio interface
├── configs/
│   └── gradio/
│       └── vision_language_model.yaml  # Example YAML configuration for model endpoints
├── data/
│   └── images/              # Example images for predefined examples
└── README.md                # Documentation for the repository

Future Enhancements

Read Model Endpoints from YAML: Automatically populate the dropdown with available models listed in the YAML file.
Error Display in Front-end: Show detailed error messages in the Gradio interface when model interactions fail.
Improved User Experience: Enhance the visual design and interactivity of the interface.

References

License

This project is licensed under the MIT License. See the LICENSE file for details.

Feel free to contribute to the repository by submitting issues or pull requests! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gradio		.gradio
configs/gradio		configs/gradio
data/images		data/images
.gitignore		.gitignore
app.py		app.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Vision-Language Chatbot Interface

Features

Installation

Prerequisites

Clone the Repository

Configuration

YAML File for Model Configuration

Usage

Start the Server

Access the Interface

Interface Overview

Model Configuration

Parameters

Example Inputs

File Structure

Future Enhancements

References

License

About

Releases

Packages

Languages

CraftJarvis/webchat

Folders and files

Latest commit

History

Repository files navigation

Multimodal Vision-Language Chatbot Interface

Features

Installation

Prerequisites

Clone the Repository

Configuration

YAML File for Model Configuration

Usage

Start the Server

Access the Interface

Interface Overview

Model Configuration

Parameters

Example Inputs

File Structure

Future Enhancements

References

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages