Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync dev with main #61

Merged
merged 16 commits into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 15 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
</p>
<p align=center>
<img src=https://img.shields.io/badge/HACS-Custom-orange.svg?style=for-the-badg>
<img src=https://img.shields.io/badge/version-1.0.3-blue>
<img src=https://img.shields.io/badge/version-1.1.0-blue>
<a href="https://github.com/valentinfrlch/ha-llmvision/issues">
<img src="https://img.shields.io/maintenance/yes/2024.svg">
<img alt="Issues" src="https://img.shields.io/github/issues/valentinfrlch/ha-llmvision?color=0088ff"/>
Expand Down Expand Up @@ -31,20 +31,19 @@
<br>

**LLM Vision** is a Home Assistant integration to analyze images, videos and camera feeds using the vision capabilities of multimodal LLMs.
Supported providers are OpenAI, Anthropic, Google Gemini, [LocalAI](https://github.com/mudler/LocalAI) and [Ollama](https://ollama.com/).
Supported providers are OpenAI, Anthropic, Google Gemini, [LocalAI](https://github.com/mudler/LocalAI), [Ollama](https://ollama.com/) and any OpenAI compatible API.

## Features
- Compatible with OpenAI, Anthropic Claude, Google Gemini, [LocalAI](https://github.com/mudler/LocalAI) and [Ollama](https://ollama.com/)
- Compatible with OpenAI, Anthropic Claude, Google Gemini, [LocalAI](https://github.com/mudler/LocalAI), [Ollama](https://ollama.com/) and custom OpenAI compatible APIs
- Takes images and video from camera entities as input
- Takes local image and video files as input
- Images can be downscaled for faster processing

## Resources
Check the docs for detailed instructions on how to set up LLM Vision and each of the supported providers as well as usage examples and service call parameters:
Check the docs for detailed instructions on how to set up LLM Vision and each of the supported providers, get inspiration from examples or join the discussion on the Home Assistant Community.

<a href="https://llm-vision.gitbook.io/getting-started"><img src="https://img.shields.io/badge/Documentation-blue?style=for-the-badge&logo=gitbook&logoColor=white&color=18bcf2"/></a>
<a href="https://llm-vision.gitbook.io/getting-started"><img src="https://img.shields.io/badge/Documentation-blue?style=for-the-badge&logo=gitbook&logoColor=white&color=18bcf2"/> </a><a href="https://llm-vision.gitbook.io/examples/"><img src="https://img.shields.io/badge/Examples-blue?style=for-the-badge&logo=gitbook&logoColor=black&color=39ffc2"/></a> </a><a href="https://community.home-assistant.io/t/llm-vision-let-home-assistant-see/729241"><img src="https://img.shields.io/badge/Community-blue?style=for-the-badge&logo=homeassistant&logoColor=white&color=03a9f4"/></a>

Check [📖 Examples](https://llm-vision.gitbook.io/examples/) on how you can integrate llmvision into your Home Assistant setup or join the [🗨️ discussion](https://community.home-assistant.io/t/gpt-4o-vision-capabilities-in-home-assistant/729241) on the Home Assistant Community.

## Installation
[![Open a repository inside the Home Assistant Community Store.](https://my.home-assistant.io/badges/hacs_repository.svg)](https://my.home-assistant.io/redirect/hacs_repository/?owner=valentinfrlch&repository=ha-llmvision&category=Integration)
Expand All @@ -69,11 +68,12 @@ logger:
> These are planned features and ideas. They are subject to change and may not be implemented in the order listed or at all.

1. **New Provider**: NVIDIA ChatRTX
2. **New Provider**: Custom (OpenAI API compatible) Providers
2. **Animation Support**: Support for animated GIFs
3. **HACS**: Include in HACS default
4. [x] ~~**Feature**: HTTPS support for LocalAI and Ollama~~
5. [x] ~~**Feature**: Support for video files~~
6. [x] ~~**Feature**: Analyze Frigate Recordings using frigate's `event_id`~~
4. [x] ~~**New Provider**: Custom (OpenAI API compatible) Providers~~
5. [x] ~~**Feature**: HTTPS support for LocalAI and Ollama~~
6. [x] ~~**Feature**: Support for video files~~
7. [x] ~~**Feature**: Analyze Frigate Recordings using frigate's `event_id`~~


## How to report a bug or request a feature
Expand All @@ -88,3 +88,8 @@ logger:
>
>[KBD]: https://github.com/valentinfrlch/ha-llmvision/issues/new/choose


## Support
You can support this project by starring this GitHub repository. If you want, you can also buy me a coffee here:
<br>
<a href="https://www.buymeacoffee.com/valentinfrlch"><img width="15%" src="https://img.buymeacoffee.com/button-api/?text=Buy me a coffee&emoji=☕&slug=valentinfrlch&button_colour=FFDD00&font_colour=000000&font_family=Inter&outline_colour=000000&coffee_colour=ffffff" /></a>
8 changes: 8 additions & 0 deletions custom_components/llmvision/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
CONF_OLLAMA_IP_ADDRESS,
CONF_OLLAMA_PORT,
CONF_OLLAMA_HTTPS,
CONF_CUSTOM_OPENAI_ENDPOINT,
CONF_CUSTOM_OPENAI_API_KEY,
MODEL,
PROVIDER,
MAXTOKENS,
Expand Down Expand Up @@ -45,6 +47,8 @@ async def async_setup_entry(hass, entry):
ollama_ip_address = entry.data.get(CONF_OLLAMA_IP_ADDRESS)
ollama_port = entry.data.get(CONF_OLLAMA_PORT)
ollama_https = entry.data.get(CONF_OLLAMA_HTTPS)
custom_openai_endpoint = entry.data.get(CONF_CUSTOM_OPENAI_ENDPOINT)
custom_openai_api_key = entry.data.get(CONF_CUSTOM_OPENAI_API_KEY)

# Ensure DOMAIN exists in hass.data
if DOMAIN not in hass.data:
Expand All @@ -63,6 +67,8 @@ async def async_setup_entry(hass, entry):
CONF_OLLAMA_IP_ADDRESS: ollama_ip_address,
CONF_OLLAMA_PORT: ollama_port,
CONF_OLLAMA_HTTPS: ollama_https,
CONF_CUSTOM_OPENAI_ENDPOINT: custom_openai_endpoint,
CONF_CUSTOM_OPENAI_API_KEY: custom_openai_api_key
}.items()
if value is not None
})
Expand Down Expand Up @@ -104,6 +110,8 @@ def _default_model(self, provider):
return "gpt-4-vision-preview"
elif provider == "Ollama":
return "llava"
elif provider == "Custom OpenAI":
return "gpt-4o-mini"


def setup(hass, config):
Expand Down
68 changes: 67 additions & 1 deletion custom_components/llmvision/config_flow.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
CONF_OLLAMA_IP_ADDRESS,
CONF_OLLAMA_PORT,
CONF_OLLAMA_HTTPS,
CONF_CUSTOM_OPENAI_API_KEY,
CONF_CUSTOM_OPENAI_ENDPOINT,
VERSION_ANTHROPIC,
)
import voluptuous as vol
Expand Down Expand Up @@ -120,6 +122,39 @@ async def openai(self):
_LOGGER.error("Could not connect to OpenAI server.")
raise ServiceValidationError("handshake_failed")

async def custom_openai(self):
self._validate_provider()
_LOGGER.debug(f"Splits: {len(self.user_input[CONF_CUSTOM_OPENAI_ENDPOINT].split(":"))}")
# URL with port
try:
if len(self.user_input[CONF_CUSTOM_OPENAI_ENDPOINT].split(":")) > 2:
protocol = self.user_input[CONF_CUSTOM_OPENAI_ENDPOINT].split(
"://")[0]
base_url = self.user_input[CONF_CUSTOM_OPENAI_ENDPOINT].split(
"://")[1].split("/")[0]
port = ":" + self.user_input[CONF_CUSTOM_OPENAI_ENDPOINT].split(":")[
1].split("/")[0]
# URL without port
else:
protocol = self.user_input[CONF_CUSTOM_OPENAI_ENDPOINT].split(
"://")[0]
base_url = self.user_input[CONF_CUSTOM_OPENAI_ENDPOINT].split(
"://")[1].split("/")[0]
port = ""
endpoint = "/v1/models"
header = {'Content-type': 'application/json',
'Authorization': 'Bearer ' + self.user_input[CONF_CUSTOM_OPENAI_API_KEY]}
except Exception as e:
_LOGGER.error(f"Could not parse endpoint: {e}")
raise ServiceValidationError("endpoint_parse_failed")

_LOGGER.debug(
f"Connecting to: [protocol: {protocol}, base_url: {base_url}, port: {port}, endpoint: {endpoint}]")

if not await self._handshake(base_url=base_url, port=port, protocol=protocol, endpoint=endpoint, header=header):
_LOGGER.error("Could not connect to Custom OpenAI server.")
raise ServiceValidationError("handshake_failed")

async def anthropic(self):
self._validate_provider()
if not await self._validate_api_key(self.user_input[CONF_ANTHROPIC_API_KEY]):
Expand Down Expand Up @@ -149,6 +184,8 @@ def get_configured_providers(self):
providers.append("LocalAI")
if CONF_OLLAMA_IP_ADDRESS in self.hass.data[DOMAIN] and CONF_OLLAMA_PORT in self.hass.data[DOMAIN]:
providers.append("Ollama")
if CONF_CUSTOM_OPENAI_ENDPOINT in self.hass.data[DOMAIN]:
providers.append("Custom OpenAI")
return providers


Expand All @@ -167,6 +204,7 @@ async def handle_provider(self, provider, configured_providers):
"Google": self.async_step_google,
"Ollama": self.async_step_ollama,
"LocalAI": self.async_step_localai,
"Custom OpenAI": self.async_step_custom_openai,
}

step_method = provider_steps.get(provider)
Expand All @@ -180,7 +218,7 @@ async def async_step_user(self, user_input=None):
data_schema = vol.Schema({
vol.Required("provider", default="OpenAI"): selector({
"select": {
"options": ["OpenAI", "Anthropic", "Google", "Ollama", "LocalAI"],
"options": ["OpenAI", "Anthropic", "Google", "Ollama", "LocalAI", "Custom OpenAI"],
"mode": "dropdown",
"sort": False,
"custom_value": False
Expand Down Expand Up @@ -339,3 +377,31 @@ async def async_step_google(self, user_input=None):
step_id="google",
data_schema=data_schema,
)

async def async_step_custom_openai(self, user_input=None):
data_schema = vol.Schema({
vol.Required(CONF_CUSTOM_OPENAI_ENDPOINT): str,
vol.Optional(CONF_CUSTOM_OPENAI_API_KEY): str,
})

if user_input is not None:
# save provider to user_input
user_input["provider"] = self.init_info["provider"]
validator = Validator(self.hass, user_input)
try:
await validator.custom_openai()
# add the mode to user_input
user_input["provider"] = self.init_info["provider"]
return self.async_create_entry(title="LLM Vision Custom OpenAI", data=user_input)
except ServiceValidationError as e:
_LOGGER.error(f"Validation failed: {e}")
return self.async_show_form(
step_id="custom_openai",
data_schema=data_schema,
errors={"base": "handshake_failed"}
)

return self.async_show_form(
step_id="custom_openai",
data_schema=data_schema,
)
2 changes: 1 addition & 1 deletion custom_components/llmvision/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@
"documentation": "https://github.com/valentinfrlch/ha-llmvision",
"iot_class": "cloud_polling",
"issue_tracker": "https://github.com/valentinfrlch/ha-llmvision/issues",
"version": "1.0.3"
"version": "1.1.0"
}
18 changes: 15 additions & 3 deletions custom_components/llmvision/media_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ async def resize_image(self, target_width, image_path=None, image_data=None, img
img = await self.hass.loop.run_in_executor(None, Image.open, image_path)
with img:
# Check if the image is a GIF and convert if necessary
_LOGGER.debug(f"Image format: {img.format}")
if img.format == 'GIF':
# Convert GIF to RGB
img = img.convert('RGB')
Expand All @@ -58,6 +59,10 @@ async def resize_image(self, target_width, image_path=None, image_data=None, img
img_byte_arr.write(image_data)
img = await self.hass.loop.run_in_executor(None, Image.open, img_byte_arr)
with img:
_LOGGER.debug(f"Image format: {img.format}")
if img.format == 'GIF':
# Convert GIF to RGB
img = img.convert('RGB')
# calculate new height based on aspect ratio
width, height = img.size
aspect_ratio = width / height
Expand Down Expand Up @@ -131,8 +136,8 @@ async def add_images(self, image_entities, image_paths, target_width, include_fi
return self.client

async def add_videos(self, video_paths, event_ids, interval, target_width, include_filename):
tmp_clips_dir = f"config/custom_components/{DOMAIN}/tmp_clips"
tmp_frames_dir = f"config/custom_components/{DOMAIN}/tmp_frames"
tmp_clips_dir = f"/config/custom_components/{DOMAIN}/tmp_clips"
tmp_frames_dir = f"/config/custom_components/{DOMAIN}/tmp_frames"
if not video_paths:
video_paths = []
"""Wrapper for client.add_frame for videos"""
Expand Down Expand Up @@ -196,7 +201,14 @@ async def add_videos(self, video_paths, event_ids, interval, target_width, inclu
# Clean up tmp dirs
try:
await self.hass.loop.run_in_executor(None, shutil.rmtree, tmp_clips_dir)
_LOGGER.info(
f"Deleted tmp folder: {tmp_clips_dir}")
except FileNotFoundError as e:
_LOGGER.error(f"Failed to delete tmp folder: {e}")
try:
await self.hass.loop.run_in_executor(None, shutil.rmtree, tmp_frames_dir)
_LOGGER.info(
f"Deleted tmp folder: {tmp_frames_dir}")
except FileNotFoundError as e:
pass
_LOGGER.error(f"Failed to delete tmp folders: {e}")
return self.client
29 changes: 26 additions & 3 deletions custom_components/llmvision/request_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,16 @@
CONF_OLLAMA_IP_ADDRESS,
CONF_OLLAMA_PORT,
CONF_OLLAMA_HTTPS,
CONF_CUSTOM_OPENAI_ENDPOINT,
CONF_CUSTOM_OPENAI_API_KEY,
VERSION_ANTHROPIC,
ENDPOINT_OPENAI,
ERROR_OPENAI_NOT_CONFIGURED,
ERROR_ANTHROPIC_NOT_CONFIGURED,
ERROR_GOOGLE_NOT_CONFIGURED,
ERROR_LOCALAI_NOT_CONFIGURED,
ERROR_OLLAMA_NOT_CONFIGURED,
ERROR_CUSTOM_OPENAI_NOT_CONFIGURED,
ERROR_NO_IMAGE_INPUT
)

Expand Down Expand Up @@ -103,16 +107,29 @@ async def make_request(self, call):
ip_address=ip_address,
port=port,
https=https)
elif call.provider == 'Custom OpenAI':
api_key = self.hass.data.get(DOMAIN).get(
CONF_CUSTOM_OPENAI_API_KEY, "")
endpoint = self.hass.data.get(DOMAIN).get(CONF_CUSTOM_OPENAI_ENDPOINT)

# Additional debug logging
_LOGGER.debug(f"Data from DOMAIN: {self.hass.data.get(DOMAIN)}")
_LOGGER.debug(f"API Key: {api_key}")
_LOGGER.debug(f"Endpoint: {endpoint}")

model = call.model
self._validate_call(provider=call.provider,
api_key=api_key,
base64_images=self.base64_images)
response_text = await self.openai(model=model, api_key=api_key, endpoint=endpoint)
return {"response_text": response_text}

def add_frame(self, base64_image, filename):
self.base64_images.append(base64_image)
self.filenames.append(filename)

# Request Handlers
async def openai(self, model, api_key):
from .const import ENDPOINT_OPENAI
async def openai(self, model, api_key, endpoint=ENDPOINT_OPENAI):
# Set headers and payload
headers = {'Content-type': 'application/json',
'Authorization': 'Bearer ' + api_key}
Expand All @@ -138,7 +155,7 @@ async def openai(self, model, api_key):
)

response = await self._post(
url=ENDPOINT_OPENAI, headers=headers, data=data)
url=endpoint, headers=headers, data=data)

response_text = response.get(
"choices")[0].get("message").get("content")
Expand Down Expand Up @@ -301,6 +318,9 @@ async def ollama(self, model, ip_address, port, https):
async def _post(self, url, headers, data):
"""Post data to url and return response data"""
_LOGGER.info(f"Request data: {sanitize_data(data)}")
_LOGGER.debug(
f"URL type: {type(url)}, Headers type: {type(headers)}, Data type: {type(data)}")

try:
response = await self.session.post(url, headers=headers, json=data)
except Exception as e:
Expand Down Expand Up @@ -352,6 +372,9 @@ def _validate_call(self, provider, api_key, base64_images, ip_address=None, port
elif provider == 'Ollama':
if not ip_address or not port:
raise ServiceValidationError(ERROR_OLLAMA_NOT_CONFIGURED)
elif provider == 'Custom OpenAI':
if not api_key:
raise ServiceValidationError(ERROR_CUSTOM_OPENAI_NOT_CONFIGURED)
# Check media input
if base64_images == []:
raise ServiceValidationError(ERROR_NO_IMAGE_INPUT)
Expand Down
1 change: 1 addition & 0 deletions custom_components/llmvision/services.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ image_analyzer:
- 'Google'
- 'Ollama'
- 'LocalAI'
- 'Custom OpenAI'
model:
name: Model
required: false
Expand Down
8 changes: 8 additions & 0 deletions custom_components/llmvision/strings.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,14 @@
"data": {
"google_api_key": "Your API key"
}
},
"custom_openai": {
"title": "Configure Custom OpenAI provider",
"description": "**Important**: Only works if the API is compatible with OpenAI's API. If the API doesn't require an API key, leave it empty. The endpoint should have the following format: `http(s)://baseURL(:port)/some/endpoint`",
"data": {
"custom_openai_endpoint": "Custom Endpoint",
"custom_openai_api_key": "Your API key"
}
}
},
"error": {
Expand Down
Loading
Loading