Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve local LLM workflow (no more environment variables) #419

Closed
cpacker opened this issue Nov 10, 2023 · 0 comments · Fixed by #422
Closed

Improve local LLM workflow (no more environment variables) #419

cpacker opened this issue Nov 10, 2023 · 0 comments · Fixed by #422
Assignees
Labels
enhancement New feature or request

Comments

@cpacker
Copy link
Collaborator

cpacker commented Nov 10, 2023

Current setup

If a user wants to run dolphin on LM studio with the airoboros wrapper:

export OPENAI_API_BASE=http://127.0.0.1:1234
export BACKEND_TYPE=lmstudio
memgpt run --model airoboros_xxx

Config (when using a local model)

  • model is "local", or can be "airoboros_xxx" in which case model == wrapper
  • model_endpoint stores the IP from OPENAI_API_BASE
[defaults]
model = local
model_endpoint = http://localhost:1234

Proposed setup (with memgpt run)

  • User does not specify and ENV variables, it's all in config
  • Add a --wrapper arg and config variable

If a user wants to run dolphin on LM studio with the airoboros wrapper:

memgpt run --wrapper airoboros_xxx --endpoint http://localhost:1234 --endpoint_type lmstudio

For almost all backends, it's OK for the model to be unspecified, because what model is running is determined by the backend. The only exception to this is Ollama, which requires you to pass the model name in the POST request. This is already a special case in our documentation: https://memgpt.readthedocs.io/en/latest/ollama/ (currently, we ask the user to set an additional environment variable).

Special Ollama case:

memgpt run --model dolphin_xxx --wrapper airoboros_xxx  --endpoint http://localhost:11434 --endpoint_type ollama

Proposed setup (with memgpt configure, then memgpt run)

  • If the user says no to OpenAI, no to Azure, then:
    • Ask for their endpoint type (lmstudio, ollama, etc)
    • Ask for their endpoint IP
      • We should do input checking / sanitation on the IP they provide (http prefix? hanging /v1/?)
    • Ask what prompt formatter / wrapper they want to use
      • IMO I think we should hide this, make it default to the default, but can override with memgpt run --wrapper

Config (when using a local model)

[defaults]
model = optional for non-Ollama (default None), for Ollama this is the real model name (eg dolphin-2.2.1-mistral7b)
model_endpoint = http://localhost:1234
model_endpoint_type = lmstudio

lmstudio:

[defaults]
model = None
wrapper = None
model_endpoint = http://localhost:1234
model_endpoint_type = lmstudio

ollama:

[defaults]
model = dolphin-2.2.1-mistral7b
wrapper = None
model_endpoint = http://localhost:11434
model_endpoint_type = ollama

Special case where the user wants to use OpenAI, but swap the endpoint to a proxy

export OPENAI_API_BASE="<proxy_address>"
memgpt run

Config

We do NOT set model_endpoint to this proxy address, instead let openai-python handle this for us (on our end we act like nothing changed, it's just openai):

[defaults]
model = gpt4 / ...
model_endpoint = n/a
model_endpoint_type = n/a
@cpacker cpacker added the enhancement New feature or request label Nov 10, 2023
@cpacker cpacker changed the title Improve local LLM workflow Improve local LLM workflow (no more environment variables) Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants