Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add o1 Models #504

Merged
merged 4 commits into from
Oct 17, 2024
Merged

Add o1 Models #504

merged 4 commits into from
Oct 17, 2024

Conversation

Ryan526
Copy link
Contributor

@Ryan526 Ryan526 commented Oct 9, 2024

Add o1 models and change max_tokens to max_completion_tokens since the former is now deprecated.

Streaming can't be enabled in profiles using these models.

Max Output tokens for these new models:
o1-preview: Up to 32,768 tokens
o1-mini: Up to 65,536 tokens

@Ryan526
Copy link
Contributor Author

Ryan526 commented Oct 9, 2024

I've tested and its working fine except for one issue. If I try to suggest a chat name it will not rename the chat. Is that set to always use the same model the chat was started with? Maybe it should be set to always use 4o-mini for speed and costs @Niek ?

@Niek
Copy link
Owner

Niek commented Oct 9, 2024

Thanks for this PR!

We need to check carefully if this doesn't break other models and petals. We can't really hardcode rename to use another model, as that could break other services that don't have it. Why is the summarization breaking?

@Ryan526
Copy link
Contributor Author

Ryan526 commented Oct 9, 2024

Why is the summarization breaking?

Possibly because maxTokens for suggestName in Chat.svelte is set to 30. Reasoning tokens are probably eating all that up before it can generate a response.

I tried changing that from 30 to 500, won't work... Changed from 500 to 20000 and now it works. Need a way to set reasoning tokens to 0 for chat summarization.

@Ryan526
Copy link
Contributor Author

Ryan526 commented Oct 9, 2024

https://platform.openai.com/docs/guides/reasoning/managing-the-context-window

If the generated tokens reach the context window limit or the max_completion_tokens value you've set, you'll receive a chat completion response with the finish_reason set to length. This might occur before any visible completion tokens are produced, meaning you could incur costs for input and reasoning tokens without receiving a visible response.

To prevent this, ensure there's sufficient space in the context window or adjust the max_completion_tokens value to a higher number. OpenAI recommends reserving at least 25,000 tokens for reasoning and outputs when you start experimenting with these models. As you become familiar with the number of reasoning tokens your prompts require, you can adjust this buffer accordingly.

Looks like reasoning tokens isn't something that can be controlled. Perhaps under the chat profile could add an option to select a model to use for chat summarization.

@Ryan526
Copy link
Contributor Author

Ryan526 commented Oct 16, 2024

@Niek it works with all OpenAI models from my testing. I didn't test petals because I don't have and don't use that API.

@Ryan526
Copy link
Contributor Author

Ryan526 commented Oct 17, 2024

I removed max tokens from chat suggestions so now all models work. Shouldn't be using many tokens anyways on models outside of o1.

@Ryan526
Copy link
Contributor Author

Ryan526 commented Oct 17, 2024

Can someone that has a petals API verify it still works after these changes? My site is currently on this branch: https://gpt.jalynski.net/

@Niek
Copy link
Owner

Niek commented Oct 17, 2024

Petals does not require a key, you can simply click the checkbox and use their models. I tried and it seems broken, but TBH so is the main version. I will merge your PR now!

@Niek Niek merged commit a64337c into Niek:main Oct 17, 2024
@Niek
Copy link
Owner

Niek commented Oct 17, 2024

@all-contributors please add @Ryan526 for code

Copy link
Contributor

@Niek

We had trouble processing your request. Please try again later.

@Niek
Copy link
Owner

Niek commented Oct 17, 2024

@all-contributors please add @Ryan526 for code

Copy link
Contributor

@Niek

I've put up a pull request to add @Ryan526! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants