Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : (web ui) Various improvements, now use vite as bundler #10599

Merged
merged 11 commits into from
Dec 3, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Nov 30, 2024

Motivation

The new web UI has received significantly more positive feedback than anticipated, prompting consideration for further enhancements.

Currently, we operate without a bundler, running code directly from index.html and 3rd party libraries from deps.sh. However, this approach has limitations, particularly with daisyui component compatibility. It also leave a big binary size with many redundant parts inside.

Given that many llama.cpp contributors have a lower-level programming background, here's a brief overview of the current UI tech stack and why we need to use them:

  • Tailwindcss: A CSS framework that simplifies styling, used by major platforms like ChatGPT, Claude, and Hugging Face
  • Daisyui: A tailwindcss-based component library offering ready-to-use elements like chat bubbles, buttons, and themes
  • Due to the large size of these libraries (up to 2MB for pre-compiled versions), we need vite as a bundler to eliminate unused code. This also enables compilation into a single .html file, eliminating runtime dependencies.

Improvements

Key updates in this PR:

  • Project relocated to server/webui using npm for dependency management (with deps.sh script removed)
  • Integration of vite bundler (build instructions in server/README.md). Output index.html size is just under 500kb, which is smaller than the old approach with deps.sh (Remind: this new index.html contains everything needed)
  • Enhanced mobile compatibility (screenshots below)

For binary size, this PR recudes 1MB of the final compiled binary compared to master:

# master
$ ls -lah llama-server
-rwxr-xr-x  1 ngxson  staff   5.5M Nov 30 15:02 llama-server

# PR
-rwxr-xr-x  1 ngxson  staff   4.5M Nov 30 14:52 llama-server
IMG_1847 IMG_1846 IMG_1845

|

@github-actions github-actions bot added examples devops improvements to build systems and github actions server labels Nov 30, 2024
@ngxson ngxson marked this pull request as ready for review November 30, 2024 13:57
@ngxson ngxson requested a review from ggerganov November 30, 2024 13:57
@ggerganov
Copy link
Owner

Do I need to do any extra steps? I pulled your branch and run llama-server like this:

cd build
make -j && ./bin/llama-server -m ../models/qwen2.5-32b-instruct/ggml-model-q8_0.gguf

...

0.01.190.151 I srv          init: initializing slots, n_slots = 1
0.01.190.163 I slot         init: id  0 | task -1 | new slot n_ctx_slot = 4096
0.01.190.233 I main: model loaded
0.01.190.255 I main: chat template, built_in: 1, chat_example: '<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
'
0.01.190.257 I main: server is listening on http://127.0.0.1:8080 - starting the main loop
0.01.190.259 I srv  update_slots: all slots are idle

But it renders like this in the browser:

image

If I go back to master, it renders OK.

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 1, 2024

@ggerganov Could you try make clean? In the initial PR, some users reported that the generated .hpp files are not up-to-date. I added some rules to clean it in Makefile

	find examples/server -type f -name "*.js.hpp"   -delete
	find examples/server -type f -name "*.mjs.hpp"  -delete
	find examples/server -type f -name "*.css.hpp"  -delete
	find examples/server -type f -name "*.html.hpp" -delete

@slaren
Copy link
Collaborator

slaren commented Dec 1, 2024

I had the same issue as @ggerganov. Running make clean and reconfiguring the cmake build fixed it for me.

However, just reconfiguring cmake is not enough to fix it. That should be fixed, or it is going to be very confusing.

@ggerganov
Copy link
Owner

Yes, make clean in the root folder and then reconfiguring the cmake build did it.

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 1, 2024

I think I have found a fix, after playing around a bit more with Makefile's .PHONY and cmake's set_source_files_properties (Not sure if this is the correct way, but feel free to suggest other improvements)

@ggerganov
Copy link
Owner

The CMake build still requires to run make clean in the root folder if you have used the Makefile build previously. Can we update the CMake to do the find ... -delete stuff? We will soon remove the Makefile build so no need to worry about it, but the CMake build is affected as it is.

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 2, 2024

I added a pre-build step to remove the generated .hpp files

@ngxson ngxson merged commit 91c36c2 into ggerganov:master Dec 3, 2024
50 checks passed
@pothitos
Copy link
Contributor

pothitos commented Dec 4, 2024

Many thanks for the new awesome front-end 🙏

I added a pre-build step to remove the generated .hpp files

Unfortunately, this change causes llama-server to recompile every time you run CMake, even when no files have changed.

I don't know if it's correct to write here this comment or should I open an issue?

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 4, 2024

I don't observe the same behavior on my side (running cmake --build the second time does not rebuild everything)

cmake --build build -j --target llama-server
[  4%] Built target build_info
[ 20%] Built target ggml-base
[ 28%] Built target ggml-metal
[ 32%] Built target ggml-blas
[ 52%] Built target ggml-cpu
[ 56%] Built target ggml
[ 72%] Built target llama
[ 92%] Built target common
[100%] Built target llama-server

@slaren
Copy link
Collaborator

slaren commented Dec 4, 2024

It does rebuild server.cpp, since it depends on the generated file. You should be able to see it with -v. I think the cleaner solution would be to move all the generated files to the cmake directory, and do not delete them. cmake should be able to regenerate them automatically when necessary, and removing the build directory always results in a clean start.

I imagine that the reason it doesn't work like that in the first place is because we needed to support the Makefile, but that's no longer the case.

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Dec 7, 2024
…ganov#10599)

* hide buttons in dropdown menu

* use npm as deps manager and vite as bundler

* fix build

* fix build (2)

* fix responsive on mobile

* fix more problems on mobile

* sync build

* (test) add CI step for verifying build

* fix ci

* force rebuild .hpp files

* cmake: clean up generated files pre build
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
…ganov#10599)

* hide buttons in dropdown menu

* use npm as deps manager and vite as bundler

* fix build

* fix build (2)

* fix responsive on mobile

* fix more problems on mobile

* sync build

* (test) add CI step for verifying build

* fix ci

* force rebuild .hpp files

* cmake: clean up generated files pre build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions examples server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants