RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding; by integrating it with ipex-llm
, users can now easily leverage local LLMs running on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max).
See the demo of ragflow running Qwen2:7B on Intel Arc A770 below.
You could also click here to watch the demo video. |
- Prerequisites
- Install and Start Ollama Service on Intel GPU
- Pull Model
- Start
RAGFlow
Service - Using
RAGFlow
- CPU >= 4 cores
- RAM >= 16 GB
- Disk >= 50 GB
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
Follow the steps in Run Ollama with IPEX-LLM on Intel GPU Guide to install and run Ollama on Intel GPU. Ensure that ollama serve
is running correctly and can be accessed through a local URL (e.g., https://127.0.0.1:11434
) or a remote URL (e.g., http://your_ip:11434
).
Important
If the RAGFlow
is not deployed on the same machine where Ollama is running (which means RAGFlow
needs to connect to a remote Ollama service), you must configure the Ollama service to accept connections from any IP address. To achieve this, set or export the environment variable OLLAMA_HOST=0.0.0.0
before executing the command ollama serve
.
Tip
If your local LLM is running on Intel Arc™ A-Series Graphics with Linux OS (Kernel 6.2), it is recommended to additionaly set the following environment variable for optimal performance before executing ollama serve
:
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
Now we need to pull a model for RAG using Ollama. Here we use Qwen/Qwen2-7B model as an example. Open a new terminal window, run the following command to pull qwen2:latest
.
-
For Linux users:
export no_proxy=localhost,127.0.0.1 ./ollama pull qwen2:latest
-
For Windows users:
Please run the following command in Miniforge or Anaconda Prompt.
set no_proxy=localhost,127.0.0.1 ollama pull qwen2:latest
Tip
Besides Qwen2, there are other LLM models you might want to explore, such as Llama3, Phi3, Mistral, etc. You can find all available models in the Ollama model library. Simply search for the model, pull it in a similar manner, and give it a try.
Note
The steps in section 3 is verified on Linux system only.
You can either clone the repository or download the source zip from github:
$ git clone https://github.com/infiniflow/ragflow.git
Ensure vm.max_map_count
is set to at least 262144. To check the current value of vm.max_map_count
, use:
$ sysctl vm.max_map_count
To set the value temporarily, use:
$ sudo sysctl -w vm.max_map_count=262144
To make the change permanent and ensure it persists after a reboot, add or update the following line in /etc/sysctl.conf
:
vm.max_map_count=262144
Build the pre-built Docker images and start up the server:
Note
Running the following commands automatically downloads the dev version RAGFlow Docker image. To download and run a specified Docker version, update RAGFLOW_VERSION
in docker/.env to the intended version, for example RAGFLOW_VERSION=v0.7.0
, before running the following commands.
$ export no_proxy=localhost,127.0.0.1
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d
Note
The core image is about 9 GB in size and may take a while to load.
Check the server status after having the server up and running:
$ docker logs -f ragflow-server
Upon successful deployment, you will see logs in the terminal similar to the following:
____ ______ __
/ __ \ ____ _ ____ _ / ____// /____ _ __
/ /_/ // __ `// __ `// /_ / // __ \| | /| / /
/ _, _// /_/ // /_/ // __/ / // /_/ /| |/ |/ /
/_/ |_| \__,_/ \__, //_/ /_/ \____/ |__/|__/
/____/
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:9380
* Running on http://x.x.x.x:9380
INFO:werkzeug:Press CTRL+C to quit
You can now open a browser and access the RAGflow web portal. With the default settings, simply enter http://IP_OF_YOUR_MACHINE
(without the port number), as the default HTTP serving port 80
can be omitted. If RAGflow is deployed on the same machine as your browser, you can also access the web portal at http://127.0.0.1
or http://localhost
.
Note
For detailed information about how to use RAGFlow, visit the README of RAGFlow official repository.
If this is your first time using RAGFlow, you will need to register. After registering, log in with your new account to access the portal.
Access the Ollama settings through Settings -> Model Providers in the menu. Fill out the Base URL, and then click the OK button at the bottom.
If the connection is successful, you will see the model listed down Show more models as illustrated below.
Note
If you want to use an Ollama server hosted at a different URL, simply update the Ollama Base URL to the new URL and press the OK button again to re-confirm the connection to Ollama.
Go to Knowledge Base by clicking on Knowledge Base in the top bar. Click the +Create knowledge base button on the right. You will be prompted to input a name for the knowledge base.
After entering a name, you will be directed to edit the knowledge base. Click on Dataset on the left, then click + Add file -> Local files. Upload your file in the pop-up window and click OK.
After the upload is successful, you will see a new record in the dataset. The Parsing Status column will show UNSTARTED
. Click the green start button in the Action column to begin file parsing. Once parsing is finished, the Parsing Status column will change to SUCCESS.
Next, go to Configuration on the left menu and click Save at the bottom to save the changes.
Start new conversations by clicking Chat in the top navbar.
On the left side, create a conversation by clicking Create an Assistant. Under Assistant Setting, give it a name and select your knowledge bases.
Next, go to Model Setting, choose your model added by Ollama, and disable the Max Tokens toggle. Finally, click OK to start.
Tip
Enabling the Max Tokens toggle may result in very short answers.
Input your questions into the Message Resume Assistant textbox at the bottom, and click the button on the right to get responses.
To shut down the RAGFlow server, use Ctrl+C in the terminal where the Ragflow server is runing, then close your browser tab.