How to stream LLM response with streamlit? #3

fabmeyer · 2023-05-11T07:55:38Z

I am following this script using RetrievalQA chain.

Code:

llm = OpenAI(client=OpenAI, streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
chain = RetrievalQA.from_chain_type(llm=llm, chain_type='refine', retriever=docsearch.as_retriever())

...

if 'user_input' not in st.session_state:
    st.session_state['user_input'] = []
if 'generated_text' not in st.session_state:
    st.session_state['generated_text'] = []

user_input = st.text_area('Enter a question', value=f"What are trends in {st.session_state['thematic']['term']}?")

button_2 = st.button('Get answer')

if user_input and button_2:
    st.session_state.user_input.append(user_input)
    with st.spinner('Running LLM...'):
        st.session_state.generated_text.append(st.session_state['chain'].run(user_input))

if 'generated_text' in st.session_state and len(st.session_state['generated_text']) > 0:
    for i in range(len(st.session_state['generated_text']) - 1, -1, -1):
        message(st.session_state['user_input'][i], is_user=True, key=str(i) + '_user')
        message(st.session_state['generated_text'][i], key=str(i))

How can I stream the response of the LLM in real time (like on the console)?

tractorjuice · 2023-10-28T09:18:54Z

Take a look at the new Streamlit Chat elements. It may help?

https://docs.streamlit.io/knowledge-base/tutorials/build-conversational-apps

ucola · 2024-02-04T20:55:46Z

@fabmeyer did you solve your problem?

tractorjuice · 2024-02-05T08:31:37Z

Use these instead: https://docs.streamlit.io/library/api-reference/chat

tractorjuice · 2024-02-05T08:34:02Z

Examples here, https://docs.streamlit.io/knowledge-base/tutorials/build-conversational-apps#build-a-chatgpt-like-app

ucola · 2024-02-05T13:21:23Z

@tractorjuice Yes, i read this implementation, but do not bring them up running. Streamlit write:
StreamlitAPIException: st.stream_write expects a generator or stream-like object as input not <class 'str'>. Please use st.write instead for this data type.

My code is bellow, did you have any idea how to rewrite this?

def prepare_llm(prompt):
    st.chat_message(name="user", avatar=IMAGE["user"]).markdown(prompt)
    st.session_state.messages.append({"role": "user", "content": prompt})

    msg = []
    msg.append({"role": "assistant", "content": description})
    for x in st.session_state.messages:
        msg.append(x)

    embeddings = OpenAIEmbeddings()
    docsearch = Pinecone.from_existing_index(
        index_name="langchain-index", embedding=embeddings
    )

    with (st.chat_message(name="assistant", avatar=IMAGE["assistant"])):
        llm = ChatOpenAI(
            streaming=True,
            callbacks=[StreamingStdOutCallbackHandler()],
            temperature=0,
        )
        qa = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type='stuff',
            retriever=docsearch.as_retriever(),
            chain_type_kwargs={
                "prompt": prompt,
                "memory": ConversationBufferMemory(
                    memory_key="history",
                    input_key="question"),
            },
        )
        response = st.write_stream(
            qa.run({"query": "What Atlas Client?"})
        )

Bharathi-A7 · 2024-03-12T10:55:48Z

@ucola I'm facing the same error. Were you able to resolve it?

truevis · 2024-03-13T20:31:28Z

This is working for me (with groq):

    full_response = ""  # Initialize outside the generator

    def generate_responses(completion):
        global full_response
        for chunk in completion:
            response = chunk.choices[0].delta.content or ""
            if response:
                full_response += response  # Append to the full response
                yield response
    st.chat_message("assistant")
    stream = generate_responses(completion)
    st.write_stream(stream)

    # After streaming
    if full_response:  # Check and use the full_response as needed
        response_message = {"role": "assistant", "content": full_response}
        # with st.chat_message("assistant"):
        #     st.markdown(full_response)
        st.session_state.messages.append(response_message)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to stream LLM response with streamlit? #3

How to stream LLM response with streamlit? #3

fabmeyer commented May 11, 2023

tractorjuice commented Oct 28, 2023

ucola commented Feb 4, 2024

tractorjuice commented Feb 5, 2024

tractorjuice commented Feb 5, 2024

ucola commented Feb 5, 2024 •

edited

Loading

Bharathi-A7 commented Mar 12, 2024

truevis commented Mar 13, 2024

How to stream LLM response with streamlit? #3

How to stream LLM response with streamlit? #3

Comments

fabmeyer commented May 11, 2023

tractorjuice commented Oct 28, 2023

ucola commented Feb 4, 2024

tractorjuice commented Feb 5, 2024

tractorjuice commented Feb 5, 2024

ucola commented Feb 5, 2024 • edited Loading

Bharathi-A7 commented Mar 12, 2024

truevis commented Mar 13, 2024

ucola commented Feb 5, 2024 •

edited

Loading