Testset generation broken after migrating from 0.1.x to 0.2.4 #1660

malikbrh · 2024-11-12T12:08:01Z

[X] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
I have a DivisionByZero error while generating my testset. The same code structure was working fine in v0.1, after migrating to 0.2 it broke. I have tried with multiple models, and finally stuck with OpenAI GPT4o-mini and text-embedding-3-small.

I added two documents to generate the testset, but it always fails at the same place. When exploring the KnowledgeGraph in the Debugger, it is fine with multiple Nodes generated by the previous steps.

Ragas version: v0.2.4
Python version: v3.11.10

Code to Reproduce
Note: documents are llama_index documents
`
kg = KnowledgeGraph()

for doc in documents:
    kg.nodes.append(
        Node(
            type=NodeType.DOCUMENT,
            properties={"page_content": doc.text, "document_metadata": doc.metadata}
        )
    )

generator = TestsetGenerator(llm=LlamaIndexLLMWrapper(generator_llm),
                             embedding_model=LlamaIndexEmbeddingsWrapper(embeddings),
                             knowledge_graph=kg)
testset = generator.generate_with_llamaindex_docs(
    documents,
    testset_size=8,
    with_debugging_logs=True
)
print("Testset GENERATED")`

Error trace

Generating personas: 100%|██████████| 3/3 [00:01<00:00,  2.74it/s]
Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/eval.py", line 202, in <module>
    asyncio.run(evaluation_test())
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/eval.py", line 150, in evaluation_test
    testset = await generate_testset_from_documents(generator, documents)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/eval.py", line 91, in generate_testset_from_documents
    testset = generator.generate_with_llamaindex_docs(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/testset/synthesizers/generate.py", line 264, in generate_with_llamaindex_docs
    return self.generate(
           ^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/testset/synthesizers/generate.py", line 410, in generate
    raise e
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/testset/synthesizers/generate.py", line 407, in generate
    scenario_sample_list: t.List[t.List[BaseScenario]] = exec.results()
                                                         ^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/executor.py", line 200, in results
    results = asyncio.run(self._process_jobs())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/nest_asyncio.py", line 30, in run
    return loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/nest_asyncio.py", line 98, in run_until_complete
    return f.result()
           ^^^^^^^^^^
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/executor.py", line 140, in _process_jobs
    result = await future
             ^^^^^^^^^^^^
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/tasks.py", line 615, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/executor.py", line 45, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/executor.py", line 96, in wrapped_callable_async
    raise e
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/executor.py", line 92, in wrapped_callable_async
    result = await callable(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/testset/synthesizers/base.py", line 94, in generate_scenarios
    scenarios = await self._generate_scenarios(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/testset/synthesizers/multi_hop/abstract.py", line 73, in _generate_scenarios
    num_sample_per_cluster = int(np.ceil(n / len(node_clusters)))
                                         ~~^~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero

Expected behavior
A clear and concise description of what you expected to happen.

I would expect the testset to be generated properly, or at least a more self-explainable error. I do not really know what would be the next best steps to debug.

Additional context
Add any other context about the problem here.
I can provide more stuff if needed, just ask me in comments and I'll see what I can post, as my project is supposed to stay confidential. Thanks in advance for your help !

The text was updated successfully, but these errors were encountered:

malikbrh · 2024-11-12T16:19:39Z

After a bit more investigation, I've ran my code line by line and was able to do the following observations:

Knowledge_graph is filled and transformations are applied, with 16 nodes (2 DOCUMENT, 14 CHUNK) and 32 relationships
Persona_list is well filled, with 3 different persona items

The first line of code I see not having an expected value is in the method flagged by my error trace: ragas.testset.synthesizers.multi_hop.abstract.MultiHopAbstractQuerySynthesizer._generate_scenarios where my node_clusters list (first line of this method) is empty.

Below is a picture of one of my Relationship object in my Knowledge_graph, some of them have properties overlapped_items and entities_overlap_score, but none has summary_similarity which would explain why the check True if rel.get_property("summary_similarity") else False never gets any node_clusters.

So my question becomes the following:
Why would my Relationships not get any summary_similarity property set?
Am I missing documents for my testset to be generated? The documentation is quite light at the moment, any help would be greatly appreciated !

shahules786 · 2024-11-12T17:41:21Z

Hey, the reason could be that the default summary similarity threshold might be higher for your docs. You may do either or both of the following.

modify and add your own transforms
Skip MultiHopAbstractQuerySynthesizer query type by removing it from query_distribution parameter.
We know that docs for Testgen is lagging, and we are trying our best to improve.

malikbrh · 2024-11-12T17:51:17Z

Hey @shahules786 ,
First of all, thanks for your reply !
I'm still debugging my problems with testset generation and I finally ended up with the same conclusion than you, similarity threshold was too high for it to pass on my first documents so I changed them to new ones that passed this check.
I am relatively new with RAG architecture, my bad.

But I still have a question, for testing purposes, I changed the default_filter in ragas/testset/persona.py

By changing the return random.random() < 0.25 to True, my testset finally got generated, my nodes were removed from the generate_personas_from_kg method by this filter.

def default_filter(node: Node) -> bool:
    if (
        node.type.name == "DOCUMENT"
        and node.properties.get("summary_embedding") is not None
    ):
        return random.random() < 0.25
    else:
        return False

Any reason why this default_filter is coded this way? I understood the if conditions but couldn't find any explanation for the random. Thanks in advance !

shahules786 · 2024-11-13T03:56:35Z

Hey @malikbrh Great, amazed that you could debug it without much help from docs. Would love any contributions from you to improve ragas. To answer your question, the idea was to sample random summaries from given document set, cluster them and use one summary from each cluster (representative of the cluster) to estimate the persona that could interact with it.
This feature is very new, and I'm sure it can be further improved. Feel free to share any thought ( consider joining our discord)
Just added some docs for it: https://docs.ragas.io/en/latest/howtos/customizations/testgenerator/_persona_generator/?h=pe#personas-in-testset-generation

malikbrh added the bug Something isn't working label Nov 12, 2024

dosubot bot added the module-testsetgen Module testset generation label Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testset generation broken after migrating from 0.1.x to 0.2.4 #1660

Testset generation broken after migrating from 0.1.x to 0.2.4 #1660

malikbrh commented Nov 12, 2024

malikbrh commented Nov 12, 2024 •

edited

Loading

shahules786 commented Nov 12, 2024

malikbrh commented Nov 12, 2024

shahules786 commented Nov 13, 2024

Testset generation broken after migrating from 0.1.x to 0.2.4 #1660

Testset generation broken after migrating from 0.1.x to 0.2.4 #1660

Comments

malikbrh commented Nov 12, 2024

malikbrh commented Nov 12, 2024 • edited Loading

shahules786 commented Nov 12, 2024

malikbrh commented Nov 12, 2024

shahules786 commented Nov 13, 2024

malikbrh commented Nov 12, 2024 •

edited

Loading