Evaluate is not working #420

jackchan0528 · 2024-03-23T09:30:10Z

Following the doc here: https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/nemoguardrails/eval/data/moderation/README.md

I unzip the required text files under eval/data/moderation folder, and tried running the commands:
nemoguardrails evaluate moderation --config=config --dataset-path .\eval\data\moderation\anthropic_harmful.txt --split harmful
nemoguardrails evaluate moderation --config=config --dataset-path .\eval\data\moderation\anthropic_helpful.txt --split helpful

For the harmful one, I got this error:

...\Lib\site-packages\langchain_openai\chat_models\base.py", line 165, in
_convert_message_to_dict _convert_message_to_dict
raise TypeError(f"Got unknown type {message}")
TypeError: Got unknown type Y

and for the helpful one, the error is:

...\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4455: character maps to

It seems that there are 2 main issues. One is that the "Y" could possibly be the answer from the rails (Yes). But it does not get recognized by any instance type defined under langchain_openai/chat_models/base.py _convert_message_to_dict(), quoting it below:
`def _convert_message_to_dict(message: BaseMessage) -> dict:
"""Convert a LangChain message to a dictionary.

Args:
    message: The LangChain message.

Returns:
    The dictionary.
"""
message_dict: Dict[str, Any]
if isinstance(message, ChatMessage):
    message_dict = {"role": message.role, "content": message.content}
elif isinstance(message, HumanMessage):
    message_dict = {"role": "user", "content": message.content}
elif isinstance(message, AIMessage):
    message_dict = {"role": "assistant", "content": message.content}
    if "function_call" in message.additional_kwargs:
        message_dict["function_call"] = message.additional_kwargs["function_call"]
        # If function call only, content is None not empty string
        if message_dict["content"] == "":
            message_dict["content"] = None
    if "tool_calls" in message.additional_kwargs:
        message_dict["tool_calls"] = message.additional_kwargs["tool_calls"]
        # If tool calls only, content is None not empty string
        if message_dict["content"] == "":
            message_dict["content"] = None
elif isinstance(message, SystemMessage):
    message_dict = {"role": "system", "content": message.content}
elif isinstance(message, FunctionMessage):
    message_dict = {
        "role": "function",
        "content": message.content,
        "name": message.name,
    }
elif isinstance(message, ToolMessage):
    message_dict = {
        "role": "tool",
        "content": message.content,
        "tool_call_id": message.tool_call_id,
    }
else:
    raise TypeError(f"Got unknown type {message}")
if "name" in message.additional_kwargs:
    message_dict["name"] = message.additional_kwargs["name"]
return message_dict`

and for the second issue, I believe you need to have the "encoding="utf8"" somewhere in the code.

@drazvan

The text was updated successfully, but these errors were encountered:

drazvan · 2024-03-26T11:10:42Z

Thanks @jackchan0528. @trebedea should be able to help with this early next week. Let me know if this is urgent and I can try to help as well.

…tchecking and moderation

trebedea · 2024-04-30T13:21:27Z

Thanks for reporting this @jackchan0528 , evaluate was not working with chat LLMs from Langchain. The evaluation package was created before Langchain branched off the BaseChatModel as a different base class for chat models.

This should solve your main problem. However, I was not able to replicate the second one with the unicode error.
Running this works for me with no errors:

python process_anthropic_dataset.py --dataset-path anthropic_helpful.jsonl --split helpful
nemoguardrails evaluate moderation --config=config --dataset-path .\eval\data\moderation\anthropic_helpful.txt --split helpful

I used the test set from Anthropic HH (test.jsonl.gz).

I will close this, just reopen if the problems persist.

…tchecking and moderation

Fix #420 - evaluate not working with chat models

drazvan assigned trebedea Mar 26, 2024

drazvan added bug Something isn't working status: in progress Issues that are currently being worked on. labels Mar 26, 2024

trebedea added a commit that referenced this issue Apr 30, 2024

Fix #420 - evaluate with chat LLMs from Langchain not working for fac…

08016c8

…tchecking and moderation

trebedea mentioned this issue Apr 30, 2024

Fix #420 - evaluate not working with chat models #478

Merged

trebedea closed this as completed Apr 30, 2024

drazvan pushed a commit that referenced this issue May 8, 2024

Fix #420 - evaluate with chat LLMs from Langchain not working for fac…

d109b96

…tchecking and moderation

drazvan added a commit that referenced this issue May 8, 2024

Merge pull request #478 from NVIDIA/bugfix/420-evaluate-with-chat-models

06fed16

Fix #420 - evaluate not working with chat models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate is not working #420

Evaluate is not working #420

jackchan0528 commented Mar 23, 2024

drazvan commented Mar 26, 2024

trebedea commented Apr 30, 2024

Evaluate is not working #420

Evaluate is not working #420

Comments

jackchan0528 commented Mar 23, 2024

drazvan commented Mar 26, 2024

trebedea commented Apr 30, 2024