Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: CohereGenerator #6034

Closed

Conversation

sunilkumardash9
Copy link
Contributor

Related Issues

Proposed Changes:

CohereGenerator to generate Cohere LLM responses from user-provided prompts

How did you test it?

Unit tests

Checklist

@sunilkumardash9 sunilkumardash9 requested a review from a team as a code owner October 12, 2023 09:08
@sunilkumardash9 sunilkumardash9 requested review from ZanSara and removed request for a team October 12, 2023 09:08
@CLAassistant
Copy link

CLAassistant commented Oct 12, 2023

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Oct 12, 2023
@ZanSara
Copy link
Contributor

ZanSara commented Oct 13, 2023

Hello @sunilkumardash9! I'm going to take care of the review soon. In the meantime, let's fix the CI. Have you followed the installation instructions in our contributions guidelines? You need to install pre-commit hooks for the CI to run properly your tests.

On top of that, after you installed the pre-commit hooks you will need to run pre-commit run --all. It's needed just one time, to fix all the issues that were committed before installing the hooks.

Let me know if you have issues!

@sunilkumardash9
Copy link
Contributor Author

Hi @ZanSara, I followed what you suggested. I got two failed checks for Ruff and Codespell. Is it a problem? Though running hooks individually runs fine.

@sunilkumardash9 sunilkumardash9 changed the title added CohereGenerator with unit tests added CohereGenerator Oct 13, 2023
@sunilkumardash9 sunilkumardash9 changed the title added CohereGenerator feat: CohereGenerator Oct 13, 2023
sunilkumardash9 and others added 3 commits October 13, 2023 18:05
2. removed commented files in test-cohere_generators
3. removed unused imports

Signed-off-by: sunilkumardash9 <[email protected]>
@sunilkumardash9 sunilkumardash9 requested a review from a team as a code owner October 13, 2023 12:43
@sunilkumardash9 sunilkumardash9 requested review from dfokina and removed request for a team October 13, 2023 12:43
@ZanSara
Copy link
Contributor

ZanSara commented Oct 13, 2023

Hey @sunilkumardash9 , if Ruff and Codespell don't pass that means that there are some small issues with your code. Normally the error messages explain what's the issue and how to fix it, and if it's not clear they should be easy to Google. If you struggle with some of them you can share them here and I can help interpreting them.

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes

haystack/preview/components/generators/cohere/cohere.py Outdated Show resolved Hide resolved
haystack/preview/components/generators/cohere/cohere.py Outdated Show resolved Hide resolved
2. remove dict casting of metadata in run

Signed-off-by: sunilkumardash9 <[email protected]>
Signed-off-by: sunilkumardash9 <[email protected]>
@ZanSara
Copy link
Contributor

ZanSara commented Oct 13, 2023

@sunilkumardash9 I see that the CI is now failing because cohere is not installed. Please add cohere to the dependencies installed at this line:

run: pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.32.1 'sentence-transformers>=2.2.0' pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'

It should fix the issue.


You will need to do the same thing here:

run: pip install .[dev,preview] langdetect transformers[torch,sentencepiece]==4.32.1 'sentence-transformers>=2.2.0' pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'

to also make PyLint pass.

Signed-off-by: sunilkumardash9 <[email protected]>
2. small change in doc string

Signed-off-by: sunilkumardash9 <[email protected]>
@ZanSara
Copy link
Contributor

ZanSara commented Oct 16, 2023

@sunilkumardash9 I see there's typo in the dependencies list: cohere should be outside of the quotes, not inside. That should fix the CI.

@ZanSara ZanSara self-requested a review October 16, 2023 08:57
2. changed api key env var from CO_API_KEY to COHERE_API_KEY

Signed-off-by: sunilkumardash9 <[email protected]>
@ZanSara
Copy link
Contributor

ZanSara commented Oct 16, 2023

@sunilkumardash9 You can ignore the MyPy issue for now (we're working on it, it's a larger issue), while for the others, I forgot to make you update these lines as well:

The change is the same, you just need to add cohere at the end.

Don't forget to do git pull, by the way: I updated your branch to try fixing the mypy issue but it didn't work 😅

@sunilkumardash9
Copy link
Contributor Author

@ZanSara, I should have scrolled down a bit too. 😅

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small fixes, overall looking good! I believe we will manage to merge this one soon

Comment on lines 42 to 46
Args:
api_key (str): The API key for the Cohere API.
model_name (str): The name of the model to use.
streaming_callback (Callable, optional): A callback function to be called with the streaming response. Defaults to None.
api_base_url (str, optional): The base URL of the Cohere API. Defaults to "https://api.cohere.ai".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use a different style for docstrings: it's quite important for our automated API documentation tooling, so let's fix it.

Here is how it should look like:

Suggested change
Args:
api_key (str): The API key for the Cohere API.
model_name (str): The name of the model to use.
streaming_callback (Callable, optional): A callback function to be called with the streaming response. Defaults to None.
api_base_url (str, optional): The base URL of the Cohere API. Defaults to "https://api.cohere.ai".
Instantiates a `CohereGenerator` component.
:param api_key: The API key for the Cohere API.
:param model_name: The name of the model to use.
:param streaming_callback: A callback function to be called with the streaming response. Defaults to None.
:param api_base_url: The base URL of the Cohere API. Defaults to "https://api.cohere.ai".
:param kwargs: additional model parameters. These will be used during generation.

In addition:

  • for model_name, let's give some example model names.
  • for kwargs, let's add a link to the Cohere documentation where these arguments are listed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same, should have stuck with the way it is in GptGenerator. For the Kwargs, I have added the params in doc_string.

Comment on lines 106 to 107
replies: List[str]
metadata: List[Dict[str, Any]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we need these type definitions? Does mypy raises errors if you remove these two lines?

Comment on lines 118 to 119
metadata = [{"finish_reason": response[0].finish_reason}]
replies = [response[0].text]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users want to receive multiple alternative responses from the model, this won't work. Let's leverage the fact that we can return a list:

Suggested change
metadata = [{"finish_reason": response[0].finish_reason}]
replies = [response[0].text]
metadata = [{"finish_reason": resp.finish_reason} for resp in response]
replies = [resp.text for resp in response]

api_key: str,
model: str = "command",
streaming_callback: Optional[Callable] = None,
api_base_url: str = API_BASE_URL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the tests you're using cohere.COHERE_API_URL. Can we use the same here instead of duplicating its content into a constant? In this way, if cohere updates this value we won't need to do anything and the component will still work.

Comment on lines 17 to 22
def default_streaming_callback(chunk):
"""
Default callback function for streaming responses from Cohere API.
Prints the tokens of the first completion to stdout as soon as they are received and returns the chunk unchanged.
"""
print(chunk.text, flush=True, end="")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this function, because it's used only in the tests. To test for a streaming callback, you can define it in the tests themselves.

2. Added kwargs doc strings for CohereGenerator
3. removed type hints for metadata and replies
4. use COHERE_API_URL instead of hard coded URL.

Signed-off-by: sunilkumardash9 <[email protected]>
Copy link
Contributor

@dfokina dfokina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sunilkumardash9 ! Adding some cosmetic docstrings suggestions.

haystack/preview/components/generators/cohere/cohere.py Outdated Show resolved Hide resolved
haystack/preview/components/generators/cohere/cohere.py Outdated Show resolved Hide resolved
haystack/preview/components/generators/cohere/cohere.py Outdated Show resolved Hide resolved
haystack/preview/components/generators/cohere/cohere.py Outdated Show resolved Hide resolved
haystack/preview/components/generators/cohere/cohere.py Outdated Show resolved Hide resolved
@masci masci added the 2.x Related to Haystack v2.0 label Oct 30, 2023
@ZanSara
Copy link
Contributor

ZanSara commented Nov 23, 2023

Hello @sunilkumardash9, sorry for the late feedback. I'm going to take over this PR in order to fix the tests and merge it. Thank you for your contribution!

@ZanSara ZanSara mentioned this pull request Nov 23, 2023
@julian-risch
Copy link
Member

Closing this PR in favor of #6395

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:CI topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CohereGenerator
6 participants