Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include workspace contents if any at first step only. #30

Merged
merged 1 commit into from
Aug 10, 2024
Merged

Conversation

SmartManoj
Copy link
Owner

What is the problem that this fixes or functionality that this introduces? Does it fix any open issues?

Closes All-Hands-AI#2838

Applying the initial idea from All-Hands-AI#2865 (comment)

Generating embeddings for all files is taking time. (~1 minute)
It should be a user preference for time vs cost.

@SmartManoj
Copy link
Owner Author

@CodiumAI-Agent /review

@CodiumAI-Agent
Copy link

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Key issues to review

Performance Concern
The use of os.listdir to list all files in the workspace directory could potentially lead to performance issues if the directory contains a large number of files. Consider adding a check or limit to the number of files processed.

Possible Bug
The code checks if workspace_contents is not empty before appending the message, but does not handle the case where os.listdir might raise an exception if the directory does not exist or if there are permissions issues.

@SmartManoj
Copy link
Owner Author

@CodiumAI-Agent /ask

Performance Concern
The use of os.listdir to list all files in the workspace directory could potentially lead to performance issues if the directory contains a large number of files. Consider adding a check or limit to the number of files processed.

It lists files only in the first level.

Possible Bug
The code checks if workspace_contents is not empty before appending the message, but does not handle the case where os.listdir might raise an exception if the directory does not exist or if there are permissions issues.

Workspace will always be there.

@CodiumAI-Agent
Copy link

Ask

[Performance Concern](https://github.com/SmartManoj/Kevin/pull/30/files > The use of os.listdir to list all files in the workspace directory could potentially lead to performance issues if the directory contains a large number of files. Consider adding a check or limit to the number of files processed. It lists files only in the first level. > [Possible Bug](https://github.com/SmartManoj/Kevin/pull/30/files > The code checks if workspace_contents is not empty before appending the message, but does not handle the case where os.listdir might raise an exception if the directory does not exist or if there are permissions issues. Workspace will always be there.

Answer:

Performance Concern

The use of os.listdir to list all files in the workspace directory could indeed lead to performance issues if the directory contains a large number of files. Here are a few suggestions to mitigate this:

  1. Limit the Number of Files: You can limit the number of files listed to a manageable number, for example, the first 10 files.

    workspace_contents = ', '.join(os.listdir(config.workspace_base)[:10])
  2. Asynchronous Listing: If the file listing operation is potentially time-consuming, consider performing it asynchronously to avoid blocking the main thread.

  3. Lazy Loading: Another approach is to load the file list lazily, i.e., only when it is actually needed.

Possible Bug

The code currently does not handle exceptions that might be raised by os.listdir, such as when the directory does not exist or there are permission issues. Even though you mentioned that the workspace will always be there, it's good practice to handle potential exceptions to make the code more robust. Here is an example of how you can handle such exceptions:

try:
    workspace_contents = ', '.join(os.listdir(config.workspace_base))
    if workspace_contents:
        messages.append(
            Message(
                role='user',
                content=[
                    TextContent(text=f'WORKSPACE CONTENTS: {workspace_contents}')
                ],
            )
        )
except FileNotFoundError:
    # Handle the case where the directory does not exist
    messages.append(
        Message(
            role='user',
            content=[
                TextContent(text='WORKSPACE CONTENTS: Directory not found')
            ],
        )
    )
except PermissionError:
    # Handle the case where there are permission issues
    messages.append(
        Message(
            role='user',
            content=[
                TextContent(text='WORKSPACE CONTENTS: Permission denied')
            ],
        )
    )

By handling these exceptions, you ensure that your application can gracefully handle unexpected situations, improving its robustness and reliability.

@SmartManoj
Copy link
Owner Author

@CodiumAI-Agent /ask

  1. for example, the first 10 files.

context will be missed.

@CodiumAI-Agent
Copy link

Ask

  1. for example, the first 10 files. context will be missed.

Answer:

The new code introduced in the PR does not specifically address the selection of the first 10 files or any specific number of files. Instead, it focuses on including the contents of the workspace directory in the initial message if there is only one event in the state history. This is done to provide an overview of the workspace contents early in the process, which can help in understanding the context without needing to generate embeddings for all files immediately.

Here is a breakdown of the relevant new code:

if len(state.history.get_events_as_list()) == 1:
    workspace_contents = ', '.join(os.listdir(config.workspace_base))
    if workspace_contents:
        messages.append(
            Message(
                role='user',
                content=[
                    TextContent(
                        text=f'WORKSPACE CONTENTS: {workspace_contents}'
                    )
                ],
            )
        )

This code checks if the state history contains only one event. If so, it lists the contents of the workspace directory and appends this information as a message to the messages list. This approach ensures that the user gets an overview of the workspace contents early on, potentially reducing the need to process all files immediately and thus saving time.

To address the concern about missing context for the first 10 files, you might consider implementing additional logic to selectively include more detailed information about a subset of files if needed. However, this specific PR does not include such functionality.

@SmartManoj SmartManoj merged commit 4fd7e0a into kevin Aug 10, 2024
16 checks passed
@SmartManoj SmartManoj deleted the ls2 branch August 10, 2024 05:26
@SmartManoj SmartManoj restored the ls2 branch August 10, 2024 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants