-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: repository loader #400
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall good, left some minor comments.
embedchain/loaders/repo_loader.py
Outdated
|
||
# TODO: Repo name as metadata, whether it's remote or local. | ||
meta_data = { | ||
"url": f"repo-{origin}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't get why are we adding "repo-" to the url.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't get why are we adding "repo-" to the url.
if it's a local file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can offer you to only do it for local directories, or leave it out completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets do repo-{origin}
for local directories and for everything else, the normal url.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, added test for this too
now uses lazy loading |
…/RepositoryLoader
Language detection demo with this repo:
|
added |
Test script from embedchain import OpenSourceApp as App
from embedchain.config import OpenSourceAppConfig as AppConfig
from embedchain.config import QueryConfig
config = AppConfig(log_level="WARNING")
app = App(config=config)
# app.reset()
print("Reset app to run clean test")
app.add("https://github.com/embedchain/embedchain.git")
config = QueryConfig(number_documents=1)
print("Answer:", app.query("What does the query function do?", config=config))
print("Answer:", app.query("Where is the _get_detected_language function located?", config=config))
print("Answer:", app.query("Where are some references of the number_documents variable or attribute?", config=config))
print("Answer:", app.query("Which parts of the code have todo comments?", config=config)) this test is running the open source app. Maybe that's why the results are so bad
|
Updated test script: from embedchain import OpenSourceApp as App
from embedchain.config import OpenSourceAppConfig as AppConfig
from embedchain.config import QueryConfig
config = AppConfig(log_level="WARNING")
app = App(config=config)
# app.reset()
print("Reset app to run clean test")
app.add("https://github.com/embedchain/embedchain.git")
config = QueryConfig(number_documents=1)
questions = [
"What does the query function do?",
"Where is the _get_detected_language function located?",
"Where are some references of the number_documents variable or attribute?",
"Which parts of the code have todo comments?",
]
for question in questions:
print(f"--------------------------------\n{question}\n--------------------------------")
print(app.query(question, config=config, dry_run=True))
print("Answer:", app.query(question, config=config, dry_run=False)) Updated response:
|
Cool! @cachho |
@cachho Hi, When encountering some errors during local testing, is it better not to submit all at once?
|
Sqlite appears to have some limitations. |
@mggger does the repository you load contain sql code? almost seems like an issue with unescaped instructions. |
@cachho Hi, I test to add I have created an issue about this chromadb bug. |
@cachho I think it's better to add |
This will be really useful. I hope the maintainers add this fast to main. |
Hey, we are not happy with the quality yet. You can see from my tests that it's not the greatest. When we rollout a data type we want it to be working. Do you have any suggestions on how we could improve it? |
Closing this PR as we have already added support for github repo. thanks for the contribution. |
Description
adds
repo
as adata_type
to load repositories from github and local.Includes unit test and documentation.
Fixes #51
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Please delete options that are not relevant.
Checklist:
Maintainer Checklist