Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: repository loader #400

Closed
wants to merge 31 commits into from
Closed

Conversation

cachho
Copy link
Contributor

@cachho cachho commented Aug 3, 2023

Description

adds repo as a data_type to load repositories from github and local.

Includes unit test and documentation.

Fixes #51

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Please delete options that are not relevant.

  • Unit Test
  • Test Script (please provide)
from embedchain import OpenSourceApp as App
from embedchain.config import OpenSourceAppConfig as AppConfig

config = AppConfig(log_level="DEBUG")
naval_chat_bot = App(config=config)
naval_chat_bot.reset()
print("Reset app to run clean test")
naval_chat_bot = App()
naval_chat_bot.add("repo", "https://github.com/embedchain/embedchain")

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Made sure Checks passed

@cachho cachho added enhancement New feature or request new-data-source New data source labels Aug 3, 2023
Copy link
Member

@taranjeet taranjeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall good, left some minor comments.

docs/advanced/data_types.mdx Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
embedchain/loaders/repo_loader.py Show resolved Hide resolved

# TODO: Repo name as metadata, whether it's remote or local.
meta_data = {
"url": f"repo-{origin}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't get why are we adding "repo-" to the url.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't get why are we adding "repo-" to the url.

if it's a local file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can offer you to only do it for local directories, or leave it out completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do repo-{origin} for local directories and for everything else, the normal url.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added test for this too

@cachho
Copy link
Contributor Author

cachho commented Aug 12, 2023

now uses lazy loading

@cachho
Copy link
Contributor Author

cachho commented Aug 18, 2023

Language detection demo with this repo:

tests               : 100%|██████████████████████████████████████████████████| 10/10 [00:00<00:00, 930.60it/s]
embedchain          : 100%|██████████████████████████████████████████████████| 54/54 [00:00<00:00, 1016.25it/s]
examples            : 100%|██████████████████████████████████████████████████| 89/89 [00:00<00:00, 805.35it/s]
.git                : 100%|██████████████████████████████████████████████████| 25/25 [00:00<00:00, 1192.91it/s]
docs                : 100%|██████████████████████████████████████████████████| 27/27 [00:00<00:00, 882.46it/s]
.github             : 100%|██████████████████████████████████████████████████| 6/6 [00:00<00:00, 1213.45it/s]
notebooks           : 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 908.35it/s]
2023-08-18 19:25:19,223 [root] [INFO] repository read, 171 files, 19624 lines
2023-08-18 19:25:19,520 [root] [DEBUG] Detected `tests/vectordb/test_chroma_db.py` as `python`
2023-08-18 19:25:19,529 [root] [DEBUG] Detected `tests/vectordb/test_elasticsearch_db.py` as `python`
2023-08-18 19:25:19,535 [root] [DEBUG] Detected `tests/embedchain/test_add.py` as `python`
2023-08-18 19:25:19,558 [root] [DEBUG] Detected `tests/embedchain/test_utils.py` as `python`
2023-08-18 19:25:19,573 [root] [DEBUG] Detected `tests/embedchain/test_generate_prompt.py` as `python`
2023-08-18 19:25:19,585 [root] [DEBUG] Detected `tests/embedchain/test_query.py` as `python`
2023-08-18 19:25:19,595 [root] [DEBUG] Detected `tests/embedchain/test_embedchain.py` as `python`
2023-08-18 19:25:19,606 [root] [DEBUG] Detected `tests/embedchain/test_chat.py` as `python`
2023-08-18 19:25:19,615 [root] [DEBUG] Detected `tests/chunkers/test_text.py` as `python`
2023-08-18 19:25:19,643 [root] [DEBUG] Detected `.env.example` as `gas`
2023-08-18 19:25:19,643 [root] [DEBUG] Detected language `gas` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:25:19,699 [root] [DEBUG] Detected `embedchain/utils.py` as `python`
2023-08-18 19:25:19,932 [root] [DEBUG] Detected `embedchain/embedchain.py` as `python`
2023-08-18 19:25:19,939 [root] [DEBUG] Detected `embedchain/__init__.py` as `python`
2023-08-18 19:25:19,949 [root] [DEBUG] Detected `embedchain/vectordb/base_vector_db.py` as `python`
2023-08-18 19:25:19,979 [root] [DEBUG] Detected `embedchain/vectordb/elasticsearch_db.py` as `python`
2023-08-18 19:25:20,003 [root] [DEBUG] Detected `embedchain/vectordb/chroma_db.py` as `python`
2023-08-18 19:25:20,009 [root] [DEBUG] Detected `embedchain/config/AddConfig.py` as `python`
2023-08-18 19:25:20,024 [root] [DEBUG] Detected `embedchain/config/ChatConfig.py` as `python`
2023-08-18 19:25:20,038 [root] [DEBUG] Detected `embedchain/config/BaseConfig.py` as `python`
2023-08-18 19:25:20,040 [root] [DEBUG] Detected `embedchain/config/__init__.py` as `python`
2023-08-18 19:25:20,084 [root] [DEBUG] Detected `embedchain/config/QueryConfig.py` as `python`
2023-08-18 19:25:20,088 [root] [DEBUG] Detected `embedchain/config/vectordbs/ElasticsearchDBConfig.py` as `python`
2023-08-18 19:25:20,129 [root] [DEBUG] Detected `embedchain/config/apps/CustomAppConfig.py` as `python`
2023-08-18 19:25:20,138 [root] [DEBUG] Detected `embedchain/config/apps/OpenSourceAppConfig.py` as `python`
2023-08-18 19:25:20,164 [root] [DEBUG] Detected `embedchain/config/apps/BaseAppConfig.py` as `python`
2023-08-18 19:25:20,174 [root] [DEBUG] Detected `embedchain/config/apps/AppConfig.py` as `python`
2023-08-18 19:25:20,185 [root] [DEBUG] Detected `embedchain/apps/OpenSourceApp.py` as `python`
2023-08-18 19:25:20,193 [root] [DEBUG] Detected `embedchain/apps/App.py` as `python`
2023-08-18 19:25:20,208 [root] [DEBUG] Detected `embedchain/apps/PersonApp.py` as `python`
2023-08-18 19:25:20,242 [root] [DEBUG] Detected `embedchain/apps/CustomApp.py` as `python`
2023-08-18 19:25:20,249 [root] [DEBUG] Detected `embedchain/apps/Llama2App.py` as `python`
2023-08-18 19:25:20,264 [root] [DEBUG] Detected `embedchain/data_formatter/data_formatter.py` as `python`
2023-08-18 19:25:20,266 [root] [DEBUG] Detected `embedchain/data_formatter/__init__.py` as `python`
2023-08-18 19:25:20,268 [root] [DEBUG] Detected `embedchain/models/VectorDatabases.py` as `python`
2023-08-18 19:25:20,269 [root] [DEBUG] Detected `embedchain/models/Providers.py` as `python`
2023-08-18 19:25:20,271 [root] [DEBUG] Detected `embedchain/models/data_type.py` as `python`
2023-08-18 19:25:20,272 [root] [DEBUG] Detected `embedchain/models/VectorDimensions.py` as `python`
2023-08-18 19:25:20,274 [root] [DEBUG] Detected `embedchain/models/EmbeddingFunctions.py` as `python`
2023-08-18 19:25:20,275 [root] [DEBUG] Detected `embedchain/models/__init__.py` as `python`
2023-08-18 19:25:20,279 [root] [DEBUG] Detected `embedchain/chunkers/notion.py` as `python`
2023-08-18 19:25:20,288 [root] [DEBUG] Detected `embedchain/chunkers/base_chunker.py` as `python`
2023-08-18 19:25:20,292 [root] [DEBUG] Detected `embedchain/chunkers/text.py` as `python`
2023-08-18 19:25:20,297 [root] [DEBUG] Detected `embedchain/chunkers/docx_file.py` as `python`
2023-08-18 19:25:20,303 [root] [DEBUG] Detected `embedchain/chunkers/youtube_video.py` as `python`
2023-08-18 19:25:20,310 [root] [DEBUG] Detected `embedchain/chunkers/qna_pair.py` as `python`
2023-08-18 19:25:20,316 [root] [DEBUG] Detected `embedchain/chunkers/pdf_file.py` as `python`
2023-08-18 19:25:20,322 [root] [DEBUG] Detected `embedchain/chunkers/docs_site.py` as `python`
2023-08-18 19:25:20,328 [root] [DEBUG] Detected `embedchain/chunkers/web_page.py` as `python`
2023-08-18 19:25:20,336 [root] [DEBUG] Detected `embedchain/loaders/notion.py` as `python`
2023-08-18 19:25:20,338 [root] [DEBUG] Detected `embedchain/loaders/local_text.py` as `python`
2023-08-18 19:25:20,341 [root] [DEBUG] Detected `embedchain/loaders/local_qna_pair.py` as `python`
2023-08-18 19:25:20,345 [root] [DEBUG] Detected `embedchain/loaders/docx_file.py` as `python`
2023-08-18 19:25:20,356 [root] [DEBUG] Detected `embedchain/loaders/base_loader.py` as `python`
2023-08-18 19:25:20,360 [root] [DEBUG] Detected `embedchain/loaders/youtube_video.py` as `python`
2023-08-18 19:25:20,379 [root] [DEBUG] Detected `embedchain/loaders/docs_site_loader.py` as `python`
2023-08-18 19:25:20,385 [root] [DEBUG] Detected `embedchain/loaders/pdf_file.py` as `python`
2023-08-18 19:25:20,392 [root] [DEBUG] Detected `embedchain/loaders/sitemap.py` as `python`
2023-08-18 19:25:20,403 [root] [DEBUG] Detected `embedchain/loaders/web_page.py` as `python`
2023-08-18 19:25:20,439 [root] [DEBUG] Detected `README.md` as `markdown`
2023-08-18 19:25:20,453 [root] [DEBUG] Detected `examples/slack_bot/slack_bot.py` as `python`
2023-08-18 19:25:20,461 [root] [DEBUG] Detected `examples/slack_bot/variables.env` as `None`
2023-08-18 19:25:20,461 [root] [DEBUG] Detected language `None` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:25:20,490 [root] [DEBUG] Detected `examples/slack_bot/README.md` as `python`
2023-08-18 19:25:20,498 [root] [DEBUG] Detected `examples/slack_bot/requirements.txt` as `text`
2023-08-18 19:25:20,498 [root] [DEBUG] Detected language `text` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:25:20,506 [root] [DEBUG] Detected `examples/discord_bot/variables.env` as `None`
2023-08-18 19:25:20,506 [root] [DEBUG] Detected language `None` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:25:20,514 [root] [DEBUG] Detected `examples/discord_bot/docker-compose.yml` as `yaml`
2023-08-18 19:25:20,515 [root] [DEBUG] Detected language `yaml` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:25:20,523 [root] [DEBUG] Detected `examples/discord_bot/README.md` as `markdown`
2023-08-18 19:25:20,531 [root] [DEBUG] Detected `examples/discord_bot/.dockerignore` as `gas`
2023-08-18 19:25:20,531 [root] [DEBUG] Detected language `gas` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:25:20,539 [root] [DEBUG] Detected `examples/discord_bot/discord_bot.py` as `python`
2023-08-18 19:25:20,547 [root] [DEBUG] Detected `examples/discord_bot/Dockerfile` as `php`
2023-08-18 19:25:20,554 [root] [DEBUG] Detected `examples/discord_bot/requirements.txt` as `text`
2023-08-18 19:25:20,554 [root] [DEBUG] Detected language `text` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:25:20,563 [root] [DEBUG] Detected `examples/full_stack/docker-compose.yml` as `yaml`
2023-08-18 19:25:20,563 [root] [DEBUG] Detected language `yaml` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:25:20,574 [root] [DEBUG] Detected `examples/full_stack/README.md` as `markdown`
2023-08-18 19:25:20,580 [root] [DEBUG] Detected `examples/full_stack/.dockerignore` as `gas`
2023-08-18 19:25:20,580 [root] [DEBUG] Detected language `gas` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:25:20,583 [root] [DEBUG] Detected `examples/full_stack/backend/models.py` as `python`
2023-08-18 19:25:20,590 [root] [DEBUG] Detected `examples/full_stack/backend/.dockerignore` as `gas`
2023-08-18 19:25:20,590 [root] [DEBUG] Detected language `gas` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:25:20,591 [root] [DEBUG] Detected `examples/full_stack/backend/paths.py` as `python`
2023-08-18 19:25:20,600 [root] [DEBUG] Detected `examples/full_stack/backend/Dockerfile` as `php`
2023-08-18 19:25:20,603 [root] [DEBUG] Detected `examples/full_stack/backend/server.py` as `python`
2023-08-18 19:25:20,608 [root] [DEBUG] Detected `examples/full_stack/backend/routes/sources.py` as `python`
2023-08-18 19:25:20,613 [root] [DEBUG] Detected `examples/full_stack/backend/routes/chat_response.py` as `python`
2023-08-18 19:25:20,625 [root] [DEBUG] Detected `examples/full_stack/backend/routes/dashboard.py` as `python`
2023-08-18 19:26:34,248 [root] [DEBUG] Detected `examples/full_stack/frontend/package-lock.json` as `json`
2023-08-18 19:26:34,248 [root] [DEBUG] Detected language `json` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,406 [root] [DEBUG] Detected `examples/full_stack/frontend/package.json` as `json`
2023-08-18 19:26:34,406 [root] [DEBUG] Detected language `json` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,413 [root] [DEBUG] Detected `examples/full_stack/frontend/.dockerignore` as `gas`
2023-08-18 19:26:34,413 [root] [DEBUG] Detected language `gas` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,423 [root] [DEBUG] Detected `examples/full_stack/frontend/tailwind.config.js` as `javascript`
2023-08-18 19:26:34,431 [root] [DEBUG] Detected `examples/full_stack/frontend/postcss.config.js` as `javascript`
2023-08-18 19:26:34,439 [root] [DEBUG] Detected `examples/full_stack/frontend/jsconfig.json` as `json`
2023-08-18 19:26:34,440 [root] [DEBUG] Detected language `json` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,448 [root] [DEBUG] Detected `examples/full_stack/frontend/Dockerfile` as `php`
2023-08-18 19:26:34,455 [root] [DEBUG] Detected `examples/full_stack/frontend/.eslintrc.json` as `json`
2023-08-18 19:26:34,455 [root] [DEBUG] Detected language `json` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,465 [root] [DEBUG] Detected `examples/full_stack/frontend/next.config.js` as `javascript`
2023-08-18 19:26:34,472 [root] [DEBUG] Detected `examples/full_stack/frontend/src/styles/globals.css` as `css`
2023-08-18 19:26:34,472 [root] [DEBUG] Detected language `css` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,482 [root] [DEBUG] Detected `examples/full_stack/frontend/src/components/PageWrapper.js` as `javascript`
2023-08-18 19:26:34,488 [root] [DEBUG] Detected `examples/full_stack/frontend/src/components/dashboard/PurgeChats.js` as `javascript`
2023-08-18 19:26:34,496 [root] [DEBUG] Detected `examples/full_stack/frontend/src/components/dashboard/CreateBot.js` as `javascript`
2023-08-18 19:26:34,504 [root] [DEBUG] Detected `examples/full_stack/frontend/src/components/dashboard/DeleteBot.js` as `javascript`
2023-08-18 19:26:34,512 [root] [DEBUG] Detected `examples/full_stack/frontend/src/components/dashboard/SetOpenAIKey.js` as `javascript`
2023-08-18 19:26:34,521 [root] [DEBUG] Detected `examples/full_stack/frontend/src/components/chat/BotWrapper.js` as `javascript`
2023-08-18 19:26:34,531 [root] [DEBUG] Detected `examples/full_stack/frontend/src/components/chat/HumanWrapper.js` as `javascript`
2023-08-18 19:26:34,536 [root] [DEBUG] Detected `examples/full_stack/frontend/src/pages/index.js` as `javascript`
2023-08-18 19:26:34,538 [root] [DEBUG] Detected `examples/full_stack/frontend/src/pages/_document.js` as `javascript`
2023-08-18 19:26:34,541 [root] [DEBUG] Detected `examples/full_stack/frontend/src/pages/_app.js` as `javascript`
2023-08-18 19:26:34,544 [root] [DEBUG] Detected `examples/full_stack/frontend/src/pages/[bot_slug]/app.js` as `javascript`
2023-08-18 19:26:34,559 [root] [DEBUG] Detected `examples/full_stack/frontend/src/containers/ChatWindow.js` as `javascript`
2023-08-18 19:26:34,575 [root] [DEBUG] Detected `examples/full_stack/frontend/src/containers/SetSources.js` as `javascript`
2023-08-18 19:26:34,593 [root] [DEBUG] Detected `examples/full_stack/frontend/src/containers/Sidebar.js` as `javascript`
2023-08-18 19:26:34,602 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/plus.svg` as `cpp`
2023-08-18 19:26:34,611 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/dropdown.svg` as `cpp`
2023-08-18 19:26:34,621 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/cross.svg` as `cpp`
2023-08-18 19:26:34,635 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/sitemap.svg` as `cpp`
2023-08-18 19:26:34,650 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/settings.svg` as `cpp`
2023-08-18 19:26:34,659 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/dropup.svg` as `cpp`
2023-08-18 19:26:34,671 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/doc.svg` as `cpp`
2023-08-18 19:26:34,680 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/text.svg` as `cpp`
2023-08-18 19:26:34,696 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/youtube.svg` as `cpp`
2023-08-18 19:26:34,713 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/github.svg` as `cpp`
2023-08-18 19:26:34,725 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/web.svg` as `cpp`
2023-08-18 19:26:34,734 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/drawer.svg` as `cpp`
2023-08-18 19:26:34,746 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/twitter.svg` as `delphi`
2023-08-18 19:26:34,746 [root] [DEBUG] Detected language `delphi` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,760 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/bot.svg` as `cpp`
2023-08-18 19:26:34,771 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/linkedin.svg` as `html`
2023-08-18 19:26:34,781 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/close.svg` as `cpp`
2023-08-18 19:26:34,790 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/dashboard.svg` as `cpp`
2023-08-18 19:26:34,800 [root] [DEBUG] Detected `examples/full_stack/frontend/public/icons/pdf.svg` as `cpp`
2023-08-18 19:26:34,808 [root] [DEBUG] Detected `examples/telegram_bot/variables.env` as `None`
2023-08-18 19:26:34,808 [root] [DEBUG] Detected language `None` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,820 [root] [DEBUG] Detected `examples/telegram_bot/telegram_bot.py` as `python`
2023-08-18 19:26:34,829 [root] [DEBUG] Detected `examples/telegram_bot/README.md` as `python`
2023-08-18 19:26:34,836 [root] [DEBUG] Detected `examples/telegram_bot/requirements.txt` as `text`
2023-08-18 19:26:34,836 [root] [DEBUG] Detected language `text` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,844 [root] [DEBUG] Detected `examples/whatsapp_bot/variables.env` as `None`
2023-08-18 19:26:34,844 [root] [DEBUG] Detected language `None` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,852 [root] [DEBUG] Detected `examples/whatsapp_bot/README.md` as `python`
2023-08-18 19:26:34,859 [root] [DEBUG] Detected `examples/whatsapp_bot/whatsapp_bot.py` as `python`
2023-08-18 19:26:34,867 [root] [DEBUG] Detected `examples/whatsapp_bot/requirements.txt` as `text`
2023-08-18 19:26:34,867 [root] [DEBUG] Detected language `text` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,874 [root] [DEBUG] Detected `examples/api_server/variables.env` as `None`
2023-08-18 19:26:34,874 [root] [DEBUG] Detected language `None` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,882 [root] [DEBUG] Detected `examples/api_server/docker-compose.yml` as `yaml`
2023-08-18 19:26:34,882 [root] [DEBUG] Detected language `yaml` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,890 [root] [DEBUG] Detected `examples/api_server/README.md` as `markdown`
2023-08-18 19:26:34,899 [root] [DEBUG] Detected `examples/api_server/api_server.py` as `python`
2023-08-18 19:26:34,907 [root] [DEBUG] Detected `examples/api_server/.dockerignore` as `gas`
2023-08-18 19:26:34,907 [root] [DEBUG] Detected language `gas` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,914 [root] [DEBUG] Detected `examples/api_server/Dockerfile` as `php`
2023-08-18 19:26:34,922 [root] [DEBUG] Detected `examples/api_server/requirements.txt` as `text`
2023-08-18 19:26:34,922 [root] [DEBUG] Detected language `text` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,936 [root] [DEBUG] Detected `pyproject.toml` as `toml`
2023-08-18 19:26:34,936 [root] [DEBUG] Detected language `toml` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,946 [root] [DEBUG] Detected `docs/introduction.mdx` as `python`
2023-08-18 19:26:34,956 [root] [DEBUG] Detected `docs/README.md` as `markdown`
2023-08-18 19:26:34,971 [root] [DEBUG] Detected `docs/development.mdx` as `haskell`
2023-08-18 19:26:34,971 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,977 [root] [DEBUG] Detected `docs/quickstart.mdx` as `haskell`
2023-08-18 19:26:34,977 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:34,989 [root] [DEBUG] Detected `docs/mint.json` as `json`
2023-08-18 19:26:34,989 [root] [DEBUG] Detected language `json` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,046 [root] [DEBUG] Detected `docs/advanced/testing.mdx` as `python`
2023-08-18 19:26:35,070 [root] [DEBUG] Detected `docs/advanced/query_configuration.mdx` as `haskell`
2023-08-18 19:26:35,070 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,081 [root] [DEBUG] Detected `docs/advanced/adding_data.mdx` as `python`
2023-08-18 19:26:35,106 [root] [DEBUG] Detected `docs/advanced/data_types.mdx` as `haskell`
2023-08-18 19:26:35,106 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,113 [root] [DEBUG] Detected `docs/advanced/vector_database.mdx` as `haskell`
2023-08-18 19:26:35,113 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,130 [root] [DEBUG] Detected `docs/advanced/app_types.mdx` as `python`
2023-08-18 19:26:35,150 [root] [DEBUG] Detected `docs/advanced/configuration.mdx` as `haskell`
2023-08-18 19:26:35,150 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,167 [root] [DEBUG] Detected `docs/advanced/interface_types.mdx` as `haskell`
2023-08-18 19:26:35,168 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,200 [root] [DEBUG] Detected `docs/advanced/showcase.mdx` as `haskell`
2023-08-18 19:26:35,200 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,219 [root] [DEBUG] Detected `docs/contribution/dev.mdx` as `ruby`
2023-08-18 19:26:35,231 [root] [DEBUG] Detected `docs/contribution/docs.mdx` as `haskell`
2023-08-18 19:26:35,231 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,556 [root] [DEBUG] Detected `docs/logo/light.svg` as `delphi`
2023-08-18 19:26:35,556 [root] [DEBUG] Detected language `delphi` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,909 [root] [DEBUG] Detected `docs/logo/dark.svg` as `delphi`
2023-08-18 19:26:35,910 [root] [DEBUG] Detected language `delphi` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,940 [root] [DEBUG] Detected `docs/examples/whatsapp_bot.mdx` as `haskell`
2023-08-18 19:26:35,940 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,952 [root] [DEBUG] Detected `docs/examples/slack_bot.mdx` as `haskell`
2023-08-18 19:26:35,952 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,966 [root] [DEBUG] Detected `docs/examples/discord_bot.mdx` as `haskell`
2023-08-18 19:26:35,966 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,978 [root] [DEBUG] Detected `docs/examples/telegram_bot.mdx` as `haskell`
2023-08-18 19:26:35,978 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:35,992 [root] [DEBUG] Detected `docs/examples/api_server.mdx` as `haskell`
2023-08-18 19:26:35,992 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:36,002 [root] [DEBUG] Detected `docs/examples/full_stack.mdx` as `haskell`
2023-08-18 19:26:36,002 [root] [DEBUG] Detected language `haskell` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:36,015 [root] [DEBUG] Detected `CITATION.cff` as `cpp`
2023-08-18 19:26:36,025 [root] [DEBUG] Detected `notebooks/embedchain-chromadb-server.ipynb` as `python`
2023-08-18 19:26:36,039 [root] [DEBUG] Detected `notebooks/embedchain-docs-site-example.ipynb` as `ruby`
2023-08-18 19:26:36,047 [root] [DEBUG] Detected `poetry.toml` as `toml`
2023-08-18 19:26:36,047 [root] [DEBUG] Detected language `toml` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:36,054 [root] [DEBUG] Detected `setup.py` as `python`
2023-08-18 19:26:36,064 [root] [DEBUG] Detected `Makefile` as `php`
2023-08-18 19:26:36,073 [root] [DEBUG] Detected `.pre-commit-config.yaml` as `yaml`
2023-08-18 19:26:36,073 [root] [DEBUG] Detected language `yaml` is not a compatibe language. Falling back to plaintext text splitter.
2023-08-18 19:26:36,086 [root] [DEBUG] Detected `CONTRIBUTING.md` as `markdown`

@cachho
Copy link
Contributor Author

cachho commented Aug 18, 2023

added .gptignore with examples/full_stack/frontend/package-lock.json which halfes the lines and reduces the runtime by about a minute.

@cachho
Copy link
Contributor Author

cachho commented Aug 18, 2023

Test script

from embedchain import OpenSourceApp as App
from embedchain.config import OpenSourceAppConfig as AppConfig
from embedchain.config import QueryConfig

config = AppConfig(log_level="WARNING")
app = App(config=config)
# app.reset()
print("Reset app to run clean test")
app.add("https://github.com/embedchain/embedchain.git")

config = QueryConfig(number_documents=1)
print("Answer:", app.query("What does the query function do?", config=config))
print("Answer:", app.query("Where is the _get_detected_language function located?", config=config))
print("Answer:", app.query("Where are some references of the number_documents variable or attribute?", config=config))
print("Answer:", app.query("Which parts of the code have todo comments?", config=config))

this test is running the open source app. Maybe that's why the results are so bad

Answer:  The `query` function is used to retrieve data from a database or other data source. It allows you to ask questions of the data and receive back the results of your queries. For example, if you wanted to find all customers who live in a specific city, you could use a SQL query like this: 

 SELECT * FROM customers WHERE city = 'New York';

 This would return a list of all customers who live in New York.
Answer:  The `language_info` object does not contain a specific function called `_get_detected_language`. Instead, it contains information about the detected language, such as its name and code. You can access this information using the appropriate methods provided by the CodeMirror library.
Answer:  - The number_documents variable is a list of documents in a specific order that needs to be sorted by their relevance to a particular topic. It can be accessed using the `number_documents` function, which returns a list of all the documents in the specified order. You can also use the `number_documents` attribute to get or set the value of this variable.
Answer:  To determine which parts of the code have do-to comments, you can use a text editor or IDE that supports code completion and auto-correction. In most cases, do-to comments are used to explain what the code is doing and why it's being written in a certain way. They help developers understand the code better and make it easier to modify and maintain over time. However, if you don't see any do-to comments, that doesn't necessarily mean they're not needed - it just means that the author may have chosen not to use them.

@cachho
Copy link
Contributor Author

cachho commented Aug 18, 2023

Updated test script:

from embedchain import OpenSourceApp as App
from embedchain.config import OpenSourceAppConfig as AppConfig
from embedchain.config import QueryConfig

config = AppConfig(log_level="WARNING")
app = App(config=config)
# app.reset()
print("Reset app to run clean test")
app.add("https://github.com/embedchain/embedchain.git")

config = QueryConfig(number_documents=1)
questions = [
    "What does the query function do?",
    "Where is the _get_detected_language function located?",
    "Where are some references of the number_documents variable or attribute?",
    "Which parts of the code have todo comments?",
]
for question in questions:
    print(f"--------------------------------\n{question}\n--------------------------------")
    print(app.query(question, config=config, dry_run=True))
    print("Answer:", app.query(question, config=config, dry_run=False))

Updated response:

--------------------------------
What does the query function do?
--------------------------------

  Use the following pieces of context to answer the query at the end.
  If you don't know the answer, just say that you don't know, don't try to make up an answer.

  for query in queries:

  Query: What does the query function do?

  Helpful Answer:

Answer:  The `query` function is used to retrieve data from a database or other data source. It allows you to ask questions of the data and receive back the results of your queries. For example, if you wanted to find all customers who live in a specific city, you could use a SQL query like this: 

 SELECT * FROM customers WHERE city = 'New York';

 This would return a list of all customers who live in New York.
--------------------------------
Where is the _get_detected_language function located?
--------------------------------

  Use the following pieces of context to answer the query at the end.
  If you don't know the answer, just say that you don't know, don't try to make up an answer.

  },
  "language_info": {
   "codemirror_mode": {

  Query: Where is the _get_detected_language function located?

  Helpful Answer:

Answer:  The `language_info` object does not contain a specific function called `_get_detected_language`. Instead, it contains information about the detected language, such as its name and code. You can access this information using the appropriate methods provided by the CodeMirror library.
--------------------------------
Where are some references of the number_documents variable or attribute?
--------------------------------

  Use the following pieces of context to answer the query at the end.
  If you don't know the answer, just say that you don't know, don't try to make up an answer.

  number_documents

  Query: Where are some references of the number_documents variable or attribute?

  Helpful Answer:

Answer:  - The number_documents variable is a list of documents in a specific order that needs to be sorted by their relevance to a particular topic. It can be accessed using the `number_documents` function, which returns a list of all the documents in the specified order. You can also use the `number_documents` attribute to get or set the value of this variable.
--------------------------------
Which parts of the code have todo comments?
--------------------------------

  Use the following pieces of context to answer the query at the end.
  If you don't know the answer, just say that you don't know, don't try to make up an answer.

  # Todo: Automatically recreating a

  Query: Which parts of the code have todo comments?

  Helpful Answer:

Answer:  To determine which parts of the code have do-to comments, you can use a text editor or IDE that supports code completion and auto-correction. In most cases, do-to comments are used to explain what the code is doing and why it's being written in a certain way. They help developers understand the code better and make it easier to modify and maintain over time. However, if you don't see any do-to comments, that doesn't necessarily mean they're not needed - it just means that the author may have chosen not to use them.

This was referenced Aug 29, 2023
@mggger
Copy link
Contributor

mggger commented Aug 30, 2023

Cool! @cachho

@mggger
Copy link
Contributor

mggger commented Sep 5, 2023

@cachho Hi, When encountering some errors during local testing, is it better not to submit all at once?

.git                : 100%|██████████████████████████████████████████████████| 26/26 [00:00<00:00, 654.17it/s]
.github             : 100%|██████████████████████████████████████████████████| 19/19 [00:00<00:00, 671.28it/s]
.devcontainer       : 100%|██████████████████████████████████████████████████| 3/3 [00:00<00:00, 575.17it/s]
docs                : 100%|██████████████████████████████████████████████████| 1174/1174 [00:03<00:00, 355.96it/s]
libs                : 100%|██████████████████████████████████████████████████| 1898/1898 [00:04<00:00, 380.68it/s]
2023-09-05 13:28:10,230 INFO     repo_loader [73] repository read, 2893 files, 479776 lines
2023-09-05 13:52:07,697 ERROR    BaseHandler [63] Error 500: {"error":"OperationalError('too many SQL variables')"}
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/chromadb/api/fastapi.py", line 410, in raise_chroma_error
    resp.raise_for_status()
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:8000/api/v1/collections/903c0b02-9d5a-4c57-bdd7-d0966ee422d0/get

@mggger
Copy link
Contributor

mggger commented Sep 5, 2023

Sqlite appears to have some limitations.

@cachho
Copy link
Contributor Author

cachho commented Sep 5, 2023

@mggger does the repository you load contain sql code? almost seems like an issue with unescaped instructions.

@mggger
Copy link
Contributor

mggger commented Sep 6, 2023

@cachho Hi, I test to add langchain project which will cause this error

I have created an issue about this chromadb bug.

chroma-core/chroma#1101 (comment)

@mggger
Copy link
Contributor

mggger commented Sep 6, 2023

@cachho I think it's better to add max_batch_size parameter in function load_and_embed, from the examples I have tested 271472 ids query will cause chromadb error

@mechanicmuthu
Copy link

This will be really useful. I hope the maintainers add this fast to main.

@cachho
Copy link
Contributor Author

cachho commented Sep 13, 2023

This will be really useful. I hope the maintainers add this fast to main.

Hey, we are not happy with the quality yet. You can see from my tests that it's not the greatest. When we rollout a data type we want it to be working. Do you have any suggestions on how we could improve it?

@taranjeet
Copy link
Member

Closing this PR as we have already added support for github repo. thanks for the contribution.

@taranjeet taranjeet closed this Dec 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new-data-source New data source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support to load codebase
4 participants