Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to load Github documents using stream #3331

Closed
Njuelle opened this issue Nov 18, 2023 · 2 comments · Fixed by #3339
Closed

Allow to load Github documents using stream #3331

Njuelle opened this issue Nov 18, 2023 · 2 comments · Fixed by #3339
Labels
auto:improvement Medium size change to existing code to handle new use-cases

Comments

@Njuelle
Copy link
Contributor

Njuelle commented Nov 18, 2023

Hey,

Since I tried to load documents from a large Github repository using GithubRepoLoader. I ran on JavaScript heap out of memory error.
It could be nice to allow to load Github documents using stream

I started to work on a PR that allow this behaviour using stream with generators.

Proposed usage:

const loader = new GithubRepoLoader(
  "https://github.com/langchain-ai/langchainjs",
  {
    branch: "main",
    recursive: false,
    unknown: "warn",
    maxConcurrency: 3, // Defaults to 2
  }
);

const docs = [];
for await (const doc of loader.loadAsStream()) {
  docs.push(doc);
}

console.log({ docs });

wdyt?

Copy link

dosubot bot commented Nov 18, 2023

🤖

Hello @Njuelle,

Nice to meet you! I'm Dosu, a bot that's here to assist with your issues, answer questions, and help you contribute to the project. While we wait for a human maintainer, I'll be doing my best to provide you with assistance.

I see you've posted an issue and I'm currently reviewing it. I'll be providing a full response shortly. Hang tight!

@dosubot dosubot bot added the auto:improvement Medium size change to existing code to handle new use-cases label Nov 18, 2023
Copy link

dosubot bot commented Nov 18, 2023

🤖

That sounds like a great idea! Loading Github documents using streams can help prevent memory issues when dealing with large repositories. Your proposed usage with generators looks good as well. It would be really helpful for users who need to load documents from a Github repository. Thank you for taking the initiative to work on a PR for this feature. I encourage you to continue working on it and submit the PR when you're ready. The maintainers will review it and decide if it should be merged. Keep up the good work!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:improvement Medium size change to existing code to handle new use-cases
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant