Move from v1 to v2 APIs #137

eskibars · 2024-12-15T22:09:26Z

This isn't perfect, but I think it moves a long way forward.

Still TODO:

In v2 we made return codes much more sane, as opposed to the embedded errors in v1
Testing

Things I'd like to do but might save for other PRs:

Make the option to create corpora instead of "just" reset them, now that we have a corpus_key that could do so
- Set it on by default for all of the crawlers in the config dir
- We could initiate sensible filters at creation time to make the onboarding experience much smoother
If OAuth isn't defined, try to use the API key as a personal API key
There are a number of new parameters in these APIs in API v2 that we could enable and would make sense here
Parts of the code use API keys while others use OAuth, and there are better ways to handle that
There's some stuff we could do to replace metadata in cases where it's been updated and we detect a conflict

ofermend · 2024-12-26T05:04:05Z

config/askFeynman.yaml

+vectara:
+  corpus_key: feynman
+  reindex: false
+  create_corpus: true


Why did you choose to make this default true for all examples? I think it's better to make this default false (and in these examples too).

Just from an ease of use perspective for getting started users: if I've never created a "feynman" corpus before, it's 1 less step. In the future, I think it'd be even a step above to:

Detect if the corpus exists and choose accordingly

Create and/or update filters as part of the creation

In general, it's best to optimize for the user that knows/has less experience with your product in a tool like vectara-ingest

Yes that makes sense, although it's a good starting point, but often vectara-ingest is used repeatedly to update (as in all the moma crawls and askNews crawls). Also I think this requires personal API key.
Let me modify then to pick a few like this, and a few like that.

Yes, it does require a personal API key, but parts of vectara-ingest already do in reality.

I'm curious though: why do you think it's better to have a false value for these? It's pretty innocuous to have a "true" value: if the corpus already exists it's not like it gets removed: it just errors on the corpus creation part (which we can catch/except). It seems safer to me in every case to try to create a corpus

It's not a really big deal either way IMO. I just think the news and moma examples lend themselves more to daily updates which are not good candidates for create_corpus. I'd like to demonstrate that us too.

What other pieces of vectara-ingest do you think require personal API key? Trying to avoid making that a requirement, although we could if needed.

Anything that uses x-api-key in the code relies on at least having an API key in addition to OAuth (whether it's a personal API key or not). At first glance, that's: _does_doc_exist, delete_doc, _list_docs, _index_file, and index_document. I think all of those should work with a non-personal API key, but it feels strange to require both an API key and OAuth to have this functionality, and I strongly suspect any user would just plug in a personal API key, especially given the "regular" API keys are scoped to specific corpora and they'd need lots otherwise

core/indexer.py

…nto v2-apis

bugfixes in indexer

ofermend

I tested rather extensively, found many bugs and fixed them. I think it's now good. @eskibars please take another look when you get a chance (after my changes) and then I think we can merge. This will become v2.0.0

README.md

eskibars · 2024-12-30T10:46:41Z

config/askFeynman.yaml

+vectara:
+  corpus_key: feynman
+  reindex: false
+  create_corpus: true


Yes, it does require a personal API key, but parts of vectara-ingest already do in reality.

I'm curious though: why do you think it's better to have a false value for these? It's pretty innocuous to have a "true" value: if the corpus already exists it's not like it gets removed: it just errors on the corpus creation part (which we can catch/except). It seems safer to me in every case to try to create a corpus

…nto v2-apis

eskibars and others added 14 commits December 15, 2024 22:20

Initial commit

8c103e7

Address TODOs

ab0db1d

Update documentId and metadata

3bcde84

Move "section" to "sections"

f001efc

Fix page key logic

cb133ef

Fix logic bug in page key

cdc53bf

Remove customer_id since it's not needed in API v2

e501a51

Fix bugs in these 2 config files

fb9dd6f

Add a create corpus flag and fix a few bugs

a7a89aa

Fix some bugs and add a create corpus flag

a5f489a

Default to creating corpora. Add API key for creating corpus.

7f298fc

Update response HTTP codes based on API v2

724a1dc

Fix inadvertent linefeed change

1d08cd4

Merge branch 'main' into v2-apis

4c04fa8

ofermend reviewed Dec 26, 2024

View reviewed changes

core/indexer.py Show resolved Hide resolved

ofermend reviewed Dec 26, 2024

View reviewed changes

core/indexer.py Show resolved Hide resolved

ofermend added 10 commits December 25, 2024 21:15

updates

ca0288d

Merge branch 'v2-apis' of https://github.com/vectara/vectara-ingest i…

7521d0f

…nto v2-apis

minor fix to slack crawler docs

2742e6b

minor fix to slack crawler docs

273874a

fix minor issue in bulkuploader

3db0004

upgraded uv version

0e405fb

bugfixes in indexer

updated create_corpus flag

d8b91fe

fixed some lint issues

b7c82a1

bugfixes to notion, github and slack crawlers

5455d09

added better check for reindexing

ecf5ec7

ofermend approved these changes Dec 29, 2024

View reviewed changes

eskibars commented Dec 30, 2024

View reviewed changes

update to README

2bd573d

ofermend and others added 3 commits January 1, 2025 19:57

Sign the commit and update Fox News to use HTTPS

1cf9f22

Merge branch 'v2-apis' of https://github.com/vectara/vectara-ingest i…

45ca567

…nto v2-apis

Merge branch 'main' into v2-apis

408c238

ofermend merged commit f6d4a1d into main Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move from v1 to v2 APIs #137

Move from v1 to v2 APIs #137

eskibars commented Dec 15, 2024 •

edited

Loading

ofermend Dec 26, 2024

eskibars Dec 27, 2024 •

edited

Loading

ofermend Dec 27, 2024

eskibars Dec 30, 2024

ofermend Dec 31, 2024

eskibars Dec 31, 2024

ofermend left a comment

eskibars Dec 30, 2024

Move from v1 to v2 APIs #137

Move from v1 to v2 APIs #137

Conversation

eskibars commented Dec 15, 2024 • edited Loading

ofermend Dec 26, 2024

Choose a reason for hiding this comment

eskibars Dec 27, 2024 • edited Loading

Choose a reason for hiding this comment

ofermend Dec 27, 2024

Choose a reason for hiding this comment

eskibars Dec 30, 2024

Choose a reason for hiding this comment

ofermend Dec 31, 2024

Choose a reason for hiding this comment

eskibars Dec 31, 2024

Choose a reason for hiding this comment

ofermend left a comment

Choose a reason for hiding this comment

eskibars Dec 30, 2024

Choose a reason for hiding this comment

eskibars commented Dec 15, 2024 •

edited

Loading

eskibars Dec 27, 2024 •

edited

Loading