Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NDJSON/CSV methods to add and update documents #191

Closed
4 of 13 tasks
curquiza opened this issue Oct 19, 2021 · 6 comments · Fixed by #494
Closed
4 of 13 tasks

NDJSON/CSV methods to add and update documents #191

curquiza opened this issue Oct 19, 2021 · 6 comments · Fixed by #494
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@curquiza
Copy link
Member

curquiza commented Oct 19, 2021

⚠️ This issue is generated, it means the nameing might be done differently in this package (ex: add_documents_json instead of addDocumentsJson). Keep the already existing way of naming in this package to stay idiomatic with the language and this repository.

📣 We strongly recommend doing multiple PRs to solve all the points of this issue

MeiliSearch v0.23.0 introduces two changes:

  • new valid formats to push data files, additionally to the JSON format: CSV and NDJSON formats.
  • it enforces the Content-type header for every route requiring a payload (POST and PUT routes)

Here are the expected changes to completely close the issue:

  • Currently, the SDKs always send Content-Type: application/json to every request. Only the POST and PUT requests should send the Content-Type: application/json and not the DELETE and GET ones.

  • Add the following methods and 🔥 the associated tests 🔥 to ADD the documents. Depending on the format type (csv or ndjson) the SDK should send Content-Type: application/x-dnjson or Content-Type: text/csv)

    • addDocumentsJson(string docs, string primaryKey)
    • addDocumentsCsv(string docs, string primaryKey)
    • addDocumentsCsvInBatches(string docs, int batchSize, string primaryKey)
    • addDocumentsNdjson(string docs, string primaryKey)
    • addDocumentsNdjsonInBatches(string docs, int batchSize, string primaryKey)
  • Add the following methods and 🔥 the associated tests 🔥 to UPDATE the documents. Depending on the format type (csv or ndjson) the SDK should send Content-Type: application/x-dnjson or Content-Type: text/csv)

    • updateDocumentsJson(string docs, string primaryKey)
    • updateDocumentsCsv(string docs, string primaryKey)
    • updateDocumentsCsvInBatches(string docs, int batchSize, string primaryKey)
    • updateDocumentsNdjson(string docs, string primaryKey)
    • updateDocumentsNdjsonInBatches(string docs, int batchSize, string primaryKey)

docs are the documents sent as String
primaryKey is the primary key of the index
batchSize is the size of the batch. Example: you can send 2000 documents in raw String in docs and ask for a batchSize of 1000, so your documents will be sent to MeiliSearch in two batches.

Example of PRs:


Related to: meilisearch/integration-guides#146

If this issue is partially/completely implemented, feel free to let us know.

@carlosb1
Copy link
Contributor

carlosb1 commented Jul 9, 2023

The idea is the encapsulation of the functions for ndjson / json / csv? It seems add_or_replace_unchecked_payload permits these formats

    /// let task = movie_index.add_or_replace_unchecked_payload(
    ///     r#"{ "id": 1, "body": "doggo" }
    ///     { "id": 2, "body": "catto" }"#.as_bytes(),
    ///     "application/x-ndjson",
    ///     Some("id"),
    ///   ).await.unwrap();

Also, about the naming, shouldn't it be snakecase , not camelcase?

@bidoubiwa
Copy link
Contributor

Hey @carlosb1
Indeed add_or_replace_unchecked_payload does the trick. The PR is kept open in case someone want to specifically implement these functions.

@curquiza issue is an issue that was created in our different SDK's, this is why the API design might not exactly follow rust conventions. We expect the contributor to adapt this design to be more in line with rust :)

@carlosb1
Copy link
Contributor

carlosb1 commented Aug 9, 2023

Well. I did a first PR with the first functions... I think it can be a good kickoff.... Furthermore, I think there are an issue with the tests, it saw some random error.

@curquiza
Copy link
Member Author

Not all features are done so I re open

meili-bors bot added a commit that referenced this issue Sep 12, 2023
508: add functions for csv documents r=curquiza a=carlosb1

# Pull Request

## Related issue
Fixes partially #191

## What does this PR do?
Added the first two functions (with their tests) for issue #191. It includes two functions for sending csv payloads 
- `add_documents_csv`
-  `update_documents_csv`


## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: carlosb1 <[email protected]>
@carlosb1
Copy link
Contributor

carlosb1 commented Sep 22, 2023

Checking the code for the implementation of the batches functions: addDocumentsNdjsonInBatches , addDocumentsCsvInBatches , etc. I am not sure it makes sense... How should it split in batches?.... the input param for each function is a string.. you can not decide how to split these strings... In the current implementation works because it uses &[T] where T is an independent object serialized... it can send as each request in parallel

@curquiza
Copy link
Member Author

Indeed, you are right. If it makes no sense for the community, let's close it. This package is for the community, so let's not add useless maintenance 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants