Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearmash: how to call Document/Folder/Get when there are too many items in the folder? #43

Open
OriHoch opened this issue Jul 11, 2017 · 14 comments

Comments

@OriHoch
Copy link
Contributor

OriHoch commented Jul 11, 2017

reproduction

  • call the clearmash web content API: /Document/Folder/Get for a folder with many items
  • (for example, the photoUnits folder id 42)

expected

  • should support returning many items
  • maybe, pagination?

actual

  • takes a long time to load (makes sense, as it returns many items)
  • sometimes it seems it doesn't return response (probably because too many items)
@OriHoch OriHoch changed the title [Task] Clearmash: how to call Document/Folder/Get when there are too many items in the folder Clearmash: how to call Document/Folder/Get when there are too many items in the folder? Jul 11, 2017
@ghost
Copy link

ghost commented Jul 12, 2017

Hi,
If I understand correctly you are using the Folder API as some sort of a primitive crawling mechanism?
If that is so, I don’t think this is the right API, the Folder API is not designed for such a use.

@OriHoch
Copy link
Contributor Author

OriHoch commented Jul 12, 2017

Ok, which API would you suggest we use?

@ghost
Copy link

ghost commented Jul 13, 2017

Using this API you can create a primitive\naive crawling mechanism, but that means that you will need to go over all the documents all the time. i'm not sure this is efficient solution.
It would be better to design a sync framework that will get only the documents that changed from ClearMash after the last sync. this way if will be faster, less impact on both systems and robust.

@OriHoch
Copy link
Contributor Author

OriHoch commented Jul 13, 2017

that's exactly what we want - get all the documents, that's the best solution for us

it already works, and we have all the systems in place to support this on our end

@ghost
Copy link

ghost commented Jul 13, 2017

BH manage ~5M logical items and ~30M relations in ClearMash. In this approach, you will load, transfer over the web and reindex all of them all the time.
Don’t you prefer to get only the changes from last indexing? It cloud be much faster & efficient.

@OriHoch
Copy link
Contributor Author

OriHoch commented Jul 13, 2017

I agree that it will be faster & efficient to have a sync, but right now, we need the API /Document/Folder/Get to work

@ghost
Copy link

ghost commented Jul 13, 2017

As I said, the Folder API is not the right API since it does not intend to support such a use case.
I think this is a wrong design, but if you insist on working this way, I believe that you better use the Queries API (ExecuteAdHocQuery).

@OriHoch
Copy link
Contributor Author

OriHoch commented Jul 13, 2017

ok, thanks, can you provide documentation about the limitations of the API?
So we know in the future which API methods we can or cannot use?

@OriHoch
Copy link
Contributor Author

OriHoch commented Jul 13, 2017

also, the documentation for Queries API does not provide details about how to use it.. all the parameters are undocumented, making it impossible to use

I opened an issue regarding that problem: #52

@OriHoch
Copy link
Contributor Author

OriHoch commented Jul 13, 2017

opened an issue for future improvement to support syncing only newly updated items: #53

@ghost
Copy link

ghost commented Aug 2, 2017

Even though this approach is not recommended we have extended the Folder API timeout to allow you to request folders with many items.
This update will be part of the next version.

@nuritgazit
Copy link

@TheGrandVizier

@OriHoch
Copy link
Contributor Author

OriHoch commented Aug 3, 2017

thanks @omrisuissa - can you estimate what's the limit on number of items you can return in this way?
or say what's the timeout duration?

@ghost
Copy link

ghost commented Aug 3, 2017

@OriHoch , yes.
The current timeout is 30 seconds; the new timeout will be 180 seconds (6 times longer). So I believe it will be enough. But in the chance that I’m wrong, we will be able to make it even longer (but I believe this will be good even for the largest folder).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants