-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't load any channel above 10000 - Error 500 #139
Comments
Update: {
"query": {
"query_string": {
"query": "*"
}
},
"size": 1,
"from": 10000,
"sort": []
} it responded with: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"channelfinder","node":"zYuUqMq8QqyvtKmyl8TuDQ","reason":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.","caused_by":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}},"status":400} how dumb is that? it can store large amounts of data ... but not display more than configured? |
Ah ok, I understand now. You should use the search_after parameter to search more than the query size. https://www.elastic.co/guide/en/elasticsearch/reference/8.13/paginate-search-results.html#search-after IMHO It's because Elastic expects open end amounts of data this restriction is there. I you have billions of channels for example, the default search window should be small so you don't crawl the database for ages. |
Yes, I've found that aswell. clearly a case of RTFM, I've now reached the bottom of channelfinders API documentation and found "scroll" endpoint.
Do I need CF 4.7.2 ? thanks :) |
Actually looks like there is a bug in this. I can't figure out how to use the search_after parameter. I'll work on a fix including updating the documentation. |
thanks :) not sure if you meant this, but maybe it helps: {
"query": {
"query_string": {
"query": "*"
}
},
"size": 10,
"sort": [
{
"name": "asc"
}
]
} elastic will return a set of results, {
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": [
{
"_index": "channelfinder",
"_id": "AGRAJAG:pyroArray",
"_score": null,
"_source": {
"name": "AGRAJAG:pyroArray",
"owner": "admin",
"properties": [
{
"name": "hostName",
"owner": "cfstore",
"value": "agrajag"
},
{
"name": "iocName",
"owner": "cfstore",
"value": "iocpyro"
},
{
"name": "iocid",
"owner": "cfstore",
"value": "10.0.0.42:1408"
},
{
"name": "time",
"owner": "cfstore",
"value": "2024-04-23 01:55:43.154832"
},
{
"name": "recceiverID",
"owner": "cfstore",
"value": "fel-recceiver"
},
{
"name": "recordType",
"owner": "cfstore",
"value": "waveform"
},
{
"name": "pvStatus",
"owner": "cfstore",
"value": "Inactive"
}
],
"tags": []
},
"sort": [
"AGRAJAG:pyroArray"
]
},
[...]
{
"_index": "channelfinder",
"_id": "AesPLC:CYCLE:MEAN",
"_score": null,
"_source": {
"name": "AesPLC:CYCLE:MEAN",
"owner": "admin",
"properties": [
{
"name": "hostName",
"owner": "cfstore",
"value": "ioc164"
},
{
"name": "iocName",
"owner": "cfstore",
"value": "iocSPLC"
},
{
"name": "iocid",
"owner": "cfstore",
"value": "10.0.0.164:59950"
},
{
"name": "time",
"owner": "cfstore",
"value": "2024-04-23 01:53:23.696448"
},
{
"name": "recceiverID",
"owner": "cfstore",
"value": "fel-recceiver"
},
{
"name": "recordType",
"owner": "cfstore",
"value": "ai"
},
{
"name": "pvStatus",
"owner": "cfstore",
"value": "Inactive"
}
],
"tags": []
},
"sort": [
"AesPLC:CYCLE:MEAN"
]
}
]
}
} here, its then, you can issue the search again, but with {
"query": {
"query_string": {
"query": "*"
}
},
"size": 10,
"search_after": ["AesPLC:CYCLE:MEAN"],
"sort": [
{
"name": "asc"
}
]
} and it will return the next set. |
Update:
|
phew, ok. my last error was caused by a javax <-> jakarta problem, I updated my pom.xml to use jakarta.xml.bind instead of javax.xml.bind after that, the scroll endpoint works as expected and in a similar fashion to what elastic does with search_after. I see that you've kept javax as a compatibilty to java 11. thanks! |
So, Based on the working of the seach_after, I don't see a quick way to get this working the way it used to (give me page 16 with each page having a 1000 channels per page). I understand the performance and usage expectations the elastic establishes but either we will have to ask users to use I just merged the JDK17 branch @jacomago can you help resolve the error with the publishing of the docker images We can also build the regular search around the scroll API... I understand that it is not recommended for deep searches but search_after and PIT does not seem like it is going to be fun to implement. Another option would be to part ways with elastic and use mongoDB or another noSQL backend |
@shroffk I agree that pagination isn't working, and we should add some kind of wrapper around the search_after parameter. Or write a new api that uses it. @kingspride With only ~24 000 channels you can probably get away with just upping the elastic search index limit. On elasticsearch set the parameter "index.max_result_window" to your expected max number of channels, and on ChannelFinder set the "elasticsearch.query.size". (@shroffk we can probably remove this and pull the parameter from elasticsearch instead). A couple of thoughts:
|
well I rewrote my PHP interface to channelfinder yesterday to use the scroll endpoint, which in my eyes does exactly what search_after does. It works. I can now get in pages that can be index.max_result_window / elasticsearch.query.size big.
yes, but I dont like that. It feels like a bad solution that backfires immediately after someone adds another PV that exceeds the new limit.
good question actually. The request from the users was to have a webinterface to search for PVs, and link them to the archiver appliances. regarding the DB, I'm generally not really a fan of noSQL, its always kind of difficult to administer. |
I don't think the issue is memory, We have used SQLite for prototyping and quickly building up applications... in production environments redundancy, backup, etc... these have discouraged the adoption. Use of ChannelFinder + recsync results in large peaks of write operations which were seen as not being an ideal for it.
I agree, this is not a solution... it just kicks the problem down the road.
Yes, version 1.x to 3.x were all based on MySQL. Indexing was slow retireval was even less performant. After a database expert spent a long while fiddling and optimizing a whole set of innodb parameter the elastic based implementation outperformed it out of the box. Maybe the actual solution here is:
|
sounds reasonable. now I just have to figure out a way how to go backwards when using scroll as pagination... |
As mentioned in the recsync issues,
(ChannelFinder/recsync#86)
I have a wierd behavior of my channelfinder 4.7.1.
according to
/resources/channels/count
, I have 24,753 in total.my application.properties has a size limit of 10,000.
so, I tried to get the remaining channels after the 10,000th:
/resources/channels?~size=1&~from=10000
which resulted in an Error 500.
the log says:
I also tried to get all channels at once, it fails aswell:
/resources/channels?~size=24753
same error.
with
size=10001
, it fails again.problem with elastic ?
can I reconfigure something there?
thanks guys for the splendid support, as always!
The text was updated successfully, but these errors were encountered: