Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alignStepBoundaries edge cases #98

Closed
jacksontj opened this issue Dec 3, 2018 · 13 comments
Closed

alignStepBoundaries edge cases #98

jacksontj opened this issue Dec 3, 2018 · 13 comments

Comments

@jacksontj
Copy link
Contributor

This method is not very large, but does a few things which IMO are questionable.

  1. reversing start/end
    if the user flipped start/end, it is not the responsibility of trickster to flip it. This is additionally confusing as things will work, but if trickster is removed from the call path all queries this fixes will break. IMO it is not the place of the cache to "correct" user queries.

  2. time.Now() checks
    I think its fine to cut off queries from going into the future, but this doesn't handle the case where both start/end are in the future. IMO in that case this should return an error. In the remaining cases I think its fine to leave it truncating end as the query results will remain unaffected, although I'd rather it didn't (the time.Now() constraint is a cache issue, not a query issue).

  3. default step param
    If the user didn't define one, this is not the place of a caching proxy to correct it. Here we should just return an error and make the user correct their query.

@jranson
Copy link
Member

jranson commented Dec 3, 2018

Thomas, totally agree on these things. Thanks for the feedback! Were you hoping to contribute these things? My plan is to use much of the upcoming holiday to work through Trickster issues, so if you are unable, we should be able to get them ripped out soon in any event.

@jacksontj
Copy link
Contributor Author

I can definitely submit a PR for these. If you are okay with all 3 changes I'll get a PR going with those changes.

@jranson
Copy link
Member

jranson commented Dec 3, 2018

Absolutely. thank you!

jacksontj added a commit to jacksontj/trickster that referenced this issue Dec 3, 2018
If end is past `now` we'll simply truncate to `now` for simplicity,
otherwise we end up caching data we can't possibly have. With this
change we also pick up the case where both start/end are past `now`

Fixes trickstercache#98
jacksontj added a commit to jacksontj/trickster that referenced this issue Dec 3, 2018
This way if any query is sent without a step param we simply return an
error (as it is a required query param).

Fixes trickstercache#98
@jacksontj
Copy link
Contributor Author

PR created: #99

@jranson jranson closed this as completed in 51f8c6c Dec 4, 2018
@jranson
Copy link
Member

jranson commented Dec 4, 2018

Hi Thomas, looks like this change has created a defect wherein the initial request for data (e.g., a trickster cache miss) results in an empty set to the user, while subsequent (cache hits) are successful. Are you able to take a look and identify what went wrong? I will keep digging to see if it's something I did wrong in my tests.

@jranson jranson reopened this Dec 4, 2018
@jacksontj
Copy link
Contributor Author

So you are saying that you get different results for cache miss vs cache hit? I definitely didn't see that before, so that is interesting. Do you have a repro case?

@jacksontj
Copy link
Contributor Author

@jranson I did some quick local testing and can't reproduce the error (I can send a query and I get the same result when its a hit or a miss). If you have a repro case (or test case) I can definitely take a look into it.

@jranson
Copy link
Member

jranson commented Dec 4, 2018

Definitely seeing an issue, might be related to #96 instead of #98. Here is how I detected it (using the current master merged with your recent changes):

Start trickster with a 100% clean cache. Request any url

# curl -g 'http://my.trickster.fqdn.com:9090/api/v1/query_range?query=my_query_here&start=1543952782&end=1543953082&step=60'

(empty response)

Make the same request again:

# curl -g 'http://my.trickster.fqdn.com:9090/api/v1/query_range?query=my_query_here&&start=1543952782&end=1543953082&step=60'

Full Response Received:

{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"failure":"0"},"values":[[1543952760,"25.568510766704293"],[1543952820,"25.631668883048594"],[1543952880,"25.326226925519595"],[1543952940,"25.199999999999815"],[1543953000,"25.129824561403314"],[1543953060,"25.29122807017526"]]},{"metric":{"failure":"1"},"values":[[1543952760,"0.03157905817213394"],[1543952820,"0.031579058172133946"],[1543952880,"0.038596355802260326"],[1543952940,"0.031578947368421054"],[1543953000,"0.03859649122807017"],[1543953060,"0.05614035087719298"]]}]}}

@jranson
Copy link
Member

jranson commented Dec 4, 2018

(accompanying debug logs for the requests above)

time=2018-12-04T20:24:46.387372919Z app=trickster caller=trickster/main.go:63 level=info event="application startup" version=0.1.2
time=2018-12-04T20:24:46.387741923Z app=trickster caller=trickster/memory.go:39 level=info event="memorycache setup"
time=2018-12-04T20:24:46.387792546Z app=trickster caller=trickster/metrics.go:47 level=info event="metrics http endpoint starting" address= port=8082
time=2018-12-04T20:24:46.387959086Z app=trickster caller=trickster/main.go:94 level=info event="proxy http endpoint starting" address= port=9091

time=2018-12-04T20:24:52.192511417Z app=trickster caller=trickster/handlers.go:657 level=info event="starting originRangeProxyHandler" cacheKey=8357e505473d9e936694566d8525dda2.6a8564ed8ab42df78d1368ae1e56cc73
time=2018-12-04T20:24:52.395010098Z app=trickster caller=trickster/handlers.go:312 level=debug event=prometheusOriginHttpRequest url="http://prometheus:9090/api/v1/query_range?end=1543953060&query=my_query_here&start=1543952760&step=60" duration=185.264946ms
time=2018-12-04T20:24:52.401166014Z app=trickster caller=trickster/handlers.go:312 level=debug event=prometheusOriginHttpRequest url="http://prometheus:9090/api/v1/query?query=my_query_here" duration=191.383478ms
time=2018-12-04T20:24:52.40122951Z app=trickster caller=trickster/memory.go:47 level=debug event="memorycache cache store" key=aade0adf5f2ad6da17b2f94d3cf343ed.6a8564ed8ab42df78d1368ae1e56cc73
time=2018-12-04T20:24:52.401623809Z app=trickster caller=trickster/handlers.go:854 level=debug event="Compressing Cached Data" cacheKey=8357e505473d9e936694566d8525dda2.6a8564ed8ab42df78d1368ae1e56cc73
time=2018-12-04T20:24:52.401687323Z app=trickster caller=trickster/memory.go:47 level=debug event="memorycache cache store" key=8357e505473d9e936694566d8525dda2.6a8564ed8ab42df78d1368ae1e56cc73
time=2018-12-04T20:24:52.401712055Z app=trickster caller=trickster/handlers.go:860 level=debug event=setCacheRecord cacheKey=8357e505473d9e936694566d8525dda2.6a8564ed8ab42df78d1368ae1e56cc73 ttl=21600

time=2018-12-04T20:24:57.490149329Z app=trickster caller=trickster/memory.go:56 level=debug event="memorycache cache retrieve" key=8357e505473d9e936694566d8525dda2.6a8564ed8ab42df78d1368ae1e56cc73
time=2018-12-04T20:24:57.490215365Z app=trickster caller=trickster/handlers.go:522 level=debug event="Decompressing Cached Data" cacheKey=8357e505473d9e936694566d8525dda2.6a8564ed8ab42df78d1368ae1e56cc73
time=2018-12-04T20:24:57.490392862Z app=trickster caller=trickster/handlers.go:591 level=debug event=deltaRoutineCompleted CacheLookupResult=hit cacheKey=8357e505473d9e936694566d8525dda2.6a8564ed8ab42df78d1368ae1e56cc73 cacheStart=1543952760000 cacheEnd=1543953060000 reqStart=1543952760000 reqEnd=1543953060000 OriginLowerExtents.Start=0 OriginLowerExtents.End=0 OriginUpperExtents.Start=0 OriginUpperExtents.End=0 extent=none

@jacksontj
Copy link
Contributor Author

As a more concrete example, I curl'd http://localhost:18082/api/v1/query?query=prometheus_build_info&time=1543957478.813&_=1543957448463 and I got data back both times. Whats interesting is my debug logs don't have any mention of compression, maybe something to do with that (can't repro the issue with compression on or off in config from my side either :/) ?

@jranson
Copy link
Member

jranson commented Dec 5, 2018

Thomas - it looks like your example that is working is a query, rather than a query_range.

The issue appears to be isolated to query_range requests. I am still digging but it looks related to checks around response StatusCode handling (it is being seen as 0 instead of 200, and thus triggering the errorBody instead of the "good" body at or around handlers.go:724)

Try it with curl -g 'http://localhost:18082/api/v1/query_range?query=go_goroutines&start=1543957478&end=1543957778&step=60'

Also, an additional regression has been introduced with your PR that I have found: providing the same value for start and end to a query_range (not an optimal way to do this, but technically it should work, and does work directly to prometheus), trickster now panics with index out of range error.

@jacksontj
Copy link
Contributor Author

@jranson fix is in #101 -- basically 2 issues, (1) I had changed resp -> r incorrectly, and (2) even if that was fixed the check if the resp was still not set was incorrect (as it would never be true). That PR fixes the issue, and I can no longer reproduce the issue.

@jranson
Copy link
Member

jranson commented Dec 5, 2018

@jacksontj Everything checks out now. This is truly awesome, thank you so much for all of these contributions. Trickster is so much better because of it!

@jranson jranson closed this as completed Dec 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants