-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New unified approach to retrys and rate limiting #3211
Comments
I favor the idea to implement a unified approach if we can turn that ON/OFF because it is not mandatory for every user. We should also have to consider the rateLimiter/retrys strategy, maybe some users will want to be much more aggressive that the others,so we need at least to have the option to implement a new strategy. I favor to use resilience4j if it hasn't a huge learning curve. |
Sounds good to me. Maybe instead of "writes" use "executions"? |
@makarid yes, having an ON/OF switch is a must As for customizing the strategy, for read only operations I think the RetryConfig builder gives you all the customization options one can think of so just accepting a your own config in ExchangeSpecification should suffice. Customizing retries for operations that change things (place order, withdraw) is harder because if it timeouts for example then you have to somehow determine did the operation succeed on the exchange or not before you execute an retry. In the case of placing orders you can check if an order with an unique client id was created if an exchange allows for such an id to be passed. For ultimate flexibility we could add client decorators:
However any one using this would have to be aware that any xchange version upgrade can brake compatibility with his decorators. |
I am sorry but i don't totally understand the thing with decorator. Do you mean that the decorator will pass a new functionality for the existing implementation. For example the placeLimitOrder will be overrided by the newOrder method of your example? About the placeNewOrder i have implement a solution for timeouts or an exception.Here is my solution:
Do you mean that the newOrder method(your method) will have an implementation like this,in order to prevent any errors from the exchange? |
@timmolter hmm doesn't sound self explanatory to me, one can execute a read operation
|
are you familiar with the decorator design pattern? In my post I proposed to decorate client interfaces that represent direct API calls to the underlining exchange. |
Yes, my bad, i was talking about the Raw implementation. I have corrected now. If you use the decorator approach, do we need to implement the new functionality on every rawCall or we will implement some standard decorators and then put the one we want to rawCall? |
your example is still wrong as |
OK then forget the example. As i saw on the link about the Decorator pattern, we need to create a new Override method for every *Raw.class method.Is that right? |
my decorator proposal has nothing do to with *Raw classes, I proposed to decorate proxy clients like *Raw classes invoke methods of those clients, decorating proxy clients does not impose any changes to *Raw classes |
I understand it now. Thanks a lot for clarifying it. I find it very good idea |
in my example, your decorator for each method in the Binance interface must implement at least a pass-through like:
or you should use Java Proxys to if you don't want to These are common implementation practices for the decorator pattern in Java. |
A unified approach for API rate limiting sounds very good. However, in the above approach, we cannot use the information returned by the API for rate limiting, which is much more precise and reliable in some case. For example, Bitmex gives the "x-ratelimit-limit", "x-ratelimit-remaining", and "x-ratelimit-reset" in the response header. It is very clear and accurate to implement API rate limiting based on this information. While it may cause some problems if the limitation is implemented locally only. |
@declan94 thank for your input, its true that the direction I'm exploring in #3231 does not cover limiting based on information in response headers. I'll take it into consideration. It think such a case like in Bitmex would be better handled by an custom Interceptor for all methods which updated the state of the RateLimiter after each request. |
@declan94 I've been thinking about those Bitmex headers and I'm starting to think interpreting them is not trivial and might lead to improper rate limiting in multi threaded applications. The current implementation is definitely not thread safe nor does it take network latency into account. Here is a example of a scenario where a response to an earlier request may override rate limit information supplied from a response from a later request: I think that implementing a rate limiter that just allows 60 req per minute per account (settings can be overwritten if you have an individual limit) and ignoring those response headers is a much simpler and less error prone approach. The only downside is at most a few % of throughput. |
After doing some initial implementation I came to a different API then I initially proposed: Add options to exchange specifications This way you can make sure for the module to use retries or rate limiting if it was enhanced with resilience4j functionality
In Xchange 4.x.x if a module already had some retry / rate limit strategy before it was rewritten to resilience4j then this will be set to true. If not then it will be false; In Xchange 5.x.x I think this should be set to true by default in all modules. Annotations on client interfaces This is how we would like to apply retry and rate limiting functions - by just annotating client interfaces.
Customising retry behaviour We can customize retry strategies by using resilience4j repository mechanism. You can override an retry configuration for a given call - lets say for the above
|
updated my PR resiliency features are now configured in *Raw classes - previously I did it using annotations on proxy interfaces but my new annotation scheme was not accepted by the resilience4j project so I removed this way of doing things documentation how this would be implemented in modules in its current share: https://github.com/knowm/XChange/wiki/Implementing-resiliency-features |
Hello, I'm just wondering is there is any path forward in this? My project needs a more robust rate limiting approach than what I have and I'm looking for a more proper solution. Is there an eta on when this would make it into a release? Thanks. |
@nibarulabs, @walec51 has added these rate limiting features into Binance. There isn't any immediate plan to add this feature set into all exchanges so if you are interested in seeing this for a specific exchange the best way is to submit a PR with those changes. |
@earce Thanks for the swift reply. My project is looking more and more like it's gonna need this on exchanges other than Binance (specifically Coinbase Pro), so I will play around with a fork today and if it goes well, I will submit a PR. Thanks again for all the work you guys have done. 👏 (@walec51 thank you too for the great work!) |
@nibarulabs CoinbasePro has some baked in rate limiting but it will probably be better to move over to this resilience framework. @walec51 has been the one really pushing this forward :) but I have started the work on applying this to the streaming framework. |
@earce I just looked through the CB Pro code and nothing jumped out at me for the baked in rate limiting. Is there a specific class/spot that you know of that handles this? (If you don't have the time, np, I'll keep looking) |
https://github.com/knowm/XChange/blob/develop/xchange-coinbasepro/src/main/java/org/knowm/xchange/coinbasepro/service/CoinbaseProMarketDataServiceRaw.java#L99, I do not recommend replicating this. There are a few issues with this even though it mostly works. We eventually want to port this to the new framework. |
Got it - I wasn't looking on the development branch. Thank you. 👍 |
@nibarulabs this document describes the new director we are taking for rate limiting and other resiliency features: https://github.com/knowm/XChange/wiki/Implementing-resiliency-features |
@walec51 Yah, I saw that earlier and am following that, thx. For CoinbasePro there is also an endpoint
that looks like it has been removed from their api. Possibly from a while ago - I did not see anything regarding that in their changelog. Is it ok to just remove it? (No other choice, really.) |
Yea that particular endpoint doesn't seem to be active, I would try and query it and if it doesn't work you can remove it. Also feel free to join the discord, it's easier to chat on there. |
@walec51 question for you, for the Bitmex style rate limiting, how do you mean create a custom interceptor? |
@earce for what do you need a custom interceptor? what do you need that cannot be achieved with plain resilience4j decorators? |
@earce do you want to drain the rate limiter based on a certain response / exception? if so then check these examples: |
@walec51 the bitmex example in this case, sends back some header that tell us how long we need to wait before we retry. Two edge cases can present themselves here: The rate limiter you are using say has an interval of 1 minute and the Bitmex API tells us to backoff for 2, we will wait the one minute start firing requests and get banned even longer, potentially getting ourselves blocked out. The second scenario is an app restart and then the same situation where we are not blocking requests for as long the API tells us and instead for whatever the duration of the rate limiter is. |
remember that xchange can be used in an multi threaded app - because of this you should rate limit proactively event before bitmex says you did to much calls so configure rate limiters as bitmex suggests: https://www.bitmex.com/app/restAPI#Limits
if despite of this you receive an 429 response then just drain the rate limiter using this option
|
This doesn't really cover all scenarios, while proactive is a must, reactive is important to have implemented. Process restarts won't be protected with this rate limiter and bans that are longer then whatever your cycle is programmed won't be covered either. I want to see if there is a good way to dynamically adjust the drain period. |
If the process restarts and causes a 429 error then it will drain the cycle. You will not get banned for this and afterwords you should not see any 429 errors. Interpreting Birmex headers in a multi threaded environment is not an easy task - HTTP requests my have variable durations and a response to request A my come after a response to a request B. Besides having complex code to maintain you will not get much out of it in my opinion. |
I don't entirely agree with this. I would imagine if the process restarts, after a 429 is in place, you will get further penalized (potentially). Because you would
You may end up with a permanent or much longer ban this way. I don't think interpreting bitmex headers would be any more then inspecting the throwable you give when you call |
The scenario you mentioned would only happen if both the below conditions ware meet:
I don't think that's the case as BitMex has a 60 s cycle and a 30 s ban for exceeding it. If BitMex gives you a long - couple of hour ban or more - then I don't think this is something a library like XChange should handle. This requires some human investigation. As for a more persistence implementation - as I said: HTTP requests my have variable durations and a response to request A my come after a response to a request B. So just applying the data from the last response (which may not be from the last request but an earlier lagging one) will not be adequate in a multi threaded environment and prone to race conditions. There is an old unix saying: worse is better if it results in a much simpler system. I think we should try my simpler approach and go with a more complex implementation only if it does not perform well in production. |
But if you want to do a more complex solution then show some pseudo code that solves this problem: |
I only say this because it does https://www.bitmex.com/app/restAPI#Request-Rate-Limits I know from using this personally that other exchanges do this type of thing as well (an exponential backoff ban). The reason they include such info in their headers is this exactly, to tell you the user how long to backoff for. I understand your point with the multi threaded race conditions but having nothing in place at all makes it so there is no protection. Personally I think if an API gives you details about a rate limit ban a library like XChange which is now handling rate limiting should also have this, is an exchange specific modification of behavior that is already part of XChange. As an example t1 and t2: If t1 gets a ban for 2 minutes from now, I would drain permissions for now + 2m. If during that time from when: t1 submitted a request, received a ban and drained permissions, t2 is able to submit a subsequent request which receives a ban for 3m then we would update the drain period to now + 3m. If alternatively t2 was actually submit before t1 and received an older ban for now + 1m we would compare when drain periods end before updating, always taking the further out drain period. This way we would not be subject to anything like network latencies which would risk having the latest ban in place due to a slow request. Before an thread would be allowed to submit any more requests it would always do an additional check to make sure there are available permissions and that nothing was changed under its feet. I understand this grows the app complexity a bit (more so resilience logic itself) but I think it offers more protection. |
@earce come to think of it this type of functionality is more of an circuit breaker then a rate limiter for such long time scales like a 30 sec or 2 hour ban there is no point of sleepeing on a thread - we should throw an exception indicating that the circuit is broken without calling the exchanges API currently resillience4j support circuit breaking for a fixed amount of time if a certain error (or number of errors) happen in a set amount of time we would have to extend resiliance4j functionality so that the circuit breaker could determine how long should it remain broken dynamically based on the responses / exceptions it receives I'll submit a feature proposal on the weekend |
@walec51 this sounds great, appreciate you taking the time to reconsider/revisit this, I think what you are describing does sound like the correct implementation of this protection |
@walec51 have you had any chance to look into this? |
@earce sorry for the delay, I've posted a feature request to resilience4j: resilience4j/resilience4j#1387 |
There are two major features that have been a pain point for many users of our library and that have been handled inconsistently across our modules: API call retrys and rate limiting.
Most modules do nothing in this mater, a few have implemented their own custom solutions which leads to a vary inconsistent behavior of this library. First things first a decision needs to be made:
Should those features be in the scope of this library or nor?
One could argue that you can implement them in your application code and limit the scope of this library to just unifying exchange API's. However there is a major problem in this approach:
To solve those two problems optimally you have to do it on the level of direct API calls done by *Raw classes.
This is because of the fact that our generic API methods like for example
AccountService#getAccountInfo()
can do more then one call the the exchanges API to gather all the data needed to construct the desired DTO. Doing retrys aroundAccountService#getAccountInfo()
if suboptimal because if the last underlining call to the exchange fails then you will have to repeat all of the required calls. Doing rate limit in a generic way is impossible because you do not know how many underlining calls to the exchange are done and how are they counted by the call rate limiting policy by a given exchange.I therefore propose the following changes to be made in our code base to add support for these features in an unified way:
Use a well established and documented library: resilience4j
All the solutions for rate limiting and retrys in our modules are hacky or oversimplified. Lets not reinvent the wheel here - there are good small libraries for this in the resilience4j project. Namely:
resilience4j-retry
andresilience4j-ratelimiter
. This project also offers metrics and circuit breakers which might be of interest to us in the future.Add options to exchange specifications
Important: the proposed API has changes, see: #3211 (comment)
In a way that preserves backward compatibility for most modules as most of us that use this library have some suboptimal solutions for rate limiting and retry's in our application code. Maybe someday in xchange 5.0.0 we'll enable these features by default.
We can also allow custom retry policies:
As for rate limiting we just add:
Provide best practices for implementers
To be done in the first PR if we reach a consensus on how this feature should look like.
The text was updated successfully, but these errors were encountered: