-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raising of Traffic Threshold NFRs specified in the CDS #541
Comments
@jimbasiq What's the actual customer use case requiring frequent refresh with complete re-load? |
Hi @dpostnikov, |
It is also worth mentioning that the current Web Scraping Connections we provide to our Partners/Customers are able to support hundreds of thousands of refreshes in a 24 hour period. It is hard to encourage our Partners to move over to CDR Open Banking connections if there is a severe degradation of service capacity in doing so. |
Adatree supports this request. We've detailed a similar experience in #534. Asynchronous collection of data is an often used pattern with obvious benefits. As CDR grows, more users mean more requests. Competing priorities will emerge if all refreshes must only occur during a customer present session. Consumer facing apps typically have a high traffic period so ADRs and data holders can expect huge spikes in traffic during those periods if customer present is the only real option (which is the case right now). Asynchronous collection avoids this by spreading load across a sensible timeframe. Real-time collection is not required in all cases. It also allows for a cached fallback for data holder unavailability during a customer present session. If the ADH is not available for a real time call the latest data presented to the consumer is not stale to the point of being unusable i.e. the balance or transactions list might have been fetched an hour ago as opposed to 24 hours ago. All of this results in better consumer outcomes regardless of use case by providing consumers with a more resilient CDR ecosystem. |
There's a number of issues raised in this thread that are worthy of being broken out. NFR SuitabilityThe core focus of the original thread is one around "raising" the NFR thresholds. While this may seem like the right approach the reality is that it likely isn't. The method of describing the threshold seems inappropriate and penalises both small data holders and successful ADRs alike. Biza.io raised this in DP208 and instead suggested that the NFR be bound to the number of active arrangements at a particular holder. Biza.io also requested usage data to make an evidence based decision. That is to say, Holders gained the benefit of being able to correlate real usage with requirement, could integrate it into their capacity management planning and could therefore design solutions that could correlate 1:1. This would, by and large, resolve the upper bound problem because the upper bound would be relative to arrangement count. An ADR would get guaranteed throughput per arrangement, even if the TPS on this was lower overall parallelisation could compensate. Additionally, Biza.io outlined a number of implementation patterns we had observed Holders implementing to provide the DSB and broader industry knowledge around the challenges Holders face when weighing up cost vs. capability. As a nascent ecosystem the CDR has a very low utilisation which makes it quite difficult to justify huge capital expenditures in the smaller end of town. Despite these suggestions, and in the face of numerous opposition, coupled with the participation of a number of ADRs (RAB, Xero, Intuit) but not those who are involved in this thread, the DSB binded the NFRs "as-is" with immediate effect. It would appear that the ADRs involved in this thread are now realising the same challenges others on both Holder and Recipient side identified. As a result of this decision organisations have now made architectural decisions on this basis and consequently any alteration of the defined NFRs is now likely to be a long dated FDO - it would inappropriate to do otherwise. Implementation SuitabilityThere is a reference in the original thread for a "data refresh job". This seems to imply a batch process which essentially resets a complete data set on a daily basis. In essence, a synchronous interface (non batch API) is being used to complete asynchronous activity. This is not only architecturally unsuitable, possibly as a hangover of applying existing collection approaches (ie. screen scraping) to the CDR I would also question the appropriateness with respect to recipients data minimisation obligations. Put another way why is a full batch run being done on all endpoints rather than requesting (and keeping hot) only data which has been requested by the Consumer themselves. Nonetheless, assuming there is justified reasons for obtaining all of the data it seems inappropriate to be doing this even daily. I believe this is the context for the question @dpostnikov posed. Additionally the scenario for comparisons was described as "perfect" when that seems like a stretch. Taking the use case given and assuming unattended behaviour (ie. the Consumer isn't waiting around).
CalculationsI'll stick with First Run This is the absolute worse case scenario because it involves a completely new Consumer coming onboard with zero prior data and retrieving every detail. 1 x Access Token Result: Taking the OPs idea of 50 TPS limit there is a total of 4,320,000 API calls to be made. n=3: 4,320,000 / (4 + 6) = 432,000 sessions per day 🥳 Huzzah, the numbers align with the OP but what's important here is that it represents the absolute worse case of doing a full load of all data in the background every day. I disagree with the statement the "real scenario could be 5-10 times worse" because separate partners should have separate software products but maybe I'm not following something. Incremental Detail Calculation Let's now assume we want to maintain the same level of detail but optimise and we have all detail scopes. We don't need to do list accounts because list balances will give us account identifiers and account details has the same detail. 1 x Access Token Result: n=3: 4,320,000 / (3 + 6) = 480,000 sessions per day No Detail Calculation Let's assume after the first run or because we haven't been provided detail scopes we have no detail at all. This appears to be most aligned with a pure PFM use case especially if the Recipient has aligned its use of the PRD Data and I've left the list of accounts in here still but this could be stripped further or called less than once a day as list of balances contains 1 x Access Token Result: n=3: 4,320,000 / (3 + 3) = 720,000 sessions per day Eventually Consistent Detail The reality is that of a total sample set there are very few Consumers who will actively engage with an app every day. If they do it's because of a prompt driven by the value proposition and possibly this can be enabled by the CDR in a different way (ie. a shared signal of account changes etc). On this basis being eventually consistent, especially in an unattended scenario seems like it should be appropriate. On this basis I'll hypothesise, after initial load, the following:
1 x Access Token Result: n=3: 4,320,000 / (3 + 3) = 720,000 sessions per day Eventually Consistent No Detail Same concept as above but this time we don't need detail updated continuously. Realistically updating detail could occur as a Consumer present call. On this basis I'll hypothesise, after initial load, the following:
1 x Access Token Result: n=3: 4,320,000 / (2.3 + 1.98) = 1,009,345 sessions per day Suitability Without real usage data of the ecosystem it is difficult to assess what is "not enough" but suffice to say some basic optimisation appears to double the upper bound. In 2020 Frollo had 100,000 customers and represented 90% of the utilisation. It's unclear if the demand has 10x'd in 2 years hence the desire for usage data to inform the decision. AlternativesTo me the NFR discussion seems to be more symptomatic of a broader set of problems including:
I think overall the concern I have with simply increasing the NFRs is that it is simply patching over features of the CDR that aren't yet present. This seems to also be combining with the need for recipients, many of which have come from a batch based bank feed or cache based screen scraping environment, to change mindset and build solutions which align with best practices in a CDR context rather than wedging CDR into existing approaches. Put another way, it seems a higher power to weight ratio to focus on feature capability to resolve the underlying problem versus forcing endlessly higher performance requirements that will simply be revisited over and over again. |
``
"PFM" can be designed in so many ways, more efficient or less efficient ways. Unnecessary calls aside, I agree, there is definitely scalability issue with the current design (both CDR framework and data recipient design as a result). The way to solve this problem is not to get a bigger hammer or build a bigger pipe (e.g.: replicate a batch design via APIs or increase thresholds). Secure event notification mechanism is missing and should probably be prioritised to solve for these use cases. |
@perlboy absolutely fair point on our lack of participation on this topic before now. An arrangement based approach makes sense so as not to require all implementers to provision excess capacity "just in case" when the practical reality is throughput thresholds will only really be tested with the majors. A reasonable FDO is also not something we'd complain about given this feedback is after the fact, but it is feedback based on metrics not theory so I would hope it would be considered valuable even at this stage. |
Just to chime in with another dimension - that of the DH customer profile. Not all DHs are equal, even within a single industry or industry vertical. Some banks focus on lending rather than transactional accounts. Loan accounts have low transaction volumes - typically a monthly interest charge and perhaps one or two monthly payments. Sometimes there might be redraws or deposits but these aren't typical. Whereas a transaction account might have 50 or more times this volume. Profiling DHs ahead of imposing NFRs might be beneficial. The activity around non-bank lender participation is a case in point - personal loans would fit into the low volatility category. When combined with comparatively low customer volumes, it seems inappropriate to impose the same NFR thresholds on both categories of DH... |
Hi All, It is great to see we seem to have a general consensus on the current traffic rates being inadequate. I'll look forward to discussing with you all on Wednesday what would be adequate and how the rate NFRs could differ dependent on industry vertical, DH size (members, loan book, other) and other ideas. With my ADR hat on I'd like to see a fair usage rate, with my DH hat on I'd like to not cripple the little guys with unfair obligations. Let's not forget lack of penalties and the caveat of "best efforts". Both could damage businesses IMO. |
A Decision Proposal is required #92 DSB Item - Reassess Non Functional Requirements has been added to DSBs future-plan backlog. |
Closing as this issue will be considered as a Decision Proposal, see comment above. |
Description
Basiq would like to raise concerns and propose a review and uplift of the currently specified Traffic Thresholds NFRs for CDR Data Holders. We believe the current limits are too low to support a data recipient serving the Australian consumer.
Can I please propose this topic as a priority for the Maintenance Iteration 13 starting in a couple of weeks?
Area Affected
To provide you with some detail to hopefully validate this as a worthwhile topic, our primary concern is rate limits for refreshing of data for all consumers for a given institution (Data Holder) for a given software product (Data Recipient):
We have a specified limit for unattended traffic per software product id, these are specified for outside of business hours, while for business hours the "best effort" is expected which means rates could be even lower. A Data Holder we have been working with has already confirmed we were hitting their limit for private endpoints and that limit is 50 TPS which is a limit that they will continue to use unless the CDS specifies a higher throughput.
In order to understand the rate limit and current limitations, here is an example of all requests we are sending within one data refresh job for one consumer:
GET access token (sent once - we are not 100% sure if this counts in rate limit)
GET the list of accounts
GET the list of balances
GET account details - should be targeted only if account.detail scope is present (number of requests equals the number of accounts)
GET transactions - should be targeted only if transaction.detail scope is present (number of requests is greater or equal to the number of accounts - in order to simplify let’s say that it is equal)
GET customer details (sent once)
This means that if we have all required scopes, we are sending at least 2*(n+2) requests, where n equals the number of accounts.
Now let’s imagine a perfect scenario where we are sending 50 requests per second each second in a day and let’s see how many jobs we could do depending on the average number of accounts per job:
n=3; 86400 * 50 / (2 * (3 + 2)) = 432000 jobs in total during a day
n=5; 86400 * 50 / (2 * (5 + 2)) = 308570 jobs in total during a day
300-430k connections for one software product for any of the big 4 banks is not enough and the example is a perfect scenario. Being realistic the real scenario could be 5-10 times worse than the perfect scenario, we see this as a serious limitation for several of our Partners.
Change Proposed
The largest Data Holder in Australia has just under 18m customers. It is very feasible that a successful Australian Fintech could attract half of Australian consumers, that would mean 9m consumers.
Rounding up to 10m (33 times higher than the current 300k the current TPS limit) to allow some growth I have 2 proposals:
Hopefully the issue is clear, please let me know if not and I can elaborate further.
The text was updated successfully, but these errors were encountered: