Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenation Rollups order maybe none Deterministic - Consider moving order by from LREngine.RollupSummaryField to LREngine.Context #239

Closed
jondavis9898 opened this issue Aug 29, 2015 · 26 comments

Comments

@jondavis9898
Copy link
Contributor

Currently, order by clauses are handled by RollupSummaryFields in a "Then By" approach. However, within DLRS as of PR #223, there would never be a situation where any two RollupSummaryFields within the same context would have a different order by clause except for when the Lookup Rollup Summary itself does not contain an order by field/clause.

The documentation & help text for Order By field states that when no order by is specified, it will default to the FieldToAggregate. However, there are cases where this behavior does not occur and is a result of the "Then By" approach on a shared context. For example, if two rollups are defined and neither specify an order by, the resulting SOQL will have an order by of "FieldToAggregate1, FieldToAggregate2." This results in the second rollup not being ordered by it's field to aggregate as the documentation states it will be. For an example, see jondavis9898@84caee2 and the comments by the asserts on jondavis9898@84caee2#diff-a59e8228c780cd01c3372282a42d4712R868, jondavis9898@84caee2#diff-a59e8228c780cd01c3372282a42d4712R874 and jondavis9898@84caee2#diff-a59e8228c780cd01c3372282a42d4712R880.

In the case where multiple rollups each contain the same specific order by, they will share a context. Currently, within LREngine there is logic that tracks fields included in order by and discards duplicates (https://github.com/afawcett/declarative-lookup-rollup-summaries/blob/master/rolluptool/src/classes/LREngine.cls#L168). Since a context in this situation will never have different order by fields, this processing is unnecessary.

In reviewing the LRE project itself, it appears that the order by enhancements within DLRS have not made it in to that project. Therefore, if we make changes within DLRS, I believe it would only effect DLRS internally and anyone that uses DLRS source instead of DLRS package.

Given the above, I have the following recommendations and would appreciate feedback/thoughts:

  1. Move the "order by" to LREngine.Context and handle in the same manner that relationship criteria (where clause) is handled. Each context has a unique order by and all rollups within the context subscribe to that order by. Making this change would eliminate the extra tracking/processing to detect duplicates, etc. thereby increasing performance of the package.

  2. When no order by is specified, do not generate any order by for the context. This results in a non-deterministic query that is not effected by any rollup fields included in the context. While the current approach is non-deterministic, it leads to undesirable side effects such as described above with "default ordering." This change could also include not including the lookup field in the order by clause (https://github.com/afawcett/declarative-lookup-rollup-summaries/blob/master/rolluptool/src/classes/LREngine.cls#L131) as the only reason this is included currently is to assist in the "merging" of master record id's during the processing of rollup queryResults. ** Edit ** After some additional thinking on this last point, I think it might make sense to leave the lookup field in the order by as it won't effect any ordering outcome and it is likely better performing to order in DB and then loop checking for a different master id (like the code currently does) then continually lookup in the Map on each iteration through queryresults.

  3. Update the documentation to remove the mention of a rollup defaulting to the fieldtoaggregate when no order by is specified.

Appreciate the consideration and any thoughts/feedback.

jondavis9898 added a commit to jondavis9898/declarative-lookup-rollup-summaries that referenced this issue Aug 29, 2015
jondavis9898 added a commit to jondavis9898/declarative-lookup-rollup-summaries that referenced this issue Aug 30, 2015
…lds and move to context

Issue SFDO-Community#216 - Add support for multiple order by fields and optional
ASC/DESC and NULLS FIRST/LAST

Issue SFDO-Community#239 - Move order by clause to LREngine Context
@jondavis9898
Copy link
Contributor Author

Since all of this was fresh in my mind based on other related work, I decided to go ahead and implement the solution I recommended above. I'll leave it to @afawcett and the community to determine which approach makes the most sense. Happy to have a discussion around it and/or provide any additional information on my thinking and/or answer any questions.

I submitted PR #241 not to "push" this approach but only because it was top of mind and I could knock it out rather quickly in case we do choose this path.

Note

@afawcett afawcett added the bug label Aug 30, 2015
@afawcett
Copy link
Collaborator

Thanks for the options here John...

RE: 1, agree.

RE: 2, agree with your edit, lets keep the order by for the lookup

RE: 3, i think the existing documentation would still stand given the revised direction of 2 above, yes?

@afawcett afawcett changed the title Consider moving order by from LREngine.RollupSummaryField to LREngine.Context Concatenation Rollups order maybe none Deterministic - Consider moving order by from LREngine.RollupSummaryField to LREngine.Context Aug 30, 2015
@afawcett
Copy link
Collaborator

I've also marked this as a bug and revised the title a little. I think some users may have got this from time to time, but cannot be sure. Anyway its a good spot and lets get it fixed.

@jondavis9898
Copy link
Contributor Author

Glad you agree @afawcett, I definitely feel like this approach makes the most sense and like you mentioned, addresses some issues that others might have experienced in the past.

Regarding #2, this PR maintains the lookup field in the order by.

Regarding #3, the docs state "Defaults to the field given in Field to Aggregate" which, given this PR, would no longer be applicable. When no order by is specified, the only field that would be specified in the SOQL projection would be the RelationshipField (lookupfield on the LRE side). See jondavis9898@574302d#diff-958fe84b38eacf8b58f949ab104a8504R185 for reference.

@afawcett
Copy link
Collaborator

RE: #3, ah i see, so the reason to not continue to support defaulting order by to Field to Aggregate would be that it basically stops any kind of context sharing, have i got that correct?

@jondavis9898
Copy link
Contributor Author

Exactly. If we default the fieldToAggregate, every rollup that doesn't contain an orderby will end up with a unique context (except for situations where the fieldtoaggregate is the same in which case they would share one). My thinking is that since no orderby was specified, it's correct to assume completely non-deterministic approach and therefore we can share context when all other criteria align (reduces limit consumption and improves perf). Along these lines, it might be a good idea to update the docs "best practices" to suggest specifying orderby when using query based rollups (concat, first, last, etc.). I don't think it would be outrageous to even require an orderby for query based rollups but I could see situations where the user doesn't care what the order is and therefore requiring one might be too restrictive. I think a doc update is likely the right course here.

@afawcett
Copy link
Collaborator

Ok i understand, the only issue then is one of backwards compatibility. Since Order By on the rollup def is not a required field. Need to think of a way provide this without breaking backwards compatibility i think. Tricky one, as in some cases its a bug and in other cases expected behaviour. I guess we can detect this at runtime and invoke the old behaviour when its safe to do so?

@jondavis9898
Copy link
Contributor Author

Sorry @afawcett, not sure I follow. In the code previous to this PR, ordering was non-deterministic when order by wasn't specified due to the ThenBy approach. It was a controlled orderby but non-deterministic nonetheless. With this PR, it remains non-deterministic. The only situation I can think of where a user would see different results than what they "expect" is when they were taking advantage of the fieldtoaggregate being defaulted in to the orderby AND they had no other rollups on the same parent/child/criteria that did not not have an orderby. If they did, they would have experienced the "ThenBy" projection and they wouldn't have been getting the fieldtoaggregate as the default for anything other than the first rollup field in the context. Even in that case, the order of the rollupfields that were added to the context was also non-deterministic so their results could have varied as they added more rollups.

Possibly I'm misunderstanding what you are thinking?

@afawcett
Copy link
Collaborator

"The only situation I can think of where a user would see different results than what they "expect" is when they were taking advantage of the fieldtoaggregate being defaulted in to the orderby AND they had no other rollups on the same parent/child/criteria that did not not have an orderby."

Yeah thats the one i'm thinking, if they used a concatenate operation the behaviour would change if the code stopped using the field to aggregate by?

@jondavis9898
Copy link
Contributor Author

Yeah, the behavior would change in this case for any query based rollup (concat, concat distinct, first, last, etc.). However, if they have two (or more) rollups defined on the same parent/child, each query based and each with no order by, they are currently encountering different results than they would expect when assuming default fieldtoaggregate is added to the orderby.

Here's what I'm thinking, let me know which you think is best:

  1. Remove the fieldtoaggregate in all situations from being defaulted in to the order by (this is what this current PR does). This is truly what an "empty" orderby should be I think. The only catch here is the documentation previously stated that fieldtoaggregate will be defaulted. To counter this and make sure users are aware, the next release of the package could call it out in the release notes and we could update the docs. It's not ideal, but I think it gets us back to baseline for how a non-specified orderby should behave. If we continue to default fieldtoaggregate, we could run in to other issues that we've yet to discover or new ones down the road trying to maintain that compatibility.

  2. The current code base managed the ThenBy projection and defaulting of fieldtoaggregate inside of LREngine which actually works out in our favor here. What I could do is inside of LREngine, if rollupSummaryFields.size() == 1 and detailOrderByClause is blank, I could add in fieldtoaggregate to order by. This would be a simple change to this PR and at least address the backwards compat scenario where there was only one rollup and it didn't have an orderby. Keeping in mind that if there are multiple rollups with no order by in v1.22 (current release) the expected behavior isn't being encountered due to ThenBy. If we do do this, we complicate the documentation and helptext because trying to explain when a default will be used (single context, single rollup field) and when it wouldn't be (single context, multiple rollups) would be very difficult without confusing people. We could remove the helptext as I've done in this PR, implement the size()==1 & detailOrderByClause is blank solution for backwards compat and just not document that part so that moving forward, users will understand that no default is applied if none is specified. If there is 1 rollup only in the context, a default would be applied but it would still be "non-deterministic" from an admin perspective, just quasi-controlled non-deterministic from an implementation perspective.

I hate to do anything that risks breaking backward compat and believe firmly that we should try to maintain backwards compat as much as possible whenever possible. That said, in this case, it would be interesting to know the telemitry/analytics around how many users have only a single rollup field on a parent/child and that rollup does not have an order by. Unfortunately, I don't think there is any way to gather those metrics (maybe a new idea for DLRS future enhancement :)).

You would definitely have a better idea than I would on how common this scenario is but I'm guessing it's not a large number. In most cases, if a user wanted an order by, I think they are more likely to have input one than rely on the default and even then, it would only work when there is only a single rollup. For this reason, I'm kind of partial to option 1 so we have a clean baseline to move forward with. That said, I'm happy to implement 2 if you'd like and can have it done rather quickly. I'm very hopeful to try to get this PR in to the next release as I'm in need of this feature & fixes in our app.

Let me know your thoughts. Thanks!

@afawcett
Copy link
Collaborator

Yeah tough one, i'm also very worried about maintaining backwards compat, especially since this is become more mission critical for folks and often consultants upgrading might not seek to always read or have enough knowledge of existing rollups, so i'm super sensitive to this. I could actually get the Salesforce Customer Metrics feature enabled here, but that will just tell me the number of records in the objects, not the contents of course.

Lets see how we get on maintaining the backwards compat here, happy to take the 'I told you so' in the future.... ;-) So your option 2 is exactly what i had in mind above when i was thinking about a runtime detection.

I'm actually packaging now the currently tagged for next release items. I can hold if you want to get this one into the next release and maybe package it tomorrow.

@afawcett
Copy link
Collaborator

Actually i had just done an upload.... I'll release as is, happy to do another tomorrow if you really need it.

@jondavis9898
Copy link
Contributor Author

Sounds like a plan on 2, definitely appreciate maintaining status quo as much as possible.

I'm working on the change now and should have it uploaded soon.

No worries on release package timing and it's not critical that this PR is included tomorrow. Aside from your normal items, I know your likely swamped with DF stuff, but if you have time tomorrow and/or in the near future and could get another new version out with this PR I'd greatly appreciate it.

@afawcett
Copy link
Collaborator

Cool. Do you have a Twitter and/or Salefsorce Community user btw? I've credited you just now on Twitter and in the Chatter Group for the tool, but cannot @ you in those communities.

jondavis9898 added a commit to jondavis9898/declarative-lookup-rollup-summaries that referenced this issue Aug 30, 2015
@jondavis9898
Copy link
Contributor Author

Thanks for the mention, happy to help improve the tool. I'm @jondavis9898 on twitter.

I just updated PR #241 with option 2. Let me know what you think. Thanks!

@jondavis9898
Copy link
Contributor Author

One thing I thought of that is a gap we could close is the following scenario:

Two rollups defined

  1. Amount__c rolls up to Total__c with order by specified as 'Amount__c'
  2. Amount__c rolls up to Total2__c with no order by

In the current code, this will result in 2 contexts being generated, two SOQL queries, etc but they'll each order by Amount__c (due to default fieldtoaggregate). The reason for this is that DLRS contextKey uses orderBy in key and orderBy for 1 will be 'Amount__c' and for 2 will be blank/emptystring. There is an optimization that would be made here to become more intelligent at the DLRS level when identifying which context to use. This comes in to play in a few situations but especially in the scenario where we default in fieldtoaggregate. To handle this we would have to project the order by within DLRS when order by isn't specified instead of within LRE. Doing this would give us an opportunity to optimize some scenarios and share context (still supporting backwards compat) when the current logic would result in separate contexts. However, this could get tricky, need to think through it some more. If you think this is worth exploring, let me know and I'll create an issue to track.

@jondavis9898
Copy link
Contributor Author

I thought some more about this and I think putting together a solution that reduces overall limit usage while still providing the functionality desired wouldn't be too difficult.

In short, what I'm thinking is that we'd build in a more intelligent "context" mapper. We would do this inside of DLRS and remove the size() == 1 that I added to LREngine.

Instead of choosing a context on each iteration through rollup building, we would build all the rollups and then take that list to create/choose the context. This gives us full visibility to all rollups and all potential contexts that are required before we start creating contexts.

This would provide the following features/benefits:

  1. Each rollup that has a specific order by would be guaranteed to execute in a context that contains only that exact order by clause (this PR handles this already)
  2. Rollups that do not have an order by would be eligible to be included in a context that has a specific order by instead of having to have their own separate context with no order by. Since not having an orderby specified would result in non-deterministic ordering, whether there is an orderby or is not an orderby in the context it uses would still be non-deterministic. Here's an example - If we had two rollups one with an orderby and one without but with all other criteria aligned (e.g. relationship criteria, etc.), we would only need one context instead of two.
  3. To provide backwards compat on rollups that do not have an order by, when searching for a context to use, we could see if a context already exists that contains the fieldtoaggregate as the first field in the orderby. This would avoid having to create a second context in those situations and solve for the gap I mentioned in my previous post.

To do all this, we would first create contexts for all rollups that have an orderby and then create contexts in a second pass for rollups that do not have an orderby.

Seems kind of complex now that I type it all out but I think it would be rather straightforward to implement. The only concern I would have is whether or not this additional logic to "find" a context would outweigh the perf gain and reduced limit consumption that would result from having that logic in place. The only way to know what would be to gather some metrics with perf testing.

Look forward to your thoughts.

@jondavis9898
Copy link
Contributor Author

I decided to go ahead and implement the approach I described in my last comment so we can consider it along with PR #241. I've actually created two different approaches and describe all 3 below.

  1. PR Issue #216, #239 & #240 - add order by support of multiple fields, ASC/DESC and NULLS FIRST/LAST; move orderby to context; fix to ensure recalc when order by field changes #241 - This approach includes the size() == 1 within LRE to attempt to maintain some level of backwards compat while still moving orderby to context to resolve some other related issues.

  2. PR Improve context identification improving backwards compatibility and reducing limits #243 - This is the approach I discussed above. It moves the "backwards compat" support to DLRS instead of LRE. This approach has feature parity with the previous code base (something PR Issue #216, #239 & #240 - add order by support of multiple fields, ASC/DESC and NULLS FIRST/LAST; move orderby to context; fix to ensure recalc when order by field changes #241 does not fully have). Additionally, it goes beyond the current code base and in some situations, will result in more accurate implementation of the previously documented "FieldToAggregate__c will be the default orderby when none is supplied". In the current code base, if there are more than one rollups that do not have orderby they will share a common context and the orderby will be a "ThenBy" approach starting with the first field in the Context (the field order is non-deterministic). This PR attempts to find a context who's first order by field (ASC NULLS FIRST) matches the loookup fields FieldToAggregate__c. If it cannot find one, then it defers back to the current codes ThenBy approach. Making this improvement in how context is identified results in more reliable results (as it relates to the "default FieldToAggregate__c when no orderby is specified"" and also reduced limit usage in certain situations.

  3. PR Improve context identification reducing limits as much as possible but giving up backwards compat #244 - This approach does not support any backwards compat. However, it results in the lowest limit usage possible. The reason for this is that a "no order by specified" context will no longer be created if there is another context that satisfies all other criteria. Since no orderby specified can be considered non-deterministic, we can piggyback on any context that matches the other criteria. The downside here is that we lose all backwards compat. However, as noted above, the current code did not conform to the documented FieldToAggregate__c will be a default when noorderby statement except for in situations where only one orderby field was being evaluated.

I've added tests to all 3 PRs (RollupServiceTest4.cls) to demonstrate the differences:

  1. Here - Compared to Issue #216, #239 & #240 - add order by support of multiple fields, ASC/DESC and NULLS FIRST/LAST; move orderby to context; fix to ensure recalc when order by field changes #241, Improve context identification improving backwards compatibility and reducing limits #243 & Improve context identification reducing limits as much as possible but giving up backwards compat #244 have same benefit, decreasing governor while maintaining predictable result.
  2. Here - Compared to Issue #216, #239 & #240 - add order by support of multiple fields, ASC/DESC and NULLS FIRST/LAST; move orderby to context; fix to ensure recalc when order by field changes #241, Improve context identification improving backwards compatibility and reducing limits #243 provides predictable result (same governor usage) while Improve context identification reducing limits as much as possible but giving up backwards compat #244 reduces governors and provides predictable result.
  3. Here - Compared to Issue #216, #239 & #240 - add order by support of multiple fields, ASC/DESC and NULLS FIRST/LAST; move orderby to context; fix to ensure recalc when order by field changes #241, Improve context identification improving backwards compatibility and reducing limits #243 provides predictable result (same governor usage) while Improve context identification reducing limits as much as possible but giving up backwards compat #244 reduces governors and provides predictable result.
  4. Here - Compared to Issue #216, #239 & #240 - add order by support of multiple fields, ASC/DESC and NULLS FIRST/LAST; move orderby to context; fix to ensure recalc when order by field changes #241, Improve context identification improving backwards compatibility and reducing limits #243 & Improve context identification reducing limits as much as possible but giving up backwards compat #244 are the same.

I'll leave it to you to determine which of the 3 approaches should be chosen. Here's my thoughts:

  1. If backwards compat is critical, I think the best option is Improve context identification improving backwards compatibility and reducing limits #243 as it reduces governors compared to Issue #216, #239 & #240 - add order by support of multiple fields, ASC/DESC and NULLS FIRST/LAST; move orderby to context; fix to ensure recalc when order by field changes #241 and also does a better job of trying to order by FieldToAggregate__c than even the current code base does.
  2. if backwards compat can be eliminated, Improve context identification reducing limits as much as possible but giving up backwards compat #244 is the best option as it optimizes for limit usage and does not incur any additional processing overhead that Improve context identification improving backwards compatibility and reducing limits #243 does. Test 2 & 3 is really where Improve context identification reducing limits as much as possible but giving up backwards compat #244 makes a difference in governor usage. For example, if two rollups existed, one with an order by, one without and all other criteria were the same, if the count() rollup children is 25,001, Issue #216, #239 & #240 - add order by support of multiple fields, ASC/DESC and NULLS FIRST/LAST; move orderby to context; fix to ensure recalc when order by field changes #241 & Improve context identification improving backwards compatibility and reducing limits #243 would fail, while Improve context identification reducing limits as much as possible but giving up backwards compat #244 would succeed (and still have room for 24,999 more query rows) because it would only issue one query.

Note that I did run some "non-scientific" performance analysis on #241, #243 & #244 and the difference is unremarkable on cpu, heap and total time. I was testing with smaller data sets but with larger datasets, I would expect #244 to have the best performance of the three (unconfirmed). In short and for our purposes here, all options can be considered equivalent with regards to these 3 factors.

Final thoughts: As mentioned previously, I think we should always strive for backwards compat whenever possible. However, since in some cases the current code was not adhering to documented functionality anyway and since #244 has the important benefit of reducing overall limit usage (which could be substantial on larger resultsets), my recommendation is #244 should be chosen. With that said, if backwards compat is absolutely critical, #243 should be the choice over #241 because of its feature parity with current code plus additional functionality to try to get closer to the previously documented default ordering method.

One last note - There is a hybrid between #243 & #244 (see here) where instead of just using the first context we find that satisfies all other criteria, we try to find one (similar to what is done here in #243) that starts with the same orderby (first field same plus ASC NULLS FIRST). If one is not found, we just grab the first one. The downside here is that we incur a little bit of additional processing to "find" the context and we have to continue to maintain functionality moving forward that we wouldn't otherwise need to if we gave up backwards compat. If this is the preferred approach, I'm happy to make the adjustment and update the PR.

Apologize for all the long comments and multitude of PRs, just wanted to get everything out there for us to be able to make the best decision on.

Look forward to your thoughts.

@afawcett
Copy link
Collaborator

afawcett commented Sep 7, 2015

@jondavis9898 wow what great work, i really appreciate this, just wanted to say i'm sorry for my stop start appearances here, DF prep causes me to black out at times. I've had a good read through this, but want to read a bit more, rest assured i really value this. Do you personally have any desires to see this in a release package sooner rather than later that i should keep in mind if possible?

@jondavis9898
Copy link
Contributor Author

thanks @afawcett, no worries about the stop/start, completely understand especially with DF prep on top of normal juggling of commitments.

I do have a need for the multiple order by enhancement. I don't need it "today" but I'm unable to finish one part of our app without it so as soon as possible would be greatly appreciated.

This is a pretty critical area of DLRS so wanted to make sure I spent the proper time to evaluate all the options and present the pros/cons. I do think that #244 is the best long term option (due to limiting governor usage) but it does come at a "cost" of impacting backwards compat to some degree (not sure how "exposed" it would really be though since the existing code had bugs in that code path anyway).

If you have any questions as you review, please just let me know. Look forward to your thoughts once you've digested.

@jondavis9898
Copy link
Contributor Author

Hello @afawcett, hope you had a good DF and things are returning to normal. Just wondering if you've had a chance to review this in more detail...eager to get this incorporated once you determine appropriate direction. Thanks!

@afawcett
Copy link
Collaborator

Yep, will look on Sunday, thanks for your patience.

@afawcett
Copy link
Collaborator

Just looked through this all again now, going with 244!

@jondavis9898
Copy link
Contributor Author

Awesome, thanks @afawcett, greatly appreciated!

Note that PR #244 contained enhancement/fixes for Issues #216, #239 & #240 so those can all be marked as "ready for next release."

Looking forward to V1.24. Thanks again!

@afawcett
Copy link
Collaborator

Released!

@jondavis9898
Copy link
Contributor Author

Thank you! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants