-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer FIELD-SYMBOLS inside of loops #115
Comments
A first measurement on SAP NW 7.52 suggests that My sample measurement iterated 1000 times over an empty table with 1,000,000 rows that contained a mixture of about 50 character-like and numeric fields. |
Several people confirmed this measurement. We're following up on this with the ABAP language group. Let's see if we can find out more. |
I can confirm that it is faster with fs.. also here (ten years old) http://zevolving.com/2009/12/use-of-field-symbols-vs-work-area/ |
Wait @f4abap, this link compares The fact that the latter is slower is something that should be brought to the attention of whoever is responsible for those "new" features in the ABAP language. I see no reason for one to be faster than the other, both being just an iteration over pointers. |
@nununo Oh.. yes your right. Thanks for the hint. Then of course the link is wrong.. the reference into is not that old:-) |
"...both being just an iteration over pointers." |
|
If field symbols have better performance but the difference is 13% I would dismiss that argument on the grounds that the difference is not significant - the limit for significance being 3 dB (or 50%) as defined by us electronics engineers :-). Field symbols being 13% faster as referenced above means a reduction from 300 milliseconds to 261 milliseconds (quick enough in both cases for a user interface) or from 30 minutes to 26 minutes (slow in both cases, start the report and go for lunch). So I would say the recommendation is well-founded as it aims for code consistency where use of field-symbols is reserved for the special cases where they have a distinct advantage while the rest of the code uses an object-reference oriented rather than memory-pointer oriented approach. |
|
|
I looked into this some time ago and also found the same. But for narrow tables it is even faster to use a simple DATA field instead of a FS or ref. From memory the threshold was for rows between 100-200 chars in length, it may even be in the ABAP Docu somewhere. Regarding impact: As described in the performance section, the vast majority of the performance will be taken up by DB activity and various other things. I think it would be an extreme case if even 1% of an average application's runtime is taken up by loop iteration (i.e. just the LOOP-ing, not the stuff inside the loop). But even in this case, reducing this 1% by 13% means shaving 0.13% off the overall runtime. In reality it's more likely to have <0.01% overall impact on most applications. So I see this guideline valid and in line with the performance guidelines that clean code and readability are the first priority, and if a loop is indeed found to be expensive (very large table with very tiny loop body) then it's a matter of minutes to swap the ref for a FS in a component that follows clean code principles. However: Personally I also consider how it is used. I find if I have to use the entire structure more often than the individual fields then it becomes a bit ugly to dereference |
For information, I don't see code here to support the discussion. So here is a short AUnit code to compare the performance between REFERENCE INTO and ASSIGNING, run it and see speed via the AUnit Test Framework. As far as I can see, I confirm that ASSIGNING is a little bit faster than REFERENCE INTO (in my 7.52 SP 1 kernel 753 SP 500). |
Thanks for that @sandraros. A really neat idea to use AUnit as a profiling tool, like it! :-) So on my system it made 0.4s difference on just under 20 million iterations. This supports my case that the performance difference is almost always negligible, as most things working with that amount of data would spend way more time talking to the DB and doing other stuff. But I did notice something interesting: It's not just the loop, but even the reference accesses inside the loop are slower. If I remove the code inside the loops, the difference between the two drops to 0.1s. It was previously my understanding that the administration overhead of creating the reference is higher than for a FS. I think the docs mentions this too, or maybe the ABAP Objects Book. |
Can a quote from Donald Knuth help close this issue? I think that quote sums up nicely the arguments for the recommendation to use data references. The speed is in most situations not relevant for the loop iteration. |
A style guide that ignores the recommendation of the ABAP language group and the performance checks in syntax check misses the point of being a "standard". We are not optimizing or measuring the loop, but the code in the loop. Attribute access to field symbols are always faster, if the field symbols is assigned. How much faster the loop is, depends on the number of access in the loop. And if you want a speedup of 60% instead of 3%, how about this slightly adapted example with 9 accesses instead of 2? Field symbol: 3,6s This also adds up with parameters passed to call in the loop. |
I'm not sure what recommendation you're referring to? If speed is important and this has a significant impact then by all means use the fastest. But in the vast majority of cases the difference will be negligible. A real application that does 60 million accesses as per your example is probably coupled with a lot of functionality that is likely to make 4s difference tiny in comparison. Here's a real-world example: As it so happens, I'm working on a performance-critical component right now where the target is 20ms for a particular function. Yet I am still using references. Why? Because my bit of code represents <1% of the overall process time. I'm calling other functions and talking to the DB. By the above measurement, the difference between field symbols and references is 0.08µs per access, so in my ... let me check... 100-odd accesses per execution I'm losing <0.001ms from my budget of 20ms. I'm OK with that. Lastly, I see your example program only writes to the table fields. I extended it to repeat the same tests but reading from it. Interestingly, reading a field symbol is slower than writing to it, and on my system I see read times of FS @ 4.13s vs references @ 4.44s. So on my above use case where my accesses are reading, I've had to revise my calculation and am now looking at a 0.0025% slowdown of my overall app. I don't think anyone will notice. And just to test my earlier statement about direct assignment to a variable being fastest for small rows, I also tested reading the table using |
I believe Donald Knuth's point was that 97% of the code does not need speed optimization, and in those 97% the goals of readability and maintainability are more important. The 3% does not refer to speed improvements, but the percentage of programs/methods in which performance becomes an issue. And in those 3% we must sometimes sacrifice some readability and maintainability to achieve the goal of reduced execution time. |
This whole philosophical discussion above is missing my original point: The statement in the styleguide is a contradiction to the recommendation of the ABAP language group and the checks implemented in the extended syntax check. If you do what what the styleguide describes, you (correctly) get a syntax warning. Nobody needs that. Therefore the guide should me make more exact, taking the reality of ABAP into account and not the theory of programming langagues. |
As I asked earlier, which recommendation are you referring to? The ABAP documentation prefers references, as does the ABAP Objects book, and now the Style Guide. |
ATC the the framework not the check and does not replace anything by Code Inspects. 24.04.2018 LOOP ... INTO ... to LOOP ... ASSIGNING ... appears more natural, as field symbols still show value semantics (see ABAP documentation). Reworking a LOOP ... INTO... into a LOOP ... REFERENCE INTO ... Performance of LOOP AT itab ... depends on the size and the type of the table line and the number of lines to be processed. For table lines containing references, LOOP ... INTO ... is likely the slowest option. For ASSIGNING vs REFERENCE INTO, the static check can't make any reasonable decision. REFERENCE INTO is not "surely faster", even in case of a table of references. For reasons of simplicity, the check therefor only suggests LOOP...ASSIGNING." And everything I've implemented in the past 3 year according to this confirms the usage foe field symbols in loop. |
I'd like to push this discussion to hopefully come to a conclusion. The section has been challenged for over 1.5 years now. Let's summarize the previous arguments:
Is there any more information given by SAP themselves in the meantime? |
A nice summary, and valid point that it would be nice to have a conclusion. Personally I think both are applicable in different scenarios but prefer references if both are equally usable. I don't get the one point though:
Technically speaking changing to field symbols is more than twice as hard as changing to references as you need to add two angle brackets in two positions instead of just one
The exception is if we repeatedly need to dereference the full structure via |
I agree it is "twice as hard" if you change it manually. However, you can also use ADT's renaming feature to rename the variable. Base code:
Rename the variable using ADT's renaming feature, ignoring errors (the error being that a variable can't be named like that):
Then you only have to replace the definition with the field-symbol, for example when moving this logic into a loop.
|
OK, I see what you mean, neat trick. If it's a handful then I just add the |
I think all the "microperformance discussions" here are meaningless, because
Concerning
I don't think it is that easy. One could copy the reference to the table line outside of the loop and do stuff with it (e.g. store it inside a class instance. Then the table has to live as long as the instance lives), whereas a field symbol cannot be copied, thus I guess it cannot extend the lifetime of a heap allocated object. Also you can reference a reference through a field symbol but you cannot reference a field symbol through a reference. Anyways, this is a styleguide, so this should really be about semantics, and there I think there is a strong argument for preferring field symbols over references:
Now for field-symbols vs. local variables: As far as I can see, there is no difference between reading / writing to field symbols and local variables, however in the loop header for the field symbol just the reference has to be changed, whereas for a local variable the structure has to be copied. While the former is probably a constant time operation, the later grows with the table size. Thus for large structures, preferring field symbols is probably a good idea. A maybe weaker version of "prefer" would be "consider":
|
I personally prefer field symbols for another reason, they stand out more in the code: Regarding performance, during migration projects we could measure considerable differences between the 3 types of loops and field symbol loops were by far the fastest option(and hence our standard). This is most likely only relevant in large data operation such as migration projects, but should at least be considered, as it is in the discussion here. |
So the general trend in ABAP Platform 2021 is away from Field Symbols according to this blog Here too he talks about measuring performance. Sure there will be the minority of cases where the difference is significant, but in a largely database-focused language the performance impact of loops will be negligible in >99% of the cases. Clean code first, then refactor performance bottlenecks. |
I think it might be important to separate the use cases:
I'm afraid the recommendations being discussed are applicable for case 2 mostly, and both cases might warrant different recommendations. For case 1, I always use simple |
I'm guessing this open discussion is why the recommendation differs between this blog and the book (see at end of chapter 6.1.4). Here in this blog it is still the reference, and in the book, it is the ASSIGN. For the reader of both it is confusing. |
While sticking to the styleguide using REFERENCE INTO as far as possible, we did found another caveat. Example:
The internal solution is to prefer REFERENCE INTO - accepting the discussion and points in this thread, with the exception of instance tables, where we use INTO DATA. |
For LOOP over TYPE TABLE OF REF TO object I tend to use a classic INTO DATA(oref) variable instead of a field symbol or a data reference since the line type is just a object reference itself. I don't think the field symbol provides any benefit over the normal variable here? Edit: Unless of course you want to modify the current line |
We use INTO DATA instead as exception. I updated the snippet to work it out. |
It's also what I do (using |
Due to changes in the ABAP language since this section was written, it seems prudent to split it up into a discussion of field symbols for dynamic data access and a discussion of field symbols as loop targets The loop target discussion is an attempt at synthetizing the primary arguments from SAP#115.
* Clarify field symbols vs. references Due to changes in the ABAP language since this section was written, it seems prudent to split it up into a discussion of field symbols for dynamic data access and a discussion of field symbols as loop targets The loop target discussion is an attempt at synthetizing the primary arguments from #115. * Fix links & TOC
From an SAP developer:
The text was updated successfully, but these errors were encountered: