Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer FIELD-SYMBOLS inside of loops #115

Closed
HrFlorianHoffmann opened this issue Dec 18, 2019 · 33 comments · Fixed by #307
Closed

Prefer FIELD-SYMBOLS inside of loops #115

HrFlorianHoffmann opened this issue Dec 18, 2019 · 33 comments · Fixed by #307

Comments

@HrFlorianHoffmann
Copy link
Contributor

From an SAP developer:

For LOOPs, FIELD-SYMBOLS are the better choice and faster - unless the tables are not already references themselves.
There is also an ATC check for this.

Therefore, the section https://github.com/SAP/styleguides/blob/master/clean-abap/CleanABAP.md#prefer-ref-to-to-field-symbol should rather be something like:
Prefer REF outside of loops
Prefer FIELD-SYMBOLS inside of loops

See also the example from the ABAP docu.

@HrFlorianHoffmann
Copy link
Contributor Author

A first measurement on SAP NW 7.52 suggests that LOOP AT ... ASSIGNING FIELD-SYMBOLS(...) might indeed be about 13% faster than LOOP AT ... REFERENCE INTO DATA(...).

My sample measurement iterated 1000 times over an empty table with 1,000,000 rows that contained a mixture of about 50 character-like and numeric fields.

@HrFlorianHoffmann
Copy link
Contributor Author

Several people confirmed this measurement. We're following up on this with the ABAP language group. Let's see if we can find out more.

@f4abap
Copy link

f4abap commented Jan 10, 2020

I can confirm that it is faster with fs.. also here (ten years old) http://zevolving.com/2009/12/use-of-field-symbols-vs-work-area/

@nununo
Copy link

nununo commented Jan 10, 2020

I can confirm that it is faster with fs.. also here (ten years old) http://zevolving.com/2009/12/use-of-field-symbols-vs-work-area/

Wait @f4abap, this link compares field-symbols with work area. Here we were comparing field-symbols with reference into.

The fact that the latter is slower is something that should be brought to the attention of whoever is responsible for those "new" features in the ABAP language. I see no reason for one to be faster than the other, both being just an iteration over pointers.

@f4abap
Copy link

f4abap commented Jan 10, 2020

@nununo Oh.. yes your right. Thanks for the hint. Then of course the link is wrong.. the reference into is not that old:-)
Wrong link copied:
http://zevolving.com/2014/03/use-of-reference-variable-vs-workarea-vs-field-symbols/

@peterdell
Copy link

peterdell commented Feb 4, 2020

"...both being just an iteration over pointers."
That is not fully correct. Fields-symbols are dereferenced during the assignment and not with every call and do not support polymorphism like references. Therefore they are fast in loops.

@tricktresor
Copy link

@peterdell

Filed-symbols [...] do not support polymorphism like references.
What does that mean?
Do you have an example of polymorphism with data references for me please?

@kjetil-kilhavn
Copy link
Contributor

kjetil-kilhavn commented Mar 3, 2020

If field symbols have better performance but the difference is 13% I would dismiss that argument on the grounds that the difference is not significant - the limit for significance being 3 dB (or 50%) as defined by us electronics engineers :-).

Field symbols being 13% faster as referenced above means a reduction from 300 milliseconds to 261 milliseconds (quick enough in both cases for a user interface) or from 30 minutes to 26 minutes (slow in both cases, start the report and go for lunch).

So I would say the recommendation is well-founded as it aims for code consistency where use of field-symbols is reserved for the special cases where they have a distinct advantage while the rest of the code uses an object-reference oriented rather than memory-pointer oriented approach.

@peterdell
Copy link

If field symbols have better performance but the difference is 13% I would dismiss that argument on the grounds that the difference is not significant - the limit for significance being 3 dB (or 50%) as defined by us electronics engineers :-).
This is a misinterpretation. The difference is on the "per access" level, i.e. if the same attribute is accessed several times in the loop, the advantage sums up. Also there is a static performance check by the ABAP language group in place which requests you explicitly to use field symbols is loops (which is the case I am referring to). In don't see the point in a recommendation that is against the check by the ABAP language groups own guidance.

@peterdell
Copy link

@peterdell

Filed-symbols [...] do not support polymorphism like references.
What does that mean?
Do you have an example of polymorphism with data references for me please?
Technically data reference are implemented the same way as object reference.
There is of course no polymorphism on data references. I was referring to object references.
The remark was only meant as motivation why there is of course a difference between references and fields-symbols in ABAP. A difference which does not exist in other languages.

@pokrakam
Copy link
Contributor

I looked into this some time ago and also found the same. But for narrow tables it is even faster to use a simple DATA field instead of a FS or ref. From memory the threshold was for rows between 100-200 chars in length, it may even be in the ABAP Docu somewhere.

Regarding impact: As described in the performance section, the vast majority of the performance will be taken up by DB activity and various other things. I think it would be an extreme case if even 1% of an average application's runtime is taken up by loop iteration (i.e. just the LOOP-ing, not the stuff inside the loop). But even in this case, reducing this 1% by 13% means shaving 0.13% off the overall runtime. In reality it's more likely to have <0.01% overall impact on most applications.

So I see this guideline valid and in line with the performance guidelines that clean code and readability are the first priority, and if a loop is indeed found to be expensive (very large table with very tiny loop body) then it's a matter of minutes to swap the ref for a FS in a component that follows clean code principles.

However: Personally I also consider how it is used. I find if I have to use the entire structure more often than the individual fields then it becomes a bit ugly to dereference thingy->* all over the place, and I prefer to use field symbols in those cases.

@sandraros
Copy link

For information, I don't see code here to support the discussion. So here is a short AUnit code to compare the performance between REFERENCE INTO and ASSIGNING, run it and see speed via the AUnit Test Framework. As far as I can see, I confirm that ASSIGNING is a little bit faster than REFERENCE INTO (in my 7.52 SP 1 kernel 753 SP 500).

@pokrakam
Copy link
Contributor

pokrakam commented Mar 21, 2020

Thanks for that @sandraros. A really neat idea to use AUnit as a profiling tool, like it! :-)

So on my system it made 0.4s difference on just under 20 million iterations. This supports my case that the performance difference is almost always negligible, as most things working with that amount of data would spend way more time talking to the DB and doing other stuff.

But I did notice something interesting: It's not just the loop, but even the reference accesses inside the loop are slower. If I remove the code inside the loops, the difference between the two drops to 0.1s. It was previously my understanding that the administration overhead of creating the reference is higher than for a FS. I think the docs mentions this too, or maybe the ABAP Objects Book.

@kjetil-kilhavn
Copy link
Contributor

Can a quote from Donald Knuth help close this issue?
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
Source: https://en.wikiquote.org/wiki/Donald_Knuth

I think that quote sums up nicely the arguments for the recommendation to use data references. The speed is in most situations not relevant for the loop iteration.
The debate about when to use data references, when to use field-symbols, and when to use a variable/structure - which Mike Pokraka (and in Mike we trust) says can be even faster than field-symbols in specific cases - is a debate that applies when optimizing performance.
The style guide is not about optimizing performance, it has different goals. The last sentence of the quote is not to missed of course, but this whole debate is about those 3% and shouldn't determine the 'default' recommendation.

@peterdell
Copy link

A style guide that ignores the recommendation of the ABAP language group and the performance checks in syntax check misses the point of being a "standard". We are not optimizing or measuring the loop, but the code in the loop. Attribute access to field symbols are always faster, if the field symbols is assigned. How much faster the loop is, depends on the number of access in the loop.

And if you want a speedup of 60% instead of 3%, how about this slightly adapted example with 9 accesses instead of 2?
https://github.com/peterdell/abap/blob/master/ZDELL.abap

Field symbol: 3,6s
Reference: 8,9s

This also adds up with parameters passed to call in the loop.

@pokrakam
Copy link
Contributor

pokrakam commented Mar 31, 2020

A style guide that ignores the recommendation of the ABAP language group

I'm not sure what recommendation you're referring to?
The SAP Help prefers References
https://help.sap.com/doc/abapdocu_754_index_htm/7.54/en-US/index.htm?file=abendyn_access_data_obj_guidl.htm
And my copy of the ABAP Objects book (second edition, 2007) also expresses a preference for References over Field Symbols (ch. 11.1) unless the FS-specific features are needed.

If speed is important and this has a significant impact then by all means use the fastest. But in the vast majority of cases the difference will be negligible. A real application that does 60 million accesses as per your example is probably coupled with a lot of functionality that is likely to make 4s difference tiny in comparison.

Here's a real-world example: As it so happens, I'm working on a performance-critical component right now where the target is 20ms for a particular function. Yet I am still using references. Why? Because my bit of code represents <1% of the overall process time. I'm calling other functions and talking to the DB. By the above measurement, the difference between field symbols and references is 0.08µs per access, so in my ... let me check... 100-odd accesses per execution I'm losing <0.001ms from my budget of 20ms. I'm OK with that.

Lastly, I see your example program only writes to the table fields. I extended it to repeat the same tests but reading from it. Interestingly, reading a field symbol is slower than writing to it, and on my system I see read times of FS @ 4.13s vs references @ 4.44s. So on my above use case where my accesses are reading, I've had to revise my calculation and am now looking at a 0.0025% slowdown of my overall app. I don't think anyone will notice.

And just to test my earlier statement about direct assignment to a variable being fastest for small rows, I also tested reading the table using LOOP...INTO data(ls_row).
Initially it was 4.29 (Field Symbol) vs 4.41 ms (variable) on a row size of 305
Then I removed the 300 char text field from the table to make it really small and got 4.0 vs 3.93, so a variable was fastest.
I added a few text fields in to get a row size just over 1500 chars, and now FS won at 4.05ms vs 5.89ms. So break-even does indeed seem to be around 100-200 chars. But up to that point there's hardly any difference.

@kjetil-kilhavn
Copy link
Contributor

I believe Donald Knuth's point was that 97% of the code does not need speed optimization, and in those 97% the goals of readability and maintainability are more important. The 3% does not refer to speed improvements, but the percentage of programs/methods in which performance becomes an issue. And in those 3% we must sometimes sacrifice some readability and maintainability to achieve the goal of reduced execution time.

@peterdell
Copy link

This whole philosophical discussion above is missing my original point:

The statement in the styleguide is a contradiction to the recommendation of the ABAP language group and the checks implemented in the extended syntax check. If you do what what the styleguide describes, you (correctly) get a syntax warning. Nobody needs that. Therefore the guide should me make more exact, taking the reality of ABAP into account and not the theory of programming langagues.

@pokrakam
Copy link
Contributor

As I asked earlier, which recommendation are you referring to? The ABAP documentation prefers references, as does the ABAP Objects book, and now the Style Guide.
The "inconsistent" thing to me looks like the extended syntax check, but that has been superseded by ATC so I wouldn't place much value on that. SAP also ship a range of checks including variable prefix naming and various others that customers may choose to implement or not in accordance with their own standards and conventions.

@peterdell
Copy link

ATC the the framework not the check and does not replace anything by Code Inspects.
SLIN is one of many check modules in the framework and it's check are 100% valid.
The reference is my internal incident answered by the ABAP language group developer personally:
"Here, LOOP AT ... ASSIGNING is very likely to be faster than LOOP AT ... INTO LO_ROLE.
Deactivatable using pragma ##INTO_OK. Message Code UNR 0261"

24.04.2018
"switching from

LOOP ... INTO ...

to

LOOP ... ASSIGNING ...

appears more natural, as field symbols still show value semantics (see ABAP documentation).

Reworking a

LOOP ... INTO...

into a

LOOP ... REFERENCE INTO ...
is more involved, as you then deal with reference semantics.

Performance of LOOP AT itab ... depends on the size and the type of the table line and the number of lines to be processed. For table lines containing references, LOOP ... INTO ... is likely the slowest option. For ASSIGNING vs REFERENCE INTO, the static check can't make any reasonable decision. REFERENCE INTO is not "surely faster", even in case of a table of references.

For reasons of simplicity, the check therefor only suggests LOOP...ASSIGNING."

And everything I've implemented in the past 3 year according to this confirms the usage foe field symbols in loop.
And, as written above, it is also he default in all new set/value constructors/expression, because it is the natural choice for value semantics.

@ConjuringCoffee
Copy link
Contributor

I'd like to push this discussion to hopefully come to a conclusion. The section has been challenged for over 1.5 years now.

Let's summarize the previous arguments:

  • SAP Help shares the view of the current guideline to use references for "consistency and simplicity reasons" in programs based on ABAP Objects.
  • Both field-symbols and references have a different set of functionality and can't replace each other completely: Field-symbols have functionality for dynamic access of components of a structure. References can be used for dynamically typed data structures.
  • It is harder to transition code based on normal variables (not even necessarily field-symbols) to using references because references use different syntax.
  • The ABAP language group allegedly recommends the usage field-symbols, but no source was provided yet.
  • Field-symbols have been observed to be faster, but the results haven't been conclusive yet.
  • A SLIN developer argues that a static check can't determine when references are faster than field-symbols. For simplicity reasons, field-symbols are recommended by the SLIN check.

Is there any more information given by SAP themselves in the meantime?

@pokrakam
Copy link
Contributor

pokrakam commented Aug 9, 2021

A nice summary, and valid point that it would be nice to have a conclusion. Personally I think both are applicable in different scenarios but prefer references if both are equally usable.

I don't get the one point though:

It is harder to transition code based on normal variables (not even necessarily field-symbols) to using references because references use different syntax.

Technically speaking changing to field symbols is more than twice as hard as changing to references as you need to add two angle brackets in two positions instead of just one > when addressing loop components:

val = row-field.
val = row->field.
val = <row>-field.

The exception is if we repeatedly need to dereference the full structure via row->* then I prefer to use field symbols for readability.

@ConjuringCoffee
Copy link
Contributor

I agree it is "twice as hard" if you change it manually. However, you can also use ADT's renaming feature to rename the variable.

Base code:

DATA sales_order TYPE vbap.
sales_order-kwmeng = 1.

Rename the variable using ADT's renaming feature, ignoring errors (the error being that a variable can't be named like that):

DATA <sales_order> TYPE vbap.
<sales_order> TYPE vbap.

Then you only have to replace the definition with the field-symbol, for example when moving this logic into a loop.

DATA sales_orders TYPE TABLE OF vbap.

LOOP AT sales_orders ASSIGNING FIELD-SYMBOL(<sales_order>).
  <sales_order>-kwmeng = 1.
ENDLOOP.

@pokrakam
Copy link
Contributor

OK, I see what you mean, neat trick. If it's a handful then I just add the >, but a select-search-and-replace also works well. But realistically I see this as a really minor point, this is the kind of thing we refactor if we're changing the code anyway.

@Jonasdoubleyou
Copy link

I think all the "microperformance discussions" here are meaningless, because

  • Advancing the loop is one single instruction per iteration, whereas reading and writing to the data looped over grows with the number of operations in the loop. Thus what matters is what you do inside the loop, just reading once or twice will probably not reflect what the "average loop" in an ABAP program will do
  • Reference Semantics come with a whole different zoo of problems, as they might reference stuff on the heap (thus deciding what can be GC'ed and what cannot). When talking about "performance" one should also consider the impact on the overall system, not just the current execution.
  • Without stating the absolute execution times, providing the code used to run the test as well as the system environment, the performance test is meaningless. Simply put more load on the system while doing one of the tests, and the test will show whatever result you prefer

Concerning

The fact that the latter is slower is something that should be brought to the attention of whoever is responsible for those "new" features in the ABAP language. I see no reason for one to be faster than the other, both being just an iteration over pointers.

I don't think it is that easy. One could copy the reference to the table line outside of the loop and do stuff with it (e.g. store it inside a class instance. Then the table has to live as long as the instance lives), whereas a field symbol cannot be copied, thus I guess it cannot extend the lifetime of a heap allocated object. Also you can reference a reference through a field symbol but you cannot reference a field symbol through a reference.

Anyways, this is a styleguide, so this should really be about semantics, and there I think there is a strong argument for preferring field symbols over references:
Field Symbols have value semantics whereas references have reference semantics.
Structures are values, and as such when working with them, keeping value semantics is generally more consistent (e.g. = vs. ->* =, -> vs. -, comparisons and so on).
Thus I would always use either field-symbols or local variables for looping over structures. That also matches the ABAP documentation:

Use field symbols and data references for the purpose that best matches their semantics:

  • Field symbols for value access (value semantics)
  • Data references for working with the references (reference semantics)

Now for field-symbols vs. local variables: As far as I can see, there is no difference between reading / writing to field symbols and local variables, however in the loop header for the field symbol just the reference has to be changed, whereas for a local variable the structure has to be copied. While the former is probably a constant time operation, the later grows with the table size. Thus for large structures, preferring field symbols is probably a good idea. A maybe weaker version of "prefer" would be "consider":

  • Consider using FIELD-SYMBOLS in LOOPs over structures
    For large structures using field symbol might yield much better performance, whilst keeping value semantics.

@MW-Phoenix
Copy link

I personally prefer field symbols for another reason, they stand out more in the code:
A FS <sales_order> lets me know, that i have to be careful and check for assignment in the code, whereas a reference sales_order does not tell me the same information. If it is an above declared variable, no assign checks are necessary, however if it is a reference, i need to check.
Hence i prefer clearly marked field symbols where they are possible.

Regarding performance, during migration projects we could measure considerable differences between the 3 types of loops and field symbol loops were by far the fastest option(and hence our standard). This is most likely only relevant in large data operation such as migration projects, but should at least be considered, as it is in the discussion here.

@pokrakam
Copy link
Contributor

pokrakam commented Oct 30, 2021

So the general trend in ABAP Platform 2021 is away from Field Symbols according to this blog

Here too he talks about measuring performance. Sure there will be the minority of cases where the difference is significant, but in a largely database-focused language the performance impact of loops will be negligible in >99% of the cases. Clean code first, then refactor performance bottlenecks.

@jordao76
Copy link
Contributor

I think it might be important to separate the use cases:

  1. loop that only reads the loop variable
  2. loop that mutates the loop variable

I'm afraid the recommendations being discussed are applicable for case 2 mostly, and both cases might warrant different recommendations.

For case 1, I always use simple into data work area variables, and never had any problems. It's also much easier to read and understand. In my experience, case 1 loops are far more common.

@JSB-Vienna
Copy link

JSB-Vienna commented Jul 17, 2022

I'm guessing this open discussion is why the recommendation differs between this blog and the book (see at end of chapter 6.1.4). Here in this blog it is still the reference, and in the book, it is the ASSIGN. For the reader of both it is confusing.

@m-hesse
Copy link

m-hesse commented Jul 18, 2022

While sticking to the styleguide using REFERENCE INTO as far as possible, we did found another caveat.
When there is a table of instances, REFERENCE INTO will get you the reference of an reference, resulting in cluttered code.

Example:

DATA: instancetable TYPE TABLE OF REF TO ltc_main.
"Fill the table...
LOOP AT instancetable REFERENCE INTO DATA(refdata).
      refdata->*->value = refdata->*->value + 1.
ENDLOOP.

"Instead using
LOOP AT instancetable INTO DATA(refdata).
      refdata->value = refdata->value + 1.
ENDLOOP.

The internal solution is to prefer REFERENCE INTO - accepting the discussion and points in this thread, with the exception of instance tables, where we use INTO DATA.

@fabianlupa
Copy link
Contributor

fabianlupa commented Jul 18, 2022

For LOOP over TYPE TABLE OF REF TO object I tend to use a classic INTO DATA(oref) variable instead of a field symbol or a data reference since the line type is just a object reference itself. I don't think the field symbol provides any benefit over the normal variable here?

Edit: Unless of course you want to modify the current line

@m-hesse
Copy link

m-hesse commented Jul 18, 2022

For LOOP over TYPE TABLE OF REF TO I tend to use a classic INTO DATA(oref) variable instead of a field symbol or a data reference since the line type is just a object reference itself. I don't think the field symbol provides any benefit over the normal variable here?

We use INTO DATA instead as exception. I updated the snippet to work it out.

@sandraros
Copy link

It's also what I do (using LOOP AT instancetable INTO DATA(refdata)), and for me it's quite obvious to use it instead of REFERENCE INTO DATA(refrefdata) which leads to awful code.

bjoern-jueliger-sap added a commit to bjoern-jueliger-sap/styleguides that referenced this issue Apr 4, 2023
Due to changes in the ABAP language since this section was written,
it seems prudent to split it up into a discussion of field symbols for
dynamic data access and a discussion of field symbols as loop targets
The loop target discussion is an attempt at synthetizing the primary arguments
from SAP#115.
sautermi0 pushed a commit that referenced this issue Aug 7, 2023
* Clarify field symbols vs. references

Due to changes in the ABAP language since this section was written,
it seems prudent to split it up into a discussion of field symbols for
dynamic data access and a discussion of field symbols as loop targets
The loop target discussion is an attempt at synthetizing the primary arguments
from #115.

* Fix links & TOC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.