Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unchecked array access #9117

Closed
nventuro opened this issue Jun 3, 2020 · 30 comments
Closed

Unchecked array access #9117

nventuro opened this issue Jun 3, 2020 · 30 comments
Labels
closed due inactivity The issue/PR was automatically closed due to inactivity. language design :rage4: Any changes to the language, e.g. new features stale The issue/PR was marked as stale because it has been open for too long.

Comments

@nventuro
Copy link
Contributor

nventuro commented Jun 3, 2020

Solidity currently performs bounds checks on array access, asserting that the index is smaller than the lenght of the array. In some scenarios, this check can be skipped by the developer to save gas, given additional guarantees on the range of the indices.

For example, consider the following snippet where a slice of an array is processed:

require(start + size <= array.length);
for (uint i = 0; i < size; ++i) {
  foo(array[start + i]);
}

Discussion around having checked arithmetic, with opt-in unchecked areas (#9054) brought up the idea of using these same areas to have unchecked array access. Given this, the previous snippet might be written as follows:

require(start + size <= array.length);
  for (uint i = 0; i < size; ++i) {
    unchecked {
      foo(array[start + i]);
    }
}

A potential issue with this approach is that the meaning of unchecked would be extended to more than just arithmetic checks, increasing information burden on the users. Some people have proposed giving arguments to unchecked, to signal which checks are being turned off (e.g. unckecked('array-access').

I believe a plain unchecked is good enough for all of these cases, since we'll want to make unchecked areas as small as possible: there shouldn't be more than one 'checked' operation inside them.

@chriseth
Copy link
Contributor

I have a bad feeling allowing this. Unchecked overflow is a very different thing than unchecked array access. My hope is that the new optimizer will find most of these cases and remove the check.

@leonardoalt
Copy link
Member

Agree with @chriseth
I think a much better solution in this direction would be range loops

@nventuro
Copy link
Contributor Author

Indeed, @axic mentioned those as an alternative here as well.

@mudgen
Copy link

mudgen commented Oct 24, 2020

What is the status of this? I don't like to pay gas for array bounds checking when it isn't needed.

@nventuro
Copy link
Contributor Author

What's the recommended way to do unchecked access in 0.7?

@mudgen
Copy link

mudgen commented Oct 25, 2020

I'm moving my comment here from the duplicate issue I made:

When looping through an array using the array length as the max bound, it is unnecessary to pay the gas for array bounds checking.

For example:

uint sum;
for (uint i; i < myarray.length; i++) {
  sum += myarray[i]
}

Is it possible for the optimizer to optimize away the bounds access? A manual way to turn it off? New range loop? What's the plan and timeline for this now?

In loops with many iterations, or many array index accesses, or nested loops this might really matter. I don't like paying gas for unneeded checking, which is most of the time I'm using arrays.

@leonardoalt
Copy link
Member

As far as I know there is no plan/timeline for this.

@mudgen your example is basically the same as the initial one in the issue, so yes we're all aware of that case.

I still think unchecked array access/manual way to turn it off is a really bad idea. Even if the optimizer can detect some cases, the general problem is still undecidable.
In my opinion the only proper solution is range loops.

@axic
Copy link
Member

axic commented Oct 25, 2020

In loops with many iterations, or many array index accesses, or nested loops this might really matter. I don't like paying gas for unneeded checking, which is most of the time I'm using arrays.

Loops with many iterations is a discouraged pattern in the first place. Reason: it very well could be possible that such a piece of code will lock up due to future gas changes.

I think a much better solution in this direction would be range loops

Indeed, @axic mentioned those as an alternative here as well.

Btw, here's a potential syntax for range based loops:

uint[] array;

for (uint value: array) {}

// And with slicing
for (uint value: array[start:end]) {}

Of course for value types this doesn't support references, but references/pointers is probably something we should solve separately in general.

@mudgen
Copy link

mudgen commented Oct 25, 2020

@axic I think that is a very nice syntax for range loops. It would also be nice and useful if there is some way to get the iteration index of each iteration. I suppose it can be done manually be adding a variable and incrementing.

@nventuro
Copy link
Contributor Author

Is there a simple way available today to do an unchecked read? Even if it involves an inline-assembly sload. Short loops (n < 10) are not uncommon, and sload is one of the most expensive operations in the EVM - the added gas overhead is in most cases unacceptable.

@chriseth
Copy link
Contributor

If you know the loop is short and want to optimize sloads radically, what about a fixed-size array?

@nventuro
Copy link
Contributor Author

Sadly they are not fixed-size but dynamic, forcing static arrays with some upper-bound imposes restrictions and extra costs that are also undesirable.

Besides, that's not really related to the point being made here about many code patterns leading to scenarios where the indexes are trivially known to be below the length, making these extra reads and checks wasteful.

@hrkrshnn
Copy link
Member

hrkrshnn commented Oct 26, 2020

scenarios where the indexes are trivially known to be below the length

Can you write more about these scenarios? (other than examples with require.) Do you think the compiler will be able to detect these cases?

@leonardoalt
Copy link
Member

@nventuro I highly disagree that these are trivial. If we're talking about only and exactly your example, sure, there's an AST pattern there. If anything changes, for example, i is incremented differently, there are branches inside the loop, the condition is a bit different, or even the same thing is simply written differently, the AST is different and you have to solve the general case. That can very easily become a very hard problem, since you're asking the compiler to automatically prove the loop invariant start + i < array.length.

@mudgen
Copy link

mudgen commented Oct 26, 2020

In my code if there is a chance that an array index could be out of bounds I will want to make a require statement to check for that and provide a helpful error message explaining the contextual specifics of the error. This makes the built-in array bounds checking redundant and wastes gas. Hence I would like to be able to turn it off.

@axic
Copy link
Member

axic commented Oct 26, 2020

Is there a simple way available today to do an unchecked read? Even if it involves an inline-assembly sload. Short loops (n < 10) are not uncommon, and sload is one of the most expensive operations in the EVM - the added gas overhead is in most cases unacceptable.

@nventuro with EIP-2929 subsequent SLOADs won't be as expensive. This EIP is being discussed for the next hard fork.

@leonardoalt
Copy link
Member

@mudgen so you'd write your own requires all over a sort implementation, for example? Sounds unlikely to me. Or would you just trust your code, and assume all accesses are safe? Sounds quite unsafe to me. Or would you formally verify your sorting function? Sounds not very doable to me, at least automatically.

@mudgen
Copy link

mudgen commented Oct 26, 2020

@leonardoalt I don't know. I'd have to look at the specific implementation. If I'm looping over an array using its length as the max bound then I don't need bound checking.

I'd like to be able to use automatic bound checking where it makes sense and not use it where it doesn't make sense.

@leonardoalt
Copy link
Member

So you want an entirely new completely unsafe feature for this one very specific use case.

@mudgen
Copy link

mudgen commented Oct 26, 2020

It is easy to use safely and it has wide use. For example looping over an array using its length for the max bound is very common.

@leonardoalt
Copy link
Member

Now we're just going in loops, I already voice my opinion.

@nventuro
Copy link
Contributor Author

the AST is different and you have to solve the general case. That can very easily become a very hard problem, since you're asking the compiler to automatically prove the loop invariant

To be clear, I'm not asking the compiler to do this for me. I understand the general problem is very hard (and can be made harder if the contents of the loop can potentially alter the array in question), and even a great solution would not cover all cases.

What I'm saying is, there's concrete scenarios where I as a developer know unchecked access is fine, and would like to be able to opt-in to avoid performing work that is (based on my analysis) wasteful.

with EIP-2929 subsequent SLOADs won't be as expensive

I was not aware of the 'warm' part of that EIP, thanks!

@cameel cameel added feature language design :rage4: Any changes to the language, e.g. new features labels Oct 27, 2020
@nventuro
Copy link
Contributor Author

Are there any plans to do range-based loops in 0.8? I'm able to replace checked storage array access by mimicking arrays with index -> value mappings plus a length field, but for memory arrays that doesn't work as well.

As a crutch, it'd be great to otherwise have some form of unsafe access, leaving it up to the developer. I've managed to reduce gas costs for our most sensitive use case by 20% just with unchecked storage access.

@axic
Copy link
Member

axic commented Oct 30, 2020

@nventuro it is not planned atm. Also created a new issue to discuss range based loop specifically (#10162), given this issue is a different discussion.

@axic
Copy link
Member

axic commented Oct 30, 2020

Also there's this comment from @chriseth on the previous discussion #9054 (comment):

@nventuro are you talking about the check in ++i? In the new code generator, the compiler should be able to get rid of it and for the old generator, if it is not possible to optimize that, then the following should work:

for (uint256 i = 0; i < array.length; unchecked { ++i } ) {
  ...
}

@frangio
Copy link
Contributor

frangio commented Jun 30, 2022

What is the current thinking on this feature? Array-heavy code could definitely benefit from it.

@Amxx
Copy link

Amxx commented Jun 30, 2022

I'd like to add that it is not just about for loops. Some other pieces of code would benefit from that:

In the case of this, checking the bound would cost one sload per iteration which is 100gas each time.

@zemse
Copy link
Contributor

zemse commented Jul 28, 2022

It might be obvious but just to mention, unchecked array access can be deadly if user inputs reach the array index, basically attacker could have the contract read a value from any slot they want and contract would process it. But yeah as long as only dev gets to input the index, it'd be great if the language provides "unsafe" methods. This would also hint devs that they should be careful and prevent user from manipulating the array index.

But for those who are gas golfing/need this right now, they can anytime drop to assembly and do stuff. For e.g.

contract MyContract {
    using Uint256Array for uint[];
    using Uint256Array for uint;

    uint[] public myArray;

    function sum() external  view returns (uint result){
        uint len = myArray.length;
        uint arrayPointer = myArray.pointer();
        for(uint i; i < len; i++) {
            result += arrayPointer.unsafeAccess(i);
        }
    }
}

library Uint256Array {
    function pointer(uint[] storage arr) internal view returns (uint256 result) {
        assembly {
            mstore(0, arr.slot)
            result := keccak256(0, 0x20)
        }
    }

    function unsafeAccess(uint p, uint index) internal view returns (uint result) {
        assembly {
            result := sload(add(p, index))
        }
    }
}

@github-actions
Copy link

github-actions bot commented Mar 6, 2023

This issue has been marked as stale due to inactivity for the last 90 days.
It will be automatically closed in 7 days.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Mar 6, 2023
@github-actions
Copy link

Hi everyone! This issue has been automatically closed due to inactivity.
If you think this issue is still relevant in the latest Solidity version and you have something to contribute, feel free to reopen.
However, unless the issue is a concrete proposal that can be implemented, we recommend starting a language discussion on the forum instead.

@github-actions github-actions bot added the closed due inactivity The issue/PR was automatically closed due to inactivity. label Mar 14, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed due inactivity The issue/PR was automatically closed due to inactivity. language design :rage4: Any changes to the language, e.g. new features stale The issue/PR was marked as stale because it has been open for too long.
Projects
None yet
Development

No branches or pull requests