ASIC Resistant PoW #14

Sc00bz · 2018-05-09T05:34:20Z

About two weeks ago I came up with a proof of work algorithm that is both easy to be currently anti-GPU (see paragraph 4 also second to last paragraph) and fast to verify. This should also be ASIC resistant since most of the cost and power will be for the RAM because ASICs lose a lot of their advantage if they can't have the memory on the same chip.

Most other "ASIC resistant" algorithms directly use a password hashing algorithm like scrypt, Argon2, or yescrypt. Low settings are used so that verification time is fast, but these settings aren't high enough to force ASICs to move the RAM off chip. One thing that gets ignored is that a password hashing algorithms can be a PoW but a PoW doesn't mean it's a password hashing algorithm.

Basically it's a modified Argon2d with a large common block of memory that changes over time. This is the 6th version I've come up with. Like I told myself the last few times I don't think I'll be able to make this better.

To be anti-GPU with all current models we need to use at least 33 GiB. Also AMD released a GPU with 2 TB of flash on it. The flash is too slow to be useful for this, but in the future memristors will end CPU only generation. Also future versions of 3D XPoint memory might be fast and cheap enough to do this.

FirePro W9100 (32 GiB) $2900
FirePro S9170 (32 GiB) $3900
Nvidia Tesla P40 (24 GiB) $6500
Radeon Pro SSG (2 TB of SSD) $7000

This won't be anti-GPU forever even if the memory size increases over time. It might be a good idea to be able to run this on a GPU. Say 7 GiB. Running on a GPU will be much faster since CPUs have ~20 GB/s of bandwidth vs ~480 GB/s for GPUs. Next generation of GPUs will probably be 12-16 GiB. So targeting those might be better. I'm not sure what kind of bandwidth you can get from RAM with an ASIC. I'd guess similar to GPUs but a little higher due to that's the bottleneck and it will get more attention.

In short I'd like to develop this idea further and implement it. Also I heard you're looking into this and thought I'd help.

solardiz · 2018-05-09T14:45:27Z

Thank you for posting this.

I think "ASIC resistance" might be understood differently by different people. It'd be more useful to talk in terms of limiting ASICs, say, to 10x efficiency improvement (energy-efficiency and/or hardware cost) over general-purpose hardware, but then it's hard to estimate those potential improvements with much accuracy. The current ASICs for Zcash are still within 10x, but that's just the start, with my expectation (first stated in the 2016 article) remaining that the 10x to 100x range is likely for Equihash with its current parameters.

So a question may be: is it the concern of growing ASIC advantage from <10x up to 100x that is reason for a possible change, or are we so much unhappy with the <10x already (which would be hard to prevent for any other scheme as well).

Most other "ASIC resistant" algorithms directly use a password hashing algorithm like scrypt, Argon2, or yescrypt. Low settings are used so that verification time is fast, but these settings aren't high enough to force ASICs to move the RAM off chip.

Yes, those settings are not high enough to force ASICs to move the memory off-chip, yet they might find that it's most optimal to do so anyway. For example, yescrypt's memory access speeds are limited by max(MUL time, S-box lookup time) per round anyway (and that's 96 of those per 1 KiB, with current defaults), so an ASIC might opt not to waste unnecessarily faster on-die SRAM on that. WIth 2 MiB to 16 MiB that current yescrypt 0.5'ish based coins use, they will fit quite a number of instances per chip anyway, but it's quite possible they'd find they can fit more with ports to external DRAM chips, or with a mixed approach. They could also opt to fab the whole thing on a DRAM process; the latency of MULs will probably grow, but they could compensate for that by fitting a lot more instances per die. It's hard to predict which solution is the most cost-effective, and this could vary by upfront costs and production volume.

Things change for yescrypt with a ROM, which will force use of more than just on-die SRAM. It'd be about what I call "ROM port hardness".

modified Argon2d with a large common block of memory that changes over time

This sounds similar to yescrypt's ROM, except that the ROM doesn't change over time. Is this how you see it, too? What extra advantages does changing this common block of memory over time provide? As far as I'm aware, multi-gigabyte ROM isn't cheaper than RAM (or other easily re-programmable memory) of same size and speed, although I understand we might want to take this flexibility away just in case, as long as we don't intend to make use of it ourselves. This isn't hard to do with yescrypt either - e.g., base the "ROM" contents on the previous block, although this would make the PoW non-progress-free (a term introduced by Argon2 authors), or alter the "ROM" contents only partially based on each previous block (but in whole over, say, 100 blocks) so that the PoW stays almost progress-free. This is implementable as a wrapper over yescrypt.

Sc00bz · 2018-05-11T21:40:37Z

modified Argon2d with a large common block of memory that changes over time

This sounds similar to yescrypt's ROM, except that the ROM doesn't change over time. Is this how you see it, too?

Yes. It actually use to be a ROM and I was thinking of just using yescrypt.

There is a problem with using a ROM. How do you prove you don't have the seed? I solved this by giving everyone the seed but it costs work to generate the ROM. I was going to do PBKDF2-SHA256 1023 rounds to generate ROM plus an extra 131,031 bytes. Then hash blocks of 131,063 bytes with offsets of a multiple of 32 bytes. "rom2 = rom2 || sha256(substr(rom, 32*i, 131063))". Then interlace bits by grabbing every 1,048,576 bit. Reads into the ROM were 128 bytes so each random read would cost 8,592,031,744 SHA256 blocks ((2048*4096+2048)*1024) if you wanted to generate it on the fly. But I realized I should just use Argon2 to generate the ROM. Which simplified a lot.

What extra advantages does changing this common block of memory over time provide? As far as I'm aware, multi-gigabyte ROM isn't cheaper than RAM (or other easily re-programmable memory) of same size and speed, although I understand we might want to take this flexibility away just in case, as long as we don't intend to make use of it ourselves.

RAM uses a lot more power than ROM. Also you're thinking of standard ROM. Think like how 3D XPoint memory is laid out but instead it's either connected or not. With this data density you'd be able to put the ROM on die. I was already thinking of a way to scrap the ROM because of a maybe patent issue with ROMs. I haven't read it, but I just assume it's as vague as possible. After I saw the Intel Optane thing you Tweeted I decided to drop the ROM because I figured you could get the same data density with just wire traces.

tromer · 2018-06-01T19:00:34Z

The Zcash Foundation Grant Review committee has reviewed your pre-proposal, including the above discussion, to evaluate its potential and competitiveness relative to other proposals. Every pre-proposal was evaluated by at least 4 committee members .

The committee's opinion is that your pre-proposal is not a leading candidate for funding in this round, and the committee therefore does not invite you to submit a full proposal.
This decision is advisory, and you can still choose to submit a full proposal by June 15th, following the detailed structure described in the Call for Proposals. Note that if the full proposal is substantially the same as discussion so far reflects, then it's unlikely to be chosen for funding; and if it isn't, then we encourage you to post a draft (or at least answer any open questions) as early as possible, to allow for community feedback. Regardless of your choice, we thank you for participation thus far.

Sc00bz · 2018-06-02T08:24:22Z

*sad face*

There's probably a zero chance I'll do this for fun so if someone wants to implement, all you need to do is modify Argon2d: remove the "rand*rand" and change the ref block function to anything but previous, current, and not yet generated blocks. Initialize ROM-ish data with constant data and t=4. Then write two functions: update and try. Both mix in a hash of the current blockchain block and a nonce then do N rounds of Argon2d with the ROM-ish data. Make sure your setting are such that the ROM-ish data changes at least once a year to avoid ROM attacks. N should also be at least 128 to avoid attacks small ROM-ish attacks. Also verifying should probably implement some kind of error code check to make sure your RAM didn't become corrupt from random bit flip or whatever. I feel like I'm forgetting something… oh there's an off by one optimization you should use… oh I think it was that you should plan for RAM increases: now do max what a GPU with 8 GiB RAM can; in 2 years do max for 12 GiB; 4 years, 16 GiB; 6, 24; 8, 32… or just try/hope/guess.

I guess if anyone wants to pay me to implement this I'll do it for $3000. I already sunk like 20 hours into the design and partial implementation (to see if I missed something). I probably have like 60-80 hours to finish (CPU and GPU versions), test, optimize, and what not.

(Note I say "ROM-ish" even though it's RAM because it's mostly static because it takes awhile to change)

amiller added the proof-of-work ASIC resistance, etc. label May 10, 2018

solardiz mentioned this issue May 17, 2018

Design and/or help review new PoW scheme #25

Closed

Sc00bz closed this as completed Jun 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASIC Resistant PoW #14

ASIC Resistant PoW #14

Sc00bz commented May 9, 2018

solardiz commented May 9, 2018

Sc00bz commented May 11, 2018

tromer commented Jun 1, 2018

Sc00bz commented Jun 2, 2018

ASIC Resistant PoW #14

ASIC Resistant PoW #14

Comments

Sc00bz commented May 9, 2018

solardiz commented May 9, 2018

Sc00bz commented May 11, 2018

tromer commented Jun 1, 2018

Sc00bz commented Jun 2, 2018