Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Encode restore height as 26th word of the mnemonic seed #6639

Open
dEBRUYNE-1 opened this issue Jun 10, 2020 · 114 comments
Open

[PROPOSAL] Encode restore height as 26th word of the mnemonic seed #6639

dEBRUYNE-1 opened this issue Jun 10, 2020 · 114 comments

Comments

@dEBRUYNE-1
Copy link
Contributor

The restore height is currently a value that has to be entered manually for wallets that are restored from either the keys or the mnemonic seed. The wallet will essentially ignore blocks (only pulling block hashes) before the restore height and start scanning (looking for transactions that belong to the wallet) from the restore height block.

User experience is degraded if the user accidentally sets a restore height that is too 'high' (i.e. after the first transaction to the wallet), as the wallet will 'miss' certain or all transactions, thereby causing an improper balance (as well as transaction history) to be displayed.

In order to improve user experience, we could encode an approximate restore height as additional word of the mnemonic seed. The restore height would then be set automatically upon restoring the wallet, thereby ensuring users will not inadvertently set an erroneous restore height.

I personally do not see many drawbacks of this proposal. Guides will have to be updated to reflect the new format and users need to be informed. Users further, initially, may be slightly confused due to two different seed formats being present. However, I think ultimately the proposal is net beneficial to user experience.

@fluffypony
Copy link
Contributor

My suggestion is to encode it as follows: the position of the word in the wordlist * 21915 = starting block height. 21915 blocks is about a month's worth of blocks (half a month when the block time was 1 minute), so it gives us a good 135 years worth of coverage.

@SChernykh
Copy link
Contributor

This is too obscure, just use Jun/2020 or something like this instead of a 26th word in the seed.

@rbrunner7
Copy link
Contributor

As discussed on IRC, summarizing it here for broader publicity and discussion:

I am in full favor of adding one seed word to encode restore height.

But if we touch the seed system and add a "new" kind of seed encoding the restore height, I vote for taking the chance and add two more worthwhile changes at the same time. (Changing anything with seeds will be a larger endeavor, and IMHO it would be a strategic mistake to come back to this with "new new" seeds a year later or so).

The "checksum" as implemented with the checksum word being simply a copy of one of the other words is very weak i.e. it does not catch a lot of errors. This can stay a single checksum word, but it should be calculated using a much more robust algorithm going over all words of the seed.

Furthermore, one more word should get added as the first word of the seed, encoding a seed version. The words used for the version should be different from all other seed words so you can reliably detect whether the first word given is such a version word or not.

This will enable a very robust UX. You can for example generate useful error messages if somebody enters only the first 25 words of a "new" seed for whatever reason, be it conviction that "more than 25 words are wrong", or input forms just not allowing for more words because not yet reworked / upgraded for "new" seeds.

Seed versions would also allow for adjustments in the word list, for whatever crazy reasons that may pop up, like some words becoming "politically incorrect", or more or less banned outright e.g. for Chinese seeds.

IMHO we should stick with words for both version and restore height encoding. Why? Because if it is anything else people will recognize it as something special and because of this some people may not treat them with the same care as the other words and e.g. simply not enter them, based on false assumptions like "I thought that's not part of the seed proper".

Maybe we should even go as far as avoiding that the exactly same word gets added as the version word to each and every Monero "new" seeds, possibly for years, because again people might get confused whether that seemingly constant world really belongs to the seed and is really necessary. People also could fear that Monero seeds are weaker than other coins' seed because of a word being constant.

This could be solved by using only the first letter of the word as the version and e.g. randomly chose from several words starting with that letter.

@nim4
Copy link

nim4 commented Jun 11, 2020

Maybe instead of adding a new word we can use first letter of each word to encode the timestamp in days(upper case=1, lower case=0).

For example using (unix timestamp / (60 * 60 * 24))

general nomad tail jargon nodes lion scrub juicy palace puffin shipped rift vampire maze axes deity viewpoint timber textbook opened awesome gang object odds object

will be

general nomad tail jargon nodes lion scrub juicy palace puffin Shipped rift vampire maze Axes Deity Viewpoint Timber Textbook Opened Awesome Gang object odds object

@fluffypony
Copy link
Contributor

@nim4 clever idea, but having helped people who have inherited wallets from a deceased spouse you can bet that case sensitivity never factored into it.

@fluffypony
Copy link
Contributor

Regarding the seed version, why do we want to pick a word that isn't on the wordlist? We could just pick a random word, and use the same offset in other wordlists, which means no additional translation work.

I've also tossed around the idea of using a single word for both the version and the block height offset chunk. We could, for instance, use the first 3 bits for the version and last 7 bits for the offset (128 possible offsets, so maybe group it per year). Alternatively, if we really want to eek as much out of it as possible, we could divide the wordlist into 5 groups (so maximum of 5 different versions for this format), and then use the offset in each group, which would give us 325 words per group, so we each offset would be ~3.5 months.

@SChernykh
Copy link
Contributor

Another idea would be to use 27 word seed. 1626^27 ~ 1.008 * 2^288, so we have 256+32=288 bits of storage there. Additional 32 bits could be used for 16-bit checksum (CRC-16 or similar), and 16-bit restore height with 5000 blocks (1 week) precision.

@knaccc
Copy link

knaccc commented Jun 11, 2020

  1. How much of this UX improvement could come from simply asking a wallet to scan backwards instead of forwards?

  2. More of a stray thought than a proposal: the first 24 words of the base 1626 seed encode 256.01 bits of information, but a seed only needs to be 252 bits. So we have 3 bits extra there. It's fast and easy to brute force seed selection such that the seed mod (135*12) = restore block height / 21915. Since we have 3 bits extra already, this brute force only loses us 8 bits of entropy on the seed. It's already questionable as to how important it is for Monero to have a 256-bit seed instead of a hashed 128-bit seed.

@rbrunner7
Copy link
Contributor

Regarding the seed version, why do we want to pick a word that isn't on the wordlist?

Because it has many advantages to be able to reliably recognize the word as a version word, or in reverse see that the first given word is not a version for sure. This allows to detect all kinds of possible confusions, wrongly entered seeds, cut-off seeds etc.

I think especially with something as critical and sensitive as seeds we want our UX (and the transition from "old" seeds to "new" seeds) to be as robust as possible.

@sumogr
Copy link
Contributor

sumogr commented Jun 11, 2020

How on earth will the cli know the top height if i just want to generate a cold wallet without a daemon running . The above discussion requires an already connected cli wallet to an already fully synced daemon (maybe get the date from the system's timestamp? wouldnt that be dangerous?)

@rbrunner7
Copy link
Contributor

How on earth will the cli know the top height if i just want to generate a cold wallet without a daemon running . The above discussion requires an already connected cli wallet to an already fully synced daemon (maybe get the date from the system's timestamp? wouldnt that be dangerous?)

You are right, I forgot to mention this from the IRC discussion: There are various situations where restore height is not known. Beside your cold-wallet example, programs generating random seeds offline come to mind.

0 must therefore be a valid value for the encoded restore height, with a meaning of "restore height unknown". This can then be used e.g. to prompt for the restore height when restoring.

@trasherdk
Copy link
Contributor

Does the restore height have to be part of the check-summed seed?
Couldn't it just be a 32 bit hexadecimal number appended as 26st. word?
If it's there, it's the restore height. If not, ask.

@fluffypony
Copy link
Contributor

@trasherdk a single word is only 10 bits of entropy, so can't encode the actual restore height, but yes - this proposal is about adding an additional word for the restore height, plus a 27th word for versioning.

@trasherdk
Copy link
Contributor

The 25 word seed is pretty much set in stone for all eternity, unless you are willing to abandon all those paper-wallets out there, hidden in madrases or something. Right?

@fluffypony
Copy link
Contributor

@trasherdk I don't understand how this affects paper wallets? It's not like the old seed format would no longer be supported, there'd just be a new, default seed format. We already did this with the old English and new English wordlists, the old English wordlist still exists and you can restore an old paper wallet any time you want.

@rbrunner7
Copy link
Contributor

The 25 word seed is pretty much set in stone for all eternity, unless you are willing to abandon all those paper-wallets out there, hidden in madrases or something. Right?

Yes. There will be "new" seeds and "old" seeds with us forever. That's one reason why I am so vocal in favor of a system that is able to distinguish in a crystal-clear way between both sorts.

The first 25 words of a "new" seed should better not be a valid "old" seed for a system that, for whatever reason, never learned about "new" seeds. New version words outside the current word lists would nicely take care of this, because they make "new" seeds flat-out invalid for an "old" system. You won't be able to do something that only looks like a correct restore with the first 25 words of a "new" seed on an old system.

@fluffypony
Copy link
Contributor

@rbrunner7 it's already 2 words longer than the "old" seeds, so I don't think we need to worry about validity. Also if we move the checksum to the end, and make it a checksum valid for the whole of the new seed (and not just the key portion), then it'll fail checksum validation on an older wallet anyway.

I would like to keep the discussion going around versioning, as I've not yet heard an argument for an out-of-band word that makes sense to me, or even an argument for putting the version in an entire word instead of using the extra bits we gain from adding 1 word for both versioning AND initial block offset chunk.

@trasherdk
Copy link
Contributor

Okay, so far. Is there any reason the 26st. word cant be 00205263 for Height 2118243 ?

@knaccc
Copy link

knaccc commented Jun 11, 2020

Okay, so far. Is there any reason the 26st. word cant be 00205263 for Height 2118243 ?

Excellent point. Or as @asymptotically508 wrote on reddit, "I just write the date on the same paper as the seed.".

@rbrunner7
Copy link
Contributor

I would like to keep the discussion going around versioning

Fair enough.

Just for completeness sake: The GUI wallet currently does not insist on the 25th / checksum word, it also accept the "naked" 24 words. Not sure about the CLI wallet.

@rbrunner7
Copy link
Contributor

Excellent point. Or as @asymptotically508 wrote on reddit, "I just write the date on the same paper as the seed.".

Sure, but this assumes that people know about restore heights and their importance in the first place. Count the people on the Monero subreddit that don't and e.g. fail to correctly restore a wallet. (If they knew, and just did not know the correct restore height, they could easily go back far enough to be safe. It seems they often don't.)

Which is an important part of the motivation to touch the seed system and integrate the restore height, to do away with such problems as best as possible.

@knaccc
Copy link

knaccc commented Jun 11, 2020

@rbrunner7 I agree, writing the date down and getting it wrong later can create problems.

I'll therefore revert to proposing the much more foolproof solution of changing nothing with the seed and just making the wallet scan from the current block backwards.

@rbrunner7
Copy link
Contributor

just making the wallet scan from the current block backwards.

Maybe I stupidly overlook something, but I have no idea how you would know when to stop scanning. How can you be sure my first transaction is not in block #1?

@knaccc
Copy link

knaccc commented Jun 11, 2020

just making the wallet scan from the current block backwards.

Maybe I stupidly overlook something, but I have no idea how you would know when to stop scanning. How can you be sure my first transaction is not in block #1?

What does it matter whether the wallet still has to scan the entire blockchain? If this is about UX, all that matters is that we show people what looks like their balance as quickly as possible.

If Monero has an Eternal September then this solves the waiting problem for most.

@rbrunner7
Copy link
Contributor

What does it matter whether the wallet still has to scan the entire blockchain? If this is about UX, all that matters is that we show people what looks like their balance as quickly as possible.

Interesting approach which I might be able to agree with, if it were not for the weak checksum problem and the advantages that some sort of versioning brings as additional arguments to improve seeds.

@fluffypony
Copy link
Contributor

Okay, so far. Is there any reason the 26st. word cant be 00205263 for Height 2118243 ?

Yes, that's not a word, and can't be encoded into many physical wallets (eg. Cryptosteel).

@knaccc
Copy link

knaccc commented Jun 11, 2020

Interesting approach which I might be able to agree with, if it were not for the weak checksum problem and the advantages that some sort of versioning brings as additional arguments to improve seeds.

I agree your proposal is better, if we were starting from scratch. I just don't think that due appreciation has been given to the confusion that will be caused when all of the documentation and tutorials and paper wallets suddenly have to start talking about 25 vs 26 word seeds.

@rbrunner7
Copy link
Contributor

I just don't think that due appreciation has been given to the confusion

A difficult assessment for sure. I hope for many people voicing their opinions here and on the Monero subreddit. I think Monero might have it easier here than many other coins because users were subjected to frequent changes anyway so far, with all our hardforks ...

@sumogr
Copy link
Contributor

sumogr commented Jun 11, 2020

Humbly and just to give my two pennies worth

void simple_wallet::print_seed(const epee::wipeable_string &seed)
{
  auto timenow =  chrono::system_clock::to_time_t(chrono::system_clock::now()); 
  success_msg_writer(true) << "\n" << "Seeds generated at: " << ctime(&timenow) << "\n"; 
  success_msg_writer(true) << "\n" << boost::format(tr("NOTE: the following %s can be used to recover access to your wallet. "
    "Write them down and store them somewhere safe and secure. Please do not store them in "
    "your email or on file storage services outside of your immediate control.\n"
    "When restoring from seeds please use the date above to avoid needlessly scanning the entire chain.\n")) % (m_wallet->multisig() ? tr("string") : tr("25 words")); 

No extra word, no confusion, monero has already too many seed words compared to btc clones.

@fluffypony
Copy link
Contributor

I don't buy the "let's not add extra words" story - 25, 26, or 27 words makes no difference to the end user. I also don't think that trying to force the user to write down a Unix timestamp is useful either, as that genuinely is an additional piece of out-of-band data that users will not always be able to write down (eg. if they use a CryptoSteel), nor can we communicate to them easily what "needlessly scanning the entire chain" actually means.

I would encourage people to have a non-technical friend try use the Monero GUI, and you'll quickly see how frightening mnemonic seeds are already. If we can make them easier to use then that's a win. And to be sure, abstracting any complexity around figuring out what seed it is will be abstracted away from the user, just like we don't ask them to specify the seed language before entering it in. They just type in their seed, and the wallet will figure out everything else.

@Adreik
Copy link

Adreik commented Jun 19, 2020

If introducing a new seed system anyway, why not also introduce a 49/50 word seed standard and have the private view key generated non-deterministically if using that seed type?

@fluffypony
Copy link
Contributor

@Adreik there's no real benefit from that, it's not like you can practically crack the spend key if you have the view key. Plus you can always generate the two keys non-deterministically right now using the CLI wallet, and back them up however you want. Someone is welcome to write a Javascript mnemonic encoder for such a task for the 2 people that will use it.

@knaccc
Copy link

knaccc commented Jun 23, 2020

@tevador It occurred to me that it would be useful if when a user creates a couple of new wallet seeds, those new seeds are likely to have different first words. This will make them easier to distinguish from each other, and also has the side effect of not falsely creating the impression that Monero seeds are supposed to always start with a particular word.

@knaccc
Copy link

knaccc commented Jun 23, 2020

@tevador I've just created a javascript implementation of your code. The test file shows how to use it. Run node test/test to run the tests.

I'm unable to parse your example test mnemonic though, so there must be a small difference between our implementations. It could be because we are using different Reed Solomon implementations.

According to your implementations, are these supposed to be valid RS encodings?
0,0,0,0,0,0,0,0,0,0,0,0,0,0 and 0,1,2,3,4,5,6,7,8,9,10,11,12,52

A quick way to test the RS JS implementation I'm using is to run this code:

const reedSolomon = require('reedsolomon');
const reedSolomonEncoder = new reedSolomon.ReedSolomonEncoder(new reedSolomon.GenericGF(2053, 2048, 1));
const mnemonicWordsLen = 14;
const mnemonicErrorCorrectionWordsLen = 1;
function reedSolomonEncode(unflaggedDataInt32Array) {
  let r = unflaggedDataInt32Array.slice();
  reedSolomonEncoder.encode(r, mnemonicErrorCorrectionWordsLen);
  return r;
}
var a = new Uint8Array(mnemonicWordsLen);
console.log(reedSolomonEncode(a)+'');
for(let i=0; i<a.length-1; i++) a[i] = i;
console.log(reedSolomonEncode(a)+'');

When you're applying the coin flag, you're doing that to the second mnemonic word, right?

@rbrunner7
Copy link
Contributor

@tevador It occurred to me that it would be useful if when a user creates a couple of new wallet seeds, those new seeds are likely to have different first words.

You mean if a program gets an order to produce 10 seeds, has produced 9 already, and then goes on to produce the 10th one, it would throw away candidates that are too similar to any of the previous ones and keep generating randomly until it get a really "different" seed?

Right now I can't see how, for single seeds produced independently, you can do better than just using a good source of randomness and hope for different first words.

@tevador
Copy link
Contributor

tevador commented Jun 23, 2020

@knaccc

It occurred to me that it would be useful if when a user creates a couple of new wallet seeds, those new seeds are likely to have different first words

If you test my implementation, you will see that the first word changes for different seeds. That's because the first word is actually the checksum. The second word encodes the flags and the high bits of the wallet birthday, so it will stay mostly the same for seeds created in the same year.

My implementation orders the coefficients in ascending order, i.e. the constant term is first, then the linear term etc. This may also explain why you are getting different results. Try reversing the order of the words.

When you're applying the coin flag, you're doing that to the second mnemonic word, right?

Correct.

@knaccc
Copy link

knaccc commented Jun 23, 2020

Right now I can't see how, for single seeds produced independently, you can do better than just using a good source of randomness and hope for different first words.

I had incorrectly assumed that the seed data would start with the 00000 reserved bits, followed by the 0000000000 birthday bits. This would have meant that all seeds created in the same month would have always started with the same first word.

Now I see that tevador's implementation is ordering the checksum word such that the first word would in fact be evenly distributed.

@knaccc
Copy link

knaccc commented Jun 23, 2020

@tevador I'm not a C coder, so I'm struggling to test your RS library. Please could you tell me:

If I start with the data 0,1,2,3,4,5,6,7,8,9,10,11,12, then after it has been RS encoded, what should the array look like? Currently mine looks like 0,1,2,3,4,5,6,7,8,9,10,11,12,52.

I've tried altering my code so that it looks like either 52,0,1,2,3,4,5,6,7,8,9,10,11,12 or 52,12,11,10,9,8,7,6,5,4,3,2,1,0 but that does not seem to fix the issue.

@tevador
Copy link
Contributor

tevador commented Jun 23, 2020

I'm getting {358,0,1,2,3,4,5,6,7,8,9,10,11,12}. My generator polynomial is {2,1}.

Have you tested 12,11,10,9,8,7,6,5,4,3,2,1,0,358?

@knaccc
Copy link

knaccc commented Jun 23, 2020

@tevador I'm out of my depth here with understanding the Reed Solomon implementation enough to know what my 'generator polynomial' is.

According to https://www.mathworks.com/help/comm/ref/rsgenpoly.html the default primitive polynomial for a Galois Field GF(2^11) is D^11 + D^2 + 1, which has an 'integer representation' of 2053.

I'd initialized my RS encoder with GenericGF(2053, 2048, 1), meaning primitive=2053, size=2048, generatorBase=1. The size appears to affect the size of the expTable and logTable.

I'm not sure if I am passing the correct values to GenericGF(). Can you suggest values what I should be using please?

This is the implementation I'm using: https://github.com/cho45/reedsolomon.js/blob/master/reedsolomon.js

Note that the implementation goes into an endless loop if I attempt to encode using GenericGF(2053, 11, 2) or GenericGF(7, 11, 2).

@tevador
Copy link
Contributor

tevador commented Jun 23, 2020

2053 is the primitive of the Galois Field. I was talking about the generator polynomial of the RS code, which is initialized here in the javascript implementation.

I managed to reproduce my result with the following code:

var rs = require('./reedsolomon.js');

var encoder = new rs.ReedSolomonEncoder(new rs.GenericGF(2053, 2048, 1));

const messageLength = 14;
const dataLength = 13;
var message = new Int32Array(messageLength);
for (var i = 0; i < dataLength; i++) message[i] = dataLength - 1 - i;

console.log('original');
console.log(Array.prototype.join.call(message));

encoder.encode(message, messageLength - dataLength);

console.log('rs coded');
console.log(Array.prototype.join.call(message));

output:

original
12,11,10,9,8,7,6,5,4,3,2,1,0,0
rs coded
12,11,10,9,8,7,6,5,4,3,2,1,0,358

Note that the coefficients are reversed compared to my implementation, but the values match.

@knaccc
Copy link

knaccc commented Jun 23, 2020

@tevador Thanks, this has helped me ensure my RS encoding matches yours.

I have another problem: I'm using your test seed test park taste security oxygen decorate essence ridge ship fish vehicle dream fluid pattern.

The first data word is park, which is word index 1282. I then unflag this word by adding 0x539 mod 2048, which gives me 1282+1337-2048=571. In binary, 571 is 01000111011. This means the reserved bits are 01000... but the reserved bits should all be zero, right? Am I missing something? The bits following the first 5 reserved bits are 111011, which are the first bits of the correct quantized timestamp 1110111101, so it looks like I'm almost getting the correct result, but not quite.

In my code, I get the entire 143 bits of the data (after unflagging) as:
01000111011110111100011100001011010011110010001110010000100110101010111001100110001100000101011110111110001111010000101000101100110110100001010 or 23bde385a791c84d5733182bdf1e85166d0a in hex.

Does that look right to you?

@tevador
Copy link
Contributor

tevador commented Jun 24, 2020

then unflag this word by adding 0x539 mod 2048

Addition in GF is XOR, so you have to do 1282^1337 = 59.

@knaccc
Copy link

knaccc commented Jun 24, 2020

@tevador Thanks, our implementations are perfectly compatible now! https://github.com/knaccc/monero-seed-js

I just thought I'd double check my understanding of RS with you (w.r.t. this scenario with a one word checksum):

  1. If exactly one particular word is entered incorrectly, and all others are correct, and if the incorrect word is still a valid electrum word, then that error will always be detected. Although the error will be detected, it will not be possible to know which word in the mnemonic was incorrect.

  2. Therefore the seed can only be corrected if the incorrect word position is known, and if there is only one incorrect word. In this circumstance, brute forcing the word from the electrum list will always find exactly one word that causes the mnemonic to validate (as an implication of point 1), and a false positive is impossible when brute forcing the word known to be incorrect.

@tevador
Copy link
Contributor

tevador commented Jun 24, 2020

@knaccc

  1. Correct. Two different phrases phrases differing in just one word can never have the same checksum.

  2. Correct. I should add that I'm using bruteforcing for error correction because we have just one check word. There are (more complex) algorithms that can correct errors efficiently for an arbitrary number of check digits. See for example Berlekamp–Massey algorithm

@knaccc
Copy link

knaccc commented Jun 26, 2020

I cross-checked the electrum words list against an English dictionary, and there is one word that stood out: "satoshi".

I'd imagine that we should be taking that word out and adding in something else... suggestions?

@hyc
Copy link
Collaborator

hyc commented Jun 26, 2020

If you want to claim electrum-compatible then we should be using their wordlist as-is, no?

@knaccc
Copy link

knaccc commented Jun 26, 2020

@hyc I'm not sure if anyone was suggesting that it would be beneficial to be "electrum-compatible". The only reason for using the electrum word list is that it contains 2048 words instead of 1626. We've already broken electrum compatibility by having 14 words instead of 12/13, a different type of checksum, and a 'coinflag' applied.

I just checked, and the non-English word lists for BIP39 do not include the word 'satoshi'. So we'd only be messing around with the English wordlist. I'd argue that normal people will not recognize the word 'satoshi', and so the presence of this word slightly hinders the ability of a normal person to write down the seed or communicate it verbally to another.

One could make the argument that someone implementing 14-word Monero seed functionality could easily make a mistake if they did not notice that we had changed the English wordlist slightly. But I think that's a difficult argument to make, since their implementation would not be able to validate seeds generated by other Monero implementations, and it'd be hard to pay enough attention to successfully implement the Reed Solomon algorithm yet miss the implementation notes about the English wordlist being different.

@xiphon
Copy link
Contributor

xiphon commented Apr 24, 2021

Pinging @tevador and others interested.
Please double check point 3, let me know if i'm missing something.

Current seed scheme is a two-way conversion, i.e. one can freely get a seed from wallet key on a wish.

Given that https://github.com/tevador/monero-seed PoC is a one-way conversion (i.e. you can't generate a seed from wallet keys):

  1. No way to migrate current wallets to new scheme. All monero users will have to generate new wallets to use new seed scheme.

  2. Requires to store the seed on disk and to load it on wallet startup to provide a way to print wallet seed (seed command).
    Currently we convert wallet key into mnemonic seed on the fly.

  3. I'm also somewhat concerned that it might affect security due to using only 128 instead of 256 bits of entropy.

    Attack monero-seed scheme Current scheme
    Brute-force ALL seeds 2128 2256
    Brute-force ALL key 2256 2256
    Recover 1 specific key (Pollard's Rho algo) 2128 2128

    * omitted guessing a few wallet birthday bits in monero-seed case

Opinions? Should we consider discussing some other seed scheme?

@tobtoht
Copy link
Collaborator

tobtoht commented Apr 24, 2021

Requires to store the seed on disk and to load it on wallet startup to provide a way to print wallet seed (seed command).

An elegant solution to this problem is to store the mnemonic seed in .keys. This way it will always be available in case the wallet cache is discarded.

omitted guessing a few wallet birthday bits in monero-seed case

Birthday bits are in addition to the 128-bit private key seed.

@xiphon
Copy link
Contributor

xiphon commented Apr 24, 2021

An elegant solution to this problem is to store the mnemonic seed in .keys. This way it will always be available in case the wallet cache is discarded.

The point is about the additional logic that looks redundant to me. We do store the keys and we will have to also store the seed basically in plaintext.

PS: yes. Obviously, .keys file is the only possible place to store the seed. Using wallet cache would be a mistake.

Birthday bits are in addition to the 128-bit private key seed.

That's exactly what i mean. You can't just count them in because actual range to brute force will be based on the implementation.

If 3) is a valid concern, a few extra bits barely change anything.

@tobtoht
Copy link
Collaborator

tobtoht commented Apr 24, 2021

basically in plaintext.

As I'm sure you know .keys is encrypted on disk, so I'm not sure what you mean by this. The mnemonic seed would benefit from identical protection as the private spend key, and it can even be encrypted in memory until it is needed similarly to the other keys.

I don't see how storing an additional value in .keys is an issue at all.

If 3) is a valid concern, a few extra bits barely change anything.

Yes, but there are no extra bits. Anyone attempting to brute force all possible spendkeys that can be generated with this seed scheme would just iterate over the 128-bit key seeds directly.

@rbrunner7
Copy link
Contributor

My thoughts about those issues:

1) is certainly unfortunate from an UX point of view, but I think we should take the long view here. If Monero really is successful as a currency it will probably live on for many years, if not decades, and all the wallets and all the users of the few years that already passed will become a small and therefore more and more unimportant minority over time.

2) does not worry me one bit, frankly. Sometimes UX improvements lead to more effort needed in code, but so what? On other fronts person months are spent, e.g. to make transactions somewhat smaller and verify somewhat faster to improve UX, so surely extending the things stored in the .keys file somewhat should not matter too much. I sneaked in something there for the MMS, without anybody barely noticing, and it was programmed in half a day.

3) Seems to me we are not talking about a key space of 128 bits versus one of 256 bits in isolation, but we are talking about Monero wallets in particular. And here I really wonder whether there is a viable method to even make, say, 1,000,000 attempts, to brute-force your way into a wallet in a reasonable time. How would that work, in detail? Seems to me brute-forcing stands and falls with a fast method to check whether guesses are correct. Is this given here?

@tevador
Copy link
Contributor

tevador commented Nov 21, 2021

I revisited my PoC mnemonic seed and reimplemented it as a C-library. Should be pretty much plug and play and ready to be integrated into simplewallet.cpp.

https://github.com/tevador/polyseed

Some of the improvements I made:

  • Added 2 extra words to increase the security margin. The total seed space (including the birthday bits) now exceeds 160 bits, which is similar to the total number of possible Bitcoin addresses.
  • Better encoding to hide the non-random bits (seeds generated in the same month no longer have the same word in the 2nd position).
  • All BIP-39 official wordlists are supported, including some of the special features such as prefix matching and accent-insensitivity.
  • The seed can be serialized in 32 bytes (the library can decode this back into the mnemonic phrase). Should fit easily into the .keys file.

The only concern that remains is the one-way conversion. There is no way around it, it's simply the price to pay for a more compact mnemonic seed. But since we have to keep supporting the 25-word seed anyways, users don't have to generate new wallets. Legacy wallets would still have to input the wallet birthday manually, but eventually (as the blockchain grows), this prompt could be removed as the fraction of outputs created during the old seed scheme becomes negligible (I think this is the point @rbrunner7 was making).

@CryptoGrampy
Copy link

Hi @tevador -

It's been nearly a near since we've seen any movement on this topic; do you have any concerns with Polyseed/have there been any issues with its implementation in Feather? Do you think it should be added to GUI/CLI wallets?

@tevador
Copy link
Contributor

tevador commented Nov 2, 2022

I'm not aware of any issues with Polyseed. AFAIK it's still on the roadmap for Seraphis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests