-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Encode restore height as 26th word of the mnemonic seed #6639
Comments
My suggestion is to encode it as follows: the position of the word in the wordlist * 21915 = starting block height. 21915 blocks is about a month's worth of blocks (half a month when the block time was 1 minute), so it gives us a good 135 years worth of coverage. |
This is too obscure, just use |
As discussed on IRC, summarizing it here for broader publicity and discussion: I am in full favor of adding one seed word to encode restore height. But if we touch the seed system and add a "new" kind of seed encoding the restore height, I vote for taking the chance and add two more worthwhile changes at the same time. (Changing anything with seeds will be a larger endeavor, and IMHO it would be a strategic mistake to come back to this with "new new" seeds a year later or so). The "checksum" as implemented with the checksum word being simply a copy of one of the other words is very weak i.e. it does not catch a lot of errors. This can stay a single checksum word, but it should be calculated using a much more robust algorithm going over all words of the seed. Furthermore, one more word should get added as the first word of the seed, encoding a seed version. The words used for the version should be different from all other seed words so you can reliably detect whether the first word given is such a version word or not. This will enable a very robust UX. You can for example generate useful error messages if somebody enters only the first 25 words of a "new" seed for whatever reason, be it conviction that "more than 25 words are wrong", or input forms just not allowing for more words because not yet reworked / upgraded for "new" seeds. Seed versions would also allow for adjustments in the word list, for whatever crazy reasons that may pop up, like some words becoming "politically incorrect", or more or less banned outright e.g. for Chinese seeds. IMHO we should stick with words for both version and restore height encoding. Why? Because if it is anything else people will recognize it as something special and because of this some people may not treat them with the same care as the other words and e.g. simply not enter them, based on false assumptions like "I thought that's not part of the seed proper". Maybe we should even go as far as avoiding that the exactly same word gets added as the version word to each and every Monero "new" seeds, possibly for years, because again people might get confused whether that seemingly constant world really belongs to the seed and is really necessary. People also could fear that Monero seeds are weaker than other coins' seed because of a word being constant. This could be solved by using only the first letter of the word as the version and e.g. randomly chose from several words starting with that letter. |
Maybe instead of adding a new word we can use first letter of each word to encode the timestamp in days(upper case=1, lower case=0). For example using (unix timestamp / (60 * 60 * 24))
will be
|
@nim4 clever idea, but having helped people who have inherited wallets from a deceased spouse you can bet that case sensitivity never factored into it. |
Regarding the seed version, why do we want to pick a word that isn't on the wordlist? We could just pick a random word, and use the same offset in other wordlists, which means no additional translation work. I've also tossed around the idea of using a single word for both the version and the block height offset chunk. We could, for instance, use the first 3 bits for the version and last 7 bits for the offset (128 possible offsets, so maybe group it per year). Alternatively, if we really want to eek as much out of it as possible, we could divide the wordlist into 5 groups (so maximum of 5 different versions for this format), and then use the offset in each group, which would give us 325 words per group, so we each offset would be ~3.5 months. |
Another idea would be to use 27 word seed. |
|
Because it has many advantages to be able to reliably recognize the word as a version word, or in reverse see that the first given word is not a version for sure. This allows to detect all kinds of possible confusions, wrongly entered seeds, cut-off seeds etc. I think especially with something as critical and sensitive as seeds we want our UX (and the transition from "old" seeds to "new" seeds) to be as robust as possible. |
How on earth will the cli know the top height if i just want to generate a cold wallet without a daemon running . The above discussion requires an already connected cli wallet to an already fully synced daemon (maybe get the date from the system's timestamp? wouldnt that be dangerous?) |
You are right, I forgot to mention this from the IRC discussion: There are various situations where restore height is not known. Beside your cold-wallet example, programs generating random seeds offline come to mind. 0 must therefore be a valid value for the encoded restore height, with a meaning of "restore height unknown". This can then be used e.g. to prompt for the restore height when restoring. |
Does the restore height have to be part of the check-summed seed? |
@trasherdk a single word is only 10 bits of entropy, so can't encode the actual restore height, but yes - this proposal is about adding an additional word for the restore height, plus a 27th word for versioning. |
The 25 word seed is pretty much set in stone for all eternity, unless you are willing to abandon all those paper-wallets out there, hidden in madrases or something. Right? |
@trasherdk I don't understand how this affects paper wallets? It's not like the old seed format would no longer be supported, there'd just be a new, default seed format. We already did this with the old English and new English wordlists, the old English wordlist still exists and you can restore an old paper wallet any time you want. |
Yes. There will be "new" seeds and "old" seeds with us forever. That's one reason why I am so vocal in favor of a system that is able to distinguish in a crystal-clear way between both sorts. The first 25 words of a "new" seed should better not be a valid "old" seed for a system that, for whatever reason, never learned about "new" seeds. New version words outside the current word lists would nicely take care of this, because they make "new" seeds flat-out invalid for an "old" system. You won't be able to do something that only looks like a correct restore with the first 25 words of a "new" seed on an old system. |
@rbrunner7 it's already 2 words longer than the "old" seeds, so I don't think we need to worry about validity. Also if we move the checksum to the end, and make it a checksum valid for the whole of the new seed (and not just the key portion), then it'll fail checksum validation on an older wallet anyway. I would like to keep the discussion going around versioning, as I've not yet heard an argument for an out-of-band word that makes sense to me, or even an argument for putting the version in an entire word instead of using the extra bits we gain from adding 1 word for both versioning AND initial block offset chunk. |
Okay, so far. Is there any reason the 26st. word cant be |
Excellent point. Or as @asymptotically508 wrote on reddit, "I just write the date on the same paper as the seed.". |
Fair enough. Just for completeness sake: The GUI wallet currently does not insist on the 25th / checksum word, it also accept the "naked" 24 words. Not sure about the CLI wallet. |
Sure, but this assumes that people know about restore heights and their importance in the first place. Count the people on the Monero subreddit that don't and e.g. fail to correctly restore a wallet. (If they knew, and just did not know the correct restore height, they could easily go back far enough to be safe. It seems they often don't.) Which is an important part of the motivation to touch the seed system and integrate the restore height, to do away with such problems as best as possible. |
@rbrunner7 I agree, writing the date down and getting it wrong later can create problems. I'll therefore revert to proposing the much more foolproof solution of changing nothing with the seed and just making the wallet scan from the current block backwards. |
Maybe I stupidly overlook something, but I have no idea how you would know when to stop scanning. How can you be sure my first transaction is not in block #1? |
What does it matter whether the wallet still has to scan the entire blockchain? If this is about UX, all that matters is that we show people what looks like their balance as quickly as possible. If Monero has an Eternal September then this solves the waiting problem for most. |
Interesting approach which I might be able to agree with, if it were not for the weak checksum problem and the advantages that some sort of versioning brings as additional arguments to improve seeds. |
Yes, that's not a word, and can't be encoded into many physical wallets (eg. Cryptosteel). |
I agree your proposal is better, if we were starting from scratch. I just don't think that due appreciation has been given to the confusion that will be caused when all of the documentation and tutorials and paper wallets suddenly have to start talking about 25 vs 26 word seeds. |
A difficult assessment for sure. I hope for many people voicing their opinions here and on the Monero subreddit. I think Monero might have it easier here than many other coins because users were subjected to frequent changes anyway so far, with all our hardforks ... |
Humbly and just to give my two pennies worth
No extra word, no confusion, monero has already too many seed words compared to btc clones. |
I don't buy the "let's not add extra words" story - 25, 26, or 27 words makes no difference to the end user. I also don't think that trying to force the user to write down a Unix timestamp is useful either, as that genuinely is an additional piece of out-of-band data that users will not always be able to write down (eg. if they use a CryptoSteel), nor can we communicate to them easily what "needlessly scanning the entire chain" actually means. I would encourage people to have a non-technical friend try use the Monero GUI, and you'll quickly see how frightening mnemonic seeds are already. If we can make them easier to use then that's a win. And to be sure, abstracting any complexity around figuring out what seed it is will be abstracted away from the user, just like we don't ask them to specify the seed language before entering it in. They just type in their seed, and the wallet will figure out everything else. |
If introducing a new seed system anyway, why not also introduce a 49/50 word seed standard and have the private view key generated non-deterministically if using that seed type? |
@Adreik there's no real benefit from that, it's not like you can practically crack the spend key if you have the view key. Plus you can always generate the two keys non-deterministically right now using the CLI wallet, and back them up however you want. Someone is welcome to write a Javascript mnemonic encoder for such a task for the 2 people that will use it. |
@tevador It occurred to me that it would be useful if when a user creates a couple of new wallet seeds, those new seeds are likely to have different first words. This will make them easier to distinguish from each other, and also has the side effect of not falsely creating the impression that Monero seeds are supposed to always start with a particular word. |
@tevador I've just created a javascript implementation of your code. The test file shows how to use it. Run I'm unable to parse your example test mnemonic though, so there must be a small difference between our implementations. It could be because we are using different Reed Solomon implementations. According to your implementations, are these supposed to be valid RS encodings? A quick way to test the RS JS implementation I'm using is to run this code:
When you're applying the coin flag, you're doing that to the second mnemonic word, right? |
You mean if a program gets an order to produce 10 seeds, has produced 9 already, and then goes on to produce the 10th one, it would throw away candidates that are too similar to any of the previous ones and keep generating randomly until it get a really "different" seed? Right now I can't see how, for single seeds produced independently, you can do better than just using a good source of randomness and hope for different first words. |
If you test my implementation, you will see that the first word changes for different seeds. That's because the first word is actually the checksum. The second word encodes the flags and the high bits of the wallet birthday, so it will stay mostly the same for seeds created in the same year. My implementation orders the coefficients in ascending order, i.e. the constant term is first, then the linear term etc. This may also explain why you are getting different results. Try reversing the order of the words.
Correct. |
I had incorrectly assumed that the seed data would start with the 00000 reserved bits, followed by the 0000000000 birthday bits. This would have meant that all seeds created in the same month would have always started with the same first word. Now I see that tevador's implementation is ordering the checksum word such that the first word would in fact be evenly distributed. |
@tevador I'm not a C coder, so I'm struggling to test your RS library. Please could you tell me: If I start with the data I've tried altering my code so that it looks like either |
I'm getting Have you tested |
@tevador I'm out of my depth here with understanding the Reed Solomon implementation enough to know what my 'generator polynomial' is. According to https://www.mathworks.com/help/comm/ref/rsgenpoly.html the default primitive polynomial for a Galois Field I'd initialized my RS encoder with I'm not sure if I am passing the correct values to GenericGF(). Can you suggest values what I should be using please? This is the implementation I'm using: https://github.com/cho45/reedsolomon.js/blob/master/reedsolomon.js Note that the implementation goes into an endless loop if I attempt to encode using |
I managed to reproduce my result with the following code: var rs = require('./reedsolomon.js');
var encoder = new rs.ReedSolomonEncoder(new rs.GenericGF(2053, 2048, 1));
const messageLength = 14;
const dataLength = 13;
var message = new Int32Array(messageLength);
for (var i = 0; i < dataLength; i++) message[i] = dataLength - 1 - i;
console.log('original');
console.log(Array.prototype.join.call(message));
encoder.encode(message, messageLength - dataLength);
console.log('rs coded');
console.log(Array.prototype.join.call(message)); output:
Note that the coefficients are reversed compared to my implementation, but the values match. |
@tevador Thanks, this has helped me ensure my RS encoding matches yours. I have another problem: I'm using your test seed The first data word is In my code, I get the entire 143 bits of the data (after unflagging) as: Does that look right to you? |
Addition in GF is XOR, so you have to do |
@tevador Thanks, our implementations are perfectly compatible now! https://github.com/knaccc/monero-seed-js I just thought I'd double check my understanding of RS with you (w.r.t. this scenario with a one word checksum):
|
|
I cross-checked the electrum words list against an English dictionary, and there is one word that stood out: "satoshi". I'd imagine that we should be taking that word out and adding in something else... suggestions? |
If you want to claim electrum-compatible then we should be using their wordlist as-is, no? |
@hyc I'm not sure if anyone was suggesting that it would be beneficial to be "electrum-compatible". The only reason for using the electrum word list is that it contains 2048 words instead of 1626. We've already broken electrum compatibility by having 14 words instead of 12/13, a different type of checksum, and a 'coinflag' applied. I just checked, and the non-English word lists for BIP39 do not include the word 'satoshi'. So we'd only be messing around with the English wordlist. I'd argue that normal people will not recognize the word 'satoshi', and so the presence of this word slightly hinders the ability of a normal person to write down the seed or communicate it verbally to another. One could make the argument that someone implementing 14-word Monero seed functionality could easily make a mistake if they did not notice that we had changed the English wordlist slightly. But I think that's a difficult argument to make, since their implementation would not be able to validate seeds generated by other Monero implementations, and it'd be hard to pay enough attention to successfully implement the Reed Solomon algorithm yet miss the implementation notes about the English wordlist being different. |
Pinging @tevador and others interested. Current seed scheme is a two-way conversion, i.e. one can freely get a seed from wallet key on a wish. Given that https://github.com/tevador/monero-seed PoC is a one-way conversion (i.e. you can't generate a seed from wallet keys):
Opinions? Should we consider discussing some other seed scheme? |
An elegant solution to this problem is to store the mnemonic seed in
Birthday bits are in addition to the 128-bit private key seed. |
The point is about the additional logic that looks redundant to me. We do store the keys and we will have to also store the seed basically in plaintext. PS: yes. Obviously,
That's exactly what i mean. You can't just count them in because actual range to brute force will be based on the implementation. If 3) is a valid concern, a few extra bits barely change anything. |
As I'm sure you know I don't see how storing an additional value in .keys is an issue at all.
Yes, but there are no extra bits. Anyone attempting to brute force all possible spendkeys that can be generated with this seed scheme would just iterate over the 128-bit key seeds directly. |
My thoughts about those issues: 1) is certainly unfortunate from an UX point of view, but I think we should take the long view here. If Monero really is successful as a currency it will probably live on for many years, if not decades, and all the wallets and all the users of the few years that already passed will become a small and therefore more and more unimportant minority over time. 2) does not worry me one bit, frankly. Sometimes UX improvements lead to more effort needed in code, but so what? On other fronts person months are spent, e.g. to make transactions somewhat smaller and verify somewhat faster to improve UX, so surely extending the things stored in the .keys file somewhat should not matter too much. I sneaked in something there for the MMS, without anybody barely noticing, and it was programmed in half a day. 3) Seems to me we are not talking about a key space of 128 bits versus one of 256 bits in isolation, but we are talking about Monero wallets in particular. And here I really wonder whether there is a viable method to even make, say, 1,000,000 attempts, to brute-force your way into a wallet in a reasonable time. How would that work, in detail? Seems to me brute-forcing stands and falls with a fast method to check whether guesses are correct. Is this given here? |
I revisited my PoC mnemonic seed and reimplemented it as a C-library. Should be pretty much plug and play and ready to be integrated into simplewallet.cpp. https://github.com/tevador/polyseed Some of the improvements I made:
The only concern that remains is the one-way conversion. There is no way around it, it's simply the price to pay for a more compact mnemonic seed. But since we have to keep supporting the 25-word seed anyways, users don't have to generate new wallets. Legacy wallets would still have to input the wallet birthday manually, but eventually (as the blockchain grows), this prompt could be removed as the fraction of outputs created during the old seed scheme becomes negligible (I think this is the point @rbrunner7 was making). |
Hi @tevador - It's been nearly a near since we've seen any movement on this topic; do you have any concerns with Polyseed/have there been any issues with its implementation in Feather? Do you think it should be added to GUI/CLI wallets? |
I'm not aware of any issues with Polyseed. AFAIK it's still on the roadmap for Seraphis. |
The restore height is currently a value that has to be entered manually for wallets that are restored from either the keys or the mnemonic seed. The wallet will essentially ignore blocks (only pulling block hashes) before the restore height and start scanning (looking for transactions that belong to the wallet) from the restore height block.
User experience is degraded if the user accidentally sets a restore height that is too 'high' (i.e. after the first transaction to the wallet), as the wallet will 'miss' certain or all transactions, thereby causing an improper balance (as well as transaction history) to be displayed.
In order to improve user experience, we could encode an approximate restore height as additional word of the mnemonic seed. The restore height would then be set automatically upon restoring the wallet, thereby ensuring users will not inadvertently set an erroneous restore height.
I personally do not see many drawbacks of this proposal. Guides will have to be updated to reflect the new format and users need to be informed. Users further, initially, may be slightly confused due to two different seed formats being present. However, I think ultimately the proposal is net beneficial to user experience.
The text was updated successfully, but these errors were encountered: