-
Notifications
You must be signed in to change notification settings - Fork 380
Mnemonics Unraveled
Libbitcoin support for mnemonic wallet seed encoding began with Electrum v1. Later came BIP39, driven by the fine folks behind Trezor, which we believed Electrum was adopting. When the fine folks behind Electrum decided against BIP39, we found ourselves with three implementations. We had dropped Electrum v1 in the expectation that BIP39 would become sufficient. Later we added Electrum but found it necessary to also restore Electrum v1. It is not possible to properly implement Electrum mnemonic support without also implementing Electrum v1 and BIP39 mnemonics.
An overhaul of our mnemonic implementations was well overdue. What was anticipated to require one week required over a month of full time work. Before I forget the various lessons learned, I decided to write them down here. The information is all out there, somewhere. But ultimately it required digging through a lot of Python and C code. Wallet seeds are not something for a developer to take lightly, and code is always authoritative. Eventually I found myself sifting through Python internals, a deeper rabbit hole than I expected.
I will state for the record that I truly appreciate both Electrum and Trezor. Otherwise I would not have spent the time to provide comprehensive support for all three of these encodings. These observations are provided for my own record and to possibly aid others who may at some point find themselves in that same rabbit hole. When one goes this deep into implementation, interesting discoveries abound.
A universally-unique natural language.
Libbitcoin refers to a languages by the IANA subtag standard.
In linguistics a token
is an "individual occurrence of a linguistic unit in speech or writing".
Tokens contain no whitespace code points.
Tokens may or may not be normal form.
Electrum allows seed generation from tokens (i.e. non-dictionary words) in normal form.
A dictionary
is a standard ordered set of distinct reference tokens of a single language.
There may be more than one dictionary per language.
Dictionaries of the same or distinct languages may intersect.
A dictionary defines its word order, which may or may not be a lexicographic sort.
An interpreter
is a set of same length (word-count) dictionaries of distinct languages, each identified by language.
An interpreter maps between entropy and mnemonic forms, given a specified or detected language.
There is no necessary standard defining the set of interpreter dictionaries.
A word
is a dictionary token.
A mnemonic
is an ordered set of words from a common dictionary, conforming to standard size and checksum constraints.
Electrum v1 does not implement checksum constraints.
A mnemonic may be fully contained by multiple dictionaries.
A mnemonic may be referred to as
recovery seed
by some implementations.
A whitespace
character is a standard character with a glyph of no visible pixels.
A sentence
is a mnemonic serialized as a sinistrodextral string of its words with whitespace delimiters.
An encoding
is a standard bidirectional map between any mnemonic and its numeric representation.
The Electrum v1 encoding is (inadvertently) not fully bidirectional.
A normal form
is a standard word, sentence or passphrase character representation.
A single glyph may have multiple distinct code points, and many distinct glyphs may be rendered similarly or identically.
Word containment by a dictionary is determined by normal form equality.
Its entropy
is the numeric representation of a mnemonic.
Both a mnemonic and its entropy represent the same entropic value.
A passphrase
is arbitrary text that may be combined with a mnemonic in the formation of a seed.
Electrum v1 does not implement a passphrase.
A seed
is a secret number, derived using a standard one-way hash from a mnemonic.
A master private key
is an secp256k1 private key, obtained from a seed in a standard manner, allowing spending.
BIP39 wallets typically derive this (and a chain code) from the seed in accordance with BIP32.
Electrum serializes the seed as a secret and chain code in accordance with BIP32 serialization.
Electrum v1 maintains this as a 32 byte value.
A master public key
is a secp256k1 public key, derived in the standard one-way manner from the master private key, allowing receiving.
Electrum and typical BIP39 wallets derive this in accordance with BIP32.
Electrum v1 maintains this as a 64 byte value (uncompressed, without prefix).
The following standards are implied by the above terminology.
- Language (identification)
- Dictionary (words and order)
- Mnemonic (length and checksum)
- Whitespace (delimiters)
- Normal Form (word, sentence, and passphrase)
- Encoding (entropy mapping)
- Seed (derivation)
- Master Private Key (derivation)
- Master Public Key (derivation)
The reliance of Electrum and BIP39 on Unicode word and passphrase normalization is an inherent risk. Unicode implementations are large and complex. Trivial conversions in ASCII, such as lower-casing, become treacherous in Unicode.
"When two applications share Unicode data, but normalize them differently, errors and data loss can result. In one specific instance, OS X normalized Unicode filenames sent from the Samba file and printer sharing software. Samba did not recognize the altered filenames as equivalent to the original, leading to data loss. Resolving such an issue is non-trivial, as normalization is not losslessly invertible."
For this reason we have implemented Libbitcoin mnemonics without a hard dependency on Unicode normalization. The Electrum v1, Electrum, and BIP39 classes do not require Unicode normalization unless a non-ASCII passphrase is provided. If the library is compiled without WITH_ICU defined all features remain available with the exception that seed passphrases are ASCII limited.
For the same reason Libbitcoin does not support Electrum token-based seeding. All words must correspond to a dictionary. When WITH_ICU is defined, words are Unicode normalized before comparison, to improve the chance of matching. Ideally an implementation provides a dictionary-based word selector, making this unnecessary. If WITH_ICU is undefined then word normalizations are ASCII limited, though pre-normalized non-ASCII words will match the dictionary.
A mnemonic sentence must be parsed into a list of words for dictionary matching and seed generation. Similarly a mnemonic is often emitted in sentence form for portability.
Users | Developers | License | Copyright © 2011-2024 libbitcoin developers
- Home
- manifesto
- libbitcoin.info
- Libbitcoin Institute
- Freenode (IRC)
- Mailing List
- Slack Channel
- Build Libbitcoin
- Comprehensive Overview
- Developer Documentation
- Tutorials (aaronjaramillo)
- Bitcoin Unraveled
-
Cryptoeconomics
- Foreword by Amir Taaki
- Value Proposition
- Axiom of Resistance
- Money Taxonomy
- Pure Bank
- Production and Consumption
- Labor and Leisure
- Custodial Risk Principle
- Dedicated Cost Principle
- Depreciation Principle
- Expression Principle
- Inflation Principle
- Other Means Principle
- Patent Resistance Principle
- Risk Sharing Principle
- Reservation Principle
- Scalability Principle
- Subjective Inflation Principle
- Consolidation Principle
- Fragmentation Principle
- Permissionless Principle
- Public Data Principle
- Social Network Principle
- State Banking Principle
- Substitution Principle
- Cryptodynamic Principles
- Censorship Resistance Property
- Consensus Property
- Stability Property
- Utility Threshold Property
- Zero Sum Property
- Threat Level Paradox
- Miner Business Model
- Qualitative Security Model
- Proximity Premium Flaw
- Variance Discount Flaw
- Centralization Risk
- Pooling Pressure Risk
- ASIC Monopoly Fallacy
- Auditability Fallacy
- Balance of Power Fallacy
- Blockchain Fallacy
- Byproduct Mining Fallacy
- Causation Fallacy
- Cockroach Fallacy
- Credit Expansion Fallacy
- Debt Loop Fallacy
- Decoupled Mining Fallacy
- Dumping Fallacy
- Empty Block Fallacy
- Energy Exhaustion Fallacy
- Energy Store Fallacy
- Energy Waste Fallacy
- Fee Recovery Fallacy
- Genetic Purity Fallacy
- Full Reserve Fallacy
- Halving Fallacy
- Hoarding Fallacy
- Hybrid Mining Fallacy
- Ideal Money Fallacy
- Impotent Mining Fallacy
- Inflation Fallacy
- Inflationary Quality Fallacy
- Jurisdictional Arbitrage Fallacy
- Lunar Fallacy
- Network Effect Fallacy
- Prisoner's Dilemma Fallacy
- Private Key Fallacy
- Proof of Cost Fallacy
- Proof of Memory Façade
- Proof of Stake Fallacy
- Proof of Work Fallacy
- Regression Fallacy
- Relay Fallacy
- Replay Protection Fallacy
- Reserve Currency Fallacy
- Risk Free Return Fallacy
- Scarcity Fallacy
- Selfish Mining Fallacy
- Side Fee Fallacy
- Split Credit Expansion Fallacy
- Stock to Flow Fallacy
- Thin Air Fallacy
- Time Preference Fallacy
- Unlendable Money Fallacy
- Fedcoin Objectives
- Hearn Error
- Collectible Tautology
- Price Estimation
- Savings Relation
- Speculative Consumption
- Spam Misnomer
- Efficiency Paradox
- Split Speculator Dilemma
- Bitcoin Labels
- Brand Arrogation
- Reserve Definition
- Maximalism Definition
- Shitcoin Definition
- Glossary
- Console Applications
- Development Libraries
- Maintainer Information
- Miscellaneous Articles