-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to French wordlist #6936
Conversation
IMO this would fix #6652 |
Re: the discussion in #6652 about getting French words from Wikipedia, I actually, as an experiment, made a separate list by this method as well. I did leave words with accents in the list, though -- that's a usability question I suppose. Happy to find 7,776 words from French Wikipedia without accents if we prefer that. As pointed out in that issue, this process could be used to create word lists in other languages beyond English and French. Though of course it'd be important to have an expert in the given language review the list before use? |
Thanks @sts10, this looks pretty good. I skimmed through bitcoin/bips#152 and saw that they put a lot of effort into selecting good words and pruning out words that are too similar or obscure so I think we probably don't need to intense of a review from a French speaker, but I'm sure we can find someone for that. I am slightly worried about the copyright. https://github.com/bitcoin/bips/blob/master/bip-0039/bip-0039-wordlists.md#french indicates that this isn't a pure mechanical list, it's been pretty well curated (related: https://opensource.stackexchange.com/a/10679). But at the same time I don't think Bitcoin intended to create a proprietary list of words? Surely it's shipped in one of their code repos somewhere that has a clear license statement? |
Totally understandable!
You'd think so! Unfortunately it doesn't seem like any of the other code projects under the "Bitcoin" GitHub organization actually contain copies of the BIPS-0039 word lists within them. I don't know enough about Bitcoin to know if these are the only "official" repos, or if there's even such a concept of "official" in Bitcoin world. For what it's worth, the BIPS repo lists a "reference implementation" for generating passphrases and links to this Python repo, which is MIT-licensed. This project does include copies of the word lists themselves. Maybe that's enough for us? I doubt they'd proudly point to a project that violates their terms? There are also plenty of "other implementations" listed that use the MIT License. I'll keep looking this week, maybe ask in a forum. |
I've learned a bit more from BIP-2: It seems that individual BIPS can be and are licensed under different licenses. BIP-2 provides a list of recommended licenses. Interesting there are also "Not recommended, but acceptable licenses," implying that individual BIP authors have some leeway in choosing how their work is licensed. Sadly, BIP-39 seems to be one of the BIPS that does NOT (currently) specify a license. Bummer! That said, I still feel hopeful that, given that the word lists are included in their reference implementation, which is MIT-licensed, we're probably safe. |
(If we're worried enough, we can also file an issue or contact them asking for explicit permission and/or offering attribution as per their request. It would be great to land these changes, and there's some good will around our project so I'd hope it would be a well-received request.) |
So just FYI, I'll fess up a bit here and admit that last week I very clumsily tried to slap a permissive license on the BIPS word lists (their repo doesn't allow issues, so I jumped straight to a PR). I should have thought through the legal impossibilities of such a move. Hopefully I didn't antagonize their maintainers! |
That's good enough for me IMO, and I'll double-check with the rest of the team that there are no concerns. Could you add a commit updating the README with the new source information (MIT-licensed trezor/python-mnemonic)? |
Ah good call. I've done so, in the way I guessed fits. Let me know if that's what you had in mind or not. |
The MIT-licenced trezor wordlist is identical to the BIPS one. (And the original PR pulls in all said lists in one go.) I think it's OK to follow their lead in this respect. |
* Add words from the BIP-0039 wordlist contained in trezor/python-mnemonic (MIT licensed) * Remove all words with less than 2 characters * Remove a handful of words from a list of profane French words found online This will not affect existing passphrases and only be used for newly generated ones. Fixes freedomofpress#6652.
4c01c91
to
888f73d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I squashed the two commits and copied what you wrote in the PR description for the commit message too. LGTM! Will merge once CI gives the green check.
Thanks for the contribution @sts10 :)
Status
Ready for review
Description of Changes
I, a person who doesn't speak French, am proposing the following changes to SecureDrop's French word list:
After these changes, my proposed word list has a healthy 7,886 words, meaning each word adds 12.945 bits of entropy to a passphrase (the existing list has 7,384 words, meaning each word provides 12.85 bits).
Note that my proposed list, like the word list it is replacing, is not uniquely decodable. In practical terms, this means that a separator like a hyphen or space must be used between words of all generated passphrase. I'd be happy to create a uniquely decodable list if that is desired.
Changes proposed in this pull request:
One concern might be the licensing of the BIPS French word list. I can't find any licensing information in what I assume is the relevant Github repo.
Testing
I don't think this PR needs testing, but I guess you could do generate some French passphrases to make sure nothing weird happens. Also, might be wise to have a French speaker give the added words a look!
Deployment
Any special considerations for deployment?
Changing the passphrase word list shouldn't be too hard to deploy. Existing users, who have words in their passphrase that I'm proposing to remove, shouldn't be affected. New users will get a series of words from the new list to make a hopefully more memorable passphrase for French speakers.
Checklist
Choose one of the following:
Choose one of the following: