Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rpc wallet protection endpoints #4214

Merged
merged 24 commits into from
May 18, 2020

Conversation

ghubstan
Copy link
Contributor

@ghubstan ghubstan commented Apr 29, 2020

Implemented lockwallet, unlockwallet, removewalletpassword, and setwalletpassword methods with basic error handling.

Also added basic error handling to the existing getbalance method, and removed unused BalancePresentation from CoreAPI.

Resolves #4198

Notes

I didn't spend time second-guessing maintainers about the ordering of rpc service defs & methods in grpc.proto and GrpcServer, so changes might need to be made in that regard. It makes sense to group wallet protection related methods together, but that is inconsistent with other alphabetically ordered service and method names.

There is also a getbalance error handling bug that needs fixing: Error: null is printed to the :cli console when the daemon is losing peers (SOCKET_TIMEOUT).

Bats test cases for these methods are ready to push in a new PR after (if) 4209 is merged.

Implemented lockwallet, unlockwallet, removewalletpassword, and
setwalletpassword methods with basic error handling.

Also added basic error handling to existing getbalance method,
and removed unused BalancePresentation from CoreAPI.

TODO:  update help text
@ghubstan ghubstan marked this pull request as draft April 29, 2020 22:43
@ghubstan
Copy link
Contributor Author

ghubstan commented Apr 29, 2020

There is a bug needing a fix and test case for CliMain at line 174:
if (nonOptionArgs.size() < 2) should be if (nonOptionArgs.size() < 3).
FIXED

@ghubstan ghubstan marked this pull request as ready for review April 30, 2020 15:20
@cbeams
Copy link
Contributor

cbeams commented May 1, 2020

Note that I changed the description from "To close task list in #4198" to "Resolves #4198" so that this PR shows up as "linked" in #4198 and so that it will in fact close #4198 when merged. See https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword.

Copy link
Contributor

@cbeams cbeams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NACK. @ghubstan, I have not comprehensively reviewed this yet, but it's clear so far that error handling will need some rework. You've built in success and error_message fields into the new *Reply definitions, but this is not idiomatic gRPC usage. gRPC provides a first-class Status object, which should be accessed via StatusRuntimeExceptions thrown from the server side and caught on the client. We're doing this already with invalid passwords, but we now need to do it more consistently from our services.

See https://grpc.io/docs/tutorials/basic/java/#calling-service-methods and https://github.com/grpc/grpc/blob/master/doc/statuscodes.md for details.

@cbeams
Copy link
Contributor

cbeams commented May 1, 2020

Also, and it's my bad for not specifying this, we probably should have cut this up into multiple PRs implementing one endpoint at a time, or at least separate commits for each. I'm not suggesting that you re-do everything in separate PRs/commits here, but it does mean more re-work when we find something like the Status issue above. If we'd caught that in a single endpoint implementation, the other's wouldn't have needed rework. Another reason to do one PR per endpoint is from a security review perspective. The less code there is to look at, the more likely it is that we notice subtle problems.

@ghubstan ghubstan marked this pull request as draft May 1, 2020 19:13
This change removes non-idiomatic gRPC *Reply proto message fields.
The client should not receive success/fail values from server methods
with a void return type, nor an optional error_message from any server
method.  This change improves error handling by wrapping an appropriate
gRPC Status with a meaningful error description in a StatusRuntimeException,
and placing it in the server's response StreamObserver.  User error
messages are mapped to general purpose gRPC Status codes in a new ApiStatus
enum class.  (Maybe ApiStatus should be renamed to CoreApiStatus.)
@ghubstan
Copy link
Contributor Author

ghubstan commented May 1, 2020

Notes about commit 2a9d1f6 (WIP)

@cbeams,

I know you will look at all of the changes, but please review my use of the Tuple2 return type used in CoreApi? Wrapping the return value and an ApiStatus code in a single return value -- for void return types too, unfortunately -- seemed like a better idea than throwing Exceptions to be caught by GrpcServer, to then be transformed into gRPC StatusRuntimeExceptions for the client. I don't know what your preference is. I prefer to make the changes after you ask for them.

And there is a lot of redundant error handling code in GrpcServer:

if (!result.second.equals(ApiStatus.OK)) {
	StatusRuntimeException ex = new StatusRuntimeException(result.second.getGrpcStatus()
	        .withDescription(result.second.getDescription()));
	responseObserver.onError(ex);
	throw ex;
}

There are only service classes in GrpcServer, no private methods yet. I don't know if you want these duplicate code blocks factored out.

@cbeams
Copy link
Contributor

cbeams commented May 2, 2020

@ghubstan, in preparation to more fully review the use of Tuple2 and the other things you've asked about above, I've committed 39c868f which refactors the way we're declaring wallet-related grpc services. Have a look, and I'll get back to my review here as soon as I can. In short, I do want to eliminate Tuple2 and go with exceptions, but I need to think about it / try things out a bit further myself.

@cbeams
Copy link
Contributor

cbeams commented May 2, 2020

I doubt, by the way, @ghubstan, that the way we want to implement unlockwallet is by actually unencrypting the wallet with removewalletpassword. If the Bisq node were to be shut down before the timeout expires, it would be without a wallet password at all when it starts up again. I am not sure exactly how unlocking the wallet works when using the GUI, but I would guess it doesn't work this way. Perhaps you could look into that in the meantime, while I continue my review?

@cbeams
Copy link
Contributor

cbeams commented May 2, 2020

@ghubstan, regarding error handling, I've landed on an exception-based solution I'm happy with so far, but not quite ready to commit. Just FYI.

cbeams added 2 commits May 3, 2020 15:46
Previously, each wallet-related method was implemented with its own grpc
service. There is no need to do this, as a grpc service may declare
multiple rpc methods. This commit refactors everything wallet-related
into a single GrpcWalletService and also extracts a CoreWalletService
from CoreApi in order to avoid the latter becoming overly large.
Ideally, there would be no need for an abstraction in bisq.grpc called
CoreWalletService; we would ideally use such a service implemented in
bisq.core. The closest we have is WalletsManager, but it is not designed
to be used the way we are using it here in the grpc context. Rather than
making changes directly to core (which can be very risky), we will
rather make them here in this layer, designing exactly the "core wallet
service" we need, and can then later see about folding it into the
actual core.
@cbeams cbeams force-pushed the rpc-wallet-protection branch from 39c868f to 12d2de5 Compare May 3, 2020 13:53
And throw an IllegalStateException with an appropriate message if the
wallet is not available or still locked.

This change also eliminates the NullPointerException that would
sometimes be thrown when calling #getAvailableBalance after the wallet
has become available but before the balance has become available.
@cbeams cbeams force-pushed the rpc-wallet-protection branch from 12d2de5 to 163061a Compare May 3, 2020 14:16
@cbeams
Copy link
Contributor

cbeams commented May 3, 2020

@ghubstan, have a look at what I've done in 163061a to change the return type of CoreWalletService#getAvailableBalance from Tuple2<Long, ApiStatus> back to long and how throwing IllegalStateException is now favored. The status of all StatusRuntimeExceptions is now set to UNKNOWN, as I don't see a requirement at this point for the client to differentiate between different gRPC status codes. We can introduce that later if need be. For now, this approach keeps CoreWalletService completely unaware of gRPC concepts (as if it were really located in bisq.core where we want it to be) and it its methods are left returning values and throwing exceptions in an idiomatic / plain old java fashion.

Could you refactor the remaining methods in CoreWalletService to follow suit? When finished, ApiStatus should be empty and therefore deleted, and there should no longer be any usage of Tuple2. Preferred is to refactor each method in its own commit. Thanks.

ghubstan and others added 3 commits May 3, 2020 13:42
Like the change in commit 163061a, setWalletPassword now throws an
IllegalStateException with an appropriate message if the wallet password
cannot be set.

Also deletes unused StatusApi enums.
Like the change in commits 163061a and 9164579, removeWalletPassword
now throws an IllegalStateException with an appropriate message if
the wallet password cannot be removed.

Also deletes unused StatusApi enums.
@ghubstan
Copy link
Contributor Author

ghubstan commented May 3, 2020

RE commit ab17b67: I was catching exceptions (indicating a bug or bugs to be fixed?) I've seen during testing this method. Otherwise they will bubble up to client.

RE commit 9164579: I don't think an extra line for an explicit return; is needed for methods that return void.

@cbeams
Copy link
Contributor

cbeams commented May 3, 2020

RE commit ab17b67: I was catching exceptions (indicating a bug or bugs to be fixed?) I've seen during testing this method. Otherwise they will bubble up to client.

I don't see the value in wrapping and rethrowing there. I'd rather see the exception bubble up as-is to the client and then see specific error handling code put wherever it's most appropriate. Wrapping it in an IllegalStateException may not really reflect the nature of the problem, e.g. just a programmer error.

RE commit 9164579: I don't think an extra line for an explicit return; is needed for methods that return void.

If you don't explicitly return there, then the nextif block will return true and throw because you just finished encrypting the wallet. Prior to your refactoring, you did explicitly return from that block with a Tuple2. You either need to return or make the subsequent if an else if block; your call.

cbeams and others added 6 commits May 4, 2020 12:22
This commit factors out a run() method in CliMain that either returns
(void) or throws an exception. This eliminates the need to call
System.exit in so many places as were previously. Indeed, now there is
only one place where System.exit is called.

It also removes the duplication of printing "Error: ..." to stderr when
something goes wrong by doing this once in the global catch clause in
the main method.
This variable no longer needs to be initialized to avoid a compiler error.
Cache the aesKey instead of the password in unlock wallet method,
avoiding extra overhead of deriving it from the temp password in the
lock method.

The KeyCrypterScrypt used in the unlock method must also be cached for
use in the manual lock method.  Creating a new KeyCrypterScrypt instance
in the manual lock method would have a different random salt value,
invalidating the password used in the unlock method.
This is consistent with :desktop's PasswordView#onApplyPassword.
@ghubstan
Copy link
Contributor Author

ghubstan commented May 6, 2020

@cbeams , I am still figuring out how to ensure the wallet is encrypted in the event of a crash before unlockpassword timeout expiration. I think the intermediary commits fbb025a - 24248d4 above were necessary, but CoreWalletService still decrypts/encrypts in the unlock/lock wallet methods. This will be fixed.

I think an intermediate step in that direction is to decrypt and re-encrypt the wallet only inside the methods that need to temporarily unlock the wallet, instead of decrypting it in the unlockwallet method. unlockwallet would just cache the crypter+key, which would be used by getbalance to: decrypt wallet -> getbalance -> encrypt wallet -> return balance. The unlockwallet timeout task or the lockwallet method, whichever happens first, would clear the temp crypter+key variables (see commit 4262f29).

I don't think getbalance even requires wallet decryption to display an updated balance, but it would be the place to see if this is feasible until a better solution to the wallet protection unlock/lock problem is found, and new methods are implemented. (Then the wallet decrypt/encrypt code would be removed from getbalance.)

Tomorrow, I will run :desktop in regtest mode to see how wallet decryption/encryption works while trading and sending btc/bsq to and from other wallets. I realize maybe all that's needed to unlock the wallet is the aesKey, but so far, bitcoin core src tells me it decrypts the in-memory wallet in its walletpassphrasechange method, and Bisq / BitcoinJ src tells me unlockwallet will have to decrypt the wallet and save modified wallet files to disk (I could be and hope I'm wrong).

(On a related matter: the UI does have a removepassword function.)

EDIT: This comment is obsolete. See commit a79ae57.

There is no need to decrypt and re-encrypt wallet files in the unlock
and lock wallet methods.  The temporary aesKey saved in the unlock
method will used for operations on an encrypted wallet.

There is also no need to cache a tempKeyCrypterScrypt;  this class
level variable was removed.
@ghubstan
Copy link
Contributor Author

ghubstan commented May 7, 2020

The correctness of commit a79ae57 needs to be verified by implementing and testing wallet related methods requiring the aesKey cached in unlockwallet.

For example, a method analogous to bitcoin-cli's sendtoaddress "addr" amt, that performs the same operation as the Bisq UI's WithdrawalView #
doWithdraw(amount, fee, callback) -> sendFunds(amount, fee, aesKey, callback)

ghubstan added a commit to ghubstan/bisq that referenced this pull request May 7, 2020
These tests are meant to be run against rpc methods in PR bisq-network#4214.
This change should not be merged before PR bisq-network#4214.
@ghubstan ghubstan marked this pull request as ready for review May 7, 2020 17:50
@cbeams
Copy link
Contributor

cbeams commented May 8, 2020

@ghubstan, I just ran into the following scenario when testing everything locally:

  1. spin up a daemon against a new data directory, e.g: ./bisq-daemon --appDataDir=/tmp/newbisqdatadir --apiPassword=xyz
  2. run ./bisq-cli --password=xyz getbalance until you get a valid response of 0.00000000
  3. set a wallet password with ./bisq-cli --password=xyz setwalletpassword foo
  4. notice that ./bisq-cli --password=xyz getbalance errors out as expected
  5. unlock the wallet with ./bisq-cli --password=xyz unlockwallet foo 60
  6. notice that ./bisq-cli --password=xyz getbalance returns 0.00000000 once again
  7. lock the wallet with ./bisq-cli --password=xyz lockwallet
  8. notice that ./bisq-cli --password=xyz getbalance errors out as expected
  9. unlock the wallet once again with ./bisq-cli --password=xyz unlockwallet foo 60
  10. notice that ./bisq-cli --password=xyz getbalance does NOT return 0.00000000 as expected, but rather fails with a "balance is not yet available message"
  11. notice that no matter how long you wait, the command above always fails with the same message.

It appears there is no way to recover from this state via the cli, and that the only way out is to spin up a fresh data directory. Stopping and starting the server makes no difference either.

@ghubstan
Copy link
Contributor Author

ghubstan commented May 8, 2020

It appears there is no way to recover from this state via the cli

@cbeams , I haven't been able to reproduce the error yet. I don't suppose you still have the server log, and could check if the balance is not yet available msgs on the cli correspond to lost peer connections and Tor layer log.errors?

I've seen this before and should have mentioned it. Fixing tor binaries is out of this PR's scope, but the daemon needs to return a meaningful msg to the cli when this happens.

I've been turning my net connection off & on trying to force the tor errors I think are related to the daemon bug. No success so far, but I'll leave daemon on in hopes tor starts failing.

@cbeams
Copy link
Contributor

cbeams commented May 12, 2020

@ghubstan, I just tried reproducing this on my side and recorded my efforts in the video at keybase://team/bisq/2020-05-12%20repro%20grpc-api%204214%20bug. In the end, I was unable to reproduce it exactly as I reported it. Please have a look at the video, but I'm not sure if it makes sense to focus on this more if I can't reliably reproduce it myself.

@ghubstan
Copy link
Contributor Author

ghubstan commented May 12, 2020

@cbeams,

I will see the keybase video later today. But today I am seeing numerous unrecoverable balance is not yet available errors and am working on it now.

An interval of tor circuit breakdown during startup prevented the wallet balance from being updated.
When I see a series of tor debug statements like these:

CircuitStatus: 92 CLOSED
CircuitStatus: 92 CLOSED
CircuitStatus: 144 FAILED
CircuitStatus: 148 LAUNCHED
CircuitStatus: 94 CLOSED
CircuitStatus: 149 EXTENDED
CircuitStatus: 148 BUILT
CircuitStatus: 153 EXTENDED
.....
CircuitStatus: 153 BUILT
CircuitStatus: 134 CLOSED
...
...
Ending in many CLOSED statements
...
...
CircuitStatus: 134 CLOSED

the initial null wallet balance cannot be updated.

But you saw your balance change from a non-null value to null (then you saw balance is not yet available). I suspect the balance is set to null while the tor circuit is broken, but I haven't reproduced that scenario yet.

When the circuit if finally reconstructed -- however long it takes -- the wallet balance is updated (confirmed by hitting my breakpoint @ Balances#updateBalance), and the cli can show the balance. I think this is the problem you found. If it is, you would have seen your balance after the tor circuit problems resolved themselves. Eventually. (EDIT: But maybe not. Ever. The recovery of the balance after circuit rebuild I saw earlier is not happening now. I have to take back what I said about "eventual" recovery.)

I am discussing with @freimair about how this problem is or might be categorized, and how this state could be communicated to the :cli user.

This problem is related and possibly identical to the "3/4" problem described in issue 2547.

The CoreWalletService should never be running more than one wallet lock
TimerTask at any given time, and the wallet should lock at the correct time
after a user overrides a prior timeout value.

The CoreWalletService now cancels any existing lock TimerTask thread
before creating a new one, to prevent the wallet from re-locking at
unexpected times.

This change prevents error conditions created in scenarios similar to
the following:

  (1)  User unlocks wallet for 60s
  (2)  User overrides unlock timeout by calling unlockwallet "pwd" 600s
  (3)  After 60s, the 1st, uncanceled lock timeout expires, and wallet is
       locked 4 minutes too soon.
@cbeams cbeams merged commit 2e33c2c into bisq-network:master May 18, 2020
@chimp1984
Copy link
Contributor

@ghubstan
Regarding wallet encryption (not sure if already resolved, have not read the full thread...):

If wallet is encrypted you need the aesKey (or generate it from the password) for certain wallet methods. Bisq does not decrypt/encrypt but requests from the user the password to get the aesKey. Encrypting and decrypting is only done when setting or removing the wallet password.
The wallet write operation has a certain risk for disk failure and when setting the password we delete all the rolling backups (as seed would be visible in unencrypted files), so that could be a critical problem if the write operation fails and causes a corrupted wallet (I read on BitcoinJ mailinglist that this happened sometimes). So the encryption/decryption operations should be done only when absolutely needed IMO.

Side note:
If the wallet is encrypted Bisq keeps the aesKey in memory to be able to sign a tx in case a taker takes an offer (which happenes non-interactive). This might be a bit of a security risk and it would be better if we do that only when absolutely needed (e.g. in case a maker has open offers).

I am not sure about your use cases for the API but passing the wallet password if required for the wallet operation seems to me the right way to deal with it.

@ripcurlx ripcurlx added this to the v1.3.5 milestone Jun 4, 2020
@ghubstan ghubstan deleted the rpc-wallet-protection branch July 15, 2020 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement wallet protection rpc methods
4 participants