Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refusing to activate session errors after upgrading to 0.10.1 #2362

Closed
umglurf opened this issue Aug 13, 2022 · 22 comments · Fixed by #2369
Closed

refusing to activate session errors after upgrading to 0.10.1 #2362

umglurf opened this issue Aug 13, 2022 · 22 comments · Fixed by #2369
Assignees
Milestone

Comments

@umglurf
Copy link

umglurf commented Aug 13, 2022

Describe the bug

In one of my use cases for boundary, I use it to allow terraform to connect to database servers to configure them, using

boundary connect -target-id $target_id -host-id "$host_id -listen-port=35567

When running terraform, there are many connections, and after a few seconds I start getting errors from the boundary agent

Proxy listening information:
  Address:             127.0.0.1
  Connection Limit:    -1
  Expiration:          Sat, 13 Aug 2022 16:58:49 CEST
  Port:                35567
  Protocol:            tcp
  Session ID:          s_vDI8YXc0xj
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"
error reading handshake result: failed to read protobuf message: failed to get reader: received close frame: status = StatusInternalError and reason = "refusing to activate session"

Running single sessions works fine, and running terraform with parallism 1 can work together with target.

I've also tried to re-initialize the database, no change.

To Reproduce
Steps to reproduce the behavior:

  1. Run boundary connect -target-id $target_id -host-id "$host_id -listen-port=XX
  2. Run many simultaneous connections against the listen port
  3. See error

Expected behavior
The boundary agent is able to serve the connections. This worked fine with 0.9.1.

Additional context
worker.log
controller.log

@ataraxia937
Copy link

I see the same, even if the connections are made one at a time. It happened on 0.10.0 as well. I can't speak to earlier history because I'm very new to Boundary.

The first use of a session works fine. If the consumer of the session closes its connection, I'm then unable to use it again, with the error shown above. Looking at the code, I see that the worker refuses to establish the session again because the state is not "Pending". Indeed, the session remains as "Active" after the first time it's been used. I would expect it to change back to "Pending".

I'm not totally sure this is the same exact bug, but it's certainly the same error message. Let me know if I should open a new issue instead.

@irenarindos irenarindos self-assigned this Aug 17, 2022
@irenarindos
Copy link
Collaborator

Thank you both for reporting this bug! @ataraxia937 It sounds like you're hitting the same bug as @umglurf

We're actively looking into this and will let you know when a fix is available.

@jefferai
Copy link
Member

Can you both confirm whether or not you have multiple workers?

@umglurf
Copy link
Author

umglurf commented Aug 17, 2022

Can you both confirm whether or not you have multiple workers?

I have three controllers and three workers, but got the same result when running with just one controller and one worker

@ataraxia937
Copy link

I likewise can reproduce it with just one worker, or with multiple.

@jefferai
Copy link
Member

@umglurf If you are using just one worker, do you see this behavior immediately? In the original post you suggested it started happening after some time. If it's what we think it is, it's a regression that would manifest as the first worker to handle a session throwing that error with additional connections...so if you have a single worker, I'd expect that any connection past the first would throw that error.

@umglurf
Copy link
Author

umglurf commented Aug 17, 2022

@jefferai I tested again now with one worker, I'm able to do one connection and on the second it fails

@jefferai
Copy link
Member

Perfect. Fix will be coming in 0.10.3, @irenarindos has a PR up (#2369)

@jefferai jefferai added this to the 0.10.3 milestone Aug 18, 2022
@ataraxia937
Copy link

I was just now testing the code from that PR branch. Works great here!

@umglurf
Copy link
Author

umglurf commented Aug 19, 2022

I was not able to get the build working locally, but downloaded the binary from https://github.com/hashicorp/boundary/actions/runs/2883181053#artifacts
With this binary I still got the same error

@umglurf
Copy link
Author

umglurf commented Aug 19, 2022

Sorry, wrong build :)
Tested with binary from https://github.com/hashicorp/boundary/actions/runs/2882448019#artifacts and it works for me as well, both with 1 worker and with 3

@RobertSkawinski
Copy link

Hi,

I have a similar issue with boundary version 0.11.2

I can start the Connection and it is pending until I try to connect.
As soon as I connect it terminates immediately with error
"no tofu token but not in correct session state"

Log:

{"id":"s9FkQAea7f","source":"https://hashicorp.com/boundary/hostname.local/controller+worker","specversion":"1.0","type":"error","data":{"error":"no tofu token but not in correct session state","error_fields":{},"id":"e_c5Cp1B0UvY","version":"v0.1","op":"worker.(Worker).handleProxy","info":{"session_id":"s_her5XTeHJa"}},"datacontentype":"application/cloudevents","time":"2022-12-22T17:21:17.306181959Z"}

Setup:
Docker: one Container with Controller & Worker
docker run --restart always -d --name boundary --network host --cap-add IPC_LOCK -v /data/boundary:/boundary -e'BOUNDARY_POSTGRES_URL=postgresql://user:[email protected]:5432/postgres?sslmode=require' -e'BOUNDARY_PUBLIC_DNS=boundary.xxx.at' hashicorp/boundary:0.11.2
Proxy with ssl offloading and port redirection 443 -> 9200
TCP Passthrough 9202 public -> 9202 docker container

cat /data/boundary/config.hcl 
disable_mlock = true

controller {
  name = "xx-controller"
  description = "XX Controller"
  database {
    url = "env://BOUNDARY_POSTGRES_URL"
    max_open_connections = 5
  }
  public_cluster_addr = "env://HOSTNAME"
}

worker {
  name = "demo-worker"
  description = "A default worker created for demonstration"
  public_addr = "env://BOUNDARY_PUBLIC_DNS"
}

listener "tcp" {
  address = "0.0.0.0"
  purpose = "api"
  tls_disable = true 
}

listener "tcp" {
  address = "0.0.0.0"
  purpose = "cluster"
  tls_disable   = true 
}

listener "tcp" {
  address = "0.0.0.0"
  purpose       = "proxy"
  tls_disable   = true 
  public_addr = "env://BOUNDARY_PUBLIC_DNS"
}

# Root KMS configuration block: this is the root key for Boundary
# Use a production KMS such as AWS KMS in production installs
kms "aead" {
  purpose = "root"
  aead_type = "aes-gcm"
  key = "xxx"
  key_id = "global_root"
}

# Worker authorization KMS
# Use a production KMS such as AWS KMS for production installs
# This key is the same key used in the worker configuration
kms "aead" {
  purpose = "worker-auth"
  aead_type = "aes-gcm"
  key = "xxx"
  key_id = "global_worker-auth"
}

# Recovery KMS block: configures the recovery key for Boundary
# Use a production KMS such as AWS KMS for production installs
kms "aead" {
  purpose = "recovery"
  aead_type = "aes-gcm"
  key = "xxx"
  key_id = "global_recovery"
}

@irenarindos
Copy link
Collaborator

Hi @tritonblaster - I've been trying to replicate this, and was curious how you're initiating your connection? Thanks!

@RobertSkawinski
Copy link

Hi @irenarindos - I tried with boundary desktop Version 1.5.
I can select the target, and it switches the status to pending.
But as soon as I connect to the local port connection, the connection terminates and I see this error message as an output from the docker container.

@RobertSkawinski
Copy link

RobertSkawinski commented Dec 23, 2022

Hi @irenarindos,
thanks for helping.
I think i was able to find the root cause, it seems to be caused by Boundary-Desktop UI.
When using the boundary.exe binary everything works as expected.

Should we track this bug in a new issue?

Steps to reproduce (working):

  1. Extract boundary-desktop_1.5.0_windows_amd64
  2. open powershell and switch to boundary-desktop_1.5.0_windows_amd64\Boundary\resources\app\cli
  3. get a token .\boundary.exe authenticate password -login-name=user -auth-method-id=ampw_xxx -addr=https://boundary.domain.at/
  4. connect to target .\boundary.exe connect -addr=https://boundary.domain.at -target-id=ttcp_xxx

Steps to reproduce (not working):

  1. Open Boundary-Desktop
  2. Login
  3. Switch to Targets
  4. Click Connect
  5. Enter the connection string (localhost + random port) in your application (for example database management tool) & start the application
  6. Boundary terminates the connection and application get a timeout

@irenarindos
Copy link
Collaborator

@tritonblaster thanks so much! I think we should track this in a new issue.

Thanks again for finding this & the steps to reproduce it!

@jonathan-russo
Copy link

Hi all. I am experiencing this issue again with Boundary 0.11.1 while using the CLI client. Particularly this is when connecting to a MongoDB server. Steps to reproduce:

  1. Add the mongo server as a target
  2. On local machine authenticate to boundary
    boundary authenticate password -addr=https://boundary.example.com:9200 -auth-method-id=ampw_uaHb9lxZD2 -login-name=admin
  3. Attempt to connect to mongo target
    boundary connect -target-id=ttcp_kpeE4kwR1w -addr=https://boundary.example.com:9200

On the client side I receive the same refusing to activate session error as OP(only after attempting to connect with a mongo client) and on the server side I receive the same no tofu token but not in correct session state error.

@irenarindos
Copy link
Collaborator

@jonathan-russo Thanks for reporting this! I'm looking into this issue under #2741

I am curious, what OS are you using, and what mongo client are you using? Thanks so much!

@jonathan-russo
Copy link

Hi @irenarindos !

I am using Boundary version 0.11.2 on my local machine running Mac OS X Monterey(12.6.2). The client I am using is Studio3T.

@irenarindos
Copy link
Collaborator

@jonathan-russo I've got a potential fix up in #2795 - I was wondering if you'd be willing to build Boundary from my PR branch and try to replicate your issue to see if it's resolved? Alternatively if you let me know what platform you need I can send a build to you.

Thanks so much!

@jonathan-russo
Copy link

Hi @irenarindos thanks for the quick fix!

I'm having some issues trying to build Boundary so if you could send me a build that would be great. We are running the server on Amazon Linux 2 and I am running the boundary client on Mac OS X Monterey(12.6.2).

@irenarindos
Copy link
Collaborator

Thanks @jonathan-russo !

Can you email me at "irena.rindos at the company I work for dot com" so we can coordinate getting you a build?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants