Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOCKS5 Stability Improvements for HTTP/mTLS Transports #1807

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

navsec
Copy link

@navsec navsec commented Oct 31, 2024

Card

Hey Sliver Maintainers,
I've been using Sliver for awhile now and love the project. One of the things that has bugged me for awhile is the inconsistency with certain protocols (like RDP) over the built-in socks5 implementation. These inconsistencies usually require me to drop another tool like Chisel in order to get the functionality I need for certain scenarios.

These protocol inconsistencies have also been documented by others in Issue #728. After some time spent debugging, I believe I've made some progress on mitigating these issues.

Details

This PR introduces some changes on the client/server to improve SOCKS5 stability as well as provide better support for high-bandwidth protocols like RDP. These changes should only be applicable for HTTP/HTTPS and mTLS transports.

  • Added batching on intervals to help control the speed at which data is transferred to implant and back to client
  • Added logic to ensure sequence numbers are being processed in the correct order which is important for more stateful protocols like RDP.
  • Added atomic operations for managing sequence number updates
  • Added additional error handling in the event that a SOCKS connection enters an unexpected state so we can try to recover without breaking the socks instance entirely.
  • Added a rate limiter to the client SOCKS logic to smooth out potential request bursts from the client that have the potential to overwhelm the implant

These changes should help mitigate issues others have reported for "protocol instability". With these changes, I'm unable to reproduce previously reported issues for evil-winrm and bloodhound-python. Additionally, I'm now able to use RDP through SOCKS successfully - even multiple clients at the same time.


Reproducing the Original Issue

Using an mTLS or HTTP/HTTPS session, start an in-band SOCKS proxy and attempt the following scenarios:

RDP through SOCKS to remote host (xfreerdp)
proxychains -f sliver.conf xfreerdp /u:localuser /p:password /v:10.3.30.3 /cert:ignore
Behavior: RDP window starts but is stuck at a black screen.

RDP through SOCKS to remote host (rdesktop)
proxychains -f sliver.conf rdesktop -u localuser -p password 10.3.30.3
Behavior: RDP sessions loads successfully but freezes after around 5 seconds

Evil-WinRM through SOCKS to remote host
proxychains4 -f sliver.conf evil-winrm -i 10.3.30.3 -u localuser -p password --ssl
Behavior: Session never fully establishes


Testing

The testing environment I used for debugging are 3 hosts with firewall rules preventing HOST1 from directly reaching HOST3 to simulate a situation where proxying is required.

HOST1 (Client/Server) - HOST2 (Implant) - HOST3 (Target)

In addition to existing unit tests - I did some additional testing for the following scenarios
✅ Tested with HTTP/HTTPS Transports
✅ Tested with mTLS Transports
✅ Tested under heavy/light load using protocols previously reported issues with
✅ Tested under heavy load with request bursting

✅ Tested (2) simultaneous RDP sessions using 2 different Sliver Sessions
✅ Tested (2) simultaneous RDP sessions across two SOCKS instances using 1 Sliver Session
✅ Tested (2) simultaneous RDP sessions across a single SOCKS instance using 1 Sliver Session
✖️* Tested under heavy sustained load over 1hr streaming videos continuously

*: Very high-bandwidth RDP sessions may still break - this is typically when trying to stream video or anything that bursts window updates very quickly for very long sustained intervals. This behavior seems to be different based on the RDP client - I've only seen this behavior when attempting to stream video content for over 1+ hour continuously. I suspect the implant cannot catch up under this type of load. There is probably some further performance improvements that could be driven on the implant side - but I've refrained from making any implant modifications with this PR as I'm not sure what the appetite is for implant changes.


Checklist

✅ Contributions to core code must be GPLv3 (but not libraries)
✖️* If you'd like to work on a feature, please open a ticket and assign it to yourself
✅ Changes should be made in a new branch
✅ Commits must be signed for any PR to master
✅ Please provide meaningful commit messages
✅ Ensure code passes existing unit tests, or provide updated test(s)
✅ gofmt your code
✅ Any changes to vendor/ should be in a distinct commit
✅ Avoid use of CGO (limits cross-platform support)
✅ Avoid use of empty interfaces
✅ Never import anything from the server package in the client package.

*: Related to Issue (728), unable to assign myself


Would love to see this be available for other Sliver users as the built-in SOCKS proxy is a very powerful tool - let me know if there is any feedback or requested changes.
Thanks!

@navsec navsec requested a review from a team as a code owner October 31, 2024 18:50
@rkervella
Copy link
Member

Thanks @navsec, I'll need to find some time to properly review that, but at a quick glance it looks good. I'll try to get this merged in the upcoming weeks.

@navsec
Copy link
Author

navsec commented Nov 7, 2024

After some further testing I noticed some performance issues with my last commit using tools that tend to create one-to-many connections (ffuf, nxc)

Added an additional commit to this PR for the following:

  • Added some additional logic to track tunnel state and periodically clean up idle connections - this resolves a goroutine leak which eventually causes high CPU utilization server-side.
  • Added much more exception handling and state management for the whole socks lifecycle
  • Adjusted some of the static ticker values to be a little less aggressive to save on CPU cycles and balance performance
  • Adjusted client rate limit a bit more conservatively after more lengthy testing with tools previously mentioned.

This specific piece of code has been a bit tricky :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants