Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom agent strings #63

Merged
merged 17 commits into from
Jul 12, 2021
Merged

Conversation

0x00002a
Copy link
Contributor

This implements #37. There are a few issues that cropped up during implementation.

Main Changes

  • malloy::controller has had its m_cfg removed. This was due to collisions with derived versions and it was just a whole mess. I hacked around it rather while doing HTTP client: Automatically follow redirects #35, but I gave up when I realised that init is hiding the inherited version to take the right config. I can reverse it but I'd rather not :p
  • malloy::controller has had parts of its interface made protected since it no longer has m_cfg. If anything relied on the controller start or init it will be broken by this patch
  • server::controller::config and client::controller::config both have new ws_agent_string fields, which default to <beast version> malloy-<server|client>
  • The agent_string is threaded through the server and client side and set as the decorator on the websocket connection
  • client::controller::init now takes a client::controller::config, which will break code currently passing malloy::controller::config to it

The tests don't fail, but an error is logged in on_read in server::http::connection. I'm not really sure if it indicates a problem or not, as far as I can tell from testing the examples everything seems to be OK but I may have missed something.

Closes: #37

@0x00002a
Copy link
Contributor Author

That failure on VS is the out of heap space again. Only seems to happen on PRs for some reason

@Tectu
Copy link
Owner

Tectu commented Jul 12, 2021

I only took a quick glance over it so far.
Some questions/points:

  • The server also needs access to this string. It will be used for the HTTP Server header when the server responds to requests. While agent_string is not the best matching name for this I am not sure whether it's worth to have separate fields for those. Without thinking too much about it, my move would have been to add std::string agent_field (or whatever better matching name) to the malloy::controller so both the client and the server get an instance of that.
  • Why not passing std::string_view for the agent string to the connection instead?
  • The HTTP connection(s) also need the corresponding header fields to be filled. This used to be there, but then got mysteriously removed by some unknown entity ;p

@0x00002a
Copy link
Contributor Author

The server also needs access to this string. It will be used for the HTTP Server header when the server responds to requests.

Ah, I didn't realise that was also needed. I'll add that.

Why not passing std::string_view for the agent string to the connection instead?

I could, but I dislike storing string_view's as fields (unless designed to be constexpr) since its non-owning. A string is still very cheap and gives me peace of mind. But I'll change it if you want.

my move would have been to add std::string agent_field (or whatever better matching name) to the malloy::controller so both the client and the server get an instance of that.

Adding it to the top-level config is an issue because the defaults are different for client and server. Not sure how else that could be done.

@Tectu
Copy link
Owner

Tectu commented Jul 12, 2021

Ah, I didn't realise that was also needed. I'll add that.

If I remember correctly requests sent from client to server use the User-Agent field whereas responses sent from the server to the client use the Server field.

I could, but I dislike storing string_view's as fields (unless designed to be constexpr) since its non-owning. A string is still very cheap and gives me peace of mind. But I'll change it if you want.

I don't mind too much at the moment. There's plenty of other stuff that can be optimized. In this case I'd personally be a bit more "cautious" than with things like the controller as this will impact every single connection. In a scenario where a server might serve thousands of connections at once a view would definitely be preferred in my opinion. After all this can be documented well plus in this particular case a user of this library wouldn't interact with that directly anyway.

Adding it to the top-level config is an issue because the defaults are different for client and server. Not sure how else that could be done.

I might be misunderstanding but why not just initializing it in the config structure's constructor? Creating the controller (and corresponding config) is something that happens rarely (most likely in most applications even just once) so performance penalties shouldn't be the thing to worry about.

@0x00002a
Copy link
Contributor Author

0x00002a commented Jul 12, 2021

I might be misunderstanding but why not just initializing it in the config structure's constructor? [..]

Currently it does just that effectively, but since its not part of the top level config the user cannot set it there where having the same value for server and client may not make sense. It does mean there is effectively code duplication there, but only superficially since they are really independent and seperate options, they are just named the same right now.

I can't see a way to put it in the ctors of the derived without also potentially overwriting a users config or other issues. Since fields cannot be overriden.

I don't mind too much at the moment. There's plenty of other stuff that can be optimized. In this case I'd personally be a bit more "cautious" than with things like the controller as this will impact every single connection.

True, but then as you say there are much bigger fish to fry. We currently copy 3 shared pointers every time a new connection is created, I'm pretty sure thats a lot more expensive than copying a SSO string :p (although I haven't measured so I'm sorta talking out of my arse here).

@0x00002a
Copy link
Contributor Author

If I remember correctly requests sent from client to server use the User-Agent field whereas responses sent from the server to the client use the Server field.

Hmm, I've just done the server side and I'm working on the client but I was thinking, should we set this? The user has full control over it from their end and I've heard it can be a security issue outside of debug environments. For the websockets its different because we set the decorator internally, before the user has access to the connection, but the user can easily pass their requests and responses through some prep code before sending them off

@Tectu
Copy link
Owner

Tectu commented Jul 12, 2021

True, but then as you say there are much bigger fish to fry. We currently copy 3 shared pointers every time a new connection is created, I'm pretty sure thats a lot more expensive than copying a SSO string :p (although I haven't measured so I'm sorta talking out of my arse here).

The difference with the string is mainly memory related: Shared pointers are lightweight. Agents strings might easily be a few dozen characters and each active connection will have a copy.
Anyway, as agreed previously: Optimization can (and should) happen after the initial release.

Hmm, I've just done the server side and I'm working on the client but I was thinking, should we set this? The user has full control over it from their end and I've heard it can be a security issue outside of debug environments. For the websockets its different because we set the decorator internally, before the user has access to the connection, but the user can easily pass their requests and responses through some prep code before sending them off

As far as I know the security issues do not arise from the header value itself but the fact that information is being disclosed which makes looking for (and using) existing vulnerabilities easier.
Personally I think that malloy should provide this capability. The user can always set an empty value in the configuration or customize it if it's necessary on a per-connection basis as you mentioned.
Long term I am planning of introducing security policies to control certain aspects such as disclosure of information - including User-Agent and Server header values.

@0x00002a
Copy link
Contributor Author

The user can always set an empty value in the configuration or customize it if it's necessary on a per-connection basis as you mentioned.

Agreed, I'll add a check for if its currently set already since currently we overwrite any existing Server field with BOOST_BEAST_VERSION_STRING (and the user provided one in my working tree).

@Tectu
Copy link
Owner

Tectu commented Jul 12, 2021

The use of BOOST_BEAST_VERSION_STRING should disappear in my opinion. We'll use malloy-server and malloy-client respectively. I'll add the corresponding version handling once we're there to add malloy's version numbers to the strings (again, just as a default, the user has in my opinion full control over the situation).

@0x00002a
Copy link
Contributor Author

I've added the agent strings to Server and User-Agent for server and client respectively. ws_agent_string has been renamed to user_agent and agent_string for client and server respectively, not sure about those and/or if they should have the same name. I've also added a roundtrip test to verify the agent strings are actually getting set for http

@@ -16,6 +18,7 @@
#endif

#include <boost/asio/strand.hpp>
#include <boost/beast/version.hpp>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This include should not be necessary as we don't use the beast version number anymore?

@@ -66,11 +69,19 @@ namespace malloy::client
public:
struct config :
malloy::controller::config {

/**
* @brief Agent string used for websocket connections
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field value is also used for HTTP connections - not only WS connections, right?

@@ -22,6 +22,12 @@ namespace malloy::http
{
head.target(head.target().substr(resource.size()));
}

template<bool isReq, typename Fields>
auto has_field(const boost::beast::http::header<isReq, Fields>& head, malloy::http::field check) -> bool
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check field can be const (I think).

@@ -50,6 +52,11 @@ namespace malloy::server
* to the working directory.
*/
std::filesystem::path doc_root = ".";

/**
* @brief Agent string used for websocket connections
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field value is also used for HTTP connections - not only WS connections, right?

@Tectu
Copy link
Owner

Tectu commented Jul 12, 2021

Also CI build errors :p

@Tectu Tectu merged commit 1dab30b into Tectu:main Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set agent strings properly
2 participants