Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support accessing Netbox through a Layer 7 proxy #376

Closed
jak3kaj opened this issue Apr 30, 2021 · 9 comments
Closed

Support accessing Netbox through a Layer 7 proxy #376

jak3kaj opened this issue Apr 30, 2021 · 9 comments

Comments

@jak3kaj
Copy link

jak3kaj commented Apr 30, 2021

We have a unique configuration where we have Netbox inside a private network and need to add access to the Netbox API from an external network. We configured a Layer 7 Proxy which has a different hostname than our Netbox instance and also prepends a directory to the url before the /api/ directory.

The external base url is in this form:

https://proxy.example.com/internal-netbox-instance/api/

We also need to continue supporting internal Netbox users and don't want to change the path.

Our internal base url looks like this:

https://internal-netbox.example.com/api/

After reviewing all of the possible Netbox server configurations, none seemed able to support requests from both internal and external clients simultaneously. Using BASE_PATH would allow external requests to work, but requires modifying the internal client's base url because there is an additional directory in the external proxies url. There are many automations using the Netbox API and it would be better not to have to modify all of their configurations.

In order to support clients that could make requests to either url, enhancing pynetbox seems like the best place to address this. There is already support for proxy access from #323. This use case requires that change as well as support to modify the base of the url path (the part between the hostname and the /api/ directory).

There are also a few more places that the hostname is not updated to the base url specified when the pynetbox api is instantiated; like the "url", "next" and "previous" fields in the response data from Netbox.

@zachmoody
Copy link
Contributor

You might need to go into a little more details why your setup isn't working. It's not immediately clear why it works fine through one, but not two.

@jak3kaj
Copy link
Author

jak3kaj commented Apr 30, 2021

Sure, no problem. Let me show what happens with pynetbox 6.1.2:

# External Proxy URL
nb_url = 'https://proxy.example.com/internal-netbox-instance/'
nb = pynetbox.api(nb_url, token=nb_token)
site_list = nb.dcim.sites.filter(name='example-site',exclude='config_context')

devices = []

for site in site_list:  # Breaks here
    devices.extend(nb.dcim.devices.filter(site_id=site.id,exclude='config_context'))

When it iterates over the values in the generator returned by dcim.sites.filter, the _endpoint_from_url gets confused because the url that is passed to it is the url returned from Netbox, not the base_url specified in the API.

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    for site in site_list:
  File "/home/user/.local/lib/python3.6/site-packages/pynetbox/core/response.py", line 117, in __next__
    next(self.response), self.endpoint.api, self.endpoint
  File "/home/user/.local/lib/python3.6/site-packages/pynetbox/core/response.py", line 232, in __init__
    if values and "url" in values
  File "/home/user/.local/lib/python3.6/site-packages/pynetbox/core/response.py", line 349, in _endpoint_from_url
    app, name = split_url_path[2:4]
ValueError: not enough values to unpack (expected 2, got 1)

This is what is passed into _endpoint_from_url:

_endpoint_from_url - url arg: https://internal_site.example.com/api/dcim/sites/1091/ self.api.base_url: https://proxy.example.com/internal-netbox-instance/api

This is happening during the Record object initialization, where self.endpoint is defined here. The values dict are a values returned by Netbox.

The reason _endpoint_from_url breaks is because it expects the path structure that Netbox returns to be the same as the path structure defined in the base_url passed during api() instantiation.

Netbox returns urls with this base url:

https://internal_site.example.com/api/

The base_url that pynetbox made the request from:

https://proxy.example.com/internal-netbox-instance/api/

To summarize, pynetbox can't support two different path structures at the same time. The external http load balancer needs the/internal-netbox-instance/ directory for the Netbox service (because the load balancer is used for many other services as well). We don't want to disrupt the internal Netbox clients by adding /internal-netbox-instance/ to the BASE_PATH, but we need to allow access to Netbox through this http proxy.

@zachmoody
Copy link
Contributor

Ok, yeah, it dawned on me a little while after replying that was probably it, but I still don't feel like pynetbox is the right place to implement a fix. It seems more like a reverse-proxy problem to me. Like, you should be able to set X-Forwarded-For based on a condition if the traffic's coming from the external proxy. 🤷‍♂️ Haven't actually tried that before though, but it might be something to look into if you haven't already.

@markkuleinio
Copy link
Contributor

If I had such a need, I would create two different virtual hosts in the reverse proxy, rewrite the URL path in one of them as needed, and make sure that NetBox configuration accepts both hostnames (or rewrite the hostname as well in the proxy). Wouldn't that be a working solution?

@jak3kaj
Copy link
Author

jak3kaj commented May 3, 2021

Thanks for the suggestions @markkuleinio and @zachmoody . There's actually no reverse proxy at the moment, but I could set one up like you described. This could address part of the problem, but not all of the problem.

I can have the internal clients continue to access Netbox directly, like they currently do. I could point the external forward-proxy to the virtual host in a new reverse proxy in order to remove the internal-netbox-instance from the path like this (nginx):

location /internal-netbox-instance {
  proxy_pass https://internal-netbox.example.com/;
}

That will get the requests to the right place, but there is still a problem, because Netbox returns absolute URLs inside JSON data in the API responses.

For example, here is a response with nested data from dcim/device:

{
    "id": 87589,
    "url": "https://internal-netbox.example.com/api/dcim/devices/87589/",
    "name": "esxhost01.example.com",
    "display_name": "esxhost01.example.com",
    "device_type": {
        "id": 984,
        "url": "https://internal-netbox.example.com/api/dcim/device-types/984/",
        "manufacturer": {
            "id": 158,
            "url": "https://internal-netbox.example.com/api/dcim/manufacturers/158/",
            "name": "Pivot3",
            "slug": "pivot3"
        },
        "model": "VSTAC",
        "slug": "vstac",
        "display_name": "Pivot3 VSTAC"
    },
}

There are 3 different URLs referencing 3 this device's url, the device type's url and also the manufacturer's netbox url. Since this data is embedded in the http response, the proxy will not modify that data.

The other place there is still a problem is when using pagination. The 'next' and 'previous' values in the json response also contain absolute URLs with the internal site name.

{
        "count": 2,
        "next": "http://internal-netbox.example.com/api/dcim/devices/?limit=2&offset=6",,
        "previous": "http://internal-netbox.example.com/api/dcim/devices/?limit=2&offset=4",
        "results": [
            {'id': 87865, 'url': 'https://internal-netbox.example.com/api/dcim/devices/87865/', 'name': 'test87865'},
            {'id': 87866, 'url': 'https://internal-netbox.example.com/api/dcim/devices/87866/', 'name': 'test87866'},
        ],
}

I was reviewing the pynetbox code again, assuming the base path can be rewritten using the reverse proxy and it may work. _endpoint_from_url should change the returned host entry to be the base_url for all 'url' values. I'll try to setup the reverse proxy and confirm this.

Looking at the pynetbox/core/query.py module, I don't think pagination will work. Since req['next'] is passed as the url_override option directly to _make_call, there's no logic to change the hostname to be the base_url there.

If the _endpoint_from_url works with the reverse proxy config, and pagination does not; I'll scale back my PR so that _make_call or query.get behave similarly to _endpoint_from_url. It will be more trivial to swap out the host on a url, then rewriting the base of the path.

@zachmoody
Copy link
Contributor

zachmoody commented May 4, 2021

I think this is where the X-Forwarded-For* header comes in. I want to say that's what NetBox uses to generate the URL on those fields.

edit: That may not be the right header. At any rate, you should be able to find the right one(s) that let you get Django to generate the right URL.

@markkuleinio
Copy link
Contributor

I think this is where the X-Forwarded-For* header comes in. I want to say that's what NetBox uses to generate the URL on those fields.

edit: That may not be the right header. At any rate, you should be able to find the right one(s) that let you get Django to generate the right URL.

https://github.com/netbox-community/netbox/blob/develop/contrib/nginx.conf: X-Forwarded-Host

https://stackoverflow.com/questions/32542282/how-do-i-rewrite-urls-in-a-proxy-response-in-nginx gives some idea about the implementation as well.

@jak3kaj
Copy link
Author

jak3kaj commented May 6, 2021

I reviewed our current configuration. Our external proxy, which is acting like a forward-proxy, is rewriting the url in a similar way to this (note that it is not actually an NGINX proxy, but I'll keep using NGINX config like proxy psuedo code):

location /internal-netbox-instance {
  proxy_pass https://internal-netbox.example.com/;
}

This is how the requests are making it to Netbox with the correct path.

There is actually a reverse proxy as well, it is an AWS ALB, which is configured using an Ingress controller in an EKS kubernetes cluster. One security feature (limitation?) of ALB is that it does not forward HTTP headers, and does not support adding an X-Forwarded-Host header either.

In order to implement a reverse proxy in a way that can resolve this problem, we would need to replace the ALB with NGINX or a similarly capable device. The device would need to accept and forward the X-Forwarded-Host header as well as rewrite the returned content from Netbox similarly to these solutions:

NGINX sub_filter: https://serverfault.com/questions/713148/modify-html-pages-returned-by-nginx-reverse-proxy

location / {
  sub_filter_once off;
  sub_filter_types text/json;
  sub_filter "/api/" "/internal-netbox-instance/api/";
}

There's a similar solution using an Apache module: https://stackoverflow.com/questions/15902992/replacing-json-encoded-url-with-mod-proxy-html

Note that these solutions are not particularly efficient and may not be acceptable depending on the traffic load. The NGINX sub_filter module is not enabled by default, so it will require a bit more effort to get the NGINX going.

I'm also not certain if the forward-proxy is sending an X-Forwarded-Host header, since I don't have visibility into the input traffic to the ALB reverse-proxy. There's a possibility the forward-proxy also doesn't support this header. Due to the complexity of swapping out ALB for NGINX, and needing to use a recompiled NGINX (to get the sub_filter support) on top of that, I'm not planning on changing the infrastructure at this time.

I appreciate you taking the time to help me with this issue. I think it is helpful for others to document the ideal solution.

I understand if you don't want to accept the PR, since most people don't have this problem; and there are ways to deal with this in infrastructure as we've detailed here. I wrote the PR so the user needs to explicitly opt-in to using the rewriting behavior when instantiating the API, hoping that would make the PR more likely to be accepted. Ironically, that made the code more complex, and required changes in many parts of the code. If you decide not to accept the PR, I'll remove all of the conditions and the new external_proxy argument and maintain a fork of pynetbox with just the modified url rewriting code enabled by default, for my own purposes.

@zachmoody
Copy link
Contributor

That's fair, it is a tough problem to work around, but I appreciate the sentiment and effort to upstream your changes. If this was a more common problem or didn't touch parts of the code that's given us a bunch of problems in the past, we probably could've accepted them, but with both of those factors working against it; I feel like holding off, for now, would probably be best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants