Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security of InstrumentationKey in Client Side Code #281

Open
mclark1129 opened this issue Jul 27, 2016 · 27 comments
Open

Security of InstrumentationKey in Client Side Code #281

mclark1129 opened this issue Jul 27, 2016 · 27 comments

Comments

@mclark1129
Copy link

Is there a more secure alternative to providing the InstrumentationKey directly in client-side code? I'd be concerned about a malicious user being able to easily add garbage telemetry data to my logs.

@KamilSzostak
Copy link
Contributor

Currently this is the only supported way. We are reviewing other options, but we don't have any timeline yet.

The problem is that even if the instrumentation key is not present in the client-side code, the JS SDK will need to get it somehow to send telemetry. There are multiple ways to send it more securely, e.g. the key can encoded, the backend can issue a one-time token, etc.
If the JS SDK can send genuine telemetry events, there is nothing stopping a malicious user from opening a browser console and sending garbage data.

You may consider:

@ramijarrar
Copy link

ramijarrar commented Oct 10, 2016

@KamilSzostak This would be much less concerning if the instrumentation key wasn't shared by secure environments (e.g server-side telemetry) that are not limited in this way.

@SergeyKanzhelev
Copy link

@ramijarrar it is not a requirement to share the same instrumentation key. You can collect telemetry into two different applications and then combine information on a custom dashboard in Azure.

@KLuuKer
Copy link

KLuuKer commented Mar 31, 2018

you could also do a separate client&server side key and have no further checking, then atleast nobody can put garbage on the server side of the telemetry

on top of that you could also add a check that makes sure a matching server generated correlation id exists for the client telemetry event to stop the most of the garbage that is a actual issue for people using google analytics (and may leak over to app insights in the future)

for more about that just bing it https://www.bing.com/search?q=google+analytics+bad+data
TL;DR spammers\hackers inject malicious referrals or other url's to lure site admin's and try to hack them

@activebiz
Copy link

Any timeline on when this will be done?

@jpiyali
Copy link
Contributor

jpiyali commented May 29, 2018

Cross apps query has strong support in the portal today. So, splitting up client side to a separate ikey is recommended here. Have you considered dropped data on client where authenticated user id is not setup?

@activebiz
Copy link

activebiz commented May 30, 2018

Thanks @jpiyali . Any examples/pointers how to achieve this? (i.e. for splitting up client side to a separate ikey is recommended here)

@jpiyali
Copy link
Contributor

jpiyali commented May 30, 2018

What kind of app do you have? You can create a second application insights resource in the azure portal. Update instrumentation key that is used in javascript and redeploy your client binaries.

@activebiz
Copy link

Its javascript react web app. Problem is that instrumentation key is visible on browser source.

@infogulch
Copy link

infogulch commented Sep 2, 2019

Can we get an api for Application Insights that can validate a time-limited and tagged temporary insturmentation key/signature? Like SAS but for AI instead of blob storage. An Instrumentation Signature instead of Instrumentation Key. If the Instrumentation Signature (IS) is expired, logging with it fails. All logs emitted by that temporary key are associated with the tag value from the signature. When the client wants to log, it requests an IS from a backend service with access to the "real" key, which generates a signature for the client and returns it, a la SAS.

This would have many benefits:

  • No publicly available instrumentation keys
  • Each client gets a unique 'key', and all logs that they emit are associated with their key, allowing very easy filtering out of logs from abusive clients post-collection
  • Handing out signatures can also be controlled based on your own custom logic like ip-blacklists etc
  • Custom tags allow you to reliably add metadata to all logs emitted by a specific client, and it would be impossible for them to emit things they shouldn't, like uploading spoofed server logs
  • Tags could go even farther: For example, you could have anon/user/admin tags that you add to the IS according a logged in user's role when they request the signature, which would give you even more metadata facets to filter on
  • Automatically detect when someone is fiddling with stuff. For example, if they upload two log entries with two completely different user agents with the same key, then you know someone was poking around in the frontend code or mitm-ing network traffic and extracted the key.
  • No centralized list of keys or central coordination needed to create them, just like SAS

All Application Insights would need to do is validate the generated signature and add metadata from valid signatures to the log entries.

The only cost to the client is that it has to request the insturmentation key/sig from a backend service first. (I'm doing that today anyway.)

The backend service can be trivially simple: just tag and sign requests with the fixed insturmentation keys. It could be a 4-line function.

@ddobric
Copy link

ddobric commented Mar 5, 2020

@infogulch SAS or similar seems to be an interesting approach, also for other services. However there should be a mechanism, which is responsible to generate the "SAS" and "inject" it in the JS code, semantically similar to managed identity.

@infogulch
Copy link

Yes, you could replace "client make a request to the backend service to get the token" with "server-side rendering of index.html (etc) embeds a unique key in the page that client js finds and uses". This capability can just be an addition in your server side code that uses the same SAS-like mechanism.

@NoelAbrahams
Copy link

NoelAbrahams commented Apr 28, 2020

Just thinking aloud here: What if AI collection requests can be restricted to specific domains?

Eg, Requests using key abc-xyx can only be made from configured domain example.com.

Together with using a separate server-side key, this would also secure the user-agent key.

@rikkiprince
Copy link

@NoelAbrahams Presumably an attacker would just make calls from within the console while breakpointing into code on your website?

What on the Application Insights server-side could be done to ensure a call is made from the configured domain? CORS only works inside a browser, so an attacker could use Postman or cURL. The IP address of the caller is the user's machine, not that of the hosting domain, because it's code running the browser.

@infogulch
Copy link

@rikkiprince that's what would make a SAS-like token the best possible solution. The key point is that each client gets a unique short-duration instrumentation key, that's all you can control, you can't control what happens to the keys once's they are off your servers. However, since each unique key leaves its mark on all of it's data, if a key gets stolen and starts providing bogus data that you can detect, you can completely and accurately filter out that abusive client key without affecting data from other customers.

You can't stop someone from abusing a key, but a SAS token-like instrumentation key authorization system would give you a sharp tool to manage what data made it into reports based on your logs and metrics.

@NoelAbrahams
Copy link

@rikkiprince, I'm just going on the basis of what's already implemented for OAuth authentication. For example Facebook authentication. Calls to their API only seem to work from domains that you configure on their developer console. Same with Google Maps.

I haven't tried hacking into this by setting a breakpoint and altering the payload, but clearly that will limit hacking to nutcase scenarios rather than industrial scale hacking, which is what we're worried about.

The problem I have with the SAS solution is the added maintenance — having to deal with expiry. I may have misunderstood that.

@PrajwalKhante
Copy link

If a website Client-Side Code discloses InstrumentationKey is there a risk?

@MSNev
Copy link
Collaborator

MSNev commented Sep 21, 2020

It's not really a security risk as the iKey itself doesn't provide any permissions of any kind.

The only real "risk" is that if a bad actor grabs and reuses your iKey, which would cause your ingested events (data in Azure Monitor) to contain a mixture of you real user and this "extra" data. Depending on the amount of these events and how this data is constructed would determine what the real level of the risk is for your subscription (this would directly affect your application).

Details: As there is no event to request a per-session key to provide CSRF style equivalent of validation of the iKey (this would require your service to fetch this and provide this to every page (instead of the iKey)), so there is currently no-easy solution beyond some type of whitelisting (based on the referrer or origin headers) of the requests to block the event(s) from getting ingested, but even these could be spoofed by constructing a raw TCP packet by a determined actor.

@pontusdacke
Copy link

It's not really a security risk as the iKey itself doesn't provide any permissions of any kind.

Write permissions are permissions. A malicious user could inject data that the owner is not allowed to store.

@sander1095
Copy link

Hey everyone :)
I fixed this issue in multiple projects a while ago by using a reverse proxy approach. I wrote a blog post about it so you can learn how to improve the instrumentationkey in the clientside!

https://stenbrinke.nl/blog/hide-app-insights-key-from-the-browser/

I hope it helps :)

@MSNev
Copy link
Collaborator

MSNev commented Dec 8, 2023

If you want to "hide" your instrumentation key from any direct usage then yes, the above approach is a possible solution.

However, I want to provide some additional considerations that you need to ensure (I only briefly reviewed the above, so forgive me if I missed them in the doc), that you are handling and not just "deferring" the issue from "direct" ingestion (to the Microsoft endpoint) vs via the Reverse Proxy (which you will now be accumulating additional cost of operation with etc).

Simplistically, when using this approach (to avoid replay style attacks), then you MUST also ensure that you are "validating" every request to your proxy endpoint to ensure that a bad actor is not just "pretending" to be you and which would mean you are still exposed to the same issue (albeit that they don't directly have you instrumentation key / connection string) but this possibility would still exist.

Currently, as a generic SDK with a generic receiving endpoint, these approach's are not directly possible for the potential issues that I highlight below the common approach's

Common approach's

  • Basic CSRF: Ensure that the request to your proxy is coming from a "recently" constructed instance of your validated server
  • Use an application / page "security token" (really just another style of CSRF) that you pass on and validate
  • Use a combination of the above as well as additional "headers" like the "Referrer" or UserAgent. Maybe also encode the client's IP address.

Potential Issues:

The problem with this group / type of approach, is that it's still possible for a bad acter to "pretend" to be you by simply loading your page in a "managed" browser instance (so they can still get access to HTTP Only cookies etc) and then just spit out loads of "events". Depending on the approach either by sending XMLHttpRequests via the managed browser or more simply creating their own HTTP requests (really easy to do in C#), so the creation of "junk" events is still possible (even without your ikey).

Generally, anything coming from a client should inherently be considered untrusted, as even the "standard" headers can be spoofed by simply creating HTTP requests and directly passing anything you like in any header you like.

To solve the above you can take it a step further and require

  • Full authentication the (Microsoft Entra id-based authentication), and then on the server crack that the signed in user's credentials and validate that they are in your "allowed" collection of users.
  • Other "user" authentication, it doesn't "have" to be a signed in Microsoft Entra id-based authentication for example, it could be your own application specific, google, facebook, etc (really any "additional" steps that require the end-user to validate their presence first)

Potential Issue:
This generally will work in almost all cases, the main caveat is that you can now no longer collect telemetry for you "unauthenticated" pages (before the ser is signed in)

So, depending on your specific use-case(s) you may (or may not) want / need to also address these possible issues (generally you would do a combination of them).

Generally, the only way to avoid "junk" data (even with these approaches) is that all received requests must be validated for both "security" (the above) and the content on your proxy.

Now, if your in a closed loop environment (like on a corporate network -- with NO external inbound (Internet) connections to you proxy), using an IP range to fully lock down all requests to your proxy should work.

@sander1095
Copy link

Hi @MSNev ! Thanks for your extensive reply.

One could host this reverse proxy on their BFF or back-end, which would not really increase costs.

It's also quite easy to integrate CSRF and authentication by using .NET's authentication system, or even simply using your existing setup for these things. I do touch upon them in my post but don't implement them there to keep things short. However, I have implemented exactly these things in the projects where I used this reverse proxy approach.

I do agree that there is still a remaining problem. An existing use could still use JavaScript on the front-end site and forge telemetry to sent to the tracking reverse proxy, which is then sent to App Insights. However, this would take a lot more effort and I think this is not really worth the effort for most people.

I'm mostly already happy with hiding my iKey and ensuring that my tracking endpoint can only be called from the front-end. Hiding the instrumentation key suffices for me, personally.

I haven't really thought of/found a way to remove the bogus telemetry crafted by a user vs real telemetry. This can be very tricky and I think not very useful for 99% of applications because it's not a real threat.

Someone on Reddit posted that this problem is solved using OTel and OTel collectors, but I don't have enough experience with those to say something useful ;).

@MSNev
Copy link
Collaborator

MSNev commented Dec 9, 2023

Yeh,

I wasn't trying to discount your work as it's a really good summary of how you go about implementing such a solution. I was just trying to help frame that if anyone wants to consider this approach that they should also make sure to address some of the underlying general issues so they just don't move the problem.

I'm extremely familiar with different Security techniques when it comes to securing requests from web clients, having not only implemented some of them here at Microsoft but also having worked in the general Web validation for decades.

@tjementum
Copy link

Could I suggest using different Instrumentation Keys for different telemetry types? E.g., configure one Instrumentation Key to only allow BrowserTimings and PageViews (used by the Browser SDK), and another to only allow Requests, Traces, Dependencies, CustomEvents, etc. (used by the server). This would enhance security without having two Application Insights resources, and losing the ability to cross-join data for insights (e.g., in the Performance View).

Another request to Microsoft: Enable the regeneration of Instrumentation Keys. If one were to get a lot of malicious data and wanted to make a more sophisticated solution (like a reverse proxy), then it is paramount that the key can be invalidated.

I also read @sander1095's article and have the same concerns as @MSNev; it just moves the problem one step further. Bypassing CSRF/CORS protections is trivial to do from a console app, which you would use anyway if you were to spam someone's data. BUT... one could extend this so only BrowserTimings and PageViews are allowed to be tracked this way. One could even allow only anonymous PageViews and BrowserTimings from pages that allow unauthenticated traffic (like login page, 404 page, error page, etc.).

@MSNev
Copy link
Collaborator

MSNev commented Jan 8, 2024

Could I suggest using different Instrumentation Keys for different telemetry types?

This is a little problematic as the instrumentation key is used to route the telemetry to a specific account, so having multiple ikeys would result in multiple Azure monitor accounts (today -- this would require a lot of work by our server team (not just the SDK team 😢 ), there is another issue for a feature request (which requires server changes) which I'm trying to find the correct location for end-users to be able to request (and up-vote) individual requests. If/when this is identified I'll add a link here.

Side Note: Now if you do want to have multiple Azure Monitor accounts, the SDK actually supports this today. By default when an event is "received" it is tagged with the instrumentation key from the connection string / instrumentationKey configurations. But you can actually specific this yourself (before calling track) or you could use a telemetry initializer to "change" this value. But I don't believe that, that is what you "really" want -- I assume you still really want the single visualization endpoint.

Another request to Microsoft: Enable the regeneration of Instrumentation Keys. If one were to get a lot of malicious data and wanted to make a more sophisticated solution (like a reverse proxy), then it is paramount that the key can be invalidated.

Regenerating the instrumentation key would really only be a short term solution (if implemented by Microsoft) as the new regenerated key would also be prone to the same issue as the original one. You could however, do something like this in conjunction with your own reverse proxy where the "instrumentation key" is really like some sort of session iKey, but again because it's exposed this would really just provide a minor speed bump for any determined bad actor.

@tjementum
Copy link

Thanks for feedback.

Regenerating the instrumentation key would really only be a short term solution (if implemented by Microsoft) as the new regenerated key would also be prone to the same issue as the original one. You could however, do something like this in conjunction with your own reverse proxy where the "instrumentation key" is really like some sort of session iKey, but again because it's exposed this would really just provide a minor speed bump for any determined bad actor.

But that is exactly the point. If you get malicious data, a solution could be to create a reverse proxy endpoint, but then you need to regenerate the instrumentation key.

I'm working on an open-source solution to build enterprise-grade SaaS solutions (using .NET, React, Azure, Bicep, and Application Insights/Open Telemetry). With inspiration from @sander1095 (thanks!), we have created a Reverse Proxy endpoint for Application Insights. It only allows tracking of PageViews, BrowserTimings, Exceptions, and Metrics. So, we can guarantee that Requests, Dependencies, CustomEvents, etc., cannot be spammed. We have yet to implement Authentication in our solution, but the plan is to implement server-side filtering, so tracking requires authentication by default, effectively allowing spamming only a known list of anonymous endpoints (e.g., Login, SignUp, ResetPassword).

Find the solution here: PlatformPlatform tracking. As of writing, this is still work in progress, e.g., we are trying to ensure the location of browser tracking is coming from the client and not the server.

@MSNev
Copy link
Collaborator

MSNev commented Jan 9, 2024

But that is exactly the point. If you get malicious data, a solution could be to create a reverse proxy endpoint, but then you need to regenerate the instrumentation key.

Understand, so if you start without a Reverse Proxy and then want to move behind one -- right now the only option would be to create a new AM resource and stop using the old one (maybe setting up 100% throttling -- so everything gets dropped) and then only use the new one behind the proxy so that you can "hide" the "real" one being used... Rather painful for the point of the flip.

I guess though (if) you start with using the RP then it could always just hide the "real" iKey. As above if/when I get hold of the location to submit general server request (for upvoting etc), I'll provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests