-
Notifications
You must be signed in to change notification settings - Fork 18
Conversation
|
||
public void Start(CancellationToken cancellationToken) | ||
{ | ||
Task.Run(async () => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Make this async, call RefreshAsync
once before returning to ensure cache is primed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
_logger.LogInformation("Refreshed killswitches"); | ||
} | ||
|
||
public bool IsActive(string name) => _active.Contains(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Throw if not started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
public Task<KillswitchTestServer> GetTestServerAsync() => _testServer.Value; | ||
|
||
[Fact] | ||
public async Task ReturnsTrueForActiveKillswitches() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Test malformed responses, that StartAsync
refreshes in background once, that IsActive
throws if not started
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
return; | ||
} | ||
|
||
Console.WriteLine($"Unexpected error code: {ex.ErrorCode}. Exception: {ex.ToString()}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ex.ToString()
-> ex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
<Reference Include="System.Data" /> | ||
<Reference Include="System.Net.Http" /> | ||
<Reference Include="System.Xml" /> | ||
<Reference Include="xunit.abstractions, Version=2.0.0.0, Culture=neutral, PublicKeyToken=8d05b1bb7a6fdb6c, processorArchitecture=MSIL"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get rid of packages.config references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
try | ||
{ | ||
await RefreshAsync(); | ||
await Task.Delay(_config.RefreshInterval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the intended RefreshInterval? How often will IsActive be called?
If RefreshInterval is small and IsActive is not frequent this approach is inefficient. A different approach is to refresh when IsActive is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming IsActive
will be called very frequently, specifically for revalidation job.. this could get out of hand. A decent RefreshInterval
should work fine. It could be tweaked as per the calling job via config. Or may be add another config to determine when to refresh? Either via interval or with IsActive
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each job can have its own refresh cadence. I'm thinking 5 minutes for the revalidate job.
The idea is to make IsActive
as cheap as possible so that it can be called anywhere including critical and hot paths. Moving the network request into IsActive
would limit where we can call IsActive
.
_active = new HashSet<string>(); | ||
} | ||
|
||
public void Start(CancellationToken cancellationToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not thread safe. Start() can be called multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this thread-safe? As far as I can tell, it is. Reference assignment is atomic, and that's the only thing that the background thread interacts with.
Calling Start
multiple times would cause the cache to be refreshed at a faster cadence. I can add a check to Start
to ensure it's called just once, but I personally prefer simpler code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling Start multiple times would cause the cache to be refreshed at a faster cadence. I can add a check to Start to ensure it's called just once, but I personally prefer simpler code.
This is the reason. The code won't work as expected if Start() is called multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check to ensure Start
is called just once.
await RefreshAsync(); | ||
await Task.Delay(_config.RefreshInterval); | ||
} | ||
catch (Exception e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are swallowing the exception and hiding issues. The correct behavior will be to add retries to protect against network issues, but throw on deserialization issues and other unrecoverable exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runs on a background thread, throwing would cause the background thread to end.
That being said, I agree that we should have some "hard fail" behavior when we repeatedly fail to reload the killswitches. I'll make IsActive
throw if the cache's staleness reaches some threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made IsActive
throw if the cache reaches the configured staleness.
cancellationToken); | ||
} | ||
|
||
public async Task RefreshAsync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why public if not part of the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be useful for downstream consumers. Its not on the interface as not all killswitch services that support push model don't need a concept of refresh.
It also enables easy testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to @shishirx34, I will add RefreshAsync
to the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
private HashSet<string> _active; | ||
|
||
public HttpKillswitchService( | ||
HttpClient httpClient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using HttpClient you assume the config blob is public, but it's an internal service config and should be private. Use IStorage interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this but decided against it. With blob storage, we'd need to add storage credentials to all jobs, which makes it harder to on-board a new service to killswitches. I don't think there's a need for killswitches to be private and there's precedent for public blobs: search auxiliary data, cursors, etc.. Really, the fact that we will use blob storage on the backend is an implementation detail.
Lastly, an HTTP implementation can support for a push model through long-polling. This requires a backend that supports long polling.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have a SAS URL to a non-public blob with expiration 100 years in the future, can't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agr Yup, that could work.
/// <summary> | ||
/// A killswitch service that queries an HTTP service periodically for the list of activate killswitches. | ||
/// </summary> | ||
public class HttpKillswitchService : IKillswitchService |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be two roles in the killswitch world: reader and writer. If you intend to separate them into two interfaces you should suffix this interface with Reader. You could also combine the two into a single KillSwitchService, however, since there's going to be only one writer (the gallery) I don't think it's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. What do you think of HttpKillswitchClient
so that it matches our naming in logging and service bus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
I like the idea of having a system wide killswitch service, however this creates feature duplication in the gallery with the "feature flag" service. Both turn abilities on/off. We should discuss this on a team level. |
namespace NuGet.Services.KillSwitch | ||
{ | ||
/// <summary> | ||
/// A killswitch service that queries an HTTP service periodically for the list of activate killswitches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: active killswitches
try | ||
{ | ||
await RefreshAsync(); | ||
await Task.Delay(_config.RefreshInterval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming IsActive
will be called very frequently, specifically for revalidation job.. this could get out of hand. A decent RefreshInterval
should work fine. It could be tweaked as per the calling job via config. Or may be add another config to determine when to refresh? Either via interval or with IsActive
call?
62fe9bc
to
e533094
Compare
e533094
to
4c22eb9
Compare
@skofman1 Sure. Let's talk about it after the next standup? |
We would like to implement this using storage APIs. Once services use the same APIs, we can look into a universal killswitch system |
Killswitches allow you to turn off services' features remotely. This will be used to turn off the revalidation job if it causes pressure on the ingestion pipeline. The API is simple:
Like the
ITelemetryClient
, services are expected to wrap theIKillswitchClient
with higher-level abstractions. For example:Killswitch Endpoint
Killswitches work by polling an endpoint that returns a JSON list of activated killswitches:
This endpoint will be a JSON file hosted on Azure Blob Storage. An admin panel will be created on the Gallery to modify this file.
Design Decisions
Client uses HTTP endpoint instead of Azure Blob Storage
The fact that we use blob storage on the backend is an implementation detail. With blob storage, we'd need to add storage credentials to all services, which would increase the cost to on-board. As for privacy concerns, we could make the killswitch blob private but generate a URI with read-only access for services to consume.
Lastly, an HTTP implementation can support a push model through long-polling. As far as I know, this cannot be done using Azure Blob Storage.
Background Refresh
In this implementation,
IsActive
does not refresh the killswitch cache. Instead, a background task refreshes the cache periodically (every 5 minutes by default). This lets consumers of the killswitch client callIsActive
as frequently as they'd like, including in hot paths.Compared to Gallery's
ContentObjectService
The
HttpKillswitchClient
is very similar to theContentObjectService
:The differences are:
ContentObjectService
gives you high-level objects, allowing you to build complex feature flag logic. TheHttpKillswitchClient
can only determine whether a features should be on or offContentObjectService
requires code changes in both the Gallery as well as the service that owns the featureContentObjectService
uses APIs specific to ASP.NET for background refreshes, and cannot be used by jobs (see this)ContentObjectService
fails silently if the latest settings cannot be fetched/parsedContentObjectService
depends on Azure Blob StoragePart of https://github.com/NuGet/Engineering/issues/1440