-
Notifications
You must be signed in to change notification settings - Fork 21
Conversation
private const int _defaultEventEndDelayMinutes = 10; | ||
private const int _defaultEventVisibilityPeriod = 10; | ||
|
||
private static void AddConfiguration(IServiceCollection serviceCollection, IDictionary<string, string> jobArgsDictionary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddConfiguration [](start = 28, length = 16)
Any chance we could use json files like in the validation jobs? There's quite a lot of arguments here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with how the JSON configuration works, but I can check it out.
src/StatusAggregator/Program.cs
Outdated
public static void Main(string[] args) | ||
{ | ||
var job = new Job(); | ||
JobRunner.Run(job, args).Wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait [](start = 37, length = 4)
Wait()
should be GetAwaiter().GetResult()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this for our other jobs as well. Filed NuGet/NuGetGallery#6224
? "" | ||
: $" and CreateDate gt datetime'{cursor.ToString("o")}'"; | ||
|
||
return $"$filter=OwningTeamId eq '{_incidentApiTeamId}'{cursorPart}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{cursorPart} [](start = 67, length = 12)
This might read a little bit better if you flip this around. if cursorPart != DateTime.MinValue, append to the query string the CreateDate bit. Feel free to ignore :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private string GetRecentIncidentsQuery(DateTime cursor)
{
var query = $"$filter = OwningTeamId eq '{_incidentApiTeamId}'";
if (cursor != DateTime.MinValue)
{
query += $" and CreateDate gt datetime'{cursor.ToString("o")}'";
}
return query;
}
Looks much better, good catch!
{ | ||
parsedIncident = null; | ||
var title = incident.Title; | ||
var match = Regex.Match(title, _regExPattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match [](start = 34, length = 5)
I forget all the requirements around Regex, but, I know there's a timeout one. I would double check what's necessary to run regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at other uses of Regex in our code, it appears the requirement is that we add a timeout. Will add a timeout of 5 seconds.
{ | ||
public static class ServiceProviderExtensions | ||
{ | ||
public static T GetService<T>(this IServiceProvider provider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetService [](start = 24, length = 10)
This already exists: GetRequiredService<T>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed and replaced the uses of this new extension
src/StatusAggregator/Job.cs
Outdated
public override Task Run() | ||
{ | ||
return _serviceProvider | ||
.GetService<StatusAggregator>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetService [](start = 17, length = 10)
You can use GetRequiredService<StatusAggregator>
here.
src/StatusAggregator/Cursor.cs
Outdated
ITableWrapper table, | ||
ILogger<Cursor> logger) | ||
{ | ||
_table = table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table [](start = 21, length = 5)
I would add null checks to all constructors. From my experience, it makes catching null reference exceptions easier :)
src/StatusAggregator/Cursor.cs
Outdated
using StatusAggregator.Table; | ||
|
||
|
||
namespace StatusAggregator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StatusAggregator [](start = 10, length = 16)
There's two empty lines above this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
src/StatusAggregator/Cursor.cs
Outdated
{ | ||
value = DateTime.MinValue; | ||
_logger.LogInformation("Could not fetch cursor."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not super clear from this log that the cursor will be set to the beginning of time. Maybe log the value here? Feel free to ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add that to the log
var shouldDeactivate = !incidentsLinkedToEventQuery | ||
.Where(i => i.IsActive || i.MitigationTime > cursor - _eventEndDelay) | ||
.ToList() | ||
.Any(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ToList()
forces all incidents to be inspected. This LINQ query will do less work if you remove the ToList()
and do just Any()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IQueryable
returned by _table.CreateQuery
doesn't support Any
, so I have to do ToList
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using (_logger.Scope("Creating message for start of event.")) | ||
{ | ||
if (cursor > eventEntity.StartTime + _eventStartMessageDelay) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be easier to read if you invert the "error" flow:
if (!condition1)
{
_logger.LogInformation("Reason 1 for not doing anything");
return;
}
if (!condition2)
{
_logger.LogInformation("Reason 2 for not doing anything");
return;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
|
||
var componentNames = path.Split(Constants.ComponentPathDivider); | ||
var name = string.Join(" ", componentNames.Skip(1).Reverse()); | ||
var boldedPart = $"<b>{name} {boldedPartInnerString} {_componentStatusNames[eventEntity.AffectedComponentStatus].ToLowerInvariant()}.</b>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_componentStatusNames[eventEntity.AffectedComponentStatus].ToLowerInvariant() [](start = 66, length = 77)
Why not eventEntity.AffectedComponentStatus.ToString().ToLowerInvariant()
?
The current code will break if we ever modify ComponentStatus
to have values that aren't sequential (ex: Up = 0
, Down = 1
, Degraded = 3
).
} | ||
} | ||
|
||
private bool TryGetContentsForEventStartForEvent(EventEntity eventEntity, out string contents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TryGetContentsForEventStartForEvent [](start = 21, length = 35)
This code is hard to understand. Can we simplify it?
Some ideas:
- Get rid of variables like
_youMayEncounterIssues
and just place the text directly into the message. It's fine to have duplication. - Instead of passing parameters like
boldedPartInnerString
andsuffix
, pass a message template. Something like<b>{ComponentName} is {ComponentStatus}</b>. {EventMessage}.
and<b>{ComponentName} is no longer {ComponentStatus}</b>. {EventMessage}. Thank you for your patience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
But this makes the package publishing degraded message less informative, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review the comments and decide if what makes sense to update.
@@ -126,6 +127,13 @@ public static bool TryGetBoolArgument(IDictionary<string, string> jobArgsDiction | |||
return null; | |||
} | |||
|
|||
public static T TryGetEnumArgument<T>(IDictionary<string, string> jobArgsDictionary, string argName, T defaultValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it initially, must have removed it. Great catch!
|
||
namespace NuGet.Jobs.Extensions | ||
{ | ||
public static class LoggerExtensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments. Why the current login is not enough? Why this method should be used over other methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BeginScope
on its own doesn't log a message when entering and leaving scope. This method does.
Will add comments.
|
||
namespace StatusAggregator | ||
{ | ||
public static class ComponentFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments. What is a Component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NuGet/ServerCommon#170 for more context. I agree that this class should have more comments however.
using (_logger.Scope("Fetching cursor.")) | ||
{ | ||
var cursor = await _table.Retrieve<CursorEntity>( | ||
CursorEntity.DefaultPartitionKey, CursorEntity.DefaultRowKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to get a null cursor back? Should it throw instead of setting to minvalue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's no cursor, it likely means the table is empty and this is the first time the job has been run. So it should start at the beginning of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment to make this behavior clearer.
using (_logger.Scope("Updating cursor to {Cursor}.", value)) | ||
{ | ||
var cursorEntity = new CursorEntity(value); | ||
return _table.InsertOrReplaceAsync(cursorEntity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this table thread safe? Does it need to be thread safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments to this either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's Azure table storage, which is thread safe.
|
||
var parsedIncidents = incidents | ||
.SelectMany(i => _aggregateIncidentParser.ParseIncident(i)) | ||
.ToList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not include orderby here and not do the ToList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
} | ||
} | ||
|
||
private string GetRecentIncidentsQuery(DateTime cursor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is private method. Can it be tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that this can be tested as a part of tests for FetchNewIncidents
because all this method does is create an argument for an _incidentApiClient
call that can be mocked.
|
||
private static string[] _componentStatusNames = Enum.GetNames(typeof(ComponentStatus)); | ||
private bool TryGetContentsForEventHelper( | ||
EventEntity eventEntity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method testable? Will it be better to have it public for easier testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there would be any benefit unless we abstracted this out to a different class. We still need to test that the other methods that call this method do what this method does.
string regExPattern, | ||
ILogger<IncidentParser> logger) | ||
{ | ||
_regExPattern = regExPattern; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
{ | ||
public class ValidationDurationIncidentParser : EnvironmentPrefixIncidentParser | ||
{ | ||
private const string SubtitleRegEx = "Too many packages are stuck in the \"Validating\" state!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we change the incident title ? Do we have any way to monitor when that will happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any change there we will need to rebuild and redeploy. Would it make sense to put it in a config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We shouldn't be changing our incident titles. They should be descriptive enough on their own. If they aren't descriptive enough, we should change them to be more descriptive so that there isn't any desire to change them. Additionally, the job only needs to parse the title on ingestion, so if the name is changed after the job reads the incident (which should be within the first minute or two after it's created) it's fine.
- Jobs deployments are generally pretty quick, so if the incident titles are changed (which should be infrequently), I don't think it would be a big deal to redeploy it.
public bool TryParseIncident(Incident incident, out ParsedIncident parsedIncident) | ||
{ | ||
using (_logger.Scope("Parsing incident with parser {IncidentParserType} using {RegExPattern}", | ||
GetType(), _regExPattern)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this log line would be a better place to log the incident's title than line 71. Something like: Attempting to use parser {IncidentParserType} using {RegExPattern} to parse title {IncidentTitle}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. done
NuGet/NuGetGallery#6107
This job aggregates incidents and exports them in a blob that can be consumed by other services.
It maintains the state of every incident and groups incidents that affect the same component at the same time and reports them as an event.
Incidents are parsed through their titles. We discard incidents that we can't parse, and maintain a cursor that described what incidents have been processed.
The job will create up to two messages for each event. If all incidents attached to an event are mitigated within a short period of time, the job will never create any messages for an event. Otherwise, the job will create a message for the start of an event and the end of an event.
See https://nuget-dev-0-status.azurewebsites.net/ to see an example of how the data exported by this job will be used.