-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add User Agent Parser for Azure Sign In Logs #23201
Conversation
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪💚 Flaky test reportTests succeeded. Expand to view the summary
Test stats 🧪
|
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
jenkins run tests |
Thanks for your contribution @nicpenning ! Please add a new entry in CHANGELOG.next.asciidoc under the Added section for filebeat indicating your change. |
Okay! I will do my best to do that. I haven't done that before :) |
Thanks! let us know if you need any assistance with it |
Is adding an entry to the CHANGELOG.next.asciidoc a Pull Request or is there something simpler to do? I ask because I was just going to do a PR on the master branch for that file but it has the same PR questions. |
I followed suit on how others did PRs for adding items to the change log. :) Let me know how I must proceed. Thanks! |
Hi @nicpenning please just add the same entry you added in #23218 in this same PR. Thanks! |
Is that better? |
run tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a log sample that includes the userAgent
field? It would be good to have that added to x-pack/filebeat/module/azure/signinlogs/test/signinlogs.log so we can test this change.
Andrew, yes, I can get this added in tomorrow. |
@andrewkroh At what level do you need the log? Is this after the Filebeat has processed it and it gets passed off to Elastic? Or does it need to be the raw JSON filebeat brings in before the module manipulates it? |
The tests look like they are using a raw sample from azure. Here's the format of the existing ones $ head -1 x-pack/filebeat/module/azure/signinlogs/test/signinlogs.log | jq .
{
"Level": 4,
"callerIpAddress": "81.171.241.231",
"category": "SignInLogs",
"correlationId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"durationMs": 0,
"identity": "Test LTest",
"location": "FR",
"operationName": "Sign-in activity",
"operationVersion": "1.0",
"properties": {
"appDisplayName": "Office 365",
"appId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"clientAppUsed": "Browser",
"conditionalAccessStatus": "notApplied",
"correlationId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"createdDateTime": "2019-10-18T04:45:48.0729893-05:00",
"deviceDetail": {
"browser": "Chrome 77.0.3865",
"deviceId": "",
"operatingSystem": "MacOs"
},
"id": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"ipAddress": "81.171.241.231",
"isInteractive": false,
"location": {
"city": "Champs-Sur-Marne",
"countryOrRegion": "FR",
"geoCoordinates": {
"latitude": 48.12341234,
"longitude": 2.12341234
},
"state": "Seine-Et-Marne"
},
"originalRequestId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"processingTimeInMilliseconds": 239,
"riskDetail": "none",
"riskLevelAggregated": "none",
"riskLevelDuringSignIn": "none",
"riskState": "none",
"servicePrincipalId": "",
"status": {
"errorCode": 50140,
"failureReason": "This error occurred due to 'Keep me signed in' interrupt when the user was signing-in."
},
"tokenIssuerName": "",
"tokenIssuerType": "AzureAD",
"userDisplayName": "Test LTest",
"userId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"userPrincipalName": "[email protected]"
},
"resourceId": "/tenants/8a4de8b5-095c-47d0-a96f-a75130c61d53/providers/Microsoft.aadiam",
"resultDescription": "This error occurred due to 'Keep me signed in' interrupt when the user was signing-in.",
"resultSignature": "None",
"resultType": "50140",
"tenantId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"time": "2019-10-18T09:45:48.0729893Z"
} |
@andrewkroh I went ahead and add an example log. Hopefully that will be sufficient for test. I had to desensitize it and I did the best I could to leave the data intact. |
@nicpenning So I was trying out the log sample in order to update the "golden file" that contains the expected output, and I think this is not a "SignInLogs" event because it's missing the Look at that schema I see the device type is already parsed. And I didn't see a userAgent field. Perhaps this a log that would be processed by one of the other filesets in the Azure module? |
One way to get a raw sample could be to modify the pipeline to populate the --- a/x-pack/filebeat/module/azure/signinlogs/ingest/pipeline.yml
+++ b/x-pack/filebeat/module/azure/signinlogs/ingest/pipeline.yml
@@ -3,6 +3,9 @@ processors:
- set:
field: event.ingested
value: '{{_ingest.timestamp}}'
+- set:
+ field: event.original
+ copy_from: message
- rename:
field: azure
target_field: azure-eventhub Maybe Azure has a way to inspect the data in an Event Hub from the web UI. If so I think the data looks like this example. The Filebeat input splits the @narph How would you recommend grabbing samples to use for testing with the log input? |
I will give that a try. That seems to be a great way to obtain the raw log that filebeat received. |
I test with the |
@narph I agree. We did that immediately for a security use case. Any user signing in without a browser or on non standard operating systems is suspect. Such as signing in with Python Requests on a Linux OS. Makes for some great detection rules as well :D |
@andrewkroh I added the redacted updated example log. The event.original pipeline suggestion worked. I used Logstash to enrich before it made it to Elastic so I didn't need to mess with the Elastic Ingest Pipelines. It appears the category exists there. Please let me know if this will work for testing. |
run tests |
type: keyword | ||
description: > | ||
Set of CA policies that apply to this sign-in, each as CA: policy name, and/or MFA: Per-user. | ||
- name: applied_conditional_access_policies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be changed to a nested
field with explicit subfields? As of now, the query behavior isn't going to allow you to correlate multiple subfields against a single object in the array. See note in https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html
Arrays of objects do not work as you would expect: you cannot query each object independently of the other objects in the array. If you need to be able to do this then you should use the nested data type instead of the object data type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to nested. One thing to note is that authentication_processing_details
and authentication_details
are not new fields for the module. They were missing from fields.yml, but were part of the the ingest node pipeline. Any idea if the switch to nested will have other consequences such as conflicts with old data? I can't think of any reason why it would
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, as far as queries go, you'll have to query them with a nested
query (see the note under https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html#nested-accessing-documents) -- I'm not sure how that would change doing a query across say a wildcard of filebeat-*
indices that included both a nested
and dynamically mapped set of fields from an older filebeat
. If these were fields that were dynamically mapped previously, we'd probably need to make sure that we didn't break anything.
type: keyword | ||
description: > | ||
The resource tenantId for B2B(business-to-business) scenarios. | ||
- name: authentication_details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, should this be nested
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
(Note: this is not a new field, just was missing from fields.yml)
type: object | ||
description: > | ||
A record of each step of authentication undertaken in the sign-in. | ||
- name: authentication_processing_details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also similar to above. However, it looks like each entry just has two fields, key
, and value
, which are both strings. Wondering if maybe then we can use flattened
, which is like nested
, but without the total number of subdocument restrictions. See https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html
Thoughts @andrewkroh ? I believe we do use flattened
in a couple of integrations already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
(Note: this is not a new field, just was missing from fields.yml)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike the nested
comment above, I don't think that the query characteristics here will change, you can still use terms
, match
, etc. -- but I'm not super familiar with how switching types works when wildcarding across indices with different types (same idea with the filebeat-*
query above) -- we may want to test it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewstucki I made some updates per your feedback.
type: keyword | ||
description: > | ||
Set of CA policies that apply to this sign-in, each as CA: policy name, and/or MFA: Per-user. | ||
- name: applied_conditional_access_policies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to nested. One thing to note is that authentication_processing_details
and authentication_details
are not new fields for the module. They were missing from fields.yml, but were part of the the ingest node pipeline. Any idea if the switch to nested will have other consequences such as conflicts with old data? I can't think of any reason why it would
type: keyword | ||
description: > | ||
The resource tenantId for B2B(business-to-business) scenarios. | ||
- name: authentication_details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
(Note: this is not a new field, just was missing from fields.yml)
type: object | ||
description: > | ||
A record of each step of authentication undertaken in the sign-in. | ||
- name: authentication_processing_details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
(Note: this is not a new field, just was missing from fields.yml)
run tests |
This will be a nice addition for parsing the user agent in the Azure sign in logs. This would allow for some great detections on unusual user agents for sign in activity.
Update example log using event.original from filebeat initial message.
The new log sample exposed fields that were missing from the mapping. It also exposed some new fields listed at https://docs.microsoft.com/en-us/azure/azure-monitor/reference/tables/signinlogs that were not yet converted to snake_case. So I added rename processors to convert them to snake_case and added descriptions in fields.yml. Since user_agent is part of ECS I renamed the Azure userAgent field to user_agent.original.
run tests |
Woohoo, this made it to 7.13! 🎉 |
This will be a nice addition for parsing the user agent in the Azure sign in logs. This would allow for some great detections on unusual user agents for sign in activity.
What does this PR do?
Enriches the Azure Sign In logs to parse the user agent.
Why is it important?
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.