Add User Agent Parser for Azure Sign In Logs #23201

nicpenning · 2020-12-17T18:03:39Z

This will be a nice addition for parsing the user agent in the Azure sign in logs. This would allow for some great detections on unusual user agents for sign in activity.

What does this PR do?

Enriches the Azure Sign In logs to parse the user agent.

Why is it important?

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

elasticmachine · 2020-12-17T18:08:00Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Started by user Andrew Kroh
Start Time: 2021-03-17T18:51:02.370+0000
Duration: 50 min 30 sec
Commit: f1db44d

Test stats 🧪

Test	Results
Failed	0
Passed	13160
Skipped	2243
Total	15403

Trends 🧪

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	0
Passed	13160
Skipped	2243
Total	15403

elasticmachine · 2020-12-18T07:03:28Z

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

marc-gr · 2020-12-18T10:23:35Z

jenkins run tests

marc-gr · 2020-12-18T12:20:26Z

Thanks for your contribution @nicpenning ! Please add a new entry in CHANGELOG.next.asciidoc under the Added section for filebeat indicating your change.

nicpenning · 2020-12-18T13:28:13Z

Okay! I will do my best to do that. I haven't done that before :)

marc-gr · 2020-12-18T15:29:51Z

Okay! I will do my best to do that. I haven't done that before :)

Thanks! let us know if you need any assistance with it

nicpenning · 2020-12-18T19:01:23Z

Is adding an entry to the CHANGELOG.next.asciidoc a Pull Request or is there something simpler to do? I ask because I was just going to do a PR on the master branch for that file but it has the same PR questions.

nicpenning · 2020-12-18T19:28:21Z

I followed suit on how others did PRs for adding items to the change log. :) Let me know how I must proceed. Thanks!

andresrc · 2020-12-18T21:45:48Z

Hi @nicpenning please just add the same entry you added in #23218 in this same PR.

Thanks!

nicpenning · 2020-12-18T23:19:06Z

Is that better?

andrewkroh · 2021-01-13T03:22:11Z

run tests

andrewkroh

Do you have a log sample that includes the userAgent field? It would be good to have that added to x-pack/filebeat/module/azure/signinlogs/test/signinlogs.log so we can test this change.

nicpenning · 2021-01-14T01:24:57Z

Andrew, yes, I can get this added in tomorrow.

nicpenning · 2021-01-14T23:09:54Z

@andrewkroh At what level do you need the log? Is this after the Filebeat has processed it and it gets passed off to Elastic? Or does it need to be the raw JSON filebeat brings in before the module manipulates it?

andrewkroh · 2021-01-14T23:41:08Z

The tests look like they are using a raw sample from azure. Here's the format of the existing ones

$ head -1 x-pack/filebeat/module/azure/signinlogs/test/signinlogs.log | jq .
{
  "Level": 4,
  "callerIpAddress": "81.171.241.231",
  "category": "SignInLogs",
  "correlationId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
  "durationMs": 0,
  "identity": "Test LTest",
  "location": "FR",
  "operationName": "Sign-in activity",
  "operationVersion": "1.0",
  "properties": {
    "appDisplayName": "Office 365",
    "appId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
    "clientAppUsed": "Browser",
    "conditionalAccessStatus": "notApplied",
    "correlationId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
    "createdDateTime": "2019-10-18T04:45:48.0729893-05:00",
    "deviceDetail": {
      "browser": "Chrome 77.0.3865",
      "deviceId": "",
      "operatingSystem": "MacOs"
    },
    "id": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
    "ipAddress": "81.171.241.231",
    "isInteractive": false,
    "location": {
      "city": "Champs-Sur-Marne",
      "countryOrRegion": "FR",
      "geoCoordinates": {
        "latitude": 48.12341234,
        "longitude": 2.12341234
      },
      "state": "Seine-Et-Marne"
    },
    "originalRequestId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
    "processingTimeInMilliseconds": 239,
    "riskDetail": "none",
    "riskLevelAggregated": "none",
    "riskLevelDuringSignIn": "none",
    "riskState": "none",
    "servicePrincipalId": "",
    "status": {
      "errorCode": 50140,
      "failureReason": "This error occurred due to 'Keep me signed in' interrupt when the user was signing-in."
    },
    "tokenIssuerName": "",
    "tokenIssuerType": "AzureAD",
    "userDisplayName": "Test LTest",
    "userId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
    "userPrincipalName": "[email protected]"
  },
  "resourceId": "/tenants/8a4de8b5-095c-47d0-a96f-a75130c61d53/providers/Microsoft.aadiam",
  "resultDescription": "This error occurred due to 'Keep me signed in' interrupt when the user was signing-in.",
  "resultSignature": "None",
  "resultType": "50140",
  "tenantId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
  "time": "2019-10-18T09:45:48.0729893Z"
}

nicpenning · 2021-01-15T01:41:20Z

@andrewkroh I went ahead and add an example log. Hopefully that will be sufficient for test. I had to desensitize it and I did the best I could to leave the data intact.

andrewkroh · 2021-01-24T23:09:50Z

@nicpenning So I was trying out the log sample in order to update the "golden file" that contains the expected output, and I think this is not a "SignInLogs" event because it's missing the category field. The pipeline for azure/signinlogs validates that the category field is SignInLogs since it's specified in this schema: https://docs.microsoft.com/en-us/azure/active-directory/reports-monitoring/reference-azure-monitor-sign-ins-log-schema. When I ran the tests your event got dropped because of this check.

Look at that schema I see the device type is already parsed. And I didn't see a userAgent field. Perhaps this a log that would be processed by one of the other filesets in the Azure module?

andrewkroh · 2021-01-26T00:08:36Z

One way to get a raw sample could be to modify the pipeline to populate the event.original field. Then you can grab the raw string value from event.original and it will be in the format that used for tests.

--- a/x-pack/filebeat/module/azure/signinlogs/ingest/pipeline.yml
+++ b/x-pack/filebeat/module/azure/signinlogs/ingest/pipeline.yml
@@ -3,6 +3,9 @@ processors:
 - set:
     field: event.ingested
     value: '{{_ingest.timestamp}}'
+- set:
+    field: event.original
+    copy_from: message
 - rename:
     field: azure
     target_field: azure-eventhub

Maybe Azure has a way to inspect the data in an Event Hub from the web UI. If so I think the data looks like this example. The Filebeat input splits the records array into individual events.

@narph How would you recommend grabbing samples to use for testing with the log input?

nicpenning · 2021-01-26T01:50:26Z

I will give that a try. That seems to be a great way to obtain the raw log that filebeat received.

narph · 2021-01-26T11:05:33Z

@narph How would you recommend grabbing samples to use for testing with the log input?

I test with the azure-eventhub input only and log the info to file or use debug level logging instead.
Also, can we add this type of information in the current signinlogs dashboards ? It looks like might be useful to users.

nicpenning · 2021-01-26T13:43:53Z

@narph I agree. We did that immediately for a security use case. Any user signing in without a browser or on non standard operating systems is suspect. Such as signing in with Python Requests on a Linux OS. Makes for some great detection rules as well :D

nicpenning · 2021-01-26T14:58:34Z

@andrewkroh I added the redacted updated example log.

The event.original pipeline suggestion worked. I used Logstash to enrich before it made it to Elastic so I didn't need to mess with the Elastic Ingest Pipelines.

It appears the category exists there.

Please let me know if this will work for testing.

andrewkroh · 2021-01-26T18:06:32Z

run tests

andrewstucki · 2021-01-26T18:11:52Z

x-pack/filebeat/module/azure/signinlogs/_meta/fields.yml

+          type: keyword
+          description: >
+            Set of CA policies that apply to this sign-in, each as CA: policy name, and/or MFA: Per-user.
+        - name: applied_conditional_access_policies


Should this be changed to a nested field with explicit subfields? As of now, the query behavior isn't going to allow you to correlate multiple subfields against a single object in the array. See note in https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html

Arrays of objects do not work as you would expect: you cannot query each object independently of the other objects in the array. If you need to be able to do this then you should use the nested data type instead of the object data type.

I changed it to nested. One thing to note is that authentication_processing_details and authentication_details are not new fields for the module. They were missing from fields.yml, but were part of the the ingest node pipeline. Any idea if the switch to nested will have other consequences such as conflicts with old data? I can't think of any reason why it would

So, as far as queries go, you'll have to query them with a nested query (see the note under https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html#nested-accessing-documents) -- I'm not sure how that would change doing a query across say a wildcard of filebeat-* indices that included both a nested and dynamically mapped set of fields from an older filebeat. If these were fields that were dynamically mapped previously, we'd probably need to make sure that we didn't break anything.

andrewstucki · 2021-01-26T18:13:04Z

x-pack/filebeat/module/azure/signinlogs/_meta/fields.yml

+          type: keyword
+          description: >
+            The resource tenantId for B2B(business-to-business) scenarios.
+        - name: authentication_details


same as above, should this be nested?

Done.

(Note: this is not a new field, just was missing from fields.yml)

andrewstucki · 2021-01-26T18:15:22Z

x-pack/filebeat/module/azure/signinlogs/_meta/fields.yml

+          type: object
+          description: >
+            A record of each step of authentication undertaken in the sign-in.
+        - name: authentication_processing_details


Also similar to above. However, it looks like each entry just has two fields, key, and value, which are both strings. Wondering if maybe then we can use flattened, which is like nested, but without the total number of subdocument restrictions. See https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html

Thoughts @andrewkroh ? I believe we do use flattened in a couple of integrations already?

Done.

(Note: this is not a new field, just was missing from fields.yml)

Unlike the nested comment above, I don't think that the query characteristics here will change, you can still use terms, match, etc. -- but I'm not super familiar with how switching types works when wildcarding across indices with different types (same idea with the filebeat-* query above) -- we may want to test it out.

andrewkroh

@andrewstucki I made some updates per your feedback.

andrewkroh · 2021-01-26T21:44:10Z

x-pack/filebeat/module/azure/signinlogs/_meta/fields.yml

+          type: keyword
+          description: >
+            Set of CA policies that apply to this sign-in, each as CA: policy name, and/or MFA: Per-user.
+        - name: applied_conditional_access_policies


I changed it to nested. One thing to note is that authentication_processing_details and authentication_details are not new fields for the module. They were missing from fields.yml, but were part of the the ingest node pipeline. Any idea if the switch to nested will have other consequences such as conflicts with old data? I can't think of any reason why it would

andrewkroh · 2021-01-26T21:44:57Z

x-pack/filebeat/module/azure/signinlogs/_meta/fields.yml

+          type: keyword
+          description: >
+            The resource tenantId for B2B(business-to-business) scenarios.
+        - name: authentication_details


Done.

(Note: this is not a new field, just was missing from fields.yml)

andrewkroh · 2021-01-26T21:45:24Z

x-pack/filebeat/module/azure/signinlogs/_meta/fields.yml

+          type: object
+          description: >
+            A record of each step of authentication undertaken in the sign-in.
+        - name: authentication_processing_details


Done.

(Note: this is not a new field, just was missing from fields.yml)

andrewkroh · 2021-01-26T21:46:04Z

run tests

This will be a nice addition for parsing the user agent in the Azure sign in logs. This would allow for some great detections on unusual user agents for sign in activity.

Update example log using event.original from filebeat initial message.

The new log sample exposed fields that were missing from the mapping. It also exposed some new fields listed at https://docs.microsoft.com/en-us/azure/azure-monitor/reference/tables/signinlogs that were not yet converted to snake_case. So I added rename processors to convert them to snake_case and added descriptions in fields.yml. Since user_agent is part of ECS I renamed the Azure userAgent field to user_agent.original.

andrewkroh · 2021-03-17T17:38:30Z

run tests

nicpenning · 2021-05-25T15:15:27Z

Woohoo, this made it to 7.13! 🎉

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 17, 2020

andresrc added the Team:Security-External Integrations label Dec 18, 2020

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 18, 2020

nicpenning closed this Dec 18, 2020

nicpenning reopened this Dec 18, 2020

andresrc mentioned this pull request Dec 18, 2020

Add User Agent Parser for Azure Sign In Logs #23218

Closed

andrewkroh added enhancement Filebeat Filebeat labels Jan 13, 2021

andrewkroh reviewed Jan 13, 2021

View reviewed changes

andrewstucki reviewed Jan 26, 2021

View reviewed changes

andrewkroh reviewed Jan 26, 2021

View reviewed changes

nicpenning and others added 8 commits March 17, 2021 13:22

Add User Agent Parser for Azure Sign In Logs

b720310

This will be a nice addition for parsing the user agent in the Azure sign in logs. This would allow for some great detections on unusual user agents for sign in activity.

Update CHANGELOG.next.asciidoc

62573b5

Add example log with UA

4a7b5a8

Update signinlogs.log

7736cec

Update example log using event.original from filebeat initial message.

Use better mappings

b2c1750

Update field docs

e7fda7c

Update generated

f1db44d

andrewkroh requested a review from andrewstucki March 17, 2021 20:11

andrewstucki approved these changes Mar 18, 2021

View reviewed changes

andrewkroh merged commit a2e8969 into elastic:master Mar 18, 2021

andrewkroh added the needs_backport PR is waiting to be backported to other branches. label Mar 18, 2021

nicpenning deleted the patch-1 branch May 25, 2021 15:15

Add User Agent Parser for Azure Sign In Logs #23201

Add User Agent Parser for Azure Sign In Logs #23201

Conversation

nicpenning commented Dec 17, 2020 • edited by andrewkroh Loading

What does this PR do?

Why is it important?

Checklist

elasticmachine commented Dec 17, 2020 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

💚 Flaky test report

Test stats 🧪

elasticmachine commented Dec 18, 2020

marc-gr commented Dec 18, 2020

marc-gr commented Dec 18, 2020

nicpenning commented Dec 18, 2020

marc-gr commented Dec 18, 2020

nicpenning commented Dec 18, 2020

nicpenning commented Dec 18, 2020

andresrc commented Dec 18, 2020

nicpenning commented Dec 18, 2020

andrewkroh commented Jan 13, 2021

andrewkroh left a comment

Choose a reason for hiding this comment

nicpenning commented Jan 14, 2021

nicpenning commented Jan 14, 2021

andrewkroh commented Jan 14, 2021

nicpenning commented Jan 15, 2021

andrewkroh commented Jan 24, 2021

andrewkroh commented Jan 26, 2021

nicpenning commented Jan 26, 2021

narph commented Jan 26, 2021

nicpenning commented Jan 26, 2021

nicpenning commented Jan 26, 2021

andrewkroh commented Jan 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewkroh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewkroh commented Jan 26, 2021

andrewkroh commented Mar 17, 2021

nicpenning commented May 25, 2021

nicpenning commented Dec 17, 2020 •

edited by andrewkroh

Loading

elasticmachine commented Dec 17, 2020 •

edited by jenkins-beats-ci bot

Loading