Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add User Agent Parser for Azure Sign In Logs #23201

Merged
merged 8 commits into from
Mar 18, 2021
Merged

Add User Agent Parser for Azure Sign In Logs #23201

merged 8 commits into from
Mar 18, 2021

Conversation

nicpenning
Copy link
Contributor

@nicpenning nicpenning commented Dec 17, 2020

This will be a nice addition for parsing the user agent in the Azure sign in logs. This would allow for some great detections on unusual user agents for sign in activity.

What does this PR do?

Enriches the Azure Sign In logs to parse the user agent.

Why is it important?

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 17, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Dec 17, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Started by user Andrew Kroh

  • Start Time: 2021-03-17T18:51:02.370+0000

  • Duration: 50 min 30 sec

  • Commit: f1db44d

Test stats 🧪

Test Results
Failed 0
Passed 13160
Skipped 2243
Total 15403

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 13160
Skipped 2243
Total 15403

@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 18, 2020
@marc-gr
Copy link
Contributor

marc-gr commented Dec 18, 2020

jenkins run tests

@marc-gr
Copy link
Contributor

marc-gr commented Dec 18, 2020

Thanks for your contribution @nicpenning ! Please add a new entry in CHANGELOG.next.asciidoc under the Added section for filebeat indicating your change.

@nicpenning nicpenning closed this Dec 18, 2020
@nicpenning nicpenning reopened this Dec 18, 2020
@nicpenning
Copy link
Contributor Author

Okay! I will do my best to do that. I haven't done that before :)

@marc-gr
Copy link
Contributor

marc-gr commented Dec 18, 2020

Okay! I will do my best to do that. I haven't done that before :)

Thanks! let us know if you need any assistance with it

@nicpenning
Copy link
Contributor Author

Is adding an entry to the CHANGELOG.next.asciidoc a Pull Request or is there something simpler to do? I ask because I was just going to do a PR on the master branch for that file but it has the same PR questions.

@nicpenning
Copy link
Contributor Author

I followed suit on how others did PRs for adding items to the change log. :) Let me know how I must proceed. Thanks!

@andresrc
Copy link
Contributor

Hi @nicpenning please just add the same entry you added in #23218 in this same PR.

Thanks!

@nicpenning
Copy link
Contributor Author

Is that better?

@andrewkroh
Copy link
Member

run tests

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a log sample that includes the userAgent field? It would be good to have that added to x-pack/filebeat/module/azure/signinlogs/test/signinlogs.log so we can test this change.

@nicpenning
Copy link
Contributor Author

Andrew, yes, I can get this added in tomorrow.

@nicpenning
Copy link
Contributor Author

@andrewkroh At what level do you need the log? Is this after the Filebeat has processed it and it gets passed off to Elastic? Or does it need to be the raw JSON filebeat brings in before the module manipulates it?

@andrewkroh
Copy link
Member

The tests look like they are using a raw sample from azure. Here's the format of the existing ones

$ head -1 x-pack/filebeat/module/azure/signinlogs/test/signinlogs.log | jq .
{
  "Level": 4,
  "callerIpAddress": "81.171.241.231",
  "category": "SignInLogs",
  "correlationId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
  "durationMs": 0,
  "identity": "Test LTest",
  "location": "FR",
  "operationName": "Sign-in activity",
  "operationVersion": "1.0",
  "properties": {
    "appDisplayName": "Office 365",
    "appId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
    "clientAppUsed": "Browser",
    "conditionalAccessStatus": "notApplied",
    "correlationId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
    "createdDateTime": "2019-10-18T04:45:48.0729893-05:00",
    "deviceDetail": {
      "browser": "Chrome 77.0.3865",
      "deviceId": "",
      "operatingSystem": "MacOs"
    },
    "id": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
    "ipAddress": "81.171.241.231",
    "isInteractive": false,
    "location": {
      "city": "Champs-Sur-Marne",
      "countryOrRegion": "FR",
      "geoCoordinates": {
        "latitude": 48.12341234,
        "longitude": 2.12341234
      },
      "state": "Seine-Et-Marne"
    },
    "originalRequestId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
    "processingTimeInMilliseconds": 239,
    "riskDetail": "none",
    "riskLevelAggregated": "none",
    "riskLevelDuringSignIn": "none",
    "riskState": "none",
    "servicePrincipalId": "",
    "status": {
      "errorCode": 50140,
      "failureReason": "This error occurred due to 'Keep me signed in' interrupt when the user was signing-in."
    },
    "tokenIssuerName": "",
    "tokenIssuerType": "AzureAD",
    "userDisplayName": "Test LTest",
    "userId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
    "userPrincipalName": "[email protected]"
  },
  "resourceId": "/tenants/8a4de8b5-095c-47d0-a96f-a75130c61d53/providers/Microsoft.aadiam",
  "resultDescription": "This error occurred due to 'Keep me signed in' interrupt when the user was signing-in.",
  "resultSignature": "None",
  "resultType": "50140",
  "tenantId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
  "time": "2019-10-18T09:45:48.0729893Z"
}

@nicpenning
Copy link
Contributor Author

@andrewkroh I went ahead and add an example log. Hopefully that will be sufficient for test. I had to desensitize it and I did the best I could to leave the data intact.

@andrewkroh
Copy link
Member

@nicpenning So I was trying out the log sample in order to update the "golden file" that contains the expected output, and I think this is not a "SignInLogs" event because it's missing the category field. The pipeline for azure/signinlogs validates that the category field is SignInLogs since it's specified in this schema: https://docs.microsoft.com/en-us/azure/active-directory/reports-monitoring/reference-azure-monitor-sign-ins-log-schema. When I ran the tests your event got dropped because of this check.

Look at that schema I see the device type is already parsed. And I didn't see a userAgent field. Perhaps this a log that would be processed by one of the other filesets in the Azure module?

@andrewkroh
Copy link
Member

One way to get a raw sample could be to modify the pipeline to populate the event.original field. Then you can grab the raw string value from event.original and it will be in the format that used for tests.

--- a/x-pack/filebeat/module/azure/signinlogs/ingest/pipeline.yml
+++ b/x-pack/filebeat/module/azure/signinlogs/ingest/pipeline.yml
@@ -3,6 +3,9 @@ processors:
 - set:
     field: event.ingested
     value: '{{_ingest.timestamp}}'
+- set:
+    field: event.original
+    copy_from: message
 - rename:
     field: azure
     target_field: azure-eventhub

Maybe Azure has a way to inspect the data in an Event Hub from the web UI. If so I think the data looks like this example. The Filebeat input splits the records array into individual events.

@narph How would you recommend grabbing samples to use for testing with the log input?

@nicpenning
Copy link
Contributor Author

I will give that a try. That seems to be a great way to obtain the raw log that filebeat received.

@narph
Copy link
Contributor

narph commented Jan 26, 2021

@narph How would you recommend grabbing samples to use for testing with the log input?

I test with the azure-eventhub input only and log the info to file or use debug level logging instead.
Also, can we add this type of information in the current signinlogs dashboards ? It looks like might be useful to users.

@nicpenning
Copy link
Contributor Author

@narph I agree. We did that immediately for a security use case. Any user signing in without a browser or on non standard operating systems is suspect. Such as signing in with Python Requests on a Linux OS. Makes for some great detection rules as well :D
image

@nicpenning
Copy link
Contributor Author

@andrewkroh I added the redacted updated example log.

The event.original pipeline suggestion worked. I used Logstash to enrich before it made it to Elastic so I didn't need to mess with the Elastic Ingest Pipelines.

It appears the category exists there.

Please let me know if this will work for testing.

@andrewkroh
Copy link
Member

run tests

type: keyword
description: >
Set of CA policies that apply to this sign-in, each as CA: policy name, and/or MFA: Per-user.
- name: applied_conditional_access_policies

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be changed to a nested field with explicit subfields? As of now, the query behavior isn't going to allow you to correlate multiple subfields against a single object in the array. See note in https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html

Arrays of objects do not work as you would expect: you cannot query each object independently of the other objects in the array. If you need to be able to do this then you should use the nested data type instead of the object data type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to nested. One thing to note is that authentication_processing_details and authentication_details are not new fields for the module. They were missing from fields.yml, but were part of the the ingest node pipeline. Any idea if the switch to nested will have other consequences such as conflicts with old data? I can't think of any reason why it would

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as far as queries go, you'll have to query them with a nested query (see the note under https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html#nested-accessing-documents) -- I'm not sure how that would change doing a query across say a wildcard of filebeat-* indices that included both a nested and dynamically mapped set of fields from an older filebeat. If these were fields that were dynamically mapped previously, we'd probably need to make sure that we didn't break anything.

type: keyword
description: >
The resource tenantId for B2B(business-to-business) scenarios.
- name: authentication_details

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, should this be nested?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(Note: this is not a new field, just was missing from fields.yml)

type: object
description: >
A record of each step of authentication undertaken in the sign-in.
- name: authentication_processing_details

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also similar to above. However, it looks like each entry just has two fields, key, and value, which are both strings. Wondering if maybe then we can use flattened, which is like nested, but without the total number of subdocument restrictions. See https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html

Thoughts @andrewkroh ? I believe we do use flattened in a couple of integrations already?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(Note: this is not a new field, just was missing from fields.yml)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the nested comment above, I don't think that the query characteristics here will change, you can still use terms, match, etc. -- but I'm not super familiar with how switching types works when wildcarding across indices with different types (same idea with the filebeat-* query above) -- we may want to test it out.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewstucki I made some updates per your feedback.

type: keyword
description: >
Set of CA policies that apply to this sign-in, each as CA: policy name, and/or MFA: Per-user.
- name: applied_conditional_access_policies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to nested. One thing to note is that authentication_processing_details and authentication_details are not new fields for the module. They were missing from fields.yml, but were part of the the ingest node pipeline. Any idea if the switch to nested will have other consequences such as conflicts with old data? I can't think of any reason why it would

type: keyword
description: >
The resource tenantId for B2B(business-to-business) scenarios.
- name: authentication_details
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(Note: this is not a new field, just was missing from fields.yml)

type: object
description: >
A record of each step of authentication undertaken in the sign-in.
- name: authentication_processing_details
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(Note: this is not a new field, just was missing from fields.yml)

@andrewkroh
Copy link
Member

run tests

nicpenning and others added 8 commits March 17, 2021 13:22
This will be a nice addition for parsing the user agent in the Azure sign in logs. This would allow for some great detections on unusual user agents for sign in activity.
Update example log using event.original from filebeat initial message.
The new log sample exposed fields that were missing from the mapping. It also
exposed some new fields listed at https://docs.microsoft.com/en-us/azure/azure-monitor/reference/tables/signinlogs
that were not yet converted to snake_case. So I added rename processors to convert them
to snake_case and added descriptions in fields.yml.

Since user_agent is part of ECS I renamed the Azure userAgent field to user_agent.original.
@andrewkroh
Copy link
Member

run tests

@andrewkroh andrewkroh requested a review from andrewstucki March 17, 2021 20:11
@andrewkroh andrewkroh merged commit a2e8969 into elastic:master Mar 18, 2021
@andrewkroh andrewkroh added the needs_backport PR is waiting to be backported to other branches. label Mar 18, 2021
@nicpenning
Copy link
Contributor Author

Woohoo, this made it to 7.13! 🎉

@nicpenning nicpenning deleted the patch-1 branch May 25, 2021 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Filebeat Filebeat needs_backport PR is waiting to be backported to other branches.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants