Cloud Logs Exposure (CLX) Tool Setup

Introduction

This page describes the usage of Cloud Exposure Tool for Unity Cloud logs and metrics. It covers the following topics in details below -

Set up the Azure resources for ingesting the logs and metrics
Set up the KargoToolCollector to fetch the logs and metrics and upload to the blob storage
Query the logs and metrics from Azure Data Explorer
Sample Queries

Provision Azure Services for diagnostics pipeline

Overview

Diagnostics Pipeline depends on the Azure Data Factory to read the logs from Azure Blob Storage and ingest into Azure Data Explorer to make it query-able throgh KQL Queries and be able to view using Grafana dashboard.

High level architecture of the CLX tool

Azure Resouce Deployment

Please click on the link below to setup Diagnostics pipeline for Logs and Metrics for Unity Cloud components. This may take upto 25-30 minutes. Please note that you need to be the owner of the Azure subscription where these resources are being created.

Instructions

Please follow the steps to setup and verify the pipeline deployment (click to expand)

Deploy Azure resource using custom arm template deployment

Verify the deployment status

Verify the deployed resources in resource group

Get the Azure Data Explorer Url

Verify the Azure Blob Storage containers for the Logs and Metrics

Run the Kargo Logs and Metrics collector tool on the jumpbox

Overview

The Kargo periodic collector tool is a simple python script which:

Periodically collects the kargo logs from the Kargo tool based on the configurable parameters.
Uploads to the Azure Blob Container for the ingestion. The tool currently supports Windows and Linux environments.

Known limitations

Kargo output file is present in the pod also. It needs to be manually deleted or else the K8 node may go for reboot because of the disk size overflow.
Currently only collection with “logging” and "prometheus" is supported. Provide invalid kibana dashboard in “kargo-log-collection-config.json” so only fluentd logs are collected.
More number of parallel writes to kusto cluster can cause out of memory (OOM) error.

Pre-requisites

Requirement for Linux and Windows environment:
Python 3.7 + ( set alias python=python3 , if required )
pip install schedule
pip install requests
az cli
kubectl 1.18+
Azure Blob Storage account (azure resource created in the steps above) connection-string .

Instructions

Please follow the steps mentioned below to run the script:

Set Path: /{PathToKargoCollector}/KargoCollector
Kubeconfig file needs to be fetched from K8 master (/etc/kubernetes/)) and copy it to jump server. This file path needs to be set as argument while running the script. Example : /etc/kubernetes/ group-maint.conf
Set elastic search endpoint details under kargo-log-endpoint-config.json

 {
        "loggingConfig": {
                "elasticPassword": "<your-password>",
                "elasticURL": "<your-elastic-url>",
                "elasticUserName": "<your-username>"
        }
}

Set prometheus endpoint under kargo-prometheus-endpoint-config.json

 {
    "prometheusConfig": {
        "URL" : "http://<Valid-PrometheusIP>:<Valid-PrometheusPort>"
    }
}

Set storage account name , logsBlobContainerName , metricsBlobContainerName and ConnectionString ( as per requirement ) under storage-account-info.json

{
   "Storage": {
               "AccountName": "<Place-Your-storageAccountName-Here>",
               "logsBlobContainerName": "<Place-Your-logsBlobContainerName-Here>",
               "metricsBlobContainerName": "<Place-Your-metricsBlobContainerName-Here>",
               "ConnectionString": "<Place-Your-StorageAccount-ConnectionString-Here>"
       }
}

Execution command: python KargoCollector.py -k -c [-m ] [-o ] [-s ] [-i ]

Note:

k is the path to kubeconfig file ( mandatory )
c is to specify the type of collcection . Currently supported are "prometheus" or "logging". (mandatory)
m is the duration in minutes. default 15 minutes
o is the folder where the tar.gz be pulled locally from the kargo server. default is "data\logging" or "data\prometheus" folder on the same path.
s currently its optional , we only support "azblob" as remote storage
i is for the identity of the machine, by default its managed identity. We can set is to "connectionstring".

Examples:

A. To execute script to fetch logs for every 2 minutes from a system having managed identity

 collector> python KargoCollector.py -k group-maint.conf -c logging -m 2

B. To execute script to fetch logs for every 2 minutes from a system having connection string to storage blob

 collector> python KargoCollector.py -k group-maint.conf -c logging -m 2 -i connectionstring

C. To execute script to fetch prometheus metrics for every 2 minutes from a system having managed identity.

 collector> python KargoCollector.py -k group-maint.conf -c prometheus -m 2

D. To execute script to fetch prometheus metrics for every 2 minutes from a system having connection string to storage blob

 collector> python KargoCollector.py -k group-maint.conf -c prometheus -m 2 -i connectionstring

Note: by default, the tar.gz file is in the “data\logging” directory for logs collection and “data\prometheus” for prometheus metrics collection.

Query the Logs and Metrics

This seconds describes few sample queries. You can construct your own queries based on your necessity. Please refer KQL quick reference for syntax.

Verify the pipeline is running and ingesting the data

View the files in blob containers

Verify the pipeline runs

Query the logs

Get the Azure Data Explorer link from deployment

Query the logs

Sample KQL Queries

Query the logs by SUPI id

5GDebugLogs
| where _source_supi == 'imsi-3104102570xyz'

Query the logs by time

5GDebugLogs
| where _source_time < datetime('2021-07-05T16:16:26.9529494Z')

Query the logs by time range

5GDebugLogs
| where _source_time > ago(5h) and _source_time  < ago(2h)

Query the logs by pattern match in log

5GDebugLogs
| where _source.log contains 'WARNING  Duplicate session exists for new create'

Query the logs by k8s namespace

5GDebugLogs
| where  _source_kubernetes_namespace_name == 'fed-smf'

Find the event time an query the logs upto next 20 seconds

let ErrorTime = toscalar(5GDebugLogs
| where _source.log contains 'watchFileEvents'
| summarize min(_source_time));
5GDebugLogs
| where _source_time >= ErrorTime and _source_time < (ErrorTime + 20s)

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
ARMDeploymentTemplate		ARMDeploymentTemplate
KargoCollector		KargoCollector
hotfixes		hotfixes
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloud Logs Exposure (CLX) Tool Setup

Introduction

Provision Azure Services for diagnostics pipeline

Overview

Azure Resouce Deployment

Instructions

Run the Kargo Logs and Metrics collector tool on the jumpbox

Overview

Known limitations

Pre-requisites

Instructions

Note:

Examples:

Query the Logs and Metrics

Verify the pipeline is running and ingesting the data

Query the logs

Sample KQL Queries

About

Releases

Packages

Languages

ramkriz84/UnityCloudDiagnosticsSetup

Folders and files

Latest commit

History

Repository files navigation

Cloud Logs Exposure (CLX) Tool Setup

Introduction

Provision Azure Services for diagnostics pipeline

Overview

Azure Resouce Deployment

Instructions

Run the Kargo Logs and Metrics collector tool on the jumpbox

Overview

Known limitations

Pre-requisites

Instructions

Note:

Examples:

Query the Logs and Metrics

Verify the pipeline is running and ingesting the data

Query the logs

Sample KQL Queries

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages