Missing AppTraces in Log Analytics Workspace #2129

stevendick-work · 2022-02-18T10:27:46Z

Expected behavior

We have an application that is logging sequential progresson of a work package comprising 18819 items.

For example, we log the following at INFO level:

2022-02-17 07:47:24.257 INFO 1 — [-StreamThread-6] c.s.g.i.b.p.k.t.p.PackageDoneTransformer : {"RUN_ID": 1071282}: Status update (17202/18819) received [{"PACKAGE_ID": "1071282|Close|Close|IFA|20220701|300532312|", "RUN_ID": 1071282, "CREINS_ID": null, "TREATY": null, "REPUN_1": null, "REPUN_2": null, "CoverageId": null, "BusinessDate": null, "CB_TO_DATE": null, "PackageCount": 18819, "RecordCountTotal": 83, "RecordCountProcessed": 0, "RecordCountError": 0, "OutputCountAms": 0, "OutputCountStp": 0, "OutputCountTav": 0, "LastUpdated": null, "Errors": []}]

This was scraped from the pod log output in AKS.

Actual behavior

The above example never makes it to the Log Analytics Workspace.

We do see log statements from before and after.

Item 17201 is logged, but we don't see any items logged again until 17217.

What we've checked

The loss of log messages does not coincide with data cap resets on either the LAW or App insights agent.

Sampling is 100% on the App Insights instance.

We don't see any App Insights network/connection errors reported in the pod log.

I have previously seen that there can be extra latency with ingestion that causes the TimeGenerated to be later than expected, but I don't see any evidence of this.

Goal

What else can we check that might contribute to the missing logging?

Is this approach to data capture with the App Insights agent not recommended as we cannot guarantee 100% log data capture?

System information

Please provide the following information:

SDK Version: 3.2.3
OS type and version: AKS with Debian 11-based containers (Google Distroless)
Using spring-boot? Yes

heyams · 2022-02-18T18:55:16Z

did you check App Insights' traces table? can you share your ikey? i can check our internal stats to see if there is any daily quota exceeded for your App Insights resource. LA workspace and App Insights have different quota limits.

trask · 2022-02-18T20:26:29Z

hi @stevendick-swissre, can you try with the latest (3.2.6)?

3.2.4 started logging when telemetry overflowed an internal queue and got dropped

3.2.5 fixed an issue that was discovered by this new logging (#2062)

(no change in 3.2.6 that I think would impact this, but worth going to it if already bumping)

stevendick-work · 2022-02-21T08:40:41Z

As we had logging from immediately before and after the missing entries, I am assuming no data cap was breached.

I have confirmed that the LAW did not have a data cap breach by checking the _logoperation() function.

How can I check if a data cap on the App Insights instance is exceeded?

I don't want to share the ikey in a public forum or via email.

I have asked the developer to upgrade to v3.2.6.

stevendick-work · 2022-02-21T15:09:31Z

We updated to 3.2.6 but reported the same problem.

We did see some new output from the App Insights agent:

2022-02-21 12:17:21.606Z WARN c.m.a.a.i.t.BatchSpanProcessor - In the last 5 minutes, the following operation has failed 295785 times (out of 571670): Queuing span:

Max general export queue capacity of 2048 has been hit, dropping a telemetry record (max general export queue capacity can be increased in the applicationinsights.json configuration file, e.g. { "preview": { "generalExportQueueCapacity": 4096 } } (295785 times)

This means that the agent failed to publish to App Insights? This suggests a network issue or problem on App Insights?

trask · 2022-02-21T20:41:55Z

this means that the single export thread in the JVM can't keep up with the volume of telemetry data being produced. up to 2048 telemetry records are buffered (by default), but once that limit is reached, if it still can't keep up it starts dropping telemetry records.

from the warning message, it looks like you are sending 571670 telemetry records over a 5 minute window, which is ~2000 telemetry records per second.

and it looks like roughly half of those records are being dropped.

can you check the distribution of records that are being ingested? it may help to know the distribution across requests, dependencies, traces, customEvents, and anything else, at least to confirm that this volume is expected.

customMetrics should have their own (larger) queue now in 3.2.6, but your warning message points to the "general" queue, which is the queue for all other telemetry records.

another factor is how long the ingestion service takes to respond, since the single export thread is throttled on waiting for the response. if you can email me your instrumentation key, I can check our internal data to see if this is likely an issue or not, though I suspect at a sustained rate of ~2000 telemetry records per second that we may just need to bump the number of export threads

stevendick-work · 2022-02-22T07:45:19Z

Given we're using the agent, is the option to configure the number of export threads exposed?

How do we check the distribution of records being ingested?

trask · 2022-02-23T04:50:33Z

here's a query to check the distribution of records, if you can run it over a timeframe that represents one of the heavy 5 minute windows that would be ideal:

union *
| where timestamp >= ago(1d)
| summarize count(), ingestedGB=sum(_BilledSize) / 1.E9 by itemType

if you can email me your instrumentation key ([email protected]), I can check what your ingestion service response times are, which will give us another clue about the best solution here

stevendick-work · 2022-02-23T08:29:58Z

app-insights-ingested.csv
This was generated 2022-02-23 09:00 CET. We have testing cycles running at the moment, so I will have a look to see if volumes are different over the previous days.

stevendick-work · 2022-02-23T08:36:55Z

Here's a chart from the last 7 days:

trask · 2022-02-24T07:19:11Z

thanks @stevendick-swissre

can you try with this SNAPSHOT build? https://github.com/microsoft/ApplicationInsights-Java/suites/5427505526/artifacts/171896362

it still only uses a single export thread, but that thread should no longer wait for the response from the ingestion service before proceeding to export the next batch

stevendick-work · 2022-02-24T10:10:58Z

Hi This isn't so easy for us to test as our build doesn't support 'local' JARs and it's not the usual URL to download the official App Insights Agent releases. This is different than the official 3.2.7 release from yesterday? From: Trask Stalnaker ***@***.***> Sent: Thursday, 24 February 2022 08:19 To: microsoft/ApplicationInsights-Java ***@***.***> Cc: Steven Dick (external) ***@***.***>; Mention ***@***.***> Subject: Re: [microsoft/ApplicationInsights-Java] Missing AppTraces in Log Analytics Workspace (Issue #2129) thanks @stevendick-swissre<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fstevendick-swissre&data=04%7C01%7CSteven_Dick%40swissre.com%7C9df21303cbdd4c18557908d9f765fbcc%7C45597f606e374be7acfb4c9e23b261ea%7C1%7C0%7C637812839677601465%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=0ryDylBeOC5sRpB4LAa3RXwsNTarIrbj7RAcuvw2T7M%3D&reserved=0> can you try with this SNAPSHOT build? https://github.com/microsoft/ApplicationInsights-Java/suites/5427505526/artifacts/171896362<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2FApplicationInsights-Java%2Fsuites%2F5427505526%2Fartifacts%2F171896362&data=04%7C01%7CSteven_Dick%40swissre.com%7C9df21303cbdd4c18557908d9f765fbcc%7C45597f606e374be7acfb4c9e23b261ea%7C1%7C0%7C637812839677757705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=a9hVJD8wDSS2z3CTDvOfBeoshB8bzds7JVkoiVS%2FimE%3D&reserved=0> it still only uses a single export thread, but that thread should no longer wait for the response from the ingestion service before proceeding to export the next batch - Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2FApplicationInsights-Java%2Fissues%2F2129%23issuecomment-1049563034&data=04%7C01%7CSteven_Dick%40swissre.com%7C9df21303cbdd4c18557908d9f765fbcc%7C45597f606e374be7acfb4c9e23b261ea%7C1%7C0%7C637812839677757705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uTgvy1RCZndXvrYkW6s%2BD2im588lzZMhLQ9fbTMaGts%3D&reserved=0>, or unsubscribe<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAT6EPK2AZJVXAHIGOBFLCQ3U4XLXVANCNFSM5OXOUDRQ&data=04%7C01%7CSteven_Dick%40swissre.com%7C9df21303cbdd4c18557908d9f765fbcc%7C45597f606e374be7acfb4c9e23b261ea%7C1%7C0%7C637812839677757705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=%2BwqRQho2aEOcfNiN83YZMCUN96%2FctJyCkXpZBRnlIks%3D&reserved=0>. Triage notifications on the go with GitHub Mobile for iOS<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7CSteven_Dick%40swissre.com%7C9df21303cbdd4c18557908d9f765fbcc%7C45597f606e374be7acfb4c9e23b261ea%7C1%7C0%7C637812839677757705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=d8COePp0748nwhfe5UXcTGuFhz0gPd4JUDkog%2FYYzSQ%3D&reserved=0> or Android<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7CSteven_Dick%40swissre.com%7C9df21303cbdd4c18557908d9f765fbcc%7C45597f606e374be7acfb4c9e23b261ea%7C1%7C0%7C637812839677757705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QvVs1RV1BT%2BzeHjNZY0NNftj2pXlDmd6C9cFCS4viNg%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>> This e-mail, including attachments, is intended for the person(s) or company(s) named and may contain confidential and/or legally privileged information. Unauthorized disclosure, copying or use of this information may be unlawful and is prohibited. If you are not the intended recipient, please delete this message and notify the sender. All incoming and outgoing e-mail messages are stored in the Swiss Re Electronic Message Repository. If you do not wish the retention of potentially private e-mails by Swiss Re, we strongly advise you not to use the Swiss Re e-mail account for any private, non-business related communications.

trask · 2022-03-13T22:47:04Z

hey @stevendick-swissre, I think #2181 is a really good improvement that we can include in an upcoming release.

In a simple local logging test, I can get ~30x(!) more throughput without dropping any telemetry (still using the single thread, just not blocking on responses from the ingestion service).

would you be able to test this out if we release 3.2.9-BETA with this one change?

stevendick-work · 2022-03-15T08:14:59Z

I think it likely we can test this, but I'm waiting on feedback from the developer. The potential issue is the impacted component may not need to load data at volume again due to where we are in the testing cycle.

asproll · 2022-03-21T13:53:56Z

Hi @trask, I can test this if you can point me to the 3.2.9-BETA release.

stevendick-work · 2022-03-22T08:23:13Z

asproll is the developer in question on our project who discovered the issue.

trask · 2022-03-22T17:54:04Z

great, we will plan to release 3.2.9-BETA tomorrow

trask · 2022-03-25T02:58:55Z

@asproll @stevendick-swissre 3.2.9 is released: https://github.com/microsoft/ApplicationInsights-Java/releases

asproll · 2022-03-25T09:03:45Z

Hi @trask , thanks a lot. We will test this as soon as we can and let you know.

asproll · 2022-03-25T15:01:36Z

@trask, I was able to successfully test the release. It solved our problem with the missing log statements. Now all the expected traces were logged. Thanks a lot.

One new thing I noticed, we keep getting the following message in the log, not sure what it means and if its indicating some real problem:

WARN c.m.a.a.i.t.BatchSpanProcessor - In the last 5 minutes, the following operation has failed 1 times (out of 13): Add async export: * Max number of concurrent exports 1 has been hit, may see some export throttling due to this (1 times)

trask · 2022-03-25T16:31:52Z

@asproll awesome, thanks for reporting that warning, you can safely ignore that, we will fix it #2208

ghost · 2022-04-01T20:00:43Z

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

stevendick-work · 2022-04-04T06:50:18Z

Closing.

heyams added Needs: Attention 👋 Needs: Author Feedback labels Feb 18, 2022

trask removed the Needs: Attention 👋 label Feb 19, 2022

ghost added Needs: Attention 👋 and removed Needs: Author Feedback labels Feb 21, 2022

trask added Needs: Author Feedback and removed Needs: Attention 👋 labels Feb 21, 2022

ghost added Needs: Attention 👋 and removed Needs: Author Feedback labels Feb 22, 2022

trask added Needs: Author Feedback and removed Needs: Attention 👋 labels Feb 23, 2022

ghost added Needs: Attention 👋 and removed Needs: Author Feedback labels Feb 23, 2022

trask added Needs: Author Feedback and removed Needs: Attention 👋 labels Mar 13, 2022

ghost added Needs: Attention 👋 and removed Needs: Author Feedback labels Mar 15, 2022

trask added Needs: Author Feedback and removed Needs: Attention 👋 labels Mar 25, 2022

trask mentioned this issue Mar 25, 2022

Fix statsbeat warning #2208

Merged

ghost added the Status: No Recent Activity label Apr 1, 2022

stevendick-work closed this as completed Apr 4, 2022

ghost removed the Status: No Recent Activity label Apr 4, 2022

trask mentioned this issue Apr 12, 2022

Thread of applicaton insights agent runs in an endless loop, missing telemetry data #2218

Closed

ghost locked as resolved and limited conversation to collaborators May 4, 2022

trask removed the Needs: Author Feedback label Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing AppTraces in Log Analytics Workspace #2129

Missing AppTraces in Log Analytics Workspace #2129

stevendick-work commented Feb 18, 2022

heyams commented Feb 18, 2022

trask commented Feb 18, 2022

stevendick-work commented Feb 21, 2022

stevendick-work commented Feb 21, 2022

trask commented Feb 21, 2022

stevendick-work commented Feb 22, 2022

trask commented Feb 23, 2022

stevendick-work commented Feb 23, 2022

stevendick-work commented Feb 23, 2022

trask commented Feb 24, 2022

stevendick-work commented Feb 24, 2022 via email

trask commented Mar 13, 2022

stevendick-work commented Mar 15, 2022

asproll commented Mar 21, 2022

stevendick-work commented Mar 22, 2022

trask commented Mar 22, 2022

trask commented Mar 25, 2022

asproll commented Mar 25, 2022

asproll commented Mar 25, 2022

trask commented Mar 25, 2022

ghost commented Apr 1, 2022

stevendick-work commented Apr 4, 2022

Missing AppTraces in Log Analytics Workspace #2129

Missing AppTraces in Log Analytics Workspace #2129

Comments

stevendick-work commented Feb 18, 2022

Expected behavior

Actual behavior

What we've checked

Goal

System information

heyams commented Feb 18, 2022

trask commented Feb 18, 2022

stevendick-work commented Feb 21, 2022

stevendick-work commented Feb 21, 2022

trask commented Feb 21, 2022

stevendick-work commented Feb 22, 2022

trask commented Feb 23, 2022

stevendick-work commented Feb 23, 2022

stevendick-work commented Feb 23, 2022

trask commented Feb 24, 2022

stevendick-work commented Feb 24, 2022 via email

trask commented Mar 13, 2022

stevendick-work commented Mar 15, 2022

asproll commented Mar 21, 2022

stevendick-work commented Mar 22, 2022

trask commented Mar 22, 2022

trask commented Mar 25, 2022

asproll commented Mar 25, 2022

asproll commented Mar 25, 2022

trask commented Mar 25, 2022

ghost commented Apr 1, 2022

stevendick-work commented Apr 4, 2022