Skip to content

Commit

Permalink
Merge #135912
Browse files Browse the repository at this point in the history
135912: pkg/cli: debug-zip log upload with datadog logs API r=arjunmahishi a=arjunmahishi

Recently datadog has started to support ingesting of logs that are upto
72 hours old. This commit updates the current log upload flow to use
the Logs API for logs that are within the 72 hour range. The following
are the changes made in this commit:

* In a given debug zip, the log events that occurred within the last 72
  (from the time of upload), will directly be uploaded to datadog using
  the Logs API.

* Log events older than 72 hours will continue to follow the
  Rehydration flow. I.e they will be uploaded to a GCS bucket which will
  then be declared as an archive on datadog. The Rehydration will
  manually need to be triggered for this (just like before).

* By design, no logs will be left behind in the debug zip. They will make
  their way to datadog using one of the above methods.

* There is no change in the way we read the log files. This just
  introduces a new pool of writers that are responsible for writing to
  datadog using the Logs API. The main thread distributes the log upload
  batches among the two writer pools based on the timestamps of the
  logs.

This commit only makes the functional changes required for using the
Logs API. There will also be some TUI changes required to make the UX
better. Will create a separate PR for those changes.

---

![image](https://github.com/user-attachments/assets/a5d2caa7-764f-4161-abae-ab1b9fb21b95)

Epic: CC-28996
Part of: CC-30567
Release note: None

Co-authored-by: Arjun Mahishi <[email protected]>
  • Loading branch information
craig[bot] and arjunmahishi committed Dec 5, 2024
2 parents a23be6b + 50be3b3 commit 844d763
Show file tree
Hide file tree
Showing 5 changed files with 498 additions and 122 deletions.
87 changes: 73 additions & 14 deletions pkg/cli/testdata/upload/logs
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,13 @@ upload-logs
}
}
----
ABC/123/dt=20240716/hour=17/1/cockroach.hostname.username.2024-07-16T17_51_43Z.048498.log:
Upload ID: 123
Create DD Archive: https://api.us5.datadoghq.com/api/v2/logs/config/archives
Create DD Archive: {"data":{"type":"archives","attributes":{"name":"abc-20241114000000","query":"-*","destination":{"type":"gcs","path":"ABC/abc-20241114000000","bucket":"debugzip-archives","integration":{"project_id":"arjun-sandbox-424904","client_email":"[email protected]"}}}}}
GCS Upload: ABC/abc-20241114000000/dt=20240716/hour=17/1/cockroach.hostname.username.2024-07-16T17_51_43Z.048498.log:
Upload ID: abc-20241114000000
debug zip upload debugDir --dd-api-key=dd-api-key --dd-app-key=dd-app-key --cluster=ABC --include=logs
{"data":{"type":"archives","attributes":{"name":"123","query":"-*","destination":{"type":"gcs","path":"ABC/123","bucket":"debugzip-archives","integration":{"project_id":"arjun-sandbox-424904","client_email":"[email protected]"}}}}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"failed to start query profiler worker: failed to detect cgroup memory limit: failed to read memory cgroup from cgroups file: /proc/self/cgroup: open /proc/self/cgroup: no such file or directory","tags":["cluster:ABC","env:debug","node_id:1","service:CRDB-SH","source:cockroachdb","upload_id:123"],"_id":"a1b2c3","attributes":{"goroutine":100,"file":"server/env_sampler.go","line":125,"counter":33,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"WARNING"}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"initialized store s1","tags":["cluster:ABC","env:debug","node_id:1","service:CRDB-SH","source:cockroachdb","upload_id:123"],"_id":"a1b2c3","attributes":{"goroutine":100,"file":"server/node.go","line":533,"counter":24,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"INFO"}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"failed to start query profiler worker: failed to detect cgroup memory limit: failed to read memory cgroup from cgroups file: /proc/self/cgroup: open /proc/self/cgroup: no such file or directory","tags":["cluster:ABC","env:debug","node_id:1","service:CRDB-SH","source:cockroachdb","upload_id:abc-20241114000000"],"_id":"a1b2c3","attributes":{"goroutine":100,"file":"server/env_sampler.go","line":125,"counter":33,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"WARNING"}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"initialized store s1","tags":["cluster:ABC","env:debug","node_id:1","service:CRDB-SH","source:cockroachdb","upload_id:abc-20241114000000"],"_id":"a1b2c3","attributes":{"goroutine":100,"file":"server/node.go","line":533,"counter":24,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"INFO"}}


# single-node with wrong log format
Expand All @@ -43,7 +44,7 @@ upload-logs log-format=crdb-v2
}
----
Failed to upload logs: decoding on line 2: malformed log entry
Upload ID: 123
Upload ID: abc-20241114000000
debug zip upload debugDir --dd-api-key=dd-api-key --dd-app-key=dd-app-key --cluster=ABC --include=logs --log-format=crdb-v2


Expand Down Expand Up @@ -76,12 +77,70 @@ upload-logs log-format=crdb-v1
}
}
----
ABC/123/dt=20240716/hour=17/1/cockroach.node1.username.2024-07-16T17_51_43Z.048498.log:
ABC/123/dt=20240716/hour=17/2/cockroach.node2.username.2024-07-16T17_51_43Z.048498.log:
Upload ID: 123
Create DD Archive: https://api.us5.datadoghq.com/api/v2/logs/config/archives
Create DD Archive: {"data":{"type":"archives","attributes":{"name":"abc-20241114000000","query":"-*","destination":{"type":"gcs","path":"ABC/abc-20241114000000","bucket":"debugzip-archives","integration":{"project_id":"arjun-sandbox-424904","client_email":"[email protected]"}}}}}
GCS Upload: ABC/abc-20241114000000/dt=20240716/hour=17/1/cockroach.node1.username.2024-07-16T17_51_43Z.048498.log:
GCS Upload: ABC/abc-20241114000000/dt=20240716/hour=17/2/cockroach.node2.username.2024-07-16T17_51_43Z.048498.log:
Upload ID: abc-20241114000000
debug zip upload debugDir --dd-api-key=dd-api-key --dd-app-key=dd-app-key --cluster=ABC --include=logs --log-format=crdb-v1
{"data":{"type":"archives","attributes":{"name":"123","query":"-*","destination":{"type":"gcs","path":"ABC/123","bucket":"debugzip-archives","integration":{"project_id":"arjun-sandbox-424904","client_email":"[email protected]"}}}}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"created new SQL liveness session 01018071445fbd54a44ee88e906efb311d7193","tags":["cluster:ABC","env:debug","node_id:2","service:CRDB-SH","source:cockroachdb","upload_id:123"],"_id":"a1b2c3","attributes":{"goroutine":916,"file":"sql/sqlliveness/slinstance/slinstance.go","line":258,"counter":44,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"INFO"}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"failed to start query profiler worker: failed to detect cgroup memory limit: failed to read memory cgroup from cgroups file: /proc/self/cgroup: open /proc/self/cgroup: no such file or directory","tags":["cluster:ABC","env:debug","node_id:1","service:CRDB-SH","source:cockroachdb","upload_id:123"],"_id":"a1b2c3","attributes":{"goroutine":100,"file":"server/env_sampler.go","line":125,"counter":33,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"WARNING"}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"initialized store s1","tags":["cluster:ABC","env:debug","node_id:1","service:CRDB-SH","source:cockroachdb","upload_id:123"],"_id":"a1b2c3","attributes":{"goroutine":100,"file":"server/node.go","line":533,"counter":24,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"INFO"}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"inserted sqlliveness session 01018071445fbd54a44ee88e906efb311d7193","tags":["cluster:ABC","env:debug","node_id:2","service:CRDB-SH","source:cockroachdb","upload_id:123"],"_id":"a1b2c3","attributes":{"goroutine":916,"file":"sql/sqlliveness/slstorage/slstorage.go","line":540,"counter":43,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"INFO"}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"created new SQL liveness session 01018071445fbd54a44ee88e906efb311d7193","tags":["cluster:ABC","env:debug","node_id:2","service:CRDB-SH","source:cockroachdb","upload_id:abc-20241114000000"],"_id":"a1b2c3","attributes":{"goroutine":916,"file":"sql/sqlliveness/slinstance/slinstance.go","line":258,"counter":44,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"INFO"}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"failed to start query profiler worker: failed to detect cgroup memory limit: failed to read memory cgroup from cgroups file: /proc/self/cgroup: open /proc/self/cgroup: no such file or directory","tags":["cluster:ABC","env:debug","node_id:1","service:CRDB-SH","source:cockroachdb","upload_id:abc-20241114000000"],"_id":"a1b2c3","attributes":{"goroutine":100,"file":"server/env_sampler.go","line":125,"counter":33,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"WARNING"}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"initialized store s1","tags":["cluster:ABC","env:debug","node_id:1","service:CRDB-SH","source:cockroachdb","upload_id:abc-20241114000000"],"_id":"a1b2c3","attributes":{"goroutine":100,"file":"server/node.go","line":533,"counter":24,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"INFO"}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"inserted sqlliveness session 01018071445fbd54a44ee88e906efb311d7193","tags":["cluster:ABC","env:debug","node_id:2","service:CRDB-SH","source:cockroachdb","upload_id:abc-20241114000000"],"_id":"a1b2c3","attributes":{"goroutine":916,"file":"sql/sqlliveness/slstorage/slstorage.go","line":540,"counter":43,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"INFO"}}


# Single-node - with recent logs that are use logs API
upload-logs
{
"nodes": {
"1": {
"logs": [
{
"name": "cockroach.hostname.username.2024-07-16T17_51_43Z.048498.log",
"lines": [
"I{{now}} 100 server/node.go:533 ⋮ [T1,n1] 24 initialized store s1",
"W{{now}} 100 server/env_sampler.go:125 ⋮ [T1,n1] 33 failed to start query profiler worker: failed to detect cgroup memory limit: failed to read memory cgroup from cgroups file: ‹/proc/self/cgroup›: open ‹/proc/self/cgroup›: no such file or directory"
]
}
]
}
}
}
----
Logs API Hook: https://http-intake.logs.us5.datadoghq.com/api/v2/logs
Logs API Hook: {"goroutine":100,"file":"server/env_sampler.go","line":125,"message":"failed to start query profiler worker: failed to detect cgroup memory limit: failed to read memory cgroup from cgroups file: /proc/self/cgroup: open /proc/self/cgroup: no such file or directory","counter":33,"tenant_id":"1","timestamp":0,"severity":"WARNING","channel":"DEV","ddtags":"cluster:ABC,env:debug,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:abc-20241114000000"}
Logs API Hook: {"goroutine":100,"file":"server/node.go","line":533,"message":"initialized store s1","counter":24,"tenant_id":"1","timestamp":0,"severity":"INFO","channel":"DEV","ddtags":"cluster:ABC,env:debug,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:abc-20241114000000"}
Upload ID: abc-20241114000000
debug zip upload debugDir --dd-api-key=dd-api-key --dd-app-key=dd-app-key --cluster=ABC --include=logs


# Single-node - with both recent and old logs
upload-logs
{
"nodes": {
"1": {
"logs": [
{
"name": "cockroach.hostname.username.2024-07-16T17_51_43Z.048498.log",
"lines": [
"I240716 17:51:44.797342 916 sql/sqlliveness/slstorage/slstorage.go:540 ⋮ [T1,n1] 43 inserted sqlliveness session 01018071445fbd54a44ee88e906efb311d7193",
"I240716 17:51:44.797530 916 sql/sqlliveness/slinstance/slinstance.go:258 ⋮ [T1,n1] 44 created new SQL liveness session 01018071445fbd54a44ee88e906efb311d7193",
"I{{now}} 100 server/node.go:533 ⋮ [T1,n1] 24 initialized store s1",
"W{{now}} 100 server/env_sampler.go:125 ⋮ [T1,n1] 33 failed to start query profiler worker: failed to detect cgroup memory limit: failed to read memory cgroup from cgroups file: ‹/proc/self/cgroup›: open ‹/proc/self/cgroup›: no such file or directory"
]
}
]
}
}
}
----
Create DD Archive: https://api.us5.datadoghq.com/api/v2/logs/config/archives
Create DD Archive: {"data":{"type":"archives","attributes":{"name":"abc-20241114000000","query":"-*","destination":{"type":"gcs","path":"ABC/abc-20241114000000","bucket":"debugzip-archives","integration":{"project_id":"arjun-sandbox-424904","client_email":"[email protected]"}}}}}
GCS Upload: ABC/abc-20241114000000/dt=20240716/hour=17/1/cockroach.hostname.username.2024-07-16T17_51_43Z.048498.log:
Logs API Hook: https://http-intake.logs.us5.datadoghq.com/api/v2/logs
Logs API Hook: {"goroutine":100,"file":"server/env_sampler.go","line":125,"message":"failed to start query profiler worker: failed to detect cgroup memory limit: failed to read memory cgroup from cgroups file: /proc/self/cgroup: open /proc/self/cgroup: no such file or directory","counter":33,"tenant_id":"1","timestamp":0,"severity":"WARNING","channel":"DEV","ddtags":"cluster:ABC,env:debug,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:abc-20241114000000"}
Logs API Hook: {"goroutine":100,"file":"server/node.go","line":533,"message":"initialized store s1","counter":24,"tenant_id":"1","timestamp":0,"severity":"INFO","channel":"DEV","ddtags":"cluster:ABC,env:debug,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:abc-20241114000000"}
Upload ID: abc-20241114000000
debug zip upload debugDir --dd-api-key=dd-api-key --dd-app-key=dd-app-key --cluster=ABC --include=logs
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"created new SQL liveness session 01018071445fbd54a44ee88e906efb311d7193","tags":["cluster:ABC","env:debug","node_id:1","service:CRDB-SH","source:cockroachdb","upload_id:abc-20241114000000"],"_id":"a1b2c3","attributes":{"goroutine":916,"file":"sql/sqlliveness/slinstance/slinstance.go","line":258,"counter":44,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"INFO"}}
{"timestamp":1721152304,"date":"2024-07-16T17:51:44Z","message":"inserted sqlliveness session 01018071445fbd54a44ee88e906efb311d7193","tags":["cluster:ABC","env:debug","node_id:1","service:CRDB-SH","source:cockroachdb","upload_id:abc-20241114000000"],"_id":"a1b2c3","attributes":{"goroutine":916,"file":"sql/sqlliveness/slstorage/slstorage.go","line":540,"counter":43,"tenant_id":"1","date":"2024-07-16T17:51:44Z","timestamp":1721152304,"channel":"DEV","severity":"INFO"}}
22 changes: 11 additions & 11 deletions pkg/cli/testdata/upload/profiles
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ upload-profiles
}
}
----
Upload ID: 123
Upload ID: abc-20241114000000
debug zip upload debugDir --dd-api-key=dd-api-key --dd-app-key=dd-app-key --cluster=ABC --include=profiles
{"start":"","end":"","attachments":["cpu.pprof","heap.pprof"],"tags_profiler":"cluster:ABC,env:debug,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:123","family":"go","version":"4"}
{"start":"","end":"","attachments":["cpu.pprof","heap.pprof"],"tags_profiler":"cluster:ABC,env:debug,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:abc-20241114000000","family":"go","version":"4"}


# Multi-node - both profiles
Expand All @@ -35,10 +35,10 @@ upload-profiles tags=foo:bar
}
}
----
Upload ID: 123
Upload ID: abc-20241114000000
debug zip upload debugDir --dd-api-key=dd-api-key --dd-app-key=dd-app-key --tags=foo:bar --cluster=ABC --include=profiles
{"start":"","end":"","attachments":["cpu.pprof","heap.pprof"],"tags_profiler":"cluster:ABC,env:debug,foo:bar,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:123","family":"go","version":"4"}
{"start":"","end":"","attachments":["cpu.pprof","heap.pprof"],"tags_profiler":"cluster:ABC,env:debug,foo:bar,node_id:2,service:CRDB-SH,source:cockroachdb,upload_id:123","family":"go","version":"4"}
{"start":"","end":"","attachments":["cpu.pprof","heap.pprof"],"tags_profiler":"cluster:ABC,env:debug,foo:bar,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:abc-20241114000000","family":"go","version":"4"}
{"start":"","end":"","attachments":["cpu.pprof","heap.pprof"],"tags_profiler":"cluster:ABC,env:debug,foo:bar,node_id:2,service:CRDB-SH,source:cockroachdb,upload_id:abc-20241114000000","family":"go","version":"4"}


# Single-node - only CPU profile
Expand All @@ -53,9 +53,9 @@ upload-profiles tags=customer:user-given-name,cluster:XYZ
}
}
----
Upload ID: 123
Upload ID: abc-20241114000000
debug zip upload debugDir --dd-api-key=dd-api-key --dd-app-key=dd-app-key --tags=customer:user-given-name,cluster:XYZ --cluster=ABC --include=profiles
{"start":"","end":"","attachments":["cpu.pprof"],"tags_profiler":"cluster:XYZ,customer:user-given-name,env:debug,foo:bar,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:123","family":"go","version":"4"}
{"start":"","end":"","attachments":["cpu.pprof"],"tags_profiler":"cluster:XYZ,customer:user-given-name,env:debug,foo:bar,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:abc-20241114000000","family":"go","version":"4"}


# Single-node - no profiles found
Expand All @@ -66,7 +66,7 @@ upload-profiles
}
}
----
Upload ID: 123
Upload ID: abc-20241114000000
debug zip upload debugDir --dd-api-key=dd-api-key --dd-app-key=dd-app-key --cluster=ABC --include=profiles


Expand All @@ -83,9 +83,9 @@ upload-profiles tags=env:SH
}
}
----
Upload ID: 123
Upload ID: abc-20241114000000
debug zip upload debugDir --dd-api-key=dd-api-key --dd-app-key=dd-app-key --tags=env:SH --cluster=ABC --include=profiles
{"start":"","end":"","attachments":["cpu.pprof","heap.pprof"],"tags_profiler":"cluster:ABC,env:SH,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:123","family":"go","version":"4"}
{"start":"","end":"","attachments":["cpu.pprof","heap.pprof"],"tags_profiler":"cluster:ABC,env:SH,node_id:1,service:CRDB-SH,source:cockroachdb,upload_id:abc-20241114000000","family":"go","version":"4"}


# Single-node - both profiles
Expand All @@ -102,7 +102,7 @@ upload-profiles tags=ERR
}
----
Failed to upload profiles: failed to upload profiles of node 1: status: 400, body: 'runtime' is a required field
Upload ID: 123
Upload ID: abc-20241114000000
debug zip upload debugDir --dd-api-key=dd-api-key --dd-app-key=dd-app-key --tags=ERR --cluster=ABC --include=profiles


Expand Down
14 changes: 7 additions & 7 deletions pkg/cli/tsdump_upload.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,15 +42,15 @@ var (
// each site in datadog has a different host name. ddSiteToHostMap
// holds the mapping of site name to the host name.
ddSiteToHostMap = map[string]string{
"us1": "api.datadoghq.com",
"us3": "api.us3.datadoghq.com",
"us5": "api.us5.datadoghq.com",
"eu1": "api.datadoghq.eu",
"ap1": "api.ap1.datadoghq.com",
"us1-fed": "api.ddog-gov.com",
"us1": "datadoghq.com",
"us3": "us3.datadoghq.com",
"us5": "us5.datadoghq.com",
"eu1": "datadoghq.eu",
"ap1": "ap1.datadoghq.com",
"us1-fed": "ddog-gov.com",
}

targetURLFormat = "https://%s/api/v2/series"
targetURLFormat = "https://api.%s/api/v2/series"
datadogDashboardURLFormat = "https://us5.datadoghq.com/dashboard/bif-kwe-gx2/self-hosted-db-console-tsdump?" +
"tpl_var_cluster=%s&tpl_var_upload_id=%s&tpl_var_upload_day=%d&tpl_var_upload_month=%d&tpl_var_upload_year=%d&from_ts=%d&to_ts=%d"
zipFileSignature = []byte{0x50, 0x4B, 0x03, 0x04}
Expand Down
Loading

0 comments on commit 844d763

Please sign in to comment.