Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reference Unstructured store API to upload TaskRun logs GCS store #107

Closed
tejal29 opened this issue Oct 6, 2018 · 5 comments
Closed
Labels
design This task is about creating and discussing a design

Comments

@tejal29
Copy link
Contributor

tejal29 commented Oct 6, 2018

Expected Behavior

The Pipeline TaskRun logs should be uploaded to an endpoint and available to download later.
In our initial reference implementation we should support uploading to GCS. In the long run we should support other kinds of stores, and provide a default that does not require GCS.

Actual Behavior

As of #167 the logs will be streamed to a PVC. This volume will continue to exist after the TaskRun has completed. Once this task is done, that PVC should no longer be needed. (This functionality was removed in #443)

Since we moved from init containers to containers for steps in #564 logs are available via the pod logs through kube, however there are still only limited guarantees about how long the logs will be available for.

Steps to Reproduce the Problem

  1. Create a Task
  2. Create a TaskRun
  3. Wait for TaskRun to complete
  4. Downloaded the uploaded results.

Additional Info

@tejal29 tejal29 added this to the Mid October Demo milestone Oct 6, 2018
@bobcatfish
Copy link
Collaborator

@tejal29 this is a stretch goal for the milestone so I'm going to remove it from the required milestone tasks

@bobcatfish bobcatfish removed this from the Mid October Demo milestone Oct 8, 2018
knative-prow-robot pushed a commit that referenced this issue Oct 8, 2018
This pr implements a simple TaskRun controller that creates a knative/build Build and updates the TaskRun status to reflect the Build status. We delegate to the knative/build controller to do the work of actually fulfillign the Build itself - meaning we have a hard dependency on knative/build.

The integration test doesn't actually assert on the logs output by the
bulid step because the pods disappear immediately after completion, so
we need a better solution here (e.g. writing to a PVC in the test) - in
the long run we need to implement better log support (#107).

Remaining work for #59 is to improve unit test coverage, and add some
docs on running + debugging.
@bobcatfish bobcatfish added the meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given label Oct 12, 2018
@tejal29 tejal29 self-assigned this Oct 15, 2018
@bobcatfish bobcatfish added design This task is about creating and discussing a design and removed meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given labels Oct 16, 2018
@bobcatfish bobcatfish changed the title Implement a Unstructured store API to upload TaskRun logs and results to GCB store. Implement reference Unstructured store API to upload TaskRun logs and results to GCB store. Oct 24, 2018
@bobcatfish bobcatfish assigned tejal29 and unassigned tejal29 Oct 29, 2018
@tejal29
Copy link
Contributor Author

tejal29 commented Oct 30, 2018

After feedback with Build working group, we decided to go with following approach.
The API will provide a Sink interface.
A Sink Service Definition will contain

  • Endpoint : This is url where the sink is hosted.
  • Path to upload to: for now we will be only focusing on uploading logs.
    • Note: We will add support for uploading the Status for TaskRun or PipelineRun in subsequent iteration.
      When we want to upload sat task runs logs, the reconciler will call the service which is running at Endpoint and upload all the logs at path prefix <Path> <Endpoint>:<Path>/<taskruns><task run id>

This will be a Https Service running in our cluster or any other cluster which your cluster can access.

The Sink Base definition will look like this.

package sink
type URL string
type Base struct {
   Endpoint URL
   Path String
}

The reason, we have Path same service can be used to upload logs for multiple projects. They will reside in different Paths and not overlap.

e.g.: I have implemented a GCSSink and installed in my cluster. It is running at "104.198.205.71:8080".
The GCS sink defintion will look this

package sink

type GCS struct {
  *sink.Base
   project string //GCP project.
}

I can define two GCS Sinks which point to 2 buckets "cluster1", "cluster2"

sink1 := sink.GCS {
  Endpoint: "104.198.205.71:8080",
  Path: "cluster1",
  Project: "test1",  // only test1 has write access to gcs bucket gs://cluster1
}

sink2 := sink.GCS {
  Endpoint: "104.198.205.71:8080",
  Path: "cluster2",
 Project: "test2",  // Similarly, only test2 has write access to gcs bucket gs://cluster2
}

Along with that, Sink Interface need to handle 4 url requests "upload/taskruns/", "download/taskruns/" , "upload/pipelineruns/", "download/taskruns/"
(We can also support, Upload and Get with run type. Its up to the implementer )
Note: We can support partial upload and download of logs later.

The Task Reconciler will now make a HTTP request to this sink.Endpoint/sink.Path/upload/taskruns/<id=x>,<contentstream=>

The Design Question over here is how to Define a Sink for a pipeline or a Task run.
Should sink be defined cluster wide Custom Resource?

  • this would mean, we will create a new custom resource for Sink.
  • Admins can create multiple Sinks.
  • The Reconciler will fetch all sinks installed in your cluster and then upload logs to all sinks.

Should Sink be defined per Pipeline or Task?

  • this would Sink will added to PipelineParams
  • this would mean, now task and task run definition will list the sink it would want to sink the results too

/cc @imjasonh and @bobcatfish and @aaron-prindle does this all make sense?

@aaron-prindle
Copy link
Contributor

aaron-prindle commented Oct 30, 2018

Nice! These are some initial thoughts/questions:

  1. For something like a GCS sink, how is authorization done for the log uploading? I think GCS w/ GKE will just work but I'm wondering about other clouds/providers?
  2. Is there a default path: value that we should populate for users is none is supplied? (path is optional?)
  3. For GCS, project should maybe be renamed bucket as I think you can have multiple GCS buckets/project. Also would the endpoint there be the gs:// url or the full url?

@tejal29
Copy link
Contributor Author

tejal29 commented Oct 31, 2018

Nice! These are some initial thoughts/questions:

  1. For something like a GCS sink, how is authorization done for the log uploading? I think GCS w/ GKE will just work but I'm wondering about other clouds/providers?

Yes you would need like Credentials file added to sink.GCS definition and then pass that along?
Maybe it could be a k8 ConfigMap object and we define the name of the ConfigMap in the GCS sink definition.

  1. Is there a default path: value that we should populate for users is none is supplied? (path is optional?)

not sure, what would happen if we provide a default path. We we have taskruns with same id running in separate clusters. They might end up writing to same path. Maybe we could add some validation to make sure path is always specified.

  1. For GCS, project should maybe be renamed bucket as I think you can have multiple GCS buckets/project. Also would the endpoint there be the gs:// url or the full url?

Ahh! For GCS i thinking path represents bucket and then project was something i saw in the go storage api.

       ctx := context.Background()

        // For API packages whose import path is starting with "cloud.google.com/go",
        // such as cloud.google.com/go/storage in this case, if there are no credentials
        // provided, the client library will look for credentials in the environment.
        storageClient, err := storage.NewClient(ctx)
        if err != nil {
                log.Fatal(err)
        }

        it := storageClient.Buckets(ctx, "project-id")
        for {
                bucketAttrs, err := it.Next()
                if err == iterator.Done {
                        break
                }
                if err != nil {
                        log.Fatal(err)
                }
                fmt.Println(bucketAttrs.Name)
        }

The endpoint will be actually GCS Sink Implementation Http Service like "10.x.x.x:8080" which will have all the code to upload and download content from GCS.
We are providing a GCS implementation which others can use.
Users will have to write Sink Implementation and deploy it in their cluster as a HTTP service. They have to make sure they handle "upload/taskruns", "download/taskruns" request.
The TaskController is now agnostic to what service implements.

bobcatfish referenced this issue in bobcatfish/pipeline Dec 1, 2018
When a user kicks off a run, they will provide an endpoint to upload
logs to (initial implementation will be in #107). The corresponding
fields in `status` will indicate where the logs actually got uplaoded
to.

Once we actually get to #107, and especially once we start supporting
endpoints other than GCS, we may find this isn't useful and remove it.

Fixes tektoncd#146
bobcatfish referenced this issue in bobcatfish/pipeline Jan 30, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in tektoncd#224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (tektoncd#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes tektoncd#387
knative-prow-robot pushed a commit that referenced this issue Jan 31, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in #224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes #387
bobcatfish referenced this issue in bobcatfish/pipeline Feb 28, 2019
In tektoncd#549 @hrishin pointed out that it's hard to understand from the step
status exactly which step did what. While looking at this I realized
that we have included a field `logsURL` which we never populate - I
thought this was copied over from Build but it was actually from our
original prototype API and we have never used it. In #107 we should be
revisiting making logs available and we may add in something like this,
but since we're not using it and it's not clear if we ever will, let's
remove it for now.
knative-prow-robot pushed a commit that referenced this issue Mar 1, 2019
In #549 @hrishin pointed out that it's hard to understand from the step
status exactly which step did what. While looking at this I realized
that we have included a field `logsURL` which we never populate - I
thought this was copied over from Build but it was actually from our
original prototype API and we have never used it. In #107 we should be
revisiting making logs available and we may add in something like this,
but since we're not using it and it's not clear if we ever will, let's
remove it for now.
@bobcatfish bobcatfish changed the title Implement reference Unstructured store API to upload TaskRun logs and results to GCS store. Implement reference Unstructured store API to upload TaskRun logs GCS store Apr 25, 2019
@bobcatfish bobcatfish assigned bobcatfish and unassigned tejal29 Apr 25, 2019
@bobcatfish bobcatfish assigned ghost and unassigned bobcatfish Apr 26, 2019
bobcatfish referenced this issue in bobcatfish/pipeline May 30, 2019
As @cmoulliard pointed out, it's not obvious how to get to the logs for
a PipelineRun or a TaskRun. If you know how the underlying kubernetes
resources work you can figure it out but it can be hard to know where to
start. Plus, folks may not realize that we are working on better ways of
accessing logs.

And once we work on #107 we can build up these docs with more detail
about how to upload logs too.
bobcatfish referenced this issue in bobcatfish/pipeline May 30, 2019
As @cmoulliard pointed out, it's not obvious how to get to the logs for
a PipelineRun or a TaskRun. If you know how the underlying kubernetes
resources work you can figure it out but it can be hard to know where to
start. Plus, folks may not realize that we are working on better ways of
accessing logs.

And once we work on #107 we can build up these docs with more detail
about how to upload logs too.

Fixes tektoncd#898
tekton-robot pushed a commit that referenced this issue May 30, 2019
As @cmoulliard pointed out, it's not obvious how to get to the logs for
a PipelineRun or a TaskRun. If you know how the underlying kubernetes
resources work you can figure it out but it can be hard to know where to
start. Plus, folks may not realize that we are working on better ways of
accessing logs.

And once we work on #107 we can build up these docs with more detail
about how to upload logs too.

Fixes #898
@ghost
Copy link

ghost commented Aug 5, 2019

I'm closing this issue out as we have now circulated a design doc for logging in Tekton and the utility of information retained in this issue is limited due to its age.

I've opened #1155 to encompass the work of validating and implementing the proposed design and encourage anyone looking to get involved on this topic to add commentary, use cases and counterpoints to the design doc or github issue linked above. Cheers!

@ghost ghost closed this as completed Aug 5, 2019
bobcatfish added a commit that referenced this issue Mar 26, 2020
In #107 and related issues we decided to let tools dedicated to this (e.g. fluentd) take care of it!
tekton-robot pushed a commit that referenced this issue Mar 27, 2020
In #107 and related issues we decided to let tools dedicated to this (e.g. fluentd) take care of it!
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design This task is about creating and discussing a design
Projects
None yet
Development

No branches or pull requests

4 participants