Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log the commands we execute for better debugging #1619

Open
Madhu-1 opened this issue Oct 21, 2020 · 10 comments
Open

log the commands we execute for better debugging #1619

Madhu-1 opened this issue Oct 21, 2020 · 10 comments
Labels
dependency/go-ceph depends on go-ceph functionality keepalive This label can be used to disable stale bot activiity in the repo logging The Change is only in logging Priority-2 VolunteersRequired Tag for issues where we need a volunteer to pick an issue

Comments

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 21, 2020

As we are moving towards integrating with go-ceph API's. it's good to log the commands which are equivalent to the RBD/ ceph/RADOS CLI which will help us in debugging the issues.

@Madhu-1 Madhu-1 added the logging The Change is only in logging label Oct 21, 2020
@phlogistonjohn
Copy link
Contributor

I don't know if there is a particularly easy way to do this for the "built in" APIs such as the RBD calls. There's some logging built in to the ceph code that you can turn on by loading a ceph config file or using the set-option function calls in rados lib.
Otherwise you might just want to resort to adding custom logging calls around the rados/rbd api invocations.

For the cephfs admin API I purposefully tried to make this easy. See our test suite here for an example. This can be adapted to log the JSON commands and the responses back from ceph. Note that if you want to log all the results of the processing go-ceph does you still need to log them directly after the go-ceph call is made.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Oct 21, 2020

@phlogistonjohn Thanks for the input, yes you are right even am not sure is there an easy way to log the command with "build-in" API's (i was thinking about enabling logging at CSI level not at the ceph level as we (I) cannot analyze it easily :D). even I was thinking the same to introduce a custom logging in cephcsi to keep track of the commands we execute which helps in debugging.

@dillaman @ShyamsundarR @nixpanic @humblec any suggestion on this one?

@phlogistonjohn
Copy link
Contributor

package cephwrap

import (
	// pretend this is filled in
)

// IOContext matches the IOContext function calls from go-ceph's rados package used by CSI.
type IOContext interface {
	Create(oid string, exclusive CreateOption) error
	ListOmapValues(oid string, startAfter string, filterPrefix string, maxReturn int64, listFn OmapListFunc) error
}

// RBD matches the package level function calls from go-ceph's rbd package used by CSI.
type RBD interface {
	CloneImage(ioctx *rados.IOContext, parentName, snapName string, destctx *rados.IOContext, name string, rio *ImageOptions) error
}

// Image matches the Image type function calls from go-ceph's rbd package used by CSI.
type Image interface {
	ListWatchers() ([]ImageWatcher, error)
}

type LogWatchIOContext struct {
	ioctx *rados.IOContext
	log SpecialLogger
	context Context
}

// Create wraps the go-ceph IOContext create call. It logs when the function
// starts and exits as well as setting up a background log to emit warnings
// if the call has blocked for longer than the specified deadline.
func (ioctx *LogWatchIOContext) Create(oid string, exclusive CreateOption) error {
	ioctx.log.Enter(oid, exclusive)
	ctx, cancel := context.WithDeadline(standardDeadline())
	defer cancel()
	ioctx.log.Monitor(ctx)
	err := ioctx.ioctx.Create(oid, exclusive)
	ioctx.log.Exit(err)
	return err
}

func standardDeadline() time.Time {
	return time.Now().Add(someDuration)
}

func examples() {
	// a logging type made for monitoring api calls with
	// whatever convenience methods you choose
	log := SpecialLogger()

        // set up a logged-and-watched wrapper around a go-ceph rados iocontext type. Use it to create
        // an object. It internally logs the args and return and if the call blocks for long time, it will
        // log that (in the background).
	lioctx := NewLogWatchIOContext(ioctx, log, ctx)
	err := lioctx.Create(myOid, rados.CreateExclusive)
	// do stuff with results

        // set up a logged-and-watched wrapper around a go-ceph rbd calls. Use it to clone an rbd
        // image. It internally logs the args and return and if the call blocks for long time, it will
        // log that (in the background).
	lrbd := LogWatchRBD(log, ctx)
	err := lrbd.CloneImage(ioctx, name1, name2, destIoctx, name3, opts)
	// do stuff with results

        // set up a logged-and-watched wrapper around a go-ceph rbd Image type. Use it to list
       // watchers. It internally logs the args and return and if the call blocks for long time, it will
        // log that (in the background).
	limage := LogWatchImage(image, log, ctx)
	w, err := limage.ListWatchers()
	// do stuff with results
}

Above is a very (very!) rough sketch of what I think CSI might want to do to log the things CSI cares about when making use of go-ceph. I don't think all functions need to be wrapped, just those that reach out to the ceph cluster and could block.

I propose defining interfaces for the various "items" ceph-csi uses from go ceph. Then based on needs the csi code defines simple uniform wrappers around go-ceph that matches the interface (go-ceph itself meets those interfaces too!).

There's lots of places to change or tweak this concept. You don't need to do the background logging for all calls, for example. You could also put the go-ceph call into a goroutine (test this please!) rather than the logging. Its all up to what works for go-ceph.

Please feel to pick holes in this suggestion and/or ask questions.

@nixpanic nixpanic added this to the release-3.3.0 milestone Jan 20, 2021
@Madhu-1 Madhu-1 modified the milestones: release-3.3.0, release-3.4.0 Apr 7, 2021
@Yuggupta27 Yuggupta27 removed their assignment Apr 20, 2021
@Yuggupta27
Copy link
Contributor

@yati1998 is working on this issue.

@github-actions
Copy link

github-actions bot commented Sep 4, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Sep 4, 2021
@Rakshith-R Rakshith-R added keepalive This label can be used to disable stale bot activiity in the repo and removed wontfix This will not be worked on labels Sep 6, 2021
@Madhu-1 Madhu-1 added the VolunteersRequired Tag for issues where we need a volunteer to pick an issue label Sep 17, 2021
@humblec
Copy link
Collaborator

humblec commented Jan 4, 2022

This is not going to make it in the release 3.5, removing the milestone.

@humblec humblec removed this from the release-3.5.0 milestone Jan 4, 2022
@humblec humblec added this to the release-3.6 milestone Jan 27, 2022
@nixpanic
Copy link
Member

The PR ceph/go-ceph#638 in go-ceph might be useful for this.

@nixpanic nixpanic added the dependency/go-ceph depends on go-ceph functionality label Feb 15, 2022
@humblec
Copy link
Collaborator

humblec commented Feb 16, 2022

As discussed in the triage call, it would be difficult to get this done for 3.6 , so moving out of the release. We have to followup on the go ceph Log PR mentioned by Niels above and get our thoughts first then try to consume it in the csi. It is also going to be good amount of refactoring in place to get this done.

@humblec humblec removed this from the release-3.6 milestone Feb 16, 2022
@ansiwen
Copy link
Contributor

ansiwen commented Feb 16, 2022

I think we need to clarify the requirements here.

First of all, just because we are about to introduce some log interface in go-ceph doesn't mean we will produce any logs. 🥲 The main trigger for that work was, that in some cases (like orphan callbacks from ceph) there is nothing we could return an error to, so I thought it would be good to log a warning at least. But as a library go-ceph will not produce any logs by default, but only return errors. These errors can be logged on the client side (ceph-csi in this case), where there is still context. (More about context below.) Of course this could also be used to create extensive debugging/tracing output, but at the moment go-ceph has no such tracing in place and we also don't consume any from the ceph libraries.

Speaking of context: I did some research regarding this, because in go-ceph our API has no context parameters. That means the code inside of go-ceph, where we might add logging/tracing, has only two scopes: local (function) and global. Goroutines are anonymous by design, which means, without context parameter go-ceph code, that is called from several goroutines concurrently, has no chance to identify, in which goroutine it is running. (See https://go.dev/doc/faq#no_goroutine_id and https://blog.sgmansfield.com/2015/12/goroutine-ids/ ), so it can only log information from the go-ceph call itself. However, there are hacks, that we might use for debugging purposes. But they depend on implementation details and could break with new Go versions. Here for example I abused profile labels as goroutine context: https://go.dev/play/p/ABnAhj7nlcR

So I guess the path forward would be:

  1. implement logging of calls and errors on the client side (ceph-csi) with proper context.
  2. enable and forward tracing from ceph to the client, if existing
  3. add additional debug output inside of go-ceph

@ansiwen
Copy link
Contributor

ansiwen commented Feb 16, 2022

golang/go#51223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependency/go-ceph depends on go-ceph functionality keepalive This label can be used to disable stale bot activiity in the repo logging The Change is only in logging Priority-2 VolunteersRequired Tag for issues where we need a volunteer to pick an issue
Projects
None yet
Development

No branches or pull requests

8 participants