-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log the commands we execute for better debugging #1619
Comments
I don't know if there is a particularly easy way to do this for the "built in" APIs such as the RBD calls. There's some logging built in to the ceph code that you can turn on by loading a ceph config file or using the set-option function calls in rados lib. For the cephfs admin API I purposefully tried to make this easy. See our test suite here for an example. This can be adapted to log the JSON commands and the responses back from ceph. Note that if you want to log all the results of the processing go-ceph does you still need to log them directly after the go-ceph call is made. |
@phlogistonjohn Thanks for the input, yes you are right even am not sure is there an easy way to log the command with "build-in" API's (i was thinking about enabling logging at CSI level not at the ceph level as we (I) cannot analyze it easily :D). even I was thinking the same to introduce a custom logging in cephcsi to keep track of the commands we execute which helps in debugging. @dillaman @ShyamsundarR @nixpanic @humblec any suggestion on this one? |
package cephwrap
import (
// pretend this is filled in
)
// IOContext matches the IOContext function calls from go-ceph's rados package used by CSI.
type IOContext interface {
Create(oid string, exclusive CreateOption) error
ListOmapValues(oid string, startAfter string, filterPrefix string, maxReturn int64, listFn OmapListFunc) error
}
// RBD matches the package level function calls from go-ceph's rbd package used by CSI.
type RBD interface {
CloneImage(ioctx *rados.IOContext, parentName, snapName string, destctx *rados.IOContext, name string, rio *ImageOptions) error
}
// Image matches the Image type function calls from go-ceph's rbd package used by CSI.
type Image interface {
ListWatchers() ([]ImageWatcher, error)
}
type LogWatchIOContext struct {
ioctx *rados.IOContext
log SpecialLogger
context Context
}
// Create wraps the go-ceph IOContext create call. It logs when the function
// starts and exits as well as setting up a background log to emit warnings
// if the call has blocked for longer than the specified deadline.
func (ioctx *LogWatchIOContext) Create(oid string, exclusive CreateOption) error {
ioctx.log.Enter(oid, exclusive)
ctx, cancel := context.WithDeadline(standardDeadline())
defer cancel()
ioctx.log.Monitor(ctx)
err := ioctx.ioctx.Create(oid, exclusive)
ioctx.log.Exit(err)
return err
}
func standardDeadline() time.Time {
return time.Now().Add(someDuration)
}
func examples() {
// a logging type made for monitoring api calls with
// whatever convenience methods you choose
log := SpecialLogger()
// set up a logged-and-watched wrapper around a go-ceph rados iocontext type. Use it to create
// an object. It internally logs the args and return and if the call blocks for long time, it will
// log that (in the background).
lioctx := NewLogWatchIOContext(ioctx, log, ctx)
err := lioctx.Create(myOid, rados.CreateExclusive)
// do stuff with results
// set up a logged-and-watched wrapper around a go-ceph rbd calls. Use it to clone an rbd
// image. It internally logs the args and return and if the call blocks for long time, it will
// log that (in the background).
lrbd := LogWatchRBD(log, ctx)
err := lrbd.CloneImage(ioctx, name1, name2, destIoctx, name3, opts)
// do stuff with results
// set up a logged-and-watched wrapper around a go-ceph rbd Image type. Use it to list
// watchers. It internally logs the args and return and if the call blocks for long time, it will
// log that (in the background).
limage := LogWatchImage(image, log, ctx)
w, err := limage.ListWatchers()
// do stuff with results
} Above is a very (very!) rough sketch of what I think CSI might want to do to log the things CSI cares about when making use of go-ceph. I don't think all functions need to be wrapped, just those that reach out to the ceph cluster and could block. I propose defining interfaces for the various "items" ceph-csi uses from go ceph. Then based on needs the csi code defines simple uniform wrappers around go-ceph that matches the interface (go-ceph itself meets those interfaces too!). There's lots of places to change or tweak this concept. You don't need to do the background logging for all calls, for example. You could also put the go-ceph call into a goroutine (test this please!) rather than the logging. Its all up to what works for go-ceph. Please feel to pick holes in this suggestion and/or ask questions. |
@yati1998 is working on this issue. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
This is not going to make it in the release 3.5, removing the milestone. |
The PR ceph/go-ceph#638 in go-ceph might be useful for this. |
As discussed in the triage call, it would be difficult to get this done for 3.6 , so moving out of the release. We have to followup on the go ceph Log PR mentioned by Niels above and get our thoughts first then try to consume it in the csi. It is also going to be good amount of refactoring in place to get this done. |
I think we need to clarify the requirements here. First of all, just because we are about to introduce some log interface in go-ceph doesn't mean we will produce any logs. 🥲 The main trigger for that work was, that in some cases (like orphan callbacks from ceph) there is nothing we could return an error to, so I thought it would be good to log a warning at least. But as a library go-ceph will not produce any logs by default, but only return errors. These errors can be logged on the client side (ceph-csi in this case), where there is still context. (More about context below.) Of course this could also be used to create extensive debugging/tracing output, but at the moment go-ceph has no such tracing in place and we also don't consume any from the ceph libraries. Speaking of context: I did some research regarding this, because in go-ceph our API has no context parameters. That means the code inside of go-ceph, where we might add logging/tracing, has only two scopes: local (function) and global. Goroutines are anonymous by design, which means, without context parameter go-ceph code, that is called from several goroutines concurrently, has no chance to identify, in which goroutine it is running. (See https://go.dev/doc/faq#no_goroutine_id and https://blog.sgmansfield.com/2015/12/goroutine-ids/ ), so it can only log information from the go-ceph call itself. However, there are hacks, that we might use for debugging purposes. But they depend on implementation details and could break with new Go versions. Here for example I abused profile labels as goroutine context: https://go.dev/play/p/ABnAhj7nlcR So I guess the path forward would be:
|
As we are moving towards integrating with go-ceph API's. it's good to log the commands which are equivalent to the RBD/ ceph/RADOS CLI which will help us in debugging the issues.
The text was updated successfully, but these errors were encountered: