-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid memory address crash when running a backup #403
Comments
So the only change you made to the test file is
or did something else change as well?
I wonder if the problem is related to the use of |
Thanks for the quick answer. I Without changing anything the backup container wont start because This spun up the backup container on one of my worker nodes. Then I sshd into that worker node, and ran the backup command, then I got this error:
Hence I also added the placement constraint on the backup container:
Then I got the error from above.
The current user I run commands from is not in the docker group so I use sudo to execute commands as root when I want to manage my docker containers. I personally wouldn't want to add the user to the docker group as that would give that user pretty much root access. However for testing I did that and got the same error:
|
Hmm, this is very hard to tell me for as I can't really reproduce it in any way. To be sure it's not caused by your Docker version I also created #404 which updates the version used in tests to 26, but everything is passing as expected. Right now, it's very hard for me to tell why you are this (highly unexpected) error. My hunch is that the docker cli client somehow is docker-volume-backup/cmd/backup/script.go Lines 104 to 116 in 2f7193a
|
Could you provide the output of |
|
I've added a It also reaches the last If you want me to run a patch or add some debug lines, just tell me please! |
What you could do is extend this block: docker-volume-backup/cmd/backup/run_script.go Lines 18 to 27 in bf79c91
to read defer func() {
if derr := recover(); derr != nil {
asErr, ok := derr.(error)
if ok {
fmt.Printf("%s: %s\n", asErr, debug.Stack())
err = errwrap.Wrap(asErr, "unexpected panic running script")
} else {
err = errwrap.Wrap(nil, fmt.Sprintf("%v", derr))
}
}
}() which might tell us where the panic is being caused. |
|
Thanks, that's very helpful as I now can see where the panic is being caused:
Is a service you are trying to stop / restart not deployed in replicated mode? |
Which would still not explain why the test case fails for you as well. |
There is indeed a bug in there: the error returned from if isDockerSwarm {
allServices, err = s.cli.ServiceList(context.Background(), types.ServiceListOptions{})
if err != nil {
return noop, errwrap.Wrap(err, "error querying for services")
}
matchingServices, err := s.cli.ServiceList(context.Background(), types.ServiceListOptions{
Filters: filters.NewArgs(filters.KeyValuePair{
Key: "label",
Value: filterMatchLabel,
}),
Status: true,
})
if err != nil {
return noop, errwrap.Wrap(err, "error querying for services to scale down")
}
for _, s := range matchingServices {
servicesToScaleDown = append(servicesToScaleDown, handledSwarmService{
serviceID: s.ID,
initialReplicaCount: *s.Spec.Mode.Replicated.Replicas,
})
}
} and see what kind of error you receive? |
Good catch! I've stopped all stacks except the example one and ran the program, now it works fine. However, having a container anywhere (even outside of the current stack) deployed with "mode: global" and the "backup" flag set via the label results in throwing this error still:
So for example the following setup fails:
Removing the label in the influx.yml file and it works fine afterwards. |
Is that using the patch suggested in #403 (comment) or without? Labeling globally deployed services is not supported, and tbh I don't know how it could be supported as they can't be scaled down as easily. Not sure what the container level approach does in that case though https://offen.github.io/docker-volume-backup/how-tos/use-with-docker-swarm.html |
Yeah that's with the patch included: $ git diff
diff --git a/cmd/backup/run_script.go b/cmd/backup/run_script.go
index b9ada32..d636f3e 100644
--- a/cmd/backup/run_script.go
+++ b/cmd/backup/run_script.go
@@ -6,7 +6,7 @@ package main
import (
"errors"
"fmt"
-
+ "runtime/debug"
"github.com/offen/docker-volume-backup/internal/errwrap"
)
@@ -19,6 +19,7 @@ func runScript(c *Config) (err error) {
if derr := recover(); derr != nil {
asErr, ok := derr.(error)
if ok {
+ fmt.Printf("%s: %s\n", asErr, debug.Stack())
err = errwrap.Wrap(asErr, "unexpected panic running script")
} else {
err = errwrap.Wrap(nil, fmt.Sprintf("%v", derr))
diff --git a/cmd/backup/stop_restart.go b/cmd/backup/stop_restart.go
index 7fc558c..1c0e761 100644
--- a/cmd/backup/stop_restart.go
+++ b/cmd/backup/stop_restart.go
@@ -151,15 +151,15 @@ func (s *script) stopContainersAndServices() (func() error, error) {
}),
Status: true,
})
+ if err != nil {
+ return noop, errwrap.Wrap(err, "error querying for services to scale down")
+ }
for _, s := range matchingServices {
servicesToScaleDown = append(servicesToScaleDown, handledSwarmService{
serviceID: s.ID,
initialReplicaCount: *s.Spec.Mode.Replicated.Replicas,
})
}
- if err != nil {
- return noop, errwrap.Wrap(err, "error querying for services to scale down")
- }
}
if len(containersToStop) == 0 && len(servicesToScaleDown) == 0 {
@@ -359,3 +359,4 @@ func (s *script) stopContainersAndServices() (func() error, error) {
return nil
}, initialErr
}
+ Yeah I'm fine with it not working in global mode, just wanted to mention in case the patch should throw a proper error instead of a stacktrace. Thanks for your help as well btw! :) |
I think I know what's going on (there's no error, but also no replica info), will include it in my PR, thanks for the input. |
This is fixed in v2.39.1 |
Hey!
Im trying to setup the project with docker swarm, but whenever I run a backup I get a null pointer:
stack ps
:docker stack file
:This also happens with the docker file from the tests folder in your repo (Slightly adjusted to make sure that the backup container runs on the manager node):
Expected behavior
The backup shouldn't crash.
Version (please complete the following information):
Additional context
I have 3 VMs running in a basic swarm network, they all run Ubuntu Server 22.04:
The text was updated successfully, but these errors were encountered: