-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc: release locks when running a command #2584
Conversation
f10dca3
to
9c523d7
Compare
2685ac5
to
fef90bc
Compare
01275c1
to
752ffe2
Compare
ce14d90
to
681a7c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graph shouldn't be recollected while repo lock is off, ideally we should assert that, this might capture some nasty errors now or in the future.
Looks like we use @locked_stage
on quite small ops. This arises several questions:
- how do you decide what to wrap and what to not? I.e. why
Stage.commit()
is not locked? - how will it affect the performance?
- what will happen if some locked op calls another locked op? Doesn't look like we provide reentrance, will we get
AlreadyLocked
error or smth?
Overall this looks fragile. The presupposition I guess should be that the dvc process might be in two states:
- collecting/updating stage files (Repo lock should be on)
- running command/updating stage outputs (Stage locked, repo unlocked)
So the structure of code and maybe some asserts should check that:
- we are not collecting stages, while repo lock is off
- some lock is always on, there shouldn't be inbetweens
- we only update (read?) stage outs when stage lock is on
Not sure how this works with deps.
This probably looks vague because it's nowhere explained what stage lock means. And there are lots of possibilities:
Or any combination of the above. If say stage lock means exclusive access to outs then it explains why we lock stage and its dep stages (their outs are our deps) on run. This, however, means that This should be stated very clearly. Until questions like "may I dump the stage while I don't have repo lock?" will arise. |
b7d9e4b
to
783ac18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Shouldn't Stage._compute_md5
be wrapped in
@rwlocked(read=["deps","outs"])
?
@pared Good catch! You are right, it should even be that for the |
One thing about the |
@pared Good point! So looks like we do need some reentrance after all. Taking a look... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still think we should use inside-out schema, e.g. {'read': {str: [INFO]}, 'write': {str: INFO}}
. Code samples below become simpler and more efficient. If you worry about paths being duplicated than they won't be since one path might only be locked either for read or write.
d3b02c6
to
e59cbf9
Compare
Adjusted the schema and made rwlock reentrant, so |
cd0b668
to
8588360
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically is done.
Two things on tests:
- tmpdir is deprecated, it's tmp_path now
- need to test that reentrance won't work for different pids
Currently only one instance of dvc could run in a particular repo because of the `.dvc/lock`. That gets furstrating when you are running some long-running command and blocked from doing anything else. This PR solved that issue for `dvc run` by releasing the repo lock while running the command. To protect against conflicts, we now use rwlock, which keeps the list of readers and writers for particular paths. The process looks like this: 1) aquire read locks for deps; 2) aquire write locks for outs; 3) release repo lock; 4) run cmd; 5) aquire repo lock; 6) release write locks for outs; 7) release read locks for outs; We also take rwlocks for operations such as `import`, `add` and `checkout` to ensure that we won't hit the same files that are being used by another `dvc run` in the background. We don't aquire rwlocks for the stage file itself, as we don't really care about anyone overwriting a stage file, as there is a very little difference here between parallel runs and sequential runs that overwrite the same stage file. Next steps might be to release repo lock on `import`(when dowloading stuff) and when computing checksums or copying files. Fixes iterative#755
Fixed.
Not sure I follow. It sounds like a regular rwlock operation to me. The access is info-based (pid + cmd), so pretty much any test like https://github.com/iterative/dvc/pull/2584/files#diff-15338413b9f377c5906d1b1b3498f98cR32 tests that. Could you elaborate? |
I see, you use different commands, this should be resolved then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
End on the epoch 🎉
Currently only one instance of dvc could run in a particular repo
because of the
.dvc/lock
. That gets furstrating when you are runningsome long-running command and blocked from doing anything else.
This PR solved that issue for
dvc run
by releasing the repo lockwhile running the command. To protect against conflicts, we now use
rwlock, which keeps the list of readers and writers for particular
paths. The process looks like this:
We also take rwlocks for operations such as
import
,add
andcheckout
to ensure that we won't hit the same files that are being used by
another
dvc run
in the background.We don't aquire rwlocks for the stage file itself, as we don't really
care about anyone overwriting a stage file, as there is a very little
difference here between parallel runs and sequential runs that overwrite
the same stage file.
Next steps might be to release repo lock on
import
(when downloadingstuff) and when computing checksums or copying files.
Related to #755
TODO:
wait indefinitely until we get a repo lock back after running the commandNo urgent need, user candvc run --no-exec
+dvc commit
if it fails to aquire the lock back.