Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Start of work on daemon #157

Merged
merged 63 commits into from
Sep 10, 2015
Merged

WIP: Start of work on daemon #157

merged 63 commits into from
Sep 10, 2015

Conversation

petervo
Copy link
Contributor

@petervo petervo commented Jun 13, 2015

Replaces #116

@petervo
Copy link
Contributor Author

petervo commented Jun 13, 2015

Made some progress on this. Still needs review and testing OSStub needs to be renamed since it isn't a stub anymore. On naming should I changed to using a prefix everywhere, something like rpmostreed?

I skipped over the package arguments to rebase as well as any update scheduling functionality since neither of those are supported by current clients. I figured those can be separate PRs after this makes it in.

I changed the on complete message to properties on the transaction object to avoid the race where if a client doesn't attach a listener to the transaction object in time they won't know the result of their operation. Now the transaction stays around for a few minutes after completion with the result info as properties. While at least no results are lost a client can still miss progress messages that are sent. Clients can work around by listening to all transactions and then only displaying the messages for it's object once it knows it's path. I'm not sure that's ideal if we decide to stick with this i'll need to rework the cli code to do this.

@mbarnes suggested having a start method on the transaction though that does make the server side quite a bit more complicated and is slightly awkward api wise saying command -> Start. Another idea might be to move all our methods on OS that return transaction paths to the transaction interface itself. So then OS would have a GetTransaction method, you get the transaction object and then you call Upgrade or Rollback or whatever the case may be. If you don't use it after a few mins it goes away. Not sure that's much better though, so I figured i get some feedback before implementing.

We also aren't closing the daemon ever. Do we want to keep it like that? Based on my understanding of our last conversation on this it's probably not possible to be 100% race free, but we can get close since we can now allow 2 daemons to run at the same time, while one finishes up any processing it might be doing. Is that something we want to implement, or would you rather just leave it running?

@mbarnes
Copy link
Contributor

mbarnes commented Jun 13, 2015

I don't have a strong preference either way, but I think calling Start() on a transaction path returned by Upgrade() versus calling Upgrade() on a transaction path returned by GetTransaction() are pretty much equivalent in terms of complexity.

Notice the basic protocol is the same:

C: "I wanna do a thing"
S: "Ok, here's a token, lemme know when you're ready"
C: (connects signals, etc.)  "token: all set"
S: "starting..."

@petervo petervo added the WIP label Jun 15, 2015
@petervo
Copy link
Contributor Author

petervo commented Jun 16, 2015

@cgwalters any thoughts on my questions above? I like to try to finish the implementation this week.

@cgwalters
Copy link
Member

I know this seems like endless delay but I keep being interrupted by other burning issues. I keep thinking that's going to stop but it hasn't =/

My offhand thought is that it would make sense to have an explicit Transaction object (and at most one at a time right?). I think it's close to what PackageKit does.

I don't much like timeouts though - have it be tied to the lifecycle of the client connection? Would work well for the command line. As long as we can print out something like Transaction lock held by :0.132 (/usr/bin/cockpit-ws) then an admin can step in. And clearly we could support an "rpm-ostree transaction force-release" or something on the command line.

As far as the daemon auto-exiting, I've given up on that for now. Revisit with kdbus?

@petervo
Copy link
Contributor Author

petervo commented Jun 18, 2015

If we go with the GetTransaction -> Rollback method then I would think that you could have multiple transaction objects, just only one can actually be executing a command at any given time. That would also work better for legitimate long running connections like cockpit. If we do Rollback -> Start then yes there should only be one at any given time.

I'll leave auto exiting out for now.

@mbarnes
Copy link
Contributor

mbarnes commented Aug 21, 2015

I spent a couple weeks reworking transactions.

To summarize:

  • Each transaction now operates on its own transient peer-to-peer bus to avoid spamming the system bus with signals that are only intended for one recipient. This also solves some of the cleanup issues discussed above since the daemon can simply wait for the client to close its bus connection, either explicitly or by terminating.
  • Methods that start a transaction now return a bus address rather than an object path. The client should open a connection to the returned bus address, prepare itself to receive signals on the Transaction interface (on the root object path), and then call the Start method. I believe this handshake should avoid any raciness or missed signals.
  • When the transaction completes, it emits a Finished(success, errormsg) signal. The client can close its peer-to-peer connection at this point.

I've done only manual testing so far but it seems to be working, at least for the rpm-ostree tool, and transactions are getting finalized in the daemon when they should.

Some questions for @petervo:

  • A few OS methods -- namely Rebase, GetCachedRebaseRpmDiff, and DownloadRebaseRpmDiff -- take a packages string list that's currently unused. Can you refresh my memory what that was meant for? Some kind of filter?
  • The auth_check_root_or_access_denied authorization handler rejects method calls by anyone but root, but couldn't the same thing be accomplished with a deny-all default policy? i.e.
  <policy context="default">
    <deny send_destination="org.projectatomic.rpmostree1"/>
  </policy>
  • Was any progress made on the Cockpit side for this reworked D-Bus API we hashed out a while back? I couldn't find it if there was.

All my work to date is pushed to the branch. I'm wondering if it would be easier for a first-pass review if we squash the branch and split commits out by class or file (one commit for Sysroot, one commit for OS, etc.). We don't need to review all the intermediate commits.

Also the classes still need a common prefix -- rpmostreed or whatever. OSStub is still a thing. I haven't bothered with that stuff yet just because I'm lazy.

@petervo
Copy link
Contributor Author

petervo commented Aug 21, 2015

Is the bus connection peer-to-peer only? Or can multiple clients connect to it? In cockpit right now the designs don't call for detailed progress information (though it was suggested at one point) but we do change the ui when commands are being run via the command line or some other way. Being able to look at the transaction and know what it is doing is important as some transactions, ei upgrade means we'll want to have the UI prevent the user from doing anything until it is done, others such as the ones that check for updates won't restrict the UI other than to let them know it's checking.

  • The packages strings were for a feature that colin said was coming soon where you could include layered packages in a rebase. Last i checked, which was a while ago, it's not implemented yet. But I'd check with him if he wants it in for future proofing or if it's ok to leave out for now.
  • The deny anyone but root was a temporary solution with the idea that it would eventually grow into a more fine grained polkit based permissions check. But I'm fine with replacing that with a deny-all policy.
  • Squashing and adding prefixes and proper names is good. Will make review easier.
  • The cockpit code was reworked but since the API was still not final it's not up anywhere yet. Once this is stabilized I'll make the needed changes and open a cockpit PR for it.

@mbarnes
Copy link
Contributor

mbarnes commented Aug 21, 2015

The way it's written at present the daemon only accepts the first connection to each peer-to-peer bus, which is assumed to be the method caller. That was mainly to keep the daemon code simple. I could lift that restriction but I was initially hoping to keep transactions private to their caller.

There is also the ActiveTransaction property of Sysroot, which now shows a bus address instead of an object path (just to have it set to something), but we could change the value to a tuple and include the currently executing method name, caller's bus address and anything else Cockpit needs to know. Would that suffice?

@petervo
Copy link
Contributor Author

petervo commented Aug 21, 2015

I'm not convinced that it's worth hiding the data from other users, I worry that the next project that comes by will want something more leading to a lot of churn on what's in that tuple and what gets returned by the method calls, The amount of data outputted to the system bus isn't that crazy compared to say what NetworkManager does,. But I'll defer to you guys when it comes to future planning, adding the method to ActiveTransaction does solve the current known use cases.

@cgwalters
Copy link
Member

We do need to support starting an upgrade from the command line and seeing the result in Cockpit. Since the connection is p2p, it won't actually work to have Cockpit connect to it, right?

DBus is somewhat optimized for emitting signals if no one is listening, that's the whole idea behind AddMatch etc. (However, it still wakes up the bus process currently, which is something fixed with kdbus).

@cgwalters
Copy link
Member

Another use case is starting an upgrade, having your ssh connection drop, then ssh back in and type rpm-ostree upgrade again. Ensuring that use case works should also drive the implementation here.

There are some interesting details of course...in this case the client should detect there's already an upgrade in progress, and have its SIGINT handler interrupt the old existing one.

@cgwalters
Copy link
Member

So I think I'm with @petervo here in being uncertain about the peer-to-peer aspect. That doesn't mean we can't do it, but supporting multiple clients is a lot of the rationale for the daemon, right? Or is there anything I'm missing?

@mbarnes
Copy link
Contributor

mbarnes commented Aug 25, 2015

See the result in Cockpit or see the progress in Cockpit? Key difference.

@cgwalters
Copy link
Member

Both...a design goal of Cockpit is being able to seamlessly work between the traditional command line and the GUI simultaneously. And ignoring Cockpit, in the dropped ssh case, it'd be a lot nicer admin UX to see the ongoing progress with the new client, i.e.:

ssh hostname
rpm-ostree upgrade
Existing upgrade in progress, reattaching.  Control-C to cancel.
Receiving objects: ...

vs

ssh hostname
rpm-ostree upgrade
Existing upgrade in progress, waiting for it to finish...
(no progress)

Right?

@mbarnes
Copy link
Contributor

mbarnes commented Aug 25, 2015

Makes sense. I still think that's best accomplished with peer-to-peer connections, but it will require a bit of redesign to eliminate races for clients connecting mid-stream.

@mbarnes
Copy link
Contributor

mbarnes commented Aug 27, 2015

@cgwalters This latest batch of commits I believe addresses your use case.

If a transaction -- say, an upgrade -- is in progress and another client issues an identical request, the bus address of the ongoing transaction is shared with the 2nd client, allowing it to "tune in" to the progress messages. I didn't test the exact scenario you described with the dropped ssh connection, but I did observe identical progress messages from two simultaneous rpm-ostree upgrade commands.

Given that we're under a time crunch now, I think code review can begin. Unless there's major objections to this design, most of the remaining work should be polish and getting the test suite working.

mbarnes and others added 26 commits September 9, 2015 22:00
Implementing a template pattern for transactions.

The TransactionClass is now abstract, and transaction_new() is replaced
with various method-specific functions like transaction_new_upgrade().
These custom subclasses live in a new file transaction-types.[ch].

Further, transaction_monitor_new_transaction() is replaced with
transaction_monitor_add().  So the handlers for "OS" interface methods
need only create an appropriate transaction instance and hand it off to
the transaction monitor.
Transaction progress and message signals are really only intended for
one recipient: the client that invoked the method.  Use a peer-to-peer
connection for transactions so we're not spamming the system bus.

This entails returning a bus address rather than an object path in
methods that use transactions.  The client opens a connection to the
bus address, connects handlers to the Transaction interface (on path
"/"), and then invokes the Start() method.

To finish a transaction, the client need only close the connection,
either explicitly or by terminating.  The server will detect this
and clean up resources for that transaction.
Because peer-to-peer endpoints don't get assigned unique names, the
sender == owner check is rendered useless.  But I'm not sure we even
need a check since the transaction *is* peer-to-peer now.

One way to secure the returned bus address from prying eyes would be
to employ GcrSecretExchange, but this would only complicate the hand-
shake further and (imo) necessitate a public client-side function to
implement the handshake correctly.
Since the daemon can detect when the client closes its peer-to-peer
connection, simplify the API by converting the Finish() method to a
Finished signal that is only emitted once.

Internally, add a "closed" signal to transactions (triggered by a
closed GDBusConnection), and have the transaction monitor use that
instead of "finished" to know when to dispose of the transaction.
The current policy is to only allow the root user access to the Sysroot
and OS interfaces, but this can be expressed in the static bus config.

The long-term intention is to integrate with PolicyKit.  Leave comments
in the code stating so but remove the unnecessary authorization handler
for the time being, just so there's less code to review.
Change the ActiveTransaction property from the bus address of the active
transaction to a string tuple: (method name, sender name)

The bus address was only a placeholder, and not very useful since each
transaction only accepts one connection (presumably the method caller).
Use 'rpmostreed' as the symbol prefix.
Only watch the caller's bus name until the transaction is started.
Thereafter the transaction proceeds independently of the calling
process.
Add a boolean return so callers can distinguish between actually starting
the transaction or reattaching to an in-progress transaction.
Few things to note:

 - Cancelling a transaction no longer immediately destroys it.

 - Transaction is destroyed when finished (or cancelled) and has
   no client connections.

 - If a client attaches to a finished transaction and calls Start(),
   the transaction will re-emit the Finished signal to that client.

 - The transaction bus address is not yet shared across multiple
   clients, so multiple connections doesn't actually work from the
   outside yet.  It's just supported internally.
If a client makes a request that is identical (that is, same method name
and same parameters) to an ongoing transaction, return the bus address of
that transaction.  The client can then "tune in" to the progress messages.
(Remember the Transaction.Start() method returns a boolean to distinguish
a newly-started transaction from an ongoing transaction.)

The driving use case for this is a dropped ssh connection during a long
running transaction -- like "upgrade" -- and being able to reattach to
the transaction's progress messages mid-stream.
This closes a race condition where the objects might not be exported
by the time clients call methods.

Also delete the code in the "on name lost" handler - it's not going to
happen in practice (we don't allow replacement), and causes issues as
it may be run first before we get the notification that the name is
owned.  github.com/cockpit-project/storaged has some better code here
which we could copy later.

This then in turn allows us to delete the "hold"/"release"
infrastructure.  Basically the daemon will live forever in the
process.
Using this flag tells DBus to activate using systemd, which gives
tighter integration.
Wasn't used for anything.
No doubt better ways are possible but this is the least messy thing
I could come up on a deadline.
So the client side can read it back.

This replaces the GObject "sysroot-path" property in the wrapper class,
which created some additional daemon refactoring.
Some vestigial option parsing cruft revealed the need for the Sysroot
interface to grow a "Path" property.
Having the OS.Upgrade() and OS.Rebase() logic flows conflated in the
daemon had me nervous.  A day's worth of debugging a failing test case
proved that nervousness well-founded.  Split them into distinct backend
operations.
Replaces rpmostreed_load_sysroot_and_repo() with a slightly more
convenient API.
Create and load a new OstreeSysroot and OstreeRepo instance as needed.
This ensures its internal state is up-to-date, since several ostree
commands can alter stored state without the daemon's knowledge.

I would prefer keeping persistent instances if these issues can be
addressed, as it would eliminate some inconvenient error handling.
But this way is safer for now.
Also, renamed the signal from "sysroot-updated" since it originates
from RpmostreedSysroot.
To the best of my knowledge, the daemon does not rely on PATH anywhere.
This chunk of code is uncommented and seems unnecessary so remove it.
Unfortunately RHEL 7 has an older version of dbus, and I use it as a
workstation.  It's not a lot of code and only used for tests.  We can
make it build time conditional down the line or something.
Allows for future extensibility.  Also some of the optional attributes
can actually be optional.  See the XML interface spec for the key names.
Various OS "diff" methods can run concurrently with whatever else is
going on since they don't have to obtain the system root lock.

Just to make sure there's no conflicts when writing deployments or
downloading RPM package details, use an internal reader/writer lock
to protect the critical sections of upgrade, rebase, rollback, etc.
@cgwalters
Copy link
Member

Thanks all!

@jlebon jlebon deleted the daemon branch February 20, 2018 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants