Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GAP-17] Offline requestor model #40

Merged
merged 8 commits into from
Jan 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 145 additions & 0 deletions gaps/gap-17_offline_requestor_model/gap-17_offline_requestor_model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
gap: GAP-17
title: Offline Requestor Model
description: Considerations and features required to implement Golem applications where Requestor is not required to be constantly connected to the network.
author: stranger80 (@stranger80)
status: Draft
type: Feature
---

## Abstract
Golem Network is a space where Requestors and Providers interact to trade computing resources in exchange for crypto-tokens. Such collaboration may not require constant connection between Requestor and Providers of the resources. The connectivity may also be disrupted by such events as machine failures & outages, network issues, etc. This article considers a number of scenarios where Requestor is not required to remain constantly online, and introduces a number of features required to support such scenarios in Golem ecosystem.

## Motivation
The features indicated or referenced in this GAP are intended to meet following objectives:
- Improve application resilience to network & requestor node failures
- Enable "fire&forget", and self-sustaining application models

## Specification
Three "Offline Requestor" scenarios are considered in this GAP.

#### **A. Requestor partially connected**
Characteristics:
- Requestor daemon remains online until Agreement is signed and Activity started
- Requestor daemon may then disconnect while the Activity is in progress
- Requestor daemon may reconnect at any time, and exercise control over the Activity via ExeScript

**Notes:**
- This scenario is only possible with payment schemes which:
* either assume upfront payment,
* or allow long intervals between payments,
* or are self-sustained and not depending on the presence of the daemon.

#### **B. Requestor offline ("fire&forget")**
Characteristics:
- Requestor daemon remains online until Agreement is signed and Activity started
- Requestor daemon disconnects permanently, while the Activity continues (probably until agreed computation is complete, or funds run out)

**Notes:**
- This scenario is only possible with payment schemes which:
* either assume upfront payment,
* or allow long intervals between payments,
* or are self-sustained and not depending on the presence of the daemon.

#### **C. Requestor delegates control**
Characteristics:
- Requestor daemon remains online until Agreement is signed and Activity started
- Requestor daemon transfers grants control rights to another node/identity, which remains online and exercises control over the Activity
Comment on lines +46 to +47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say, that in case of delegating permissions, Requestor doesn't even have to sign Agreements. If other service is responsible for sustaining the execution of tasks, than it needs the ability to sign Agreements on behalf of this Requestor. In such scenario sustain service could initialize everything by himself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delay on commenting this.

The "Scenario C" is purely about the ability to delegate control to an already signed Agreement (which is paid by whatever payment method/platform negotiated by original Requestor). This is not about delegating rights to "token pool" owned by the original Requestor to another Requestor. For two primary reasons:

  • Delegating rights to create agreement from "offline Requestor" to another Requestormakes the latter, well, an "online Requestor" :)
  • I don't want to propose a generic way of delegating rights to a "token pool".

The scenario where original "bootstrap" Requestor only initiates an application with initial set of Agreements and then empowers other nodes to create new Agreements - I would like to handle in a slightly different way, via a [GAP-18] "drip-feed payment contract" (so allowing other nodes to manage am onchain payment mechanism), and [GAP-19] "Golem Supervisor concept". Please bear with me.


**Notes:**
- This requires **Agreement Permissions Management** feature

### Proposed features

This GAP introduces following features which are aimed at enabling the "offline Reuqestor" scenarios listed above:
- Activity Attach/Detach
- Self-sustained Payments
- Agreement Permissions Management
Copy link

@johny-b johny-b Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought:

Agreement Permissions Management is a separate feature that has many use cases that have nothing to do with the "offline requestor". E.g. developer who creates an app that is supposed to run on Golem might prefer to rent (buy?) agreements from some third party, let's call them "broker". This way, from the developer POV:

  • There is no strategy-related complexity
  • There is no payment-related complexity (e.g. broker service could just support Mastercard)
  • Broker can handle some part of the risk (e.g. agreements that were ended by the provider are "free" for the end user, and broker pays-or-not-pays the provider, but from the developer POV this doesn't matter)

Actually, I wouldn't be surprised if this became the main way developers interact with Golem in the future.

-->

  1. I think this topic might deserve a separate GAP ...
  2. ... that should be a little more general - e.g. we might want to rent activities, not agreements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now a separate GAP for this: #56


The mapping between the scenarios and features required to implement them is indicated below:

| | Activity attach/detach | Self-sustained payments | Agreement Permissions Management |
|-|-|-|-|
| Requestor partially connected | X | X | |
| Requestor offline ("fire&forget") | X | X | |
| Requestor delegates control | X | | X |
| | | | |

#### **Feature: Activity attach/detach**
We propose to introduce an ability for Requestor node to disconnect from the network while controlled Activity remains active, and then gracefully reconnect, and take control of the Activity. **Note** this can happen intentionally, or as a result of eg. a network failure or software error. Therefore the ability to attach a Requestor to resume control over an Activity is improving the reliability and robustness of Golem as a platform.

The implementation of this feature requires considerations on two levels:

1. Requestor Agent application goes offline, while corresponding `yagna` daemon remains online
- The HL API implementations shall include the ability to obtain a "live" instance of Activity object and corresponding Agreement object based on `activityID`, and then perform actions on them (ExeScript command execution and results processing, Agreement control, including termination), as in the following pseudo-code example:
```
...
activity = Golem.attach_activity(activityId) # returns an "attached" Activity object, which can then be used to manipulate the Activity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, it doesn't have to be attach_activity ... maybe, in this scenario it suffices for the activity to be instantiated -> iow:

activity = Activity(activityId)

Attaching would be required if we wanted to resume a current high-level Task/Service worker with the given activity... but then, the engine would need to gather also other data, like agreement ids, etc

Copy link
Contributor Author

@stranger80 stranger80 Jul 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I believe this should be an explicit operation on a Golem object, which is representing the "context" in which the Activity resides.

Alternative 'style' would be to create an Activity object and populate its ID, then 'attach' it to the Golem context, for the engine to 'hydrate' it (reconcile against current state and populate resptective Activity properties):

...
activity = Activity(activityId)
activity = Golem.attach_activity(activity)
...

...but the 'shorthand' syntax I've proposed in the GAP text seems 'lighter', so I prefer that.

Copy link

@johny-b johny-b Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, my current version is

golem = GolemNode()
activity = golem.activity(activity_id)

and this only initializes the Activity object (doesn't even validate the activity_id).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but in case an Activity with a given activity_id exists on the network - will golem "attach" itself to it and allow the subsequent control?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand the question.

GolemNode objects corresponds to a (YAGNA_URL, APP_KEY) pair. If this is enough to control the activity, it will allow the subsequent control. It doesn't matter if the activity was created with this GolemNode instance or not (or in this script run, or this yagna instance).

IOW, I think that if you created an activity and shared with me activity_id and your APP_KEY I could do:

async with GolemNode() as golem:
    activity = golem.activity(activity_id)
    activity.execute_command(...)

without additional effort (except for setting yagna identity with the same APP_KEY).
(I might be missing something, but in principle this is how this should work, I think?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...OK, cool, and for example if we do this:

async with GolemNode() as golem:
    activity = golem.activity(activity_id)

...will the activity object correctly represent the actual state of the Activity?

Copy link

@johny-b johny-b Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you'll need an additional call

async with GolemNode() as golem:
    activity = golem.activity(activity_id)
    state = await activity.get_state()

Copy link

@johny-b johny-b Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more consideration, I decided I don't understand what is "attach" and "detach" in your model :/ For me, every entity with activity_id and credentials is "attached" (i.e. it can interact with the Activity).
So - sorry if my answers don't make sense :)


exeScriptResult = activity.exec(exeScript) # Activity can receive ExeScript commands, etc.

activity.agreement.terminate(reason) # ...eventually the corresponding Agreement can be terminated
...
```
- Persisting the information about running Activities to which an HL API caller may "attach" is out of scope for HL API implementation, ie. it is the designer of the Requestor Agent application, whois responsible for eg. persisitng the obtained `activityIDs`, in order to be able to "attach" to Activities which are "in-flight".
- TBC decide how to approach the Task and Service objects.

2. Requestor `yagna` daemon goes offline and should have ability to reconnect to network
- The daemon may disconnect from the network, and the current state of the Golem node (incuding Demands/Offers, Agreements, Acitivites, etc.) should be persisted.
- The daemon may re-connect to the network, and it should be capable of synchronizing the persisted state with the actual state of relevant entities on the network (eg. update the actual state of Agreements, Activities, respective Invoices/DebitNotes, events, etc.)
- After the reconnecting and successful synchronization, the `yagna` APIs should continue to work as if there was no offline period.

**'Detach' and in-flight operations**

It shall be acceptable for 'detach' to happen while there are API calls in-flight.

For long-polling API calls (eg. `collectDemands`/`collectOffers`, `getExecBatchResults`) the 'detach' should break the running API calls.
- For scenario when Agent application goes offline - the daemon implementation shall persist the relevant data structures, so that after subsequent 'attach' the Agent may query for data received by the deamon while the Agent remained offline.
- For scenario when daemon goes offline - the Golem network protocol shall ensure that relevant data objects are persisted on sender side, and delivered after the daemon subsequently 'attaches' itself.

**Note:** The `getExecBatchResults` API called in `text/event-stream` ('streaming') mode will not attempt to persist the undelivered stream content while calling Agent/daemon remains offline. The same API called in `application/json` ('non-streaming') mode will operate as described above.

#### **Feature: Self-sustained payments**
A `yagna` Payment Platform abstraction is proposed which implements the standard Payment API logic (Invoice/DebitNote issuance, Payment processing), however does not require the Requestor to be online to accept Invoices/DebitNotes.

Such a self-sustained Payment Platform shall be implemented as standard payment platform, where Provider-side Payment API calls are not routed to Requestor node (and corresponding Agent application), but instead are handled by a party, which provides the "Accept/Reject" logic, and is able to lauch respective payments on behalf of the Reuqestor who signed the Agreement.

The "payment issuing" party can be for example:
- A "payment depositary/broker" service, which accepts prepayment from the Reuqestor, as well as instructions on how the funds can be released (which can be as simple as "release x GLM per block on bloackchain", or may include more sophisticated Invoice acceptance logic).
- A "payment channel" smart-contract on a blockchain network.

**In general,** the self-sustained Payment mechanism shall be expressed in Demand&Offer via dedicated, standardized properties from `golem.com.payment` property namespace. This is important, as a specific payment platform & scheme are required to ensure compatibility between Requestor and Provider nodes to support a specific "offline Requestor" scenario.

#### **Feature: Agreement Permissions Management**
In order to support scenarios where control delegation from Requestor to a different Golem node is performed, the Golem APIs must include a concept of permissions and grants.

This feature is described in a dedicated [GAP-24]().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken link


## Rationale
The proposed behaviour of 'attach/detach' feature is driven by intent to increase robustness of Golem platform, and expand the space of supported usage scenarios, but also taking into account the state of existing `yagna` implementation. It is assumed that the implementaion of the proposed feature logic would not imply substantial redesign of Golem network protocols & implementations.

## Backwards Compatibility
Backwards compatibility can be considered separately for each feature proposed.
### Activity attach/detach

#### 1. Requestor Agent application goes offline - `yagna` daemon remains online
This scenario is implemented only on Agent application level (so in HL API library), so backwards compatibility is ensured between a "new" HL API library implementation and "legacy" `yagna` daemon implementation (as no `yagna` REST API changes are required).

#### 2. Requestor `yagna` daemon goes offline
**IMPORTANT** Will current Golem net implementations (eg. "hybrid net") support a scenario where `yagna` daemon is reconnecting to network after downtime?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


### Self-sustained payments
As self-susteined payment mechanisms are to be provided by specific, distinguished payment platform & scheme implementations, the Golem nodes and Agent applications will ensure compatibility by indicating respective payment conditions via properties & constraints in Demands & Offers. Therefore only nodes which support self-sustained payments will enter an Agreement, so backwards compatibility will be ensured.

### Agreement Permissions Management
See relevant [GAP-24]() for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken link


## Test Cases
Indicative test scenario suite is available [here](gap-17_test_cases.md).

## Security Considerations
Considerations must be given to all vulnerabilities which may result from a fact that the original Requestor (who signed the Agreement) disconnects from the network, leaving the Provider 'unattended'. A malicious 'usurper' may be tempted to disguise as the original Requestor to gain control over the in-flight Agreements&Activites.
It seems mandatory to ensure the communciation security (ie. message integrity and authentication) is achieved on Golem Net level - so that the Golem Net transport layer ensures the identity of the sender of data over the network.

## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
71 changes: 71 additions & 0 deletions gaps/gap-17_offline_requestor_model/gap-17_test_cases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Offline Requestor Model - Test Scenarios

## 1. Requestor partially connected

### 1.1. Requestor Agent outage
1. Launch & initialize Requestor `yagna` daemon
2. Launch Requestor Agent to start an interactive service payload on a VM
3. Execute a simple ExeScript batch to ensure successful interaction with payload
4. Shutdown the Requestor Agent
5. Restart the Requestor Agent
6. Execute a simple ExeScript batch to ensure the Requestor Agent application has successfully reconnected to the Daemon

### 1.2. Requestor Daemon outage
1. Launch & initialize Requestor `yagna` daemon
2. Launch Requestor Agent to start an interactive service payload on a VM
3. Execute a simple ExeScript batch to ensure successful interaction with payload
4. Shutdown the Requestor Daemon
5. Restart the Requestor Daemon
6. Execute a simple ExeScript batch to ensure the Daemon has successfully reconnected

### 1.3. Network disruption
1. Launch & initialize Requestor `yagna` daemon
2. Launch Requestor Agent to start an interactive service payload on a VM
3. Execute a simple ExeScript batch to ensure successful interaction with payload
4. Disconnect the Requestor machine from network
5. Reconnect the Requestor machine to network
6. Execute a simple ExeScript batch to ensure the Requestor Daemon has successfully reconnected

### 1.4. Requestor Daemon network address change
1. Launch & initialize Requestor `yagna` daemon on machine with IP address A
2. Launch Requestor Agent to start an interactive service payload on a VM
3. Execute a simple ExeScript batch to ensure successful interaction with payload
4. Shutdown the Requestor Daemon
5. Start the Requestor Daemon with the same node id/identity on a different IP address
6. Execute a simple ExeScript batch to ensure the Daemon has successfully reconnected from a different network address


## 2. Requestor offline ("fire&forget")

### 2.1. Upfront payment
1. Launch & initialize Requestor `yagna` daemon
2. Launch Requestor Agent to start an interactive service, with upfront payment allowing for `t` seconds of operation
3. Ensure the service runs successfully (eg. by observing logs on Provider)
4. Shutdown the Requestor Agent & Daemon
5. Ensure the service runs successfully on provider
6. Wait `t` seconds
7. Ensure the Provider has terminated the Activity after the budget runs out

### 2.2. Self-sustained payment
1. Launch & initialize Requestor `yagna` daemon
2. Launch Requestor Agent to start an interactive service, with self-sustained payment platform
3. Ensure the service runs successfully (eg. by observing logs on Provider)
4. Shutdown the Requestor Agent & Daemon
5. Ensure the service runs successfully on Provider
6. Restart the Requestor Agent & Daemon
7. Terminate the Activity

## 3. Requestor delegates control

### 3.1. Full control delegation
1. Launch & initialize Requestor `yagna` daemons A & B
2. Launch Requestor Agent on Requestor A to start an interactive service, with pay-as-you-go payment scheme
3. Execute a simple ExeScript batch from Requestor A to ensure successful interaction with payload
4. Grant control over the Agreement to Requestor B
5. Launch Requestor Agent on Requestor B
6. Execute a simple ExeScript batch from Requestor B to ensure successful interaction with payload
7. Terminate the Activity from Requestor B

### 3.2. Partial control delegation
...basically run scenarios as in 3.1., but granting various atomic permissions and validating that respective actions are permitted/forbidden accordingly.