Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible scheduling #2258

Closed
wants to merge 37 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
a52d173
Update flexible scheduling to new main
jpbruinsslot Jan 2, 2024
3447649
Continue refactor
jpbruinsslot Jan 3, 2024
99aeeba
Make test work
jpbruinsslot Jan 4, 2024
0a96e21
Continue adding tests
jpbruinsslot Jan 8, 2024
3ead50e
Add job_store tests
jpbruinsslot Jan 9, 2024
e762a79
Add boefje test for 'no ooi' boefjes
jpbruinsslot Jan 9, 2024
e6f4773
Add more job storage tests, and fix big
jpbruinsslot Jan 9, 2024
d111b51
Remove re-enable of job when task pushed
jpbruinsslot Jan 10, 2024
3a9a95d
Remove test-case
jpbruinsslot Jan 10, 2024
9535b70
Start with api endpoints for jobs
jpbruinsslot Jan 10, 2024
21e8f33
Implement api endpoints, tests, and ranker
jpbruinsslot Jan 11, 2024
5123f33
Basic deadline calculation
jpbruinsslot Jan 15, 2024
fa15756
Update
jpbruinsslot Jan 15, 2024
45fb545
Implement cron like functionality
jpbruinsslot Jan 16, 2024
291a73d
Add ValidationError
jpbruinsslot Jan 17, 2024
ed2fd7f
Remove rate limit mentions in favour of feature branch
jpbruinsslot Jan 18, 2024
8e78f9c
Merge branch 'main' into feature/mula/flexible-scheduling
jpbruinsslot Jan 18, 2024
4d020a7
Trying pre-commit
jpbruinsslot Jan 18, 2024
0f193fb
Update documentation
jpbruinsslot Jan 22, 2024
9b34f56
Update docs with schematic
jpbruinsslot Jan 23, 2024
776242d
Update schematic
jpbruinsslot Jan 24, 2024
dcecb2f
Update architecture documentation
jpbruinsslot Jan 24, 2024
cc85e65
Add diagrams
jpbruinsslot Jan 25, 2024
01e01c5
Update documentation
jpbruinsslot Jan 25, 2024
44e415f
Update docs and restructure some code
jpbruinsslot Jan 29, 2024
8968abe
Fix diagram010.svg
jpbruinsslot Jan 29, 2024
86db29b
Fix
jpbruinsslot Jan 29, 2024
04995f7
Merge branch 'main' into chore/mula/update-architecture-doc
jpbruinsslot Jan 29, 2024
e76dc69
Merge branch 'main' into feature/mula/flexible-scheduling
jpbruinsslot Jan 29, 2024
2a7693b
Merge branch 'chore/mula/update-architecture-doc' into feature/mula/f…
jpbruinsslot Jan 29, 2024
a9cd630
Update docs
jpbruinsslot Jan 29, 2024
4416704
Update documentation
jpbruinsslot Jan 30, 2024
e12c0ee
Refactor naming
jpbruinsslot Jan 30, 2024
97b0350
Update docs
jpbruinsslot Jan 31, 2024
85777ea
Merge branch 'main' into feature/mula/flexible-scheduling
jpbruinsslot Mar 5, 2024
245ca61
First round of git pre-commit
jpbruinsslot Mar 5, 2024
2838b45
Fix mypy suggestions
jpbruinsslot Mar 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 86 additions & 31 deletions mula/docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,13 +129,20 @@ we will check the following:

- check if the same task is already on the priority queue

Important to note is that when a `BoefjeTask` is created and pushed onto the
queue as a `PrioritizedItem` a new unique `TaskRun` is generated.[^1] This
ensures that each task has its own dedicated `TaskRun` throughout its entire
lifecycle. This approach maintains a distinct record for each task, providing
an accurate and independent history of task statuses. This means that each
execution of a `BoefjeTask`, regardless of whether it's the same task being
repeated in the future, is tracked independently with its own unique `TaskRun`.
> [!IMPORTANT]
> Important to note is that when a `PrioritizedItem` is created and pushed onto
> the queue. A `TaskRun` and `Schedule` is created for this item. Below we'll
> explain the explain the role of these two entities in more detail.

##### `TaskRun`

When a `BoefjeTask` is created and pushed onto the queue as a `PrioritizedItem`
a new unique `TaskRun` is generated.[^1] This ensures that each task has its
own dedicated `TaskRun` throughout its entire lifecycle. This approach
maintains a distinct record for each task, providing an accurate and
independent history of task statuses. This means that each execution of a
`BoefjeTask`, regardless of whether it's the same task being repeated in the
future, is tracked independently with its own unique `TaskRun`.

This approach ensures that the historical record of each task's execution is
distinct, providing a clear and isolated view of each instance of the task's
Expand Down Expand Up @@ -169,6 +176,27 @@ keep track of the status of this task throughout the system we update its
`TaskRun` status by either setting the status to `COMPLETED`, `FAILED` or
`CANCELLED`. (5)

##### `Schedule`

Since a task within the KAT implementation of the scheduler, can generate
findings at a specific moment in time. We want to account for additional
findings or changes for the same task at a later moment in time. Meaning we
want to be able to reschedule particular tasks.

In order to support this, every task that is executed by the
`BoefjesScheduler` a `Schedule` is created. This `Schedule` contains
the necessary information and the specific task in order to reschedule a task
at a later moment in time.

![diagram006](./img/diagram006.svg)

A `Schedule` supports a cron-like expression as schedule, which makes it
possible to schedule tasks at certain intervals. When such an expression isn't
set, the task will be scheduled at a future calculated date (deadline ranker
calculation).

To see how task will be rescheduled, refer to the 'Processes' section.

#### Processes

In order to create a `BoefjeTask` and trigger the dataflow we described above
Expand All @@ -180,12 +208,25 @@ tasks. Namely:
3. rescheduling of prior tasks
4. manual scan job

![diagram006](./img/diagram006.svg)
![diagram007](./img/diagram007.svg)

##### 1. Scan profile mutations
#### Processes

In order to create a `BoefjeTask` and trigger the dataflow we described above
we have 4 different processes within a `BoefjeScheduler` that can create boefje
tasks. Namely:

1. scan profile mutations
2. enabling of boefjes
3. rescheduling of prior tasks
4. manual scan job

![diagram007](./img/diagram007.svg)

##### 1. Scan profile mutations

![diagram008](./img/diagram008.svg)

When a scan level is increased on an OOI
(`schedulers.boefje.push_tasks_for_scan_profile_mutations`) a message is pushed
on the RabbitMQ `{organization_id}__scan_profile_mutations` queue. The scheduler
Expand All @@ -212,7 +253,7 @@ The dataflow is as follows:

##### 2. Enabling of boefjes

![diagram008](./img/diagram008.svg)
![diagram009](./img/diagram009.svg)

When a plugin of type `boefje` is enabled or disabled in Rocky. The dataflow is
triggered when the plugin cache of an organisation is flushed.
Expand All @@ -236,7 +277,7 @@ The dataflow is as follows:

##### 3. Rescheduling of prior tasks

![diagram009](./img/diagram009.svg)
![diagram010](./img/diagram010.svg)

In order to re-run tasks that have been executed in the past we try to create
new tasks on ooi's. We continuously get a batch of random ooi's from octopoes
Expand Down Expand Up @@ -266,7 +307,7 @@ The dataflow is as follows:

##### 4. Manual scan job

![diagram010](./img/diagram010.svg)
![diagram011](./img/diagram011.svg)

Scan jobs created by the user in Rocky (`server.push_queue`), will
get the highest priority of 1. Note, that this will circumvent all the checks
Expand Down Expand Up @@ -477,25 +518,39 @@ classDiagram

```mermaid
erDiagram
items {
uuid id PK
character_varying scheduler_id
character_varying hash
integer priority
jsonb data
timestamp_with_time_zone created_at
timestamp_with_time_zone modified_at
}

tasks {
uuid id PK
character_varying scheduler_id
taskstatus status
timestamp_with_time_zone created_at
timestamp_with_time_zone modified_at
jsonb p_item
character_varying type
}
schedules {
uuid id PK
character_varying scheduler_id
boolean enabled
jsonb p_item
character_varying cron_expression
timestamp_with_time_zone deadline_at
timestamp_with_time_zone evaluated_at
timestamp_with_time_zone created_at
timestamp_with_time_zone modified_at
}
task_runs {
uuid id PK
uuid job_id FK
character_varying scheduler_id
taskstatus status
timestamp_with_time_zone created_at
timestamp_with_time_zone modified_at
jsonb p_item
character_varying type
}
items {
uuid id PK
character_varying scheduler_id
character_varying hash
integer priority
jsonb data
timestamp_with_time_zone created_at
timestamp_with_time_zone modified_at
}


tasks }o--|| jobs: ""
```

## Project structure
Expand Down
2 changes: 1 addition & 1 deletion mula/docs/img/diagram006.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion mula/docs/img/diagram010.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions mula/docs/img/diagram011.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion mula/docs/img/schematic-drawing.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions mula/docs/schematic.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading