Datadeps handling for algorithm tasks #19

joott · 2024-07-02T12:33:23Z

BEGINRELEASENOTES

Algorithm functions now created to be aware of the graph state at schedule-time
Data objects no longer scheduled
Outputs are now passed as mutable DataObject structs via Dagger.spawn_datadeps()
Test graphml files now have size and runtime properties
DataObject size allocated according to graphml specification
Length of sleep in algorithm now dependent on graph properties

ENDRELEASENOTES

Data objects are no longer scheduled; algorithms take mutable inputs rather than returning outputs in a Thunk

* added size_kb field to store size as a double * adjusted scheduling.jl treatment of size to align with test graphs

Changed size_kb to size_average_B

m-fila

Awesome job @ottjk Thank you for a quick fix so the tests run fine
I put some minor suggestions and questions

The only thing that worries me is that it doesn't work anymore with multiple processes. I'll check if this can be fixed somehow now. If not we will have to focus on that later

The execution graph done by Dagger doesn't look like what we'd expect, but it seems it's on Dagger as they plot both In and Out data dependencies as inputs

src/scheduling.jl

m-fila · 2024-07-03T11:36:02Z

src/scheduling.jl

+function populate_data_object!(object::DataObject, data)
+    proc = Dagger.thunk_processor()
+    scope = Dagger.scope(worker=myid())
+
+    chunk = Dagger.tochunk(data, proc, scope)
+
+    object.data = chunk
+end


How about getting the size of data to create from the data object instead of having it as another argument?

Suggested change

function populate_data_object!(object::DataObject, data)

proc = Dagger.thunk_processor()

scope = Dagger.scope(worker=myid())

chunk = Dagger.tochunk(data, proc, scope)

object.data = chunk

end

function populate_data_object!(object::DataObject)

data = ' '^round(Int, object.size)

proc = Dagger.thunk_processor()

scope = Dagger.scope(worker=myid())

chunk = Dagger.tochunk(data, proc, scope)

object.data = chunk

end

The part with making a chunk seems to be problematic with multiple processes when addprocs is used (running with julia -p strangely seems to be fine)

Is it necessary to have data as a chunk if what the algorithm receives is a whole DataObject? If I understood correctly the motivation was to place the chunk on the local worker, but as is we have to transfer a whole DataObject as it may live somewhere else.

I tried removing explicit chunk creation and relay on Dagger to handle it, and ended up with something like:

mutable struct DataObject data size::Float64 end function populate_data_object!(object::DataObject) object.data = " "^round(Int, object.size) end

which seems to work without problems with addprocs and -p. I think we could aim now for minimal version and optimize transfers later

As I understand it, the Chunk acts as a reference to some data, so moving around a DataObject with a Chunk in it is just moving around a struct that contains a reference rather than moving around the data itself. The actual data should only get transferred when you collect the Chunk in an algorithm.

I agree we can do without it for now.

m-fila · 2024-07-03T12:02:15Z

src/scheduling.jl

+        println("Gaudi algorithm for vertex $vertex_id !")
+
+        for output in outputs
+            bytes = round(Int, output.size * 1e3)


I think the 1e3 is not right since what we put in the size is measured in bytes.
In another comment I propose to move that part inside populate_data_object!

Co-authored-by: Mateusz Jakub Fila <[email protected]>

m-fila · 2024-07-04T11:31:23Z

src/scheduling.jl

+            transform = get_transform(G, vertex_id)
+            N_inputs = length(incoming_data)
+            res = Dagger.@spawn transform(In.(incoming_data)..., Out.(outgoing_data)...; N_inputs)


I feel the whole chain to execute algorithm is a bit too complicated.
In principle all the parameters from DAG could be extracted and "algorithm" created before anything is submitted to Dagger. I think at this point it may by clearer to replace some closures with callable structs
I'd replace get_transform, AVAILABLE_TRANSFORMS and _algorithm with something like this

struct MockupAlgorithm name::String runtime::Float64 input_length::UInt MockupAlgorithm(graph::MetaDiGraph, vertex_id::Int) = begin runtime = get_prop(graph, vertex_id, :runtime_average_s) name = get_prop(graph, vertex_id, :node_id) inputs = length(inneighbors(graph, vertex_id)) new(name, runtime, inputs) end end function (alg::MockupAlgorithm)(args...) function execute!(alg::MockupAlgorithm, inputs, outputs) println("Executing $(alg.name)") populate_data_object!.(outputs) sleep(alg.runtime) end inputs = args[1:alg.input_length] outputs = args[alg.input_length+1:end] execute!(alg, inputs, outputs) end function schedule_algorithm!(graph::MetaDiGraph, vertex_id::Int) incoming_data = get_in_promises(graph, vertex_id) outgoing_data = get_out_promises(graph, vertex_id) algorithm = MockupAlgorithm(graph, vertex_id) Dagger.@spawn algorithm(In.(incoming_data)..., Out.(outgoing_data)...) end

and then

function schedule_graph(G::MetaDiGraph) data_vertices = MetaGraphs.filter_vertices(G, :type, "DataObject") sorted_vertices = MetaGraphs.topological_sort(G) for data_id in data_vertices size = get_prop(G, data_id, :size_average_B) set_prop!(G, data_id, :res_data, DataObject(nothing, size)) end Dagger.spawn_datadeps() do for vertex_id in setdiff(sorted_vertices, data_vertices) res = schedule_algorithm!(G, vertex_id) set_prop!(G, vertex_id, :res_data, res) end end end

Having the algorithms as immutable callable structs might be useful later when we revise the pipeline scheduling

m-fila · 2024-07-04T11:40:23Z

Unfortunately it seems we have another very big issue here. The Dagger.spawn_datadeps() is waiting for all the tasks spawned with it. This is not a problem for scheduling a single graph as we would wait for its completion anyway

But when having multiple independent graphs we'd like to spawn tasks for all of them (then probably spawn some more graphs) and then wait for all of them to finish.

With current mechanism of spawning multiple graphs we call spawn_graph in a loop. And each spawn_graph does spawn_datadeps and then blocks until all the tasks are done. So we end up with a single graph running at the time

I'm not sure how to approach this. Whether it'd be possible to adjust the way we schedule multiple tasks or find a non-blocking way to schedule a single graph with data deps.

PS. I couldn't find anything about this blocking behavior of spawn_datadeps in docs, but there is a docstring in the source code and (wait_all in the function):

https://github.com/JuliaParallel/Dagger.jl/blob/235b9683e6b02f4de344f43a46738067773c7f62/src/datadeps.jl#L918-L920

At the end of executing f, spawn_datadeps will wait for all launched tasks
to complete, rethrowing the first error, if any. The result of f will be
returned from spawn_datadeps.

joott added 3 commits July 1, 2024 16:13

Reworked datadeps for algorithm scheduling

71afdaa

Data objects are no longer scheduled; algorithms take mutable inputs rather than returning outputs in a Thunk

made algorithm sleep for average runtime

eb8a936

Added size and runtime properties to test graphs

dc65be4

* added size_kb field to store size as a double * adjusted scheduling.jl treatment of size to align with test graphs

m-fila self-requested a review July 2, 2024 17:43

Made graph property consistent with tests

234134e

Changed size_kb to size_average_B

m-fila reviewed Jul 3, 2024

View reviewed changes

joott and others added 2 commits July 4, 2024 13:29

Update src/scheduling.jl

f888b36

Co-authored-by: Mateusz Jakub Fila <[email protected]>

remove AVAILABLE_TRANSFORMS Dict

e98763a

Co-authored-by: Mateusz Jakub Fila <[email protected]>

m-fila reviewed Jul 4, 2024

View reviewed changes

joott mentioned this pull request Jul 9, 2024

Configure algorithm runtime, don't schedule data object nodes as algortihms #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datadeps handling for algorithm tasks #19

Datadeps handling for algorithm tasks #19

joott commented Jul 2, 2024

m-fila left a comment •

edited

Loading

m-fila Jul 3, 2024

m-fila Jul 4, 2024

joott Jul 4, 2024

m-fila Jul 3, 2024

m-fila Jul 4, 2024

m-fila commented Jul 4, 2024 •

edited

Loading

Datadeps handling for algorithm tasks #19

Are you sure you want to change the base?

Datadeps handling for algorithm tasks #19

Conversation

joott commented Jul 2, 2024

m-fila left a comment • edited Loading

Choose a reason for hiding this comment

m-fila Jul 3, 2024

Choose a reason for hiding this comment

m-fila Jul 4, 2024

Choose a reason for hiding this comment

joott Jul 4, 2024

Choose a reason for hiding this comment

m-fila Jul 3, 2024

Choose a reason for hiding this comment

m-fila Jul 4, 2024

Choose a reason for hiding this comment

m-fila commented Jul 4, 2024 • edited Loading

m-fila left a comment •

edited

Loading

m-fila commented Jul 4, 2024 •

edited

Loading