-
Notifications
You must be signed in to change notification settings - Fork 14
Conversation
When a task returns a list of tasks via getTasks, we currently check for cyclical dependencies on each task independently. If they are all connected in the DAG, then this is really slow! This avoids that by finding the strongly connected components jointly across the new *to be added* tasks.
otherwise linear increase
also change a info to a debug in TaskManager
Codecov Report
@@ Coverage Diff @@
## master #373 +/- ##
==========================================
- Coverage 91.95% 91.53% -0.42%
==========================================
Files 31 31
Lines 1156 1182 +26
Branches 65 73 +8
==========================================
+ Hits 1063 1082 +19
- Misses 93 100 +7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good stuff here. A few questions/suggestions.
@@ -78,8 +78,10 @@ abstract class Pipeline(val outputDirectory: Option[Path] = None, | |||
|
|||
/** Recursively navigates dependencies, starting from the supplied task, and add all children to this.tasks. */ | |||
private def addChildren(task : Task) : Unit = { | |||
tasks ++= task.tasksDependingOnThisTask | |||
task.tasksDependingOnThisTask.foreach(addChildren) | |||
task.tasksDependingOnThisTask.filterNot(tasks.contains).foreach { child => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing a filterNot
here, I think you could write this as:
task.tasksDependingOnThisTask.foreach { child =>
if (tasks.add(child)) addChildren(child)
}
The bonus being that for tasks that have not been previously added you're not checking the set twice.
tasks += child | ||
addChildren(child) | ||
// 1. find all tasks connected to this task | ||
val toVisit: mutable.Set[Task] = mutable.HashSet[Task](task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use a Set
here instead of a Stack
? Are you gaining something from the uniqueness checking? Otherwise it's just overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are gaining from uniqueness checking (I'll add a comment). Suppose we have A ==> (B :: C)
and B ==> C
. Even thought this could be simplified to A ==> B ==> C
, that's up to the caller, and we post-processing of the DAG. So when addChildren
gets called on A
, it recurses on B
and C
. Since C
depends on C
, without the uniqueness check we recurse on C
in the addChildren
call on B
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks!
@@ -80,10 +81,22 @@ object TaskManagerDefaults extends LazyLogging { | |||
object TaskManager extends LazyLogging { | |||
import dagr.core.execsystem.TaskManagerDefaults._ | |||
|
|||
/** The initial time to wait between scheduling tasks. */ | |||
val InitialSleepMillis: Int = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!
tasks += child | ||
addChildren(child) | ||
// 1. find all tasks connected to this task | ||
val toVisit: mutable.Set[Task] = mutable.HashSet[Task](task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks!
df74f6b
to
c3dc024
Compare
…zations (#374) * ResourceSet bug fix If the minimum to subset to is fractional, with a different fractional value than the maximum, it could be missed. Also added a small performance optimization. * Task Manager optimizations * A few NaiveScheduler simplifications
I am using the pipeline below to do some stress testing:
Test Pipeline
Place in
pipelines/src/main/scala/dagr/pipelines/TestingPipeline.scala
In particular, these options: