-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Scheduler] Implement FunctionPullingContainerPool #5102
[New Scheduler] Implement FunctionPullingContainerPool #5102
Conversation
import scala.util.{Random, Try} | ||
import scala.collection.immutable.Queue | ||
|
||
case class Creation(creationMessage: ContainerCreationMessage, action: WhiskAction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case class Creation(creationMessage: ContainerCreationMessage, action: WhiskAction) | |
case class CreateContainer(creationMessage: ContainerCreationMessage, action: WhiskAction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly.
import scala.collection.immutable.Queue | ||
|
||
case class Creation(creationMessage: ContainerCreationMessage, action: WhiskAction) | ||
case class Deletion(deletionMessage: ContainerDeletionMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case class Deletion(deletionMessage: ContainerDeletionMessage) | |
case class DeleteContainer(deletionMessage: ContainerDeletionMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly.
logging.warn(this, message) | ||
sendAckToScheduler(create.rootSchedulerIndex, ack) | ||
} else { | ||
logging.info(this, s"received a container creation message: ${create.creationId}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think logs like this should be debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, normally, i should be debug
.
But i think currently use info
is better, after all prs of the scheduler are merged and become stable, i think we can submit a separate pr to change all relative log level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We leave info logs when an activation flows through the system.
Similarly, we can track the container creation flow with this kind of log.
I think we can keep this as info.
sendAckToScheduler(msg.rootSchedulerIndex, ack) | ||
} | ||
|
||
// if warmed containers is failed to resume, we should try to use other container or create a new one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we attempt to remove the container that failed to remove as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, subsequent pr FunctionPullContainerProxy
will remove the container firstly.
206004e
to
8374e39
Compare
Codecov Report
@@ Coverage Diff @@
## master #5102 +/- ##
==========================================
- Coverage 81.51% 75.08% -6.43%
==========================================
Files 220 221 +1
Lines 10731 11178 +447
Branches 444 473 +29
==========================================
- Hits 8747 8393 -354
- Misses 1984 2785 +801
Continue to review full report at Codecov.
|
8374e39
to
273e183
Compare
@@ -91,7 +91,13 @@ class ContainerPool(childFactory: ActorRefFactory => ActorRef, | |||
.nextInt(v.toSeconds.toInt)) | |||
.getOrElse(0) | |||
.seconds | |||
context.system.scheduler.schedule(2.seconds, interval, self, AdjustPrewarmedContainer) | |||
if (prewarmConfig.exists(!_.reactive.isEmpty)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have no need to backfill the prewarm periodically if reactive
configuration is not included in runtimes.json
, because if doesn't exist reactive
in runtimes.json, the prewarm container will not be expired forever.
fyi, the reactive configuration in runtimes.json like below
...
"stemCells": [
{
"initialCount": 2,
"memory": "256 MB",
"reactive": {
"minCount": 1,
"maxCount": 4,
"ttl": "2 minutes",
"threshold": 1,
"increment": 1
}
}
]
...
if (expiredPrewarmedContainer.nonEmpty) { | ||
// emit expired container counter metric with memory + kind | ||
MetricEmitter.emitCounterMetric(LoggingMarkers.CONTAINER_POOL_PREWARM_EXPIRED(memory.toString, kind)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If expiredPrewarmedContainer is empty, have no need to emit this counter metric.
|
||
private var preWarmScheduler: Option[Cancellable] = None | ||
private var prewarmConfigQueue = Queue.empty[(CodeExec[_], ByteSize, Option[FiniteDuration])] | ||
private val prewarmCreateFailedCount = new AtomicInteger(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for retry logic.
Let's assume the max retry limit is 5
,
- If 1th, 2th, 3th creation prewarm failed, but the 4th creation prewarm success, the prewarmCreateFailedCount would be reset to 0. (the count would be reinitialized when creation is succeeded at any time.)
- If 1th, 2th, 3th, 4th, 5th creation prewarm failed, it would stop all creation. wait next round.
} | ||
|
||
// Key is ColdStartKey, value is the number of cold Start in minute | ||
var coldStartCount = immutable.Map.empty[ColdStartKey, Int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This coldStartCount logic already existed in upstream, i just picked up relative logic to this pr
|
||
/** Install prewarm containers up to the configured requirements for each kind/memory combination or specified kind/memory */ | ||
private def adjustPrewarmedContainer(init: Boolean, scheduled: Boolean): Unit = { | ||
if (!shuttingDown) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It the invoker is disabled, have no need to backfill the prewarm.
case _ => false | ||
} | ||
val startingCount = prewarmStartingPool.count(p => p._2._1 == kind && p._2._2 == memory) | ||
val queuingCount = prewarmQueue.count(p => p._1.kind == kind && p._2 == memory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to avoid create a lot of prewarm in a very short time
@bdoyle0182 @style95 it is ready to review again now. |
prewarm-expiration-check-interval-variance: 10 seconds # varies expiration across invokers to avoid many concurrent expirations | ||
prewarm-expiration-limit: 100 # number of prewarms to expire in one expiration cycle (remaining expired will be considered for expiration in next cycle) | ||
prewarm-max-retry-limit: 5 # max retry limit for create prewarm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, this max limit is reached when the subsequent 5 retries are failed.
How about changing this to max subsequent retry limit to create prewarm containers
?
Would be worth mentioning that the count would be reinitialized when creation is succeeded at any time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already changed the comment to max subsequent retry limit to create prewarm containers
prewarm-expiration-check-interval-variance: 10 seconds # varies expiration across invokers to avoid many concurrent expirations | ||
prewarm-expiration-limit: 100 # number of prewarms to expire in one expiration cycle (remaining expired will be considered for expiration in next cycle) | ||
prewarm-max-retry-limit: 5 # max retry limit for create prewarm | ||
prewarm-promotion: false # if true, action can take prewarm container which has bigger memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these following two configurations used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but for prewarm-promotion
, is used in FunctionPullingContainerPool only, and the configuration is false
which mean, didn't take prewarm container which has bigger memory
logging.warn(this, message) | ||
sendAckToScheduler(create.rootSchedulerIndex, ack) | ||
} else { | ||
logging.info(this, s"received a container creation message: ${create.creationId}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We leave info logs when an activation flows through the system.
Similarly, we can track the container creation flow with this kind of log.
I think we can keep this as info.
273e183
to
7a2aca4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Manage container pool.
Design document: https://cwiki.apache.org/confluence/display/OPENWHISK/FunctionPullingContainerPool
Related issue and scope
My changes affect the following components
Types of changes
Checklist: