[New Scheduler] Implement FunctionPullingContainerPool #5102

ningyougang · 2021-04-14T08:40:03Z

Description

Manage container pool.

Design document: https://cwiki.apache.org/confluence/display/OPENWHISK/FunctionPullingContainerPool

Related issue and scope

I opened an issue to propose and discuss this change (#????)

My changes affect the following components

Types of changes

Bug fix (generally a non-breaking change which closes an issue).
Enhancement or new feature (adds new functionality).
Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

I signed an Apache CLA.
I reviewed the style guides and followed the recommendations (Travis CI will check :).
I added tests to cover my changes.
My changes require further changes to the documentation.
I updated the documentation where necessary.

bdoyle0182 · 2021-04-19T21:35:18Z

...src/main/scala/org/apache/openwhisk/core/containerpool/v2/FunctionPullingContainerPool.scala

+import scala.util.{Random, Try}
+import scala.collection.immutable.Queue
+
+case class Creation(creationMessage: ContainerCreationMessage, action: WhiskAction)


Suggested change

case class Creation(creationMessage: ContainerCreationMessage, action: WhiskAction)

case class CreateContainer(creationMessage: ContainerCreationMessage, action: WhiskAction)

Updated accordingly.

bdoyle0182 · 2021-04-19T21:35:42Z

...src/main/scala/org/apache/openwhisk/core/containerpool/v2/FunctionPullingContainerPool.scala

+import scala.collection.immutable.Queue
+
+case class Creation(creationMessage: ContainerCreationMessage, action: WhiskAction)
+case class Deletion(deletionMessage: ContainerDeletionMessage)


Suggested change

case class Deletion(deletionMessage: ContainerDeletionMessage)

case class DeleteContainer(deletionMessage: ContainerDeletionMessage)

Updated accordingly.

bdoyle0182 · 2021-04-19T21:41:45Z

...src/main/scala/org/apache/openwhisk/core/containerpool/v2/FunctionPullingContainerPool.scala

+        logging.warn(this, message)
+        sendAckToScheduler(create.rootSchedulerIndex, ack)
+      } else {
+        logging.info(this, s"received a container creation message: ${create.creationId}")


I still think logs like this should be debug

Yes, normally, i should be debug.
But i think currently use info is better, after all prs of the scheduler are merged and become stable, i think we can submit a separate pr to change all relative log level.

We leave info logs when an activation flows through the system.
Similarly, we can track the container creation flow with this kind of log.
I think we can keep this as info.

bdoyle0182 · 2021-04-19T21:46:52Z

...src/main/scala/org/apache/openwhisk/core/containerpool/v2/FunctionPullingContainerPool.scala

+        sendAckToScheduler(msg.rootSchedulerIndex, ack)
+      }
+
+    // if warmed containers is failed to resume, we should try to use other container or create a new one


should we attempt to remove the container that failed to remove as well?

Yes, you are right, subsequent pr FunctionPullContainerProxy will remove the container firstly.

codecov-commenter · 2021-04-28T04:41:20Z

Codecov Report

Merging #5102 (7a2aca4) into master (aa7e6e2) will decrease coverage by 6.42%.
The diff coverage is 75.77%.

@@            Coverage Diff             @@
##           master    #5102      +/-   ##
==========================================
- Coverage   81.51%   75.08%   -6.43%     
==========================================
  Files         220      221       +1     
  Lines       10731    11178     +447     
  Branches      444      473      +29     
==========================================
- Hits         8747     8393     -354     
- Misses       1984     2785     +801

Impacted Files	Coverage Δ
...k/core/containerpool/v2/InvokerHealthManager.scala	`75.37% <ø> (ø)`
.../org/apache/openwhisk/core/connector/Message.scala	`67.35% <36.36%> (-12.36%)`	⬇️
...in/scala/org/apache/openwhisk/common/Logging.scala	`77.11% <50.00%> (-8.82%)`	⬇️
...ontainerpool/v2/FunctionPullingContainerPool.scala	`80.53% <80.53%> (ø)`
...penwhisk/core/containerpool/ContainerFactory.scala	`88.88% <100.00%> (+1.38%)`	⬆️
...ala/org/apache/openwhisk/core/entity/DocInfo.scala	`95.12% <100.00%> (+1.18%)`	⬆️
.../scala/org/apache/openwhisk/core/entity/Size.scala	`79.10% <100.00%> (ø)`
...e/openwhisk/core/containerpool/ContainerPool.scala	`96.86% <100.00%> (+0.05%)`	⬆️
...ntainerpool/v2/FunctionPullingContainerProxy.scala	`14.28% <100.00%> (+14.28%)`	⬆️
...core/database/cosmosdb/RxObservableImplicits.scala	`0.00% <0.00%> (-100.00%)`	⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa7e6e2...7a2aca4. Read the comment docs.

ningyougang · 2021-04-29T09:20:23Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

@@ -91,7 +91,13 @@ class ContainerPool(childFactory: ActorRefFactory => ActorRef,
        .nextInt(v.toSeconds.toInt))
    .getOrElse(0)
    .seconds
-  context.system.scheduler.schedule(2.seconds, interval, self, AdjustPrewarmedContainer)
+  if (prewarmConfig.exists(!_.reactive.isEmpty)) {


Have no need to backfill the prewarm periodically if reactive configuration is not included in runtimes.json, because if doesn't exist reactive in runtimes.json, the prewarm container will not be expired forever.

fyi, the reactive configuration in runtimes.json like below

... "stemCells": [ { "initialCount": 2, "memory": "256 MB", "reactive": { "minCount": 1, "maxCount": 4, "ttl": "2 minutes", "threshold": 1, "increment": 1 } } ] ...

ningyougang · 2021-04-29T09:21:14Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

            if (expiredPrewarmedContainer.nonEmpty) {
+              // emit expired container counter metric with memory + kind
+              MetricEmitter.emitCounterMetric(LoggingMarkers.CONTAINER_POOL_PREWARM_EXPIRED(memory.toString, kind))


If expiredPrewarmedContainer is empty, have no need to emit this counter metric.

ningyougang · 2021-04-29T09:22:32Z

...src/main/scala/org/apache/openwhisk/core/containerpool/v2/FunctionPullingContainerPool.scala

+
+  private var preWarmScheduler: Option[Cancellable] = None
+  private var prewarmConfigQueue = Queue.empty[(CodeExec[_], ByteSize, Option[FiniteDuration])]
+  private val prewarmCreateFailedCount = new AtomicInteger(0)


This is for retry logic.

Let's assume the max retry limit is 5,

If 1th, 2th, 3th creation prewarm failed, but the 4th creation prewarm success, the prewarmCreateFailedCount would be reset to 0. (the count would be reinitialized when creation is succeeded at any time.)

If 1th, 2th, 3th, 4th, 5th creation prewarm failed, it would stop all creation. wait next round.

ningyougang · 2021-04-29T09:24:29Z

...src/main/scala/org/apache/openwhisk/core/containerpool/v2/FunctionPullingContainerPool.scala

+  }
+
+  // Key is ColdStartKey, value is the number of cold Start in minute
+  var coldStartCount = immutable.Map.empty[ColdStartKey, Int]


This coldStartCount logic already existed in upstream, i just picked up relative logic to this pr

ningyougang · 2021-04-29T09:25:11Z

...src/main/scala/org/apache/openwhisk/core/containerpool/v2/FunctionPullingContainerPool.scala

+
+  /** Install prewarm containers up to the configured requirements for each kind/memory combination or specified kind/memory */
+  private def adjustPrewarmedContainer(init: Boolean, scheduled: Boolean): Unit = {
+    if (!shuttingDown) {


It the invoker is disabled, have no need to backfill the prewarm.

ningyougang · 2021-05-06T05:13:06Z

...src/main/scala/org/apache/openwhisk/core/containerpool/v2/FunctionPullingContainerPool.scala

+        case _ => false
+      }
+      val startingCount = prewarmStartingPool.count(p => p._2._1 == kind && p._2._2 == memory)
+      val queuingCount = prewarmQueue.count(p => p._1.kind == kind && p._2 == memory)


This is to avoid create a lot of prewarm in a very short time

ningyougang · 2021-05-06T05:14:29Z

@bdoyle0182 @style95 it is ready to review again now.

style95 · 2021-05-06T09:36:08Z

core/invoker/src/main/resources/application.conf

    prewarm-expiration-check-interval-variance: 10 seconds # varies expiration across invokers to avoid many concurrent expirations
    prewarm-expiration-limit: 100 # number of prewarms to expire in one expiration cycle (remaining expired will be considered for expiration in next cycle)
+    prewarm-max-retry-limit: 5 # max retry limit for create prewarm


Basically, this max limit is reached when the subsequent 5 retries are failed.
How about changing this to max subsequent retry limit to create prewarm containers?

Would be worth mentioning that the count would be reinitialized when creation is succeeded at any time.

Already changed the comment to max subsequent retry limit to create prewarm containers

style95 · 2021-05-06T09:36:56Z

core/invoker/src/main/resources/application.conf

    prewarm-expiration-check-interval-variance: 10 seconds # varies expiration across invokers to avoid many concurrent expirations
    prewarm-expiration-limit: 100 # number of prewarms to expire in one expiration cycle (remaining expired will be considered for expiration in next cycle)
+    prewarm-max-retry-limit: 5 # max retry limit for create prewarm
+    prewarm-promotion: false # if true, action can take prewarm container which has bigger memory


Are these following two configurations used?

Yes, but for prewarm-promotion, is used in FunctionPullingContainerPool only, and the configuration is false which mean, didn't take prewarm container which has bigger memory

style95 · 2021-05-06T09:39:39Z

...src/main/scala/org/apache/openwhisk/core/containerpool/v2/FunctionPullingContainerPool.scala

+        logging.warn(this, message)
+        sendAckToScheduler(create.rootSchedulerIndex, ack)
+      } else {
+        logging.info(this, s"received a container creation message: ${create.creationId}")


We leave info logs when an activation flows through the system.
Similarly, we can track the container creation flow with this kind of log.
I think we can keep this as info.

style95

LGTM

ningyougang added the wip label Apr 14, 2021

ningyougang changed the title ~~[wip][scheduler] Implement FunctionPullingContainerPool~~ [wip][New Scheduler] Implement FunctionPullingContainerPool Apr 14, 2021

style95 added the scheduler label Apr 15, 2021

bdoyle0182 reviewed Apr 19, 2021

View reviewed changes

ningyougang force-pushed the FunctionPullingContainerPool branch from 206004e to 8374e39 Compare April 28, 2021 03:22

ningyougang changed the title ~~[wip][New Scheduler] Implement FunctionPullingContainerPool~~ [New Scheduler] Implement FunctionPullingContainerPool Apr 28, 2021

ningyougang force-pushed the FunctionPullingContainerPool branch from 8374e39 to 273e183 Compare April 29, 2021 09:15

ningyougang commented Apr 29, 2021

View reviewed changes

ningyougang removed the wip label May 6, 2021

ningyougang commented May 6, 2021

View reviewed changes

style95 reviewed May 6, 2021

View reviewed changes

ningyougang added 2 commits May 7, 2021 09:14

Implement FunctionPullingContainerPool

3de1735

Fix review points

7a2aca4

ningyougang force-pushed the FunctionPullingContainerPool branch from 273e183 to 7a2aca4 Compare May 7, 2021 01:19

style95 approved these changes May 7, 2021

View reviewed changes

bdoyle0182 approved these changes May 10, 2021

View reviewed changes

jiangpengcheng merged commit 3802374 into apache:master May 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Scheduler] Implement FunctionPullingContainerPool #5102

[New Scheduler] Implement FunctionPullingContainerPool #5102

ningyougang commented Apr 14, 2021 •

edited

Loading

bdoyle0182 Apr 19, 2021

ningyougang Apr 28, 2021

bdoyle0182 Apr 19, 2021

ningyougang Apr 28, 2021

bdoyle0182 Apr 19, 2021

ningyougang Apr 28, 2021

style95 May 6, 2021

bdoyle0182 Apr 19, 2021

ningyougang Apr 28, 2021 •

edited

Loading

codecov-commenter commented Apr 28, 2021 •

edited

Loading

ningyougang Apr 29, 2021

ningyougang Apr 29, 2021

ningyougang Apr 29, 2021 •

edited

Loading

ningyougang Apr 29, 2021

ningyougang Apr 29, 2021 •

edited

Loading

ningyougang May 6, 2021

ningyougang commented May 6, 2021 •

edited

Loading

style95 May 6, 2021

ningyougang May 7, 2021

style95 May 6, 2021

ningyougang May 7, 2021

style95 May 6, 2021

style95 left a comment

	case class Creation(creationMessage: ContainerCreationMessage, action: WhiskAction)
	case class CreateContainer(creationMessage: ContainerCreationMessage, action: WhiskAction)

	case class Deletion(deletionMessage: ContainerDeletionMessage)
	case class DeleteContainer(deletionMessage: ContainerDeletionMessage)

[New Scheduler] Implement FunctionPullingContainerPool #5102

[New Scheduler] Implement FunctionPullingContainerPool #5102

Conversation

ningyougang commented Apr 14, 2021 • edited Loading

Description

Related issue and scope

My changes affect the following components

Types of changes

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ningyougang Apr 28, 2021 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Apr 28, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ningyougang Apr 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ningyougang Apr 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ningyougang commented May 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

style95 left a comment

Choose a reason for hiding this comment

ningyougang commented Apr 14, 2021 •

edited

Loading

ningyougang Apr 28, 2021 •

edited

Loading

codecov-commenter commented Apr 28, 2021 •

edited

Loading

ningyougang Apr 29, 2021 •

edited

Loading

ningyougang Apr 29, 2021 •

edited

Loading

ningyougang commented May 6, 2021 •

edited

Loading