Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory based loadbalancing #3747

Merged
merged 14 commits into from
Aug 23, 2018
Merged

Memory based loadbalancing #3747

merged 14 commits into from
Aug 23, 2018

Conversation

cbickel
Copy link
Contributor

@cbickel cbickel commented Jun 11, 2018

This PR implements the idea, that has been discussed in the following mail thread:
https://lists.apache.org/thread.html/dfccf972bc1419fe48dbc23119441108c45f85d53625fd6f8fc04fcb@%3Cdev.openwhisk.apache.org%3E

It changes the amount of containers an invoker can spawn based on available memory and not on the CPU.
In addition it makes the loadbalancer aware about the amount of available memory on each invoker and it limits the invoker to create only user containers, if there is enough free memory.

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@dgrove-oss
Copy link
Member

dgrove-oss commented Jun 12, 2018

Would love to see this get pushed through. There are some compensating changes we will need in kube-deploy; I will be happy to take care of them when it is time.

@markusthoemmes markusthoemmes self-requested a review June 12, 2018 19:57
@markusthoemmes markusthoemmes self-assigned this Jun 12, 2018
@cbickel cbickel force-pushed the mem branch 2 times, most recently from 0f134ae to d3000b0 Compare June 14, 2018 11:45
@codecov-io
Copy link

codecov-io commented Jun 14, 2018

Codecov Report

Merging #3747 into master will decrease coverage by 4.49%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #3747     +/-   ##
=========================================
- Coverage   85.41%   80.92%   -4.5%     
=========================================
  Files         147      147             
  Lines        7070     7093     +23     
  Branches      423      408     -15     
=========================================
- Hits         6039     5740    -299     
- Misses       1031     1353    +322
Impacted Files Coverage Δ
...ain/scala/whisk/core/containerpool/Container.scala 80.3% <ø> (ø) ⬆️
.../scala/src/main/scala/whisk/core/WhiskConfig.scala 94.16% <ø> (-0.1%) ⬇️
...la/whisk/core/containerpool/ContainerFactory.scala 100% <100%> (ø) ⬆️
...scala/whisk/core/containerpool/ContainerPool.scala 100% <100%> (+10.58%) ⬆️
...cala/whisk/core/containerpool/ContainerProxy.scala 93.82% <100%> (ø) ⬆️
...ain/scala/whisk/core/invoker/InvokerReactive.scala 74.16% <100%> (ø) ⬆️
...e/loadBalancer/ShardingContainerPoolBalancer.scala 86.01% <100%> (+0.44%) ⬆️
.../scala/src/main/scala/whisk/core/entity/Size.scala 96.49% <100%> (+0.33%) ⬆️
...core/database/cosmosdb/RxObservableImplicits.scala 0% <0%> (-100%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0% <0%> (-95.1%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74ffb4d...72ae32f. Read the comment docs.

@cbickel cbickel force-pushed the mem branch 11 times, most recently from d01dc16 to 09e569b Compare June 19, 2018 07:47
@cbickel cbickel added companion and removed wip labels Jun 19, 2018
Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need a pass over the tests, great stuff 🎉

@@ -339,6 +345,7 @@ object ShardingContainerPoolBalancer extends LoadBalancerProvider {
@tailrec
def schedule(invokers: IndexedSeq[InvokerHealth],
dispatched: IndexedSeq[ForcableSemaphore],
memory: ByteSize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make this slotsNeeded and keep that as an integer?

*/
case class ShardingContainerPoolBalancerConfig(blackboxFraction: Double, invokerBusyThreshold: Int)
case class ShardingContainerPoolBalancerConfig(blackboxFraction: Double, invokerBusyThreshold: ByteSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename invokerBusyThreshold to something more meaningful?

.getOrElse {
(createContainer(), "recreated")
// Only process request, if there are no other requests waiting for free slots, or if the current request is the next request to process
if (runBuffer.size == 0 || runBuffer.headOption.map(_.msg == r.msg).getOrElse(false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runBuffer.headOption(_.msg == r.msg).getOrElse(true)

} else {
r.retryLogDeadline
}
if (!runBuffer.map(_.msg).contains(r.msg)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(!runBuffer.exists(_.msg == r.msg))

.schedule(r.action, r.msg.user.namespace.name, freePool)
.map(container => {
(container, "warm")
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.map(container => (container, "warm))

Some(ref)
} else None
if (freeContainers.nonEmpty && freeContainers.map(_._2.memoryLimit.toMB).sum >= memory.toMB) {
if (memory > 0.B) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can collapse these ifs to get rid of one level of nesting.

@@ -91,62 +95,89 @@ class ContainerPool(childFactory: ActorRefFactory => ActorRef,
// their requests and send them back to the pool for rescheduling (this may happen if "docker" operations
// fail for example, or a container has aged and was destroying itself when a new request was assigned)
case r: Run =>
val createdContainer = if (busyPool.size < poolConfig.maxActiveContainers) {
// Only process request, if there are no other requests waiting for free slots, or if the current request is the next request to process
if (runBuffer.isEmpty || runBuffer.dequeueOption.exists(_._1.msg == r.msg)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be collapsed to only one condition

runBuffer.dequeueOption.map(_._1.msg == r.msg).getOrElse(true)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, it makes sense to keep it seperate for readability!

On another note: We could extract this to a value isResentFromBuffer to branch the execution later. There are checks there which implicitly check for this condition, that'd make it clearer.

.orElse {
if (busyPool
.map(_._2.memoryLimit.toMB)
.sum + freePool.map(_._2.memoryLimit.toMB).sum < poolConfig.userMemory.toMB) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add the action's memory limit to this condition as well.

On a broader note: Should we make this a method?

def hasCapacityFor(pool: Map[ActorRef, ContainerData], memory: ByteSize): Boolean =
  pool.map(_._2.memoryLimit.toMB).sum + memory.toMB <= poolConfig.userMemory.toMB)

Would then be usable like hasCapacityFor(busyPool ++ freePool, r.action.limits.memory.megabytes)

})
.getOrElse {
(createContainer(r.action.limits.memory.megabytes.MB), "recreated")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we collapse the map, orElse etc statements a bit by not using {} but ()?

busyPool = busyPool + (actor -> data)
freePool = freePool - actor
// Remove the action that get's executed now from the buffer and execute the next one afterwards.
runBuffer = runBuffer.dequeueOption.map(_._2).getOrElse(runBuffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more information in the comment would be nice, like:

// It is guaranteed that the currently executed messages is == the head of the queue, if the queue has any entries

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the tests ❤️ ! One should be added to cover an edge case of ContainerPool.remove. Great job!

}

it should "not provide a container from busy pool with non-warm containers" in {
val pool = Map('none -> noData(), 'pre -> preWarmedData())
ContainerPool.remove(pool) shouldBe None
ContainerPool.remove(pool, MemoryLimit.stdMemory) shouldBe List.empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a test which tests that if you cannot make space for enough capacity, the List is empty?

@@ -124,7 +129,7 @@ class ContainerPoolTests
it should "reuse a warm container" in within(timeout) {
val (containers, factory) = testContainers(2)
val feed = TestProbe()
val pool = system.actorOf(ContainerPool.props(factory, ContainerPoolConfig(2, 2), feed.ref))
val pool = system.actorOf(ContainerPool.props(factory, ContainerPoolConfig(MemoryLimit.stdMemory * 4), feed.ref))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment somewhere, that an action is created with stdMemory by default, so this is preserving behavior (4 actions can be scheduled)

val feed = TestProbe()

// a pool with slots for 512MB
val pool = system.actorOf(ContainerPool.props(factory, ContainerPoolConfig(512.MB), feed.ref))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 2 * stdMemory

pool ! runMessage
containers(0).expectMsg(runMessage)
pool ! runMessageDifferentAction
containers(1).expectMsg(runMessageDifferentAction)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments in here would be nice, like

containers(0).expectMsg(runMessage) // 1 * stdMemory taken
pool ! runMessageDifferentAction
containers(1).expectMsg(runMessageDifferentAction) // 2 * stdMemory taken -> full

...

pool ! runMessageLarge
// need to remove both action to make space for the large action (needs 2 * stdMemory)
containers(0).expectMsg(Remove)
containers(1).expectMsg(Remove)
containers(2).expectMsg(runMessageLarge)

}
// Action 2 should start immediately as well (without any retries, as there is already enough space in the pool)
containers(1).expectMsg(runMessageDifferentAction)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice test! 🎉

@cbickel cbickel force-pushed the mem branch 7 times, most recently from 57261e8 to 2e334a2 Compare July 2, 2018 11:00
@cbickel cbickel force-pushed the mem branch 2 times, most recently from f053a42 to be8ef95 Compare July 24, 2018 05:58
@cbickel
Copy link
Contributor Author

cbickel commented Jul 24, 2018

The gatling-tests
ApiV1Simulation,
LatencySimulation,
BlockingInvokeOneActionSimulation and
ColdBlockingInvokeSimulation
do not show any performance regression on this PR.

Copy link
Contributor

@vvraskin vvraskin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the code part only, looks good given my experience in this corner.

@cbickel cbickel force-pushed the mem branch 2 times, most recently from 3c94f0a to ec9fc2c Compare July 26, 2018 13:13
Copy link
Member

@rabbah rabbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description in ShardingContainerPoolBalancer.scala should be updated to describe the algorithm.

}

def /(other: ByteSize): Double = {
// Without throwing the exception the result would be `Infinity` here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could consider making this return a Try instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the reasons for using a Try here?
On dividing Ints, you also get the response directly, instead of a Try, don't you?

@@ -456,13 +461,15 @@ object ShardingContainerPoolBalancer extends LoadBalancerProvider {
*
* @param invokers a list of available invokers to search in, including their state
* @param dispatched semaphores for each invoker to give the slots away from
* @param slots Number of slots, that need to be aquired (e.g. memory in MB)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acquired (typo)

@cbickel
Copy link
Contributor Author

cbickel commented Aug 21, 2018

PG2#3520 🔵

@markusthoemmes markusthoemmes merged commit 5b3e0b6 into apache:master Aug 23, 2018
@cbickel cbickel deleted the mem branch August 23, 2018 09:13
@@ -6,7 +6,7 @@ whisk {
use-cluster-bootstrap: false
}
loadbalancer {
invoker-busy-threshold: 4
user-memory: 1024 m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbickel Should this be named invoker-user-memory. I see following exception on startup as in my case CONFIG_whisk_loadbalancer_invokerUserMemory was not defined

Exception in thread "main" pureconfig.error.ConfigReaderException: Cannot convert configuration to a whisk.core.loadBalancer.ShardingContainerPoolBalancerConfig. Failures are:
  at 'whisk.loadbalancer':
    - Key not found: 'invoker-user-memory'.

	at pureconfig.package$.getResultOrThrow(package.scala:138)
	at pureconfig.package$.loadConfigOrThrow(package.scala:160)
	at whisk.core.loadBalancer.ShardingContainerPoolBalancer.<init>(ShardingContainerPoolBalancer.scala:159)
	at whisk.core.loadBalancer.ShardingContainerPoolBalancer$.instance(ShardingContainerPoolBalancer.scala:437)
	at whisk.core.controller.Controller.<init>(Controller.scala:117)
	at whisk.core.controller.Controller$.main(Controller.scala:258)
	at whisk.core.controller.Controller.main(Controller.scala)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.
I'll open a PR to correct this.
Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3993 has the fix.

BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants