Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schedule messages #2

Merged
merged 22 commits into from
Aug 9, 2017
Merged

Schedule messages #2

merged 22 commits into from
Aug 9, 2017

Conversation

lacarvalho91
Copy link
Contributor

@lacarvalho91 lacarvalho91 commented Aug 7, 2017

Will also be looking into adding metrics but is ready for an initial review.

I didn't realise git pair would be amending commit author outside of using git pair-commit.

  • Fix error that occasionally happens when generating random Schedule
  • Memory leak in tests


object SchedulerApp extends App with AkkaComponents with LazyLogging {
object SchedulerApp extends App with LazyLogging with AkkaComponents {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding - is there any logical reason why the order of these is swapped? I had no awareness that this made any difference to the compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are circumstances where it would make a difference, but in this case - no.

Spent some time trying to work out how to properly inject the implicits from AkkaComponents so it would've moved during that experimentation.

@lacarvalho91 lacarvalho91 changed the title Schedule messages [DON'T MERGE] - Schedule messages Aug 7, 2017
@lacarvalho91 lacarvalho91 changed the title [DON'T MERGE] - Schedule messages Schedule messages Aug 7, 2017

implicit val system = ActorSystem("kafka-message-scheduler")
implicit val materializer = ActorMaterializer()

val decider: Supervision.Decider = { t =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this stuff and the tests for it are a WIP

Copy link
Contributor

@paoloambrosio-skyuk paoloambrosio-skyuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review


sys.addShutdownHook {
logger.info("Kafka Message Scheduler shutting down...")
Await.ready(runningStream.shutdown(), conf.shutdownTimeout.stream)
Await.ready(system.terminate(), conf.shutdownTimeout.system)
Rewriter.stop(app).value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this is a bit messy :trollface: Grafter stuff mixed with Akka, Kamon, ... all working in a different way. It would be good to create an abstraction so that we can close everything in the same way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconded, it's pretty hard to follow everything at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, will take some time to come up with that i would have thought. for this PR or another?

log.info(s"Updating schedule $scheduleId")
else
log.info(s"Creating schedule $scheduleId")
val cancellable = scheduler.scheduleOnce(timeFromNow(schedule.time)){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually prefer to schedule a message to the actor itself instead of executing a function, as it makes it easier to handle concurrency properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good shout

@@ -22,4 +28,8 @@ object ApplicationError {
case schemaError: InvalidSchemaError => invalidSchemaErrorShow.show(schemaError)
case messageFormatError: AvroMessageFormatError => avroMessageFormatErrorShow.show(messageFormatError)
}

//TODO: is this enough for now?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a Reader? Happy to postpone it to later. It should really be created from the raw configuration (not application config). We should later create a project with all our common code (a la akka-stream-contrib).


case class Schedule(time: OffsetDateTime, topic: String, key: Array[Byte], value: Array[Byte]) extends ScheduleData

case class ScheduleMetadata(scheduleId: ScheduleId, topic: String) extends ScheduleData
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like ScheduledMessage(topic: String, key: Array[Byte], value: Array[Byte]) and ScheduleDeletion(scheduleId: ScheduleId) might be more appropriate (note the topic missing from ScheduleDeletion as the ...ToProducerRecord should take it from config.


package object scheduler extends LazyLogging {

case class AppConfig(scheduler: SchedulerConfig)
type DecodeResult = Either[ApplicationError, (ScheduleId, Option[Schedule])]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure DecodeResult is the right abstraction. It does not communicate what it is. With that name I would have expected type DecodeResult[T] = Either[DecodingError, T] where DecodingError inherits from ApplicationError... but that might not deserve being a separate type.

Copy link
Contributor Author

@lacarvalho91 lacarvalho91 Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove type alias

def reader(implicit system: ActorSystem, materializer: ActorMaterializer): Reader[AppConfig, ScheduleReader] =
for {
config <- SchedulerConfig.reader
schedulingActorRef <- SchedulingActor.reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I take it this is all about using Reader for dependency injection, as per http://eed3si9n.com/herding-cats/Reader.html?

import akka.stream.stage.{GraphStage, InHandler, OutHandler}

//Taken from: https://stackoverflow.com/a/38445121/8424807
class EitherFanOut[L, R] extends GraphStage[FanOutShape2[Either[L, R], L, R]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't PartitionWith basically what this is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at that but maybe I misunderstood it - will take another look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works! 🎉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest that to the StackOverflow question 😄

import com.sky.kafka.message.scheduler.kafka._
import com.typesafe.scalalogging.LazyLogging

case class ScheduleReader(config: SchedulerConfig, scheduleSource: Source[DecodeResult, Control], schedulingSink: Sink[Any, NotUsed])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding Scaladoc about for the purpose of these classes? In this case this would be something like...

/**
 * Provides stream from the schedule source to the scheduling actor.
 */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through the previous class PartitionedSink I was thinking exactly the same thing! Possibly worth pairing on the docs with someone?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not sure about the name ScheduleReader. Maybe I'm being dense but I got confused between this and the Reader concept that's introduced in this PR.

How about ScheduleInputHandler or something else?

Copy link
Contributor Author

@lacarvalho91 lacarvalho91 Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScheduleReaderStream?

I'm not a fan of changing it, because ScheduleReader describes it very well IMO. Does the above distinguish it enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that ruins it actually as it would be SchedulerReaderStream.stream.

I would rather it stayed ScheduleReader, perhaps we should discuss as a team

class AkkaComponentsSpec extends AkkaBaseSpec {

"decider" should {
"return a 'Restart' supervision strategy" in {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully there is a higher level test that shows the effect of this error handling logic. Also not sure of the value that this test is giving. It looks to me like testing accessors (testing something that is declarative).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will evolve into a test that is testing we report metrics in the supervisor strategy, its a WIP

verify(mockSourceQueue).offer((scheduleId, updatedSchedule))
}

"send an Ack to the sender when receiving an Init message" in new SchedulingActorTest {
Copy link
Contributor

@paoloambrosio-skyuk paoloambrosio-skyuk Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we send the Ack back on every scheduling message as well as on Init? Yes, just seen that createSchedule is masking what is happening. I think in this case we want to see both the message and be told we expect the Ack.

At this point we can also rename this test case as "does nothing on init" or something on those lines, given that we don't talk about messages in the other test names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already see both the message and that we expect the Ack - unless I'm misunderstanding?

have renamed the test


object EmbeddedKafka {

val kafkaServer = new KafkaServer

val bootstrapServer = s"localhost:${kafkaServer.kafkaPort}"

implicit class KafkaServerOps(val kafkaServer: KafkaServer) extends AnyVal {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to add a comment about why this is here, where it comes from, and why we haven't fixed code quality issues.

@oliverlockwood
Copy link
Contributor

High-level: I'm not convinced by the root package being com.sky.kafka.message.scheduler. It leads to subpackages like com.sky.kafka.message.scheduler.kafka which seems weird to me.

How about the root package being com.sky.kafkascheduler?

object KafkaStream {

def source[T](config: SchedulerConfig)(implicit system: ActorSystem, crDecoder: ConsumerRecordDecoder[T]): Source[T, Control] =
Consumer.plainSource(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a Github issue to track moving to a committableSource to provide at-least-once semantics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in the meantime for any commits to be made, you'll need to change enable.auto.commit to true in application.conf.

@oliverlockwood
Copy link
Contributor

From avro/package.scala (which is not part of this PR, but I've reviewed anyway as part of getting across the project):

  implicit val offsetDateTimeToValue = new ToValue[OffsetDateTime] {
    override def apply(value: OffsetDateTime): String = value.toString
  }

My understanding was that it's not great practice to be using the toString() method for serialization. For long-term compatibility, would it not be better to use format() with an explicit DateTimeFormatter - and then use that same formatter as a second argument to parse(), when deserializing?


package object config {

case class AppConfig(scheduler: SchedulerConfig)(implicit system: ActorSystem, materialzer: ActorMaterializer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: materialzer is missing an i

}

(receiveCreateOrUpdateMessage orElse receiveCancelMessage andThen updateStateAndAck) orElse {
case Init => sender ! Ack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic priority of the above line isn't 100% clear to me. Is it:

  ((receiveCreateOrUpdateMessage orElse receiveCancelMessage) andThen updateStateAndAck) orElse

or

  (receiveCreateOrUpdateMessage orElse (receiveCancelMessage andThen updateStateAndAck)) orElse

Maybe this is obvious to masters of partial functions, but I don't think that being explicit would hurt?

Also there's an argument for putting the Init case in a named partial function for clarity. I'm less fussed about this though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be clearer now

import com.sky.kafkamessage.scheduler.kafka.ProducerRecordEncoder
import org.apache.kafka.clients.producer.ProducerRecord

sealed trait PublishableMessage
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

help me name this pls 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchedulerControlMessage?

with extensions ScheduleDefinition and ScheduleDeletion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ScheduledMessage and ScheduleDeletion are good for the sub-classes just not sure on the name of the trait

@lacarvalho91 lacarvalho91 force-pushed the schedule-messages branch 3 times, most recently from 8abe011 to 804832a Compare August 8, 2017 15:14

case class Cancel(scheduleId: ScheduleId) extends SchedulingMessage

private case class Trigger(scheduleId: ScheduleId, schedule: Schedule)
Copy link
Contributor Author

@lacarvalho91 lacarvalho91 Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not 100% on this name either, maybe its ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably fine

.settings
.supervisionDecider(exception) shouldBe Restart

verify(metrics).counter(key)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paoloambrosio-skyuk wasn't sure how we wanted to test these metrics, whether this is ok (using mocks) or starting Kamon and starting an MBeanServer and querying that

README.md Outdated

The `schedule-topic` must be configured to use [log compaction](https://kafka.apache.org/documentation/#compaction).
This is to allow the scheduler to delete the schedule after it has been written to its destination topic. This is very
important because the scheduler uses the `schedule-topic` to reconstruct its state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only to allow deletion, it's also to ensure that (longer-term) schedules don't get lost after a restart.

build.sbt Outdated

"io.kamon" %% "kamon-jmx" % kamonVersion,
"io.kamon" %% "kamon-akka-2.5" % kamonVersion,
"io.kamon" %% "kamon-core" % kamonVersion,

"org.scalatest" %% "scalatest" % "3.0.1" % Test,
"com.typesafe.akka" %% "akka-testkit" % akkaVersion % Test,
"com.typesafe.akka" %% "akka-stream-testkit" % akkaVersion % Test,
"net.cakesolutions" %% "scala-kafka-client-testkit" % kafkaVersion % Test,
"org.slf4j" % "log4j-over-slf4j" % "1.7.21" % Test,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're a new module I see no reason not to use SLF4J 1.7.25


sys.addShutdownHook {
logger.info("Kafka Message Scheduler shutting down...")
Rewriter.stop(app).value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restoring @paoloambrosio-skyuk's comment:

All of this is a bit messy :trollface: Grafter stuff mixed with Akka, Kamon, ... all working in a different way. It would be good to create an abstraction so that we can close everything in the same way.

If we're not addressing that in this PR, perhaps create an issue or at least an issue comment to track, and link to it from here?

Copy link
Contributor Author

@lacarvalho91 lacarvalho91 Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #8


/**
* Provides a stream that consumes from the queue of triggered messages,
* writes the messages to Kafka and then deletes the schedules from Kafka
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

writes the scheduled messages to the specified Kafka topics, and then deletes the schedules form the scheduling Kafka topic to mark completion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where were you when I was writing reports at uni!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lacarvalho91 I was busy giving grammatical advice to developers elsewhere 😉

implicit val scheduleDeletionProducerRecordEnc: ProducerRecordEncoder[ScheduleDeletion] =
ProducerRecordEncoder.instance(deletion => new ProducerRecord(deletion.scheduleTopic, deletion.scheduleId.getBytes, null))

implicit def scheduleDataToProducerRecord(msg: PublishableMessage): ProducerRecord[Array[Byte], Array[Byte]] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function naming is a legacy of the old name ScheduleData.

I think it probably needs to be named publishableMessageToProducerRecord.

import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.FiniteDuration

class SchedulingActor(queue: SourceQueue[(String, ScheduledMessage)], scheduler: Scheduler) extends Actor with ActorLogging {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think that the scheduler parameter should be renamed akkaScheduler.

There's so much "schedule" text dotted around this class that I believe it would increase readability if we clearly identify the Akka scheduling component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

Copy link
Contributor

@oliverlockwood oliverlockwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm happy enough with this now 👍

I won't merge yet - I'll give @paoloambrosio-skyuk a chance for a final review first.

if (cancel(scheduleId, schedules))
log.info(s"Cancelled schedule $scheduleId")
else
log.warning(s"Couldn't cancel $scheduleId")
Copy link
Contributor Author

@lacarvalho91 lacarvalho91 Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this log message appears every time a schedule is triggered because we delete the schedule topic after.

What happens is we send the null to the schedule-topic then we process that as a Cancel (to support explicit deletes, and to remove the schedule from the actors state) but the call to cancel fails since the event has already been triggered.

Not sure how we want to handle this, I guess we would want some kind of warning if we fail to cancel something when it should have actually been cancellable

Copy link
Contributor

@oliverlockwood oliverlockwood Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be non-trivial, so I've raised an issue to cover this: Erroneous WARNING message after every schedule firing

@@ -1,4 +1,4 @@
package common
package com.sky.kms.common
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth discussing why we put tests in the same package. It has always fascinated me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not so you can use "package-private" visibilty, and still be able to test it, at least in Java.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. You should NOT test the internals, and separating packages does that ;-)

@paoloambrosio-skyuk paoloambrosio-skyuk merged commit 934d521 into master Aug 9, 2017
@paoloambrosio-skyuk paoloambrosio-skyuk deleted the schedule-messages branch August 9, 2017 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants