From b9cfcbdb16fd1c1447587d46fb4fea29bdf8be47 Mon Sep 17 00:00:00 2001 From: MASES Public Developers Team <94312179+masesdevelopers@users.noreply.github.com> Date: Fri, 20 Oct 2023 15:35:29 +0200 Subject: [PATCH] Added documentation on how the provider works and use cases (#121) --- README.md | 3 +- src/documentation/articles/howitworks.md | 93 ++++++++++++++++++++++++ src/documentation/articles/toc.yml | 8 +- src/documentation/articles/usecases.md | 70 +++++++++++++++++- src/documentation/index.md | 3 +- 5 files changed, 171 insertions(+), 6 deletions(-) create mode 100644 src/documentation/articles/howitworks.md diff --git a/README.md b/README.md index 5738adf4..342c8f91 100644 --- a/README.md +++ b/README.md @@ -48,10 +48,11 @@ This project adheres to the Contributor [Covenant code of conduct](CODE_OF_CONDU ## Summary * [Getting started](src/documentation/articles/gettingstarted.md) +* [How works](src/documentation/articles/howitworks.md) * [Usage](src/documentation/articles/usage.md) * [Use cases](src/documentation/articles/usecases.md) -* [Serialization](src/documentation/articles/serialization.md) * [Templates usage](src/documentation/articles/usageTemplates.md) +* [Serialization](src/documentation/articles/serialization.md) * [External application](src/documentation/articles/externalapplication.md) * [Roadmap](src/documentation/articles/roadmap.md) * [Current state](src/documentation/articles/currentstate.md) diff --git a/src/documentation/articles/howitworks.md b/src/documentation/articles/howitworks.md new file mode 100644 index 00000000..187b7fba --- /dev/null +++ b/src/documentation/articles/howitworks.md @@ -0,0 +1,93 @@ +# KEFCore: how it works + +[Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) can be used in some operative conditions. + +However it is important to start with a simple description on how it works. + +## Basic concepts + +Here below an image from Wikipedia describing simple concepts: + +![Alt text](https://upload.wikimedia.org/wikipedia/commons/6/64/Overview_of_Apache_Kafka.svg "Kafka basic concepts") + +Simplifying there are three active elements: +- **Topics**: storage of the records (the data), they are hosted in the Apache Kafka cluster and can be partitioned +- **Producers**: entities producing records to be stored in one or more topics +- **Consumers**: entities receiving records from the topics + +When a producer send a record to Apache Kafka cluster, the record will be sent to the consumers subscribed to the topics the producer is producing on: this is a classic pub-sub pattern. +Apache Kafka cluster adds the ability to store this information within the topic the producer has produced on, this feature guarantee that: +- an application consuming from the Apache Kafka cluster can hear only latest changes or position to a specific position in the past and start from that point to receive data +- the standard way to consume from Apache Kafka cluster is to start from the end (latest available record) or start from the beginning (first available record) + +## How [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) works + +An application based on [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) is both a producer and a consumer at the same time: +- when an entity is created/updated/deleted (e.g. calling [SaveChanges](https://learn.microsoft.com/en-us/ef/core/saving/basic)) the provider will invoke the right producer to store a new record in the right topic of the Apache Kafka cluster +- then the consumer subscribed will be informed about this new record and will store it back: this seems not useful till now, but it will be more clear later + +Apache Kafka cluster becams a: +1. a central routing for data changes in [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) based applications. +2. a reliable storage because, when the application restarts, the data stored in the topics will be read back from the consumers so the state will be aligned to the latest available. + +Apache Kafka comes with [topic compaction](https://kafka.apache.org/documentation/#compaction) feature, thanks to it the point 2 is optimized. +[Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) is interested to store only the latest state of the entity and not the changes. +Using the [topic compaction](https://kafka.apache.org/documentation/#compaction), the combination of producer, consumer and Apache Kafka cluster can apply the CRUD operations on data: +- Create: a producer stores a new record with a unique key +- Read: a consumer retrieves records from topic +- Update: a producer storing a new record with a previously stored unique key will discard the old records +- Delete: a producer storing a new record with a previously stored unique key, and value set to null, will delete all records with that unique key + +All CRUD operations are helped, behind the scene, from [`KNetCompactedReplicator`](https://github.com/masesgroup/KNet/blob/master/src/net/KNet/Specific/Replicator/KNetCompactedReplicator.cs) and/or [`KNetProducer`](https://github.com/masesgroup/KNet/blob/master/src/net/KNet/Specific/Producer/KNetProducer.cs)/[Apache Kafka Streams](https://kafka.apache.org/documentation/streams/). + +### Data storage + +Apache Kafka stores the information using records. It is important to convert entities in something usable from Apache Kafka. +The conversion is done using serializers that converts the Entities (data in the model) into Apache Kafka records and viceversa: see [serialization chapter](serialization.md) for more info. + +## [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) compared to other providers + +In the previous chapter was described how [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) permits to reproduce the CRUD operations. +Starting from the model defined in the code, the data will be stored in the topics and each topic can be seen as a table of a database filled in with the same data. +From the point of view of an application, the use of [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) is similar to the use of the InMemory provider. + +### A note on [migrations](https://learn.microsoft.com/en-us/ef/core/managing-schemas/migrations) + +The current version of [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) does not support [migrations](https://learn.microsoft.com/en-us/ef/core/managing-schemas/migrations). + +## [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) features not available in other providers + +Here a list of features [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) gives to its user and useful in some use cases. + +### Distributed cache + +In the previous chapter was stated that consumers align the application data to the last topics information. +The alignment is managed from [`KNetCompactedReplicator`](https://github.com/masesgroup/KNet/blob/master/src/net/KNet/Specific/Replicator/KNetCompactedReplicator.cs) and/or [Apache Kafka Streams](https://kafka.apache.org/documentation/streams/), everything is driven from the Apache Kafka back-end. +Considering two, or more, applications, sharing the same model and configuration, they always align to the latest state of the topics involved. +This implies that, virtually, there is a distributed cache between the applications and the Apache Kafka back-end: +- Apache Kafka stores physically the cache (shared state) within the topics and routes changes to the subscribed applications +- Applications use latest cache version (local state) received from Apache Kafka back-end + +If an application restarts it will be able to retrieve latest data (latest cache) and aligns to the shared state. + +### Events + +Generally, an application based on [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/), executes queries to the back-end to store, or retrieve, information on demand. +The alignment (record consumed) can be considered a change event: so any change in the backend produces an event used in different mode. +These change events are used from [`KNetCompactedReplicator`](https://github.com/masesgroup/KNet/blob/master/src/net/KNet/Specific/Replicator/KNetCompactedReplicator.cs) and/or [Apache Kafka Streams](https://kafka.apache.org/documentation/streams/) to align the local state. +Moreover [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) can inform, using callbacks and at zero cost, the registered application about these events. +Then the application can use the reported events to execute some actions: +- execute a query +- write something to disk +- execute a REST call +- and so on + +### Applications not based on [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) + +Till now was spoken about applications based on [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/), however this provider can be used to feed applications not based on [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/). +[Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) comes with ready-made helping classes to subscribe to any topic of the Apache Kafka cluster to retrieve the data stored from an application based on [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/). +Any application can use this feature to: +- read latest data stored in the topics from the application based on [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) +- attach to the topics involved from the application based on [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) and receive change events upon something was produced + +The ready-made helping classes upon a record is received, deserialize it and returns back the filled Entity. diff --git a/src/documentation/articles/toc.yml b/src/documentation/articles/toc.yml index 88eb5c34..11b936d7 100644 --- a/src/documentation/articles/toc.yml +++ b/src/documentation/articles/toc.yml @@ -2,10 +2,14 @@ href: intro.md - name: Getting started href: gettingstarted.md +- name: How works + href: howitworks.md - name: Usage href: usage.md - name: Use cases href: usecases.md +- name: Template usage + href: usageTemplates.md - name: Serialization href: serialization.md - name: External application @@ -15,6 +19,4 @@ - name: Current state href: currentstate.md - name: KafkaDbContext - href: kafkadbcontext.md -- name: Template usage - href: usageTemplates.md \ No newline at end of file + href: kafkadbcontext.md \ No newline at end of file diff --git a/src/documentation/articles/usecases.md b/src/documentation/articles/usecases.md index 5bd5b12d..7b00ad70 100644 --- a/src/documentation/articles/usecases.md +++ b/src/documentation/articles/usecases.md @@ -3,4 +3,72 @@ [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) can be used in some operative conditions. Here a possible, non exausthive list, of use cases. -TBD \ No newline at end of file +Before read following chapters it is important to understand [how it works](howitworks.md). + +## [Apache Kafka](https://kafka.apache.org/) as Database + +The first use cases can be coupled to a standard usage of [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/), the same when it is used with database providers. +In [getting started](gettingstarted.md) is proposed a simple example following the online documentation. +In the example the data within the model are stored in multiple Apache Kafka topics, each topic is correlated to the `DbSet` described from the `DbContext`. + +The constraint are managed using `OnModelCreating` of `DbContext`. + +## [Apache Kafka](https://kafka.apache.org/) as distributed cache + +Changing the mind a model is written it is possible to define a set of classes which acts as storage for data we want to use as a cache. +It is possible to build a new model like: +```cs +public class CachingContext : KafkaDbContext +{ + public DbSet Items { get; set; } +} + +public class Item +{ + public int ItemId { get; set; } + public string Data { get; set; } +} +``` + +Sharing it between multiple applications and allocating the `CachingContext` in each application, the cache is shared and the same data are available. + +## [Apache Kafka](https://kafka.apache.org/) as a triggered distributed cache + +Continuing from the previous use case, using the events reported from [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) it is possible to write a reactive application. +When a change event is triggered the application can react to it and take an action. + +### SignalR + +The triggered distributed cache can be used side-by-side with [SignalR](https://learn.microsoft.com/it-it/aspnet/signalr/overview/getting-started/introduction-to-signalr): combining [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) provider for [Apache Kafka](https://kafka.apache.org/) and [SignalR](https://learn.microsoft.com/it-it/aspnet/signalr/overview/getting-started/introduction-to-signalr) in an application, subscribing to the change events, it is possible to feed the connected applications to [SignalR](https://learn.microsoft.com/it-it/aspnet/signalr/overview/getting-started/introduction-to-signalr). + +### Redis + +The triggered distributed cache can be seen as a [Redis](https://redis.io/) backend. + +## Data processing out-side [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) application + +The schema used to write the information in the topics are available, or can be defined from the user, so an external application can use the data in many mode: +- Using the feature to extract the entities stored in the topics outside the application based on [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/) +- Use some features of Apache Kafka like Apache Kafka Streams or Apache Kafka Connect. + +### External application + +An application, not based on [Entity Framework Core](https://learn.microsoft.com/it-it/ef/core/), can subscribe to the topics to: +- store all change events to another medium +- analyze the data or the changes +- and so on + +### Apache Kafka Streams + +Apache Kafka comes with the powerful Streams feature. An application based on Streams can analyze streams of data to extract some information or converts the data into something else. +It is possible to build an application, based on Apache Kafka Streams, which hear on change events and produce something else or just sores them in another topic containing all events not only the latest (e.g. just like the transaction log of SQL Server does it). + +### Apache Kafka Connect + +Apache Kafka comes with another powerful feature called Connect: it comes with some ready-made connector which connect Apache Kafka with other systems (database, storage, etc). +There are sink or source connectors, each connector has its own specificity: +- Database: the data in the topics can be converted and stored in a database +- File: the data in the topics can be converted and stored in one, or more, files +- Other: there are many ready-made connectors or a connector can be built using a [Connect SDK](https://github.com/masesgroup/KNet/blob/master/src/documentation/articles/connectSDK.md) + +**NOTE**: While Apache Kafka Streams is an application running alone, Apache Kafka Connect can allocate the connectors using the distributed feature which load-balance the load and automatically restarts operation if something is going wrong. \ No newline at end of file diff --git a/src/documentation/index.md b/src/documentation/index.md index e9c33f9b..ea0f28be 100644 --- a/src/documentation/index.md +++ b/src/documentation/index.md @@ -42,10 +42,11 @@ This project adheres to the Contributor [Covenant code of conduct](CODE_OF_CONDU ## Summary * [Getting started](articles/gettingstarted.md) +* [How works](articles/howitworks.md) * [Usage](articles/usage.md) * [Use cases](articles/usecases.md) -* [Serialization](articles/serialization.md) * [Templates usage](articles/usageTemplates.md) +* [Serialization](articles/serialization.md) * [External application](articles/externalapplication.md) * [Roadmap](articles/roadmap.md) * [Current state](articles/currentstate.md)