Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup README and documentation. Add docs for Prometheus Metrics. #422

Merged
merged 7 commits into from
May 14, 2018

Conversation

mthenw
Copy link
Contributor

@mthenw mthenw commented May 11, 2018

  • cleaned up README
  • created Reference section in the readme pointing to docs in docs dir
  • added docs for Prometheus metrics
  • minor fixes

@mthenw mthenw requested review from rupakg and alexdebrie May 11, 2018 10:46
@codecov
Copy link

codecov bot commented May 11, 2018

Codecov Report

Merging #422 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #422   +/-   ##
=======================================
  Coverage   63.87%   63.87%           
=======================================
  Files          29       29           
  Lines        1650     1650           
=======================================
  Hits         1054     1054           
  Misses        553      553           
  Partials       43       43

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 517f1ec...27905b7. Read the comment docs.

Copy link
Contributor

@rupakg rupakg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A left a few comments

cache configuration used to drive low-latency event routing. The instance local cache is built asynchronously based on
events from backing DB.

The Event Gateway is a horizontally scalable system. It can be scaled by adding instances to the cluster. A cluster is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this paragraph that introduces the notion of clustering should come first.

│ │ │ └─────────────┘ │ │
│ │ ▲ │ │ │
│ │ │ │ │
│ Cloud Region 2───────┐ │ │ │ Cloud Regio│ 3───────┐ │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo on Region 3 at the far right

# Prometheus Metrics

Both Events and Configuration API exposes Prometheus metrics. The metrics are accesible via `/v1/metrics` endpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future, writing an prometheus exporter for EG metrics might be really useful for those who want to integrate with EG.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add some screenshots of the charts that we have been doing via Grafana and EG data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next step is to add Grafana dashboard JSON file. I will do that In separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


## Events are delivered _at most once_

Event Gateway attempts delivery fulfillment for an event only once and consequently any event received successfully by the Event Gateway is guaranteed to be received by the subscriber _at most once_. That said, the nature of Event Gateway provider implementation could result in retries under specific circumstances, but these should not cause delivering the same event multiple times. For example, Providers for AWS Services that use the AWS SDK are subject to auto retry logic that's built into the SDK ([AWS documentation on API retries](https://docs.aws.amazon.com/general/latest/gr/api-retries.html)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and consequently any event received successfully by the Event Gateway is guaranteed to be received by the subscriber at most once." -> is it guaranteed? We say in the previous section that "Events are not durable".

After reading the two sections, I am really confused. Let's clarify the message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's "at most once" so basically anything guarantee that.


Event Gateway attempts delivery fulfillment for an event only once and consequently any event received successfully by the Event Gateway is guaranteed to be received by the subscriber _at most once_. That said, the nature of Event Gateway provider implementation could result in retries under specific circumstances, but these should not cause delivering the same event multiple times. For example, Providers for AWS Services that use the AWS SDK are subject to auto retry logic that's built into the SDK ([AWS documentation on API retries](https://docs.aws.amazon.com/general/latest/gr/api-retries.html)).

AWS Lambda provider uses `RequestResponse` invocation type which means that retry logic for asynchronous AWS events doesn't apply here. Among others it means, that failed deliveries of custom events are not sent to DLQ. Please find more information in [Understanding Retry Behavior](https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html), "Synchronous invocation" section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph seems out of place and only addresses AWS. How about other cloud providers since our focus is for all of them?
Are we talking about "retry behavior" in EG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we natively support only Lambda which has that behaviour. Other providers exposing HTTP endpoint doesn't have that AFAIK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

_Current implementation supports plugins written only in Golang. We plan to support other languages in the future._

Plugin system is based on [go-plugin](https://github.com/hashicorp/go-plugin). A plugin needs to implement the following
interface:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it easier, I think we should do a post on "Writing a plugin for EG". That way we can show how to write a plugin and also drive some traffic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mthenw mthenw merged commit e5138aa into master May 14, 2018
@mthenw mthenw deleted the readme-update branch May 14, 2018 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants