-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Configurable Build Event Stores #53
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Aidan Oldershaw <[email protected]>
Not sure why it's PR concourse#53, when the previous one was concourse#48...perhaps because of discussions? Signed-off-by: Aidan Oldershaw <[email protected]>
Signed-off-by: Aidan Oldershaw <[email protected]>
Signed-off-by: Aidan Oldershaw <[email protected]>
* By default, there will be no change - Concourse will default to using a | ||
Postgres `EventStore` using the same main Postgres database. | ||
* Operators will be able to configure a different store | ||
* If they configure a new store, they won't be able to access any existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this feature has to provide a migration solution, otherwise production clusters would be hard to move on external storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree. I'd love to hear any thoughts you have on the best way to approach this. I can think of a couple options:
-
Introduce a
concourse migrate-build-events
command that migrates everything over to the configured build event store. It could possibly default to migrating from the Postgres database to whatever event store is configured, but it could also take a separatefrom
configuration in case you want to migrate from a non-Postgres backend to a new backend. -
Migrate out of Postgres "on the fly". i.e. we could have a component like the build reaper that migrates old builds to the new store, a few at a time. In this case, we'd probably still want some way to view build logs from Postgres in case they haven't yet been migrated, for which we could construct a
FallbackEventStore
that does something like this:
type FallbackEventStore struct {
EventStore
Fallback EventStore
}
func (f *FallbackEventStore) Get(build db.Build, requested int, cursor *Key) ([]event.Envelope, error) {
// first, try the main event store:
events, err := f.EventStore.Get(build, requested, cursor)
if err == nil && len(events) > 0 {
return events, nil
}
// the build wasn't found (or at least has no events in the main store) - so try the Fallback (Postgres)
// if this also errors, we'll probably want to return the err from the primary EventStore, but you get the idea
return f.Fallback.Get(build, requested, cursor)
}
I suspect an approach like 1) may not be feasible when there's a huge number of build events, since it will likely take a really long time.
Either way, it probably makes sense to make EventStore.Put
take a list of events so we can do batch puts, rather than needing to Put
them all one-by-one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just not migrate. Instead, add a flag to table build
to indicate where is build logs of the build, PG or ES. So that reads old build logs from PG and new build logs from ES.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion for another way to do the migration, I wonder if there could be a way to migrate over all the build events to their new event store without stopping the database? This will make it so that operators wanting to move over to a new build event store does not need to wait for the migrations to finish before starting concourse.
I imagine something like the operator would continue to have a running concourse deployment and the build events would currently be fetched from the postgres database. While the deployment is running, some tool can be used to migrate all the build events within the postgres database (old build event store) into the new build event store. Then once it has finished, the operator can redeploy concourse and point it at the new build event store. There will need to be some method for discarding the old data (maybe concourse can do that at startup if you configure a new event store? or maybe it can be part of the tool to wipe out build events?)
I wonder for operators that want to configure a new build event store, they might want to benefit from being able to query or search for things within the build events. If we were to do the method of migrating on the fly, that would mean only certain build events would be migrated over. So taking that into account and the fact that operators that configure a new build event store most likely would like all their builds to be in this new store, I might be inclined to thinking that migrating all the data together would be preferable? I could totally be wrong about this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clarafu that definitely makes sense - having a separate tool that just points to the running Postgres is very much in line with #53 (comment).
FWIW the idea behind the "on the fly" approach is that it'd eventually migrate all build events over (not just for new events) - it'd essentially be doing what the tool you're suggesting is doing, but it's started by the ATC. I think it's beneficial to keep it separate though, for the reasons mentioned by @jchesterpivotal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh I see! Sorry I missed where you clarified it by saying it will build reaper that migrates old builds to the new store
. And yeah that is a good call that allowing the operator to migrate before upgrading is definitely a good idea.
Awesome! |
Rather than when it's Started, it should be when the build is first created. Signed-off-by: Aidan Oldershaw <[email protected]>
Signed-off-by: Aidan Oldershaw <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiinteresting! I didn't finish reading it all yet though, will finish later 😁
Get(build db.Build, requested int, cursor *Key) ([]event.Envelope, error) | ||
|
||
Delete(builds []db.Build) error | ||
DeletePipeline(pipeline db.Pipeline) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it got me wondering if we should really have these 🤔 from what I understand, the distinction between Delete
and Delete(Pipeline|Team)
is simply that the the one implementing the interface could do so in a more performant manner. Buut .. the Pipeline and Team concepts seem to be too Concourse specific, kinda "leaking" into the interface here.
tbh, I don't have a suggestion 😅 just seemed like overloading the interface a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree, but couldn't really think of a better way to do it without losing the ability to DROP
tables...
We could try to be smart with the Postgres implementation of the EventStore
, where if we call Delete(builds)
where builds
exactly matches the list of pipeline builds, drop the table instead - though I feel like this could be finicky.
} | ||
``` | ||
|
||
How about migrating build events to cold-storage after they're completed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a pretty useful use case, specially for those businesses that require keeping logs for .. years.
assuming this would be implemented by, say, aws' s3, one can even place lifecycle policies to their bucket (https://docs.aws.amazon.com/AmazonS3/latest/dev/lifecycle-transition-general-considerations.html) and then get some sweet price reduction automatically for those super old build events
Signed-off-by: Aidan Oldershaw <[email protected]>
Signed-off-by: Aidan Oldershaw <[email protected]>
1. Storing build events in a system designed for handling text data (like | ||
Elasticsearch, for instance) allows you to do some pretty cool querying that | ||
isn't feasible in Postgres. | ||
1. Gaining observability into build events isn't great. The syslog drainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I've had to do quite a lot of work when I've wanted to extract and transform this data for analysis.
* Operators will be able to configure a different store | ||
* If they configure a new store, they won't be able to access any existing | ||
build events that are currently stored in Postgres without migrating them | ||
over (how should we recommend doing this?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be a standalone tool, not part of the ATC. Otherwise you're forcing folks to upgrade to get it, or to get bugfixes to such a migration tool.
* Users won't be able to access build events that exist in the main Postgres | ||
database if they switch to a new backend store. How can we make the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two use cases being mixed here.
The first is "tell me what just broke", the thing Concourse UX is optimised for. That data works best in PostgreSQL, close to the ATC.
The second is "I need a complete archive of everything ever". Concourse enables that but isn't really putting it front and centre. The overhead has been historically bad enough that the archival need was at cross-purposes with the diagnostic goal. This is where draining to secondary stores is useful.
If you have a secondary store in addition to the main store, then setting short build histories on the main store, but not the secondary store, is a relatively safe compromise for most installations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that assessment makes sense. I think the proposed EventStore
interface should be able to support this (similar to the "Cold Storage" example I provided, but could drain to an arbitrary secondary EventStore
). Pair that with build log reaping to clear out old Postgres logs, and we can keep recent builds close to the ATC in Postgres while archiving everything in a secondary store (which could be S3, Elasticsearch, etc)
Signed-off-by: Aidan Oldershaw <[email protected]>
This header allows users to query the /api/v1/builds/{id}/events endpoint starting from an offset. Removing this may be a controversial decision. I discussed this as an open question in concourse/rfcs#53. Essentially, supporting this header means that `EventStores` will have to define how to unmarshal `Keys` (but not marshal them??). This header seems to be used nowhere internally, and isn't exposed by Fly or go-concourse, so my guess is it's *probably* no longer used by anyone. Signed-off-by: Aidan Oldershaw <[email protected]>
This commit adds the Postgres implementation for the `EventStore` interface as described in concourse/rfcs#53. While this change shouldn't noticably change the outward behaviour of Concourse noticably, there are a few changes: * Don't get from the build_events table twice for no reason in Events() * Some DB interactions now run in a separate transaction to accomodate `EventStores` that don't interact with Postgres * The build_events table is now created at runtime (rather than at migration time), and the `pipeline_build_events_x` and `team_build_events_x` tables are created when the first build is initialized (rather than when the pipeline/team is created) Signed-off-by: Aidan Oldershaw <[email protected]>
This commit adds the Postgres implementation for the `EventStore` interface as described in concourse/rfcs#53. While this change shouldn't noticably change the outward behaviour of Concourse noticably, there are a few changes: * Don't get from the build_events table twice for no reason in Events() * Some DB interactions now run in a separate transaction to accomodate `EventStores` that don't interact with Postgres * The build_events table is now created at runtime (rather than at migration time), and the `pipeline_build_events_x` and `team_build_events_x` tables are created when the first build is initialized (rather than when the pipeline/team is created) Signed-off-by: Aidan Oldershaw <[email protected]>
This header allows users to query the /api/v1/builds/{id}/events endpoint starting from an offset. Removing this may be a controversial decision. I discussed this as an open question in concourse/rfcs#53. Essentially, supporting this header means that `EventStores` will have to define how to unmarshal `Keys` (but not marshal them??). This header seems to be used nowhere internally, and isn't exposed by Fly or go-concourse, so my guess is it's *probably* no longer used by anyone. Signed-off-by: Aidan Oldershaw <[email protected]>
This commit adds the Postgres implementation for the `EventStore` interface as described in concourse/rfcs#53. While this change shouldn't noticably change the outward behaviour of Concourse noticably, there are a few changes: * Don't get from the build_events table twice for no reason in Events() * Some DB interactions now run in a separate transaction to accomodate `EventStores` that don't interact with Postgres * The build_events table is now created at runtime (rather than at migration time), and the `pipeline_build_events_x` and `team_build_events_x` tables are created when the first build is initialized (rather than when the pipeline/team is created) Signed-off-by: Aidan Oldershaw <[email protected]>
This header allows users to query the /api/v1/builds/{id}/events endpoint starting from an offset. Removing this may be a controversial decision. I discussed this as an open question in concourse/rfcs#53. Essentially, supporting this header means that `EventStores` will have to define how to unmarshal `Keys` (but not marshal them??). This header seems to be used nowhere internally, and isn't exposed by Fly or go-concourse, so my guess is it's *probably* no longer used by anyone. Signed-off-by: Aidan Oldershaw <[email protected]>
This commit adds the Postgres implementation for the `EventStore` interface as described in concourse/rfcs#53. While this change shouldn't noticably change the outward behaviour of Concourse noticably, there are a few changes: * Don't get from the build_events table twice for no reason in Events() * Some DB interactions now run in a separate transaction to accomodate `EventStores` that don't interact with Postgres * The build_events table is now created at runtime (rather than at migration time), and the `pipeline_build_events_x` and `team_build_events_x` tables are created when the first build is initialized (rather than when the pipeline/team is created) Signed-off-by: Aidan Oldershaw <[email protected]>
+ key marshalling Signed-off-by: Aidan Oldershaw <[email protected]>
One thing I haven't really explored is how operators would configure the | ||
backend store. I think it could be similar to how Credential managers are | ||
configured - there are groups of independent configuration options for each | ||
possible `EventStore`. If certain flags are configured for a given backend, use | ||
that backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would be desirable to design build event stores in such a way that allows them to be built and released independently from the rest of Concourse, similar to how Terraform eventually spun providers out of its main source tree so that they could be built and released independently, typically written and maintained by the commercial entities backing the providers.
For example, a user might wish to export build logs to Coralogix, which is a log management and analysis platform. As Coralogix is a commercial platform, I would fully expect that a Coralogix build event store plugin would be written and maintained by Coralogix and not by the Concourse team / Pivotal.
(full disclosure: I work for Coralogix and we would probably (but I'm intentionally being non-committal here) be interested in building out a plugin for our own platform as part of dogfooding our own product)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very interesting - I hadn't put much thought into an architecture like that, mostly because the precedent seems to be package and release everything from within Concourse (at least in terms of credential managers and metrics emitters). I think it'd be cool to for build event stores to be written as plugins external to the Concourse binary, and it'd be really cool to have experts in the domain be the ones to implement/maintain it.
I guess one concern is the added operational complexity, though I imagine it's not too bad. Like Terraform, we could package some plugins directly into Concourse (it could just be the Postgres plugin so operators don't need to bring their own plugin for the default behaviour).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, my frame of reference wasn't credential managers and metrics emitters (though arguably they should be handled the same way), rather it was Concourse resources. Just as everything in https://resource-types.concourse-ci.org/ is written, packaged, and maintained outside of the main Concourse source tree, shouldn't build event stores (and credential managers and metrics emitters) also be written and maintained outside of the main Concourse source tree?
That concourse/registry-image-resource
and concourse/git-resource
resource types are not part of concourse/concourse
isn't something that makes me think of additional operational complexity (although I guess that's technically true), but rather resilience, flexibility, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this a bit more. For some use cases of the proposed EventStore
, having their implementations be invoked directly (as opposed to over something like gRPC) is preferable. For instance, invoking the Postgres EventStore
in memory allows us to perform queries database objects (e.g. also deleting all of a team's pipelines when deleting a team). For certain applications of EventStore
composition (e.g. for secret redaction), handling the operations in memory is beneficial.
So, what if EventStore
s are just an in-memory type, but there's a PluginEventStore
implementation that functions as a gRPC client? Which plugin (gRPC server) to use could be configured via flags to Concourse, and any environment variables such as CONCOURSE_EVENTSTORE_PLUGIN_*
could be passed along to the plugin on startup to configure e.g. authentication (similar to how we support custom configuration for Guardian).
What do you think of that approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt in-memory EventStore, because that implies every ATC will duplicately store build logs, which might not fit to huge clusters where hundreds pipelines generate huge amount of build logs. But I think plugin is a good idea, at least, a PG plugin would allow us to store build logs to a separate database, if the plugin is well designed, it may even support to dispatch build logs to multiple databases, which would definitely benefit scalability of Concourse. Also, users will have the flexibility to store build logs in various backends, ES, and whatever.
// perhaps run this in the background | ||
return c.migrateToColdStorage(build) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment belies a lot of underlying complexity, where this design should go towards a plugin model per-provider, like Terraform (see comment below about adding providers), which uses an RPC architecture. See for example: https://www.terraform.io/docs/extend/how-terraform-works.html
This kind of multiple-binary architecture is necessary for both a) proper queuing and background processing of events as they are sent to external providers, and b) allowing additional providers to be written and developed independently of the upstream project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the comment was a bit of a half-baked idea. From what I can tell, any sort of background processing we have now is just done through separate goroutines, everything residing in the same process. For instance, metric emitters run on an emitLoop
that reads from a Go chan
that receives messages: https://github.com/concourse/concourse/blob/36cbf5ecf628dba92b6b4c08cfa536820638f313/atc/metric/emit.go#L129-L133
All that to say, we could (and currently do) perform some sort of background processing together without the use of plugins running in their own process and communicating via RPC. That's definitely not to say we should do things this way (though, admittedly, I don't fully grasp all of the implications of taking one approach over the other).
} | ||
``` | ||
|
||
How about migrating build events to cold-storage after they're completed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the desire to design for cold storage doesn't mean that EventStore.Get
should be some kind of optional operation. Usually the trade-off with cold storage is that it reduces storage costs while increasing access costs, and we wouldn't want to subject users to a surprisingly high bill just because they accessed some part of Concourse in a way which invoked the EventStore.Get
on cold-storage.
Maybe a better design would be one where EventStore.Get
can return a link to where the build events can be found? Then the user can more intentionally make the decision to access cold storage and invoke those costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good point. I'm hesitant to modify the EventStore
interface to explicitly make Get
optional, since that seems pretty specific to this use case (and I think it generally makes sense for there to be Get
functionality).
I guess this is also modifying the API (albeit in a different way), but we could introduce a special build event type that includes the link to cold-storage (e.g. event.InColdStorage
), and a ColdStorage implementation would return that single build event when called. The event consumers (the UI and Fly) could know to interpret that event type specially, perhaps prompting the user to confirm they want to fetch from cold storage. This prompt could also be bypassed with a special Key
(passed as the Last-Event-ID
to the build events API endpoint) to indicate "yes, I know I'm fetching from cold storage, and I'm okay with that".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very excited by the idea of offloading build logs to "another place", since as you've identified, we are bearing quite a burden from keeping them in Postgres exclusively.
I'd like to probe a bit on what's envisaged here, because I am uneasy about bringing a lot of fresh complexity into Concourse itself as far as log forwarding. There are existing tools like logstash, fluentd, or NiFi, which can do all sorts of things in terms of fanout, enrichment/transformation, and high-performance streaming. Pointing the syslog emitter at one of those should be more friendly than having Concourse grow the same capabilities on its side of the socket. Perhaps the syslog drainer does need some changes to allow build logs to be more continuously streamed, or that kind of thing, but it does feel like the forwarding part of the picture is more established.
In the same way, I'm not sure that it should be Concourse's job to decide when remote logs should be reaped. Once we've passed the data on, it's the other system's job to implement its own retention policy, according to whatever criteria happen to make sense. Those might not be readily accessible to Concourse (e.g. remote disk usage, or frequency of access). We may also not want to force log deletion just because a pipeline has been deleted - maybe those logs are still useful for analytics.
So I think the remaining non-existing piece is retrieving logs from their remote home, for display through Concourse. I'm imagining a world where logs always go to the local Postgres for at least a few days, but some operators will configure aggressive local reaping combined with forwarding into another store. Then Concourse just needs to know, for a given build whose logs do not exist locally, how to retrieve the logs from the remote place. If that does not work then we could display a tombstone, as we do now. This sidesteps some of the notification concerns since "live" build logs would always be served from local storage.
Then the implementation is that the builds
DB table grows an extra column for the remote storage URL (maybe with other data as needed for retrieval, such as auth tokens or whatever). When build logs are requested, and not found locally, we fetch them via that URL instead. This could even be set by the log forwarding engine in some fashion, since some backends might not have URLs that are predictable a priori, or they may change over time if logs are further migrated to different storage tiers. (For example, Elasticsearch could be configured to invoke a webhook whenever it receives a start-build
event, that reaches back into Concourse to set the remote storage URL for that build. Similar setups should be possible for other forwarding and storage choices.) In that case there is no need for Concourse-side configuration of how to construct that URL. This part is a little fuzzy to me since storage backends may have a variety of APIs and access control schemes, though that concern holds for any version of having remote storage.
There is a bunch of stuff to configure here, from an operator point of view, so we'd need to think through and document a few end-to-end scenarios. For example, can we use fluentd + in_syslog
+ out_s3
in a sensible way, and have the remote-URL details set correctly? How should Concourse be expected to authenticate itself to a remote log store, given that not all users should see all logs, and remote systems have their own idea of how access control works? (That's still the case for the current proposal as well, since a given EventStore
would have to implement that logic in some fashion.)
Thanks for your feedback @agurney! You bring up some really good points. Going to preface my response by saying I'm not an expert on this stuff, so please bear with me if I say anything that doesn't make sense 😄- if you disagree with anything, please let me know!
I looked briefly into logstash during my investigation into Elasticsearch, but I've never used any log forwarding tools myself so don't fully grasp the value-add for Concourse build events (besides performance, as I imagine they're quite well optimized). To me, it seems like one benefit is turning raw unstructured logs into structured data (I suppose that's the enrichment/transformation you bring up), but Concourse build events already are structured to a certain degree. Another benefit is the decoupling of inputs from the outputs - as you mention, Concourse could just emit to syslog, and logstash/fluentd/NiFi could forward the logs to the correct destination. However, given the fact that Concourse will need to retrieve logs from these destinations as well, there's an implicit coupling that still needs to exist somewhere:
I'm not quite sure what you mean by "storage URL" - do you mean something like a If you mean the former, I'm not convinced this will work in general. For instance, if we emit build events to Elasticsearch, we could store the If you mean the latter, wouldn't it be easier to just have an implementation of an Let me know if I completely misunderstood what you meant here 😄
This sounds a lot like what @jchesterpivotal was saying here: #53 (comment) (keep recent builds as close as possible to the ATC, and optionally configure an external event store for archiving builds). I think that's a good way to look at the problem. I did reply to his comment mentioning that I think this could be modelled with the proposed
That's a really good point, I hadn't considered that! Do you think there's value in Concourse letting the external system know "by the way, this pipeline no longer exists, so you may want to delete the build events", and the system can choose to delete the events or not? EDIT: I think if you want to keep the build events around, it probably makes sense to archive the pipeline rather than delete it (in which case we wouldn't try to delete the build events in the first place). It probably doesn't always make sense to enable the "build log reaper" - but that's opt-in, anyway.
I'm not 100% sure of what types of access control remote systems have, so maybe you could speak more to that. I kind of envisioned it as working like it does currently with Postgres: the ATC is provided credentials that would grant access to all build events, and Concourse implements handles user access controls by team at the API. So, |
This header allows users to query the /api/v1/builds/{id}/events endpoint starting from an offset. Removing this may be a controversial decision. I discussed this as an open question in concourse/rfcs#53. Essentially, supporting this header means that `EventStores` will have to define how to unmarshal `Keys` (but not marshal them??). This header seems to be used nowhere internally, and isn't exposed by Fly or go-concourse, so my guess is it's *probably* no longer used by anyone. Signed-off-by: Aidan Oldershaw <[email protected]>
This commit adds the Postgres implementation for the `EventStore` interface as described in concourse/rfcs#53. While this change shouldn't noticably change the outward behaviour of Concourse noticably, there are a few changes: * Don't get from the build_events table twice for no reason in Events() * Some DB interactions now run in a separate transaction to accomodate `EventStores` that don't interact with Postgres * The build_events table is now created at runtime (rather than at migration time), and the `pipeline_build_events_x` and `team_build_events_x` tables are created when the first build is initialized (rather than when the pipeline/team is created) Signed-off-by: Aidan Oldershaw <[email protected]>
This header allows users to query the /api/v1/builds/{id}/events endpoint starting from an offset. Removing this may be a controversial decision. I discussed this as an open question in concourse/rfcs#53. Essentially, supporting this header means that `EventStores` will have to define how to unmarshal `Keys` (but not marshal them??). This header seems to be used nowhere internally, and isn't exposed by Fly or go-concourse, so my guess is it's *probably* no longer used by anyone. Signed-off-by: Aidan Oldershaw <[email protected]>
This commit adds the Postgres implementation for the `EventStore` interface as described in concourse/rfcs#53. While this change shouldn't noticably change the outward behaviour of Concourse noticably, there are a few changes: * Don't get from the build_events table twice for no reason in Events() * Some DB interactions now run in a separate transaction to accomodate `EventStores` that don't interact with Postgres * The build_events table is now created at runtime (rather than at migration time), and the `pipeline_build_events_x` and `team_build_events_x` tables are created when the first build is initialized (rather than when the pipeline/team is created) Signed-off-by: Aidan Oldershaw <[email protected]>
|
||
# Open Questions | ||
|
||
* Does `Key` need to be `interface{...}`, or can it be more specific like `uint`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experimental PR that introduced an Elasticsearch implementation of an EventStore
, I found that I needed a composite key of (timestamp, tiebreaker)
(where both timestamp
and tiebreaker
are int64
s) to uniquely identify events, so a single uint
isn't good enough (without relying on Postgres to generate auto-incrementing IDs).
I was experimenting with plugin-based EventStore
implementations at the suggestion of @ari-becker, and realized that if the plugin runs in a separate process (i.e. we use something other than Go plugins, which seem very limiting), we'll need some way of sending the Key
s over the wire while retaining their ordering information. This isn't so easy with an arbitrary Key
type, so I got back to thinking about restricting the Key
type to something more concrete that has an implicit ordering.
From the Postgres and Elasticsearch implementations, I got the idea of using a []int64
. Using timestamp
and tiebreaker
could be used for many backends (the timestamp
comes from the event, and tiebreaker
is an in-memory counter - more details in the PR) - but for backends that support a more native keying approach (like Postgres' auto incrementing event_id
or line numbers in file-based storage like S3 or local FS), we could make use of that instead. I wonder if there are backends that won't fit into the []int64
mold - if anyone has a system in mind, I'd love to hear about it. I also wonder if the Key
could always be a [2]int64
, or if there's value in having an arbitrary length slice.
Another approach is to avoid the Key
/cursor approach altogether, which I'd definitely be open to ideas on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's basically no way to establish sane global ordering without the whole nightmare of distributed consensus. I heartily discourage trying. Leave the business of sorting out ordering to consumers, who can choose what interpretations to apply.
But if what you need is uniqueness, that's much more straightforward. You have broadly two options:
- UUIDs, specifically v4. Basically a 122-bit random number, so they can be generated without coordination. They are widely supported data type. PostgreSQL has a native type and the uuid-ossp module can generate them in the database very fast. They're supported in pretty much everything everywhere.
- Natural keys. That is, you determine what the distinguishing fields are and then you digest them. This is isomorphic to the database normalisation exercise of identifying the natural key for a table: what is the field or combination of fields which absolutely defines uniqueness? In this case it might be something as simple as
(something_identifying_the_database, event_id)
. As long as the database is unique and the event_id is unique within the database, it "should" be unique overall.
I prefer option 1 to option 2. It's easier to implement and more robust to oversights. Natural keys are easy to describe but once you pick one, it's hard to fix if you discover you made a mistake and have collisions. Plus people are liable to start trying to rely on the natural key components, meaning you wouldn't be able to change it.
tl;dr UUIDs errywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's good advice - in my experimental PR for Elasticsearch, I made the naïve assumption that I could trust the timestamp on build events for ordering, when obviously isn't the case. Builds are tracked by a single ATC at a time, so at least time-based inconsistencies aren't as prevalent, but in general, it's a hacky "solution" (that doesn't really solve the problem). So, I agree that we shouldn't try to achieve distributed consensus.
Leave the business of sorting out ordering to consumers, who can choose what interpretations to apply
This makes sense, but does this mean that Concourse will have to fetch all events for a build (and sort them in memory) before it can serve any events? Or am I misunderstanding what you mean?
Another option that I suggested as a possibility is to use natural keys of (build_id, event_id)
. For backends that don't support auto incrementing (e.g. Elasticsearch), rely on Postgres sequences to generate these event_id
s. This way, we can take advantage of the indexing options of tools like Elasticsearch to perform sorting/pagination outside of Concourse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, but does this mean that Concourse will have to fetch all events for a build (and sort them in memory) before it can serve any events? Or am I misunderstanding what you mean?
Sorting in the database is fine, using the internal sequential IDs (ie the primary key). Those are typically monotonic, so they establish a local order. But they're only an ordinal scale (this-before-that), not an interval score (this-10-seconds-before-that) or ratio score (this-10-percent-longer-than-that).
The motivation for supporting using EventStore implementations defined in external binaries was suggested in concourse/rfcs#53. Removing the EventStore implementations from the core Concourse codebase allows for a few things: 1. EventStores can be versioned independent of Concourse 2. The burden of maintaining many EventStores doesn't fall on the Concourse team 3. The barrier to adding support for external tools is much lower, and we can hopefully see the open source community create EventStores for several platforms (like we've seen with Resource Types) 4. Using gRPC, EventStore authors aren't tied to Go (with a bit of extra effort to make it compatible with hashicorp/go-plugin) If this is an approach we choose to take, it might make sense to introduce an SDK for building plugins like hashicorp/terraform-plugin-sdk. There are two main differences between in-process EventStores (like the Postgres implementation) and external plugin EventStores: 1. Plugin EventStores don't have access to the full DB models (`db.Build`, etc), and can't perform further queries on them. Instead, they have a serialized form that contains many of the fields 2. Plugin EventStores can't use arbitrary `Key`s - instead, they are restricted to `[]int64`. I posed a question on the RFC if using `[]int64` for all EventStores is feasible. Signed-off-by: Aidan Oldershaw <[email protected]>
The motivation for supporting using EventStore implementations defined in external binaries was suggested in concourse/rfcs#53. Removing the EventStore implementations from the core Concourse codebase allows for a few things: 1. EventStores can be versioned independent of Concourse 2. The burden of maintaining many EventStores doesn't fall on the Concourse team 3. The barrier to adding support for external tools is much lower, and we can hopefully see the open source community create EventStores for several platforms (like we've seen with Resource Types) 4. Using gRPC, EventStore authors aren't tied to Go (with a bit of extra effort to make it compatible with hashicorp/go-plugin) If this is an approach we choose to take, it might make sense to introduce an SDK for building plugins like hashicorp/terraform-plugin-sdk. There are two main differences between in-process EventStores (like the Postgres implementation) and external plugin EventStores: 1. Plugin EventStores don't have access to the full DB models (`db.Build`, etc), and can't perform further queries on them. Instead, they have a serialized form that contains many of the fields 2. Plugin EventStores can't use arbitrary `Key`s - instead, they are restricted to `[]int64`. I posed a question on the RFC if using `[]int64` for all EventStores is feasible. Signed-off-by: Aidan Oldershaw <[email protected]>
The motivation for supporting using EventStore implementations defined in external binaries was suggested in concourse/rfcs#53. Removing the EventStore implementations from the core Concourse codebase allows for a few things: 1. EventStores can be versioned independent of Concourse 2. The burden of maintaining many EventStores doesn't fall on the Concourse team 3. The barrier to adding support for external tools is much lower, and we can hopefully see the open source community create EventStores for several platforms (like we've seen with Resource Types) 4. Using gRPC, EventStore authors aren't tied to Go (with a bit of extra effort to make it compatible with hashicorp/go-plugin) If this is an approach we choose to take, it might make sense to introduce an SDK for building plugins like hashicorp/terraform-plugin-sdk. There are two main differences between in-process EventStores (like the Postgres implementation) and external plugin EventStores: 1. Plugin EventStores don't have access to the full DB models (`db.Build`, etc), and can't perform further queries on them. Instead, they have a serialized form that contains many of the fields 2. Plugin EventStores can't use arbitrary `Key`s - instead, they are restricted to `[]int64`. I posed a question on the RFC if using `[]int64` for all EventStores is feasible. Signed-off-by: Aidan Oldershaw <[email protected]>
The motivation for supporting using EventStore implementations defined in external binaries was suggested in concourse/rfcs#53. Removing the EventStore implementations from the core Concourse codebase allows for a few things: 1. EventStores can be versioned independent of Concourse 2. The burden of maintaining many EventStores doesn't fall on the Concourse team 3. The barrier to adding support for external tools is much lower, and we can hopefully see the open source community create EventStores for several platforms (like we've seen with Resource Types) 4. Using gRPC, EventStore authors aren't tied to Go (with a bit of extra effort to make it compatible with hashicorp/go-plugin) If this is an approach we choose to take, it might make sense to introduce an SDK for building plugins like hashicorp/terraform-plugin-sdk. There are two main differences between in-process EventStores (like the Postgres implementation) and external plugin EventStores: 1. Plugin EventStores don't have access to the full DB models (`db.Build`, etc), and can't perform further queries on them. Instead, they have a serialized form that contains many of the fields 2. Plugin EventStores can't use arbitrary `Key`s - instead, they are restricted to `[]int64`. I posed a question on the RFC if using `[]int64` for all EventStores is feasible. Signed-off-by: Aidan Oldershaw <[email protected]>
This header allows users to query the /api/v1/builds/{id}/events endpoint starting from an offset. Removing this may be a controversial decision. I discussed this as an open question in concourse/rfcs#53. Essentially, supporting this header means that `EventStores` will have to define how to unmarshal `Keys` (but not marshal them??). This header seems to be used nowhere internally, and isn't exposed by Fly or go-concourse, so my guess is it's *probably* no longer used by anyone. Signed-off-by: Aidan Oldershaw <[email protected]>
This commit adds the Postgres implementation for the `EventStore` interface as described in concourse/rfcs#53. While this change shouldn't noticably change the outward behaviour of Concourse noticably, there are a few changes: * Don't get from the build_events table twice for no reason in Events() * Some DB interactions now run in a separate transaction to accomodate `EventStores` that don't interact with Postgres * The build_events table is now created at runtime (rather than at migration time), and the `pipeline_build_events_x` and `team_build_events_x` tables are created when the first build is initialized (rather than when the pipeline/team is created) Signed-off-by: Aidan Oldershaw <[email protected]>
This header allows users to query the /api/v1/builds/{id}/events endpoint starting from an offset. Removing this may be a controversial decision. I discussed this as an open question in concourse/rfcs#53. Essentially, supporting this header means that `EventStores` will have to define how to unmarshal `Keys` (but not marshal them??). This header seems to be used nowhere internally, and isn't exposed by Fly or go-concourse, so my guess is it's *probably* no longer used by anyone. Signed-off-by: Aidan Oldershaw <[email protected]>
This commit adds the Postgres implementation for the `EventStore` interface as described in concourse/rfcs#53. While this change shouldn't noticably change the outward behaviour of Concourse noticably, there are a few changes: * Don't get from the build_events table twice for no reason in Events() * Some DB interactions now run in a separate transaction to accomodate `EventStores` that don't interact with Postgres * The build_events table is now created at runtime (rather than at migration time), and the `pipeline_build_events_x` and `team_build_events_x` tables are created when the first build is initialized (rather than when the pipeline/team is created) Signed-off-by: Aidan Oldershaw <[email protected]>
This header allows users to query the /api/v1/builds/{id}/events endpoint starting from an offset. Removing this may be a controversial decision. I discussed this as an open question in concourse/rfcs#53. Essentially, supporting this header means that `EventStores` will have to define how to unmarshal `Keys` (but not marshal them??). This header seems to be used nowhere internally, and isn't exposed by Fly or go-concourse, so my guess is it's *probably* no longer used by anyone. Signed-off-by: Aidan Oldershaw <[email protected]>
This commit adds the Postgres implementation for the `EventStore` interface as described in concourse/rfcs#53. While this change shouldn't noticably change the outward behaviour of Concourse noticably, there are a few changes: * Don't get from the build_events table twice for no reason in Events() * Some DB interactions now run in a separate transaction to accomodate `EventStores` that don't interact with Postgres * The build_events table is now created at runtime (rather than at migration time), and the `pipeline_build_events_x` and `team_build_events_x` tables are created when the first build is initialized (rather than when the pipeline/team is created) Signed-off-by: Aidan Oldershaw <[email protected]>
Rendered
Experimental PR: concourse/concourse#5651
Signed-off-by: Aidan Oldershaw [email protected]