-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal for reusable Process components #1154
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikhilsaraf I love this direction! I think we should do something close to this.
A few thoughts:
- I think it'd be great if we could use type assertions to ensure that
Processor
s are hooked up properly rather than hand-rolled checking like you're proposing. Do you see any reason that wouldn't be possible? Here's a basic example of how this could work. - Obviously, right now
Processor
s operate at the level ofio.Reader
/io.Writer
objects, since full ledgers may not fit in memory. I think it'd really simplify logic ofProcessor
steps if they could operate one transaction at a time, and theProcessorPipeline
would do the hard work of streaming data in and out of the per-transactionProcessor
s. That gets us to a place where we can haveprocess
functions like you're proposing, which take maps rather than streams. - As far as high-level organization, I think we can replace the existing filter step and process step with just this pipeline system. Each
Processor
in the new proposed system can do either filtering or processing. We probably also want to splitStore
s out from the processing package, since I think they're pretty independent. - I think ratchet is a good example of the kind of system we want to build. What are your thoughts on it? Should we use it directly? Should we build something similar ourselves? Is there some superior alternative out there?
@bartekn @MonsieurNicolas I'd love your thoughts too.
@nikhilsaraf if you want to elaborate this PR based on the ideas above, feel free to go for it!
@tomquisel that's great feedback! some thoughts: 1. Type-Assertions
2. Pipelinetl;dr: have ledger-level
3. High-Level OrganizationAgreed 4. Ratchet
|
@nikhilsaraf this is great! Type AssertionsThinking about it, trying to use Go's type system for checking that Pipeline
Simplifying the pipelineOne alternate idea for simplifying the data pipeline lifecycle is to create and tear down a new pipeline with each new Here's the proposed lifecycle for processing when a new
The flow for The flow is the same for processing Pipeline conclusionThinking it through, I'm leaning towards the alternate approach. @nikhilsaraf I'd love your thoughts. We end up with a type Data map[string]interface{}
type TransactionProcessor interface {
RequiredFields() []string
UpdatedFields() map[string]FieldAction
Process(input Data, outputChan chan Data, killChan chan error)
Finish() (outputChan chan Data, killChan chan error)
} RatchetI think we can learn a lot from Ratchet, but it's probably not suited to our workload. this is a great guide on how to use it. These seem to be the main challenges with using Ratchet:
|
Great proposal and discussion! @nikhilsaraf @bartekn I'm closing it out since I think we've iterated a few more steps since then. Feel free to reopen if I'm wrong. |
Added a section on reusable process components. This is the first draft and if it makes sense I can integrate it better into the doc.
Link to doc, see Outputs section.