-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple persistence #9
Conversation
PR is ready for review. Maybe, now that the services actually persists data in an append only manner, we should possibly rename it to Kodak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I like the idea of the append-only log, in this implementation I see significant risks to performance once the user base scales up. Not that it matters too much at this stage, but we will have to benchmark our whole system towards the end of the project and I feel like the complexity is rather high for this tradeoff. This service in particular needs to be very robust with the data it's handling, so I'd like to see at least a couple more tests and a benchmark or two before I'd feel comfortable green lighting this, sorry 😅. Kill Once Destroy All, after all.
return fmt.Sprintf("no value for key: %v", e.key) | ||
} | ||
|
||
func IsNotFoundError(err error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make it this complex? Wouldn't it be enough to only expose something like
var ErrNotFound = errors.New("key not found")
and let the caller identify the error using errors.As
and errors.Is
.
) | ||
|
||
const ( | ||
kindValue = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have the default value be a special case like kindEmpty
to make identifying uninitialised records easy.
|
||
// ErrInsufficientData is returned when the given data is not enouch to be | ||
// parsed into a Record | ||
var ErrInsufficientData = errors.New("insufficient bytes to parse a record") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an error that would ever be exposed to the caller?
readBuf := bytes.NewBuffer(data) | ||
|
||
checksum := uint32(binary.BigEndian.Uint32(readBuf.Next(checksumSize))) | ||
kind, _ := readBuf.ReadByte() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unchecked error
return nil, ErrCorruptData | ||
} | ||
|
||
return &Record{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return a pointer here?
@@ -0,0 +1,46 @@ | |||
package logstore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No tests for the scanner? :(
} | ||
|
||
func (s *Store) Get(key string) ([]byte, error) { | ||
f, err := os.Open(s.storagePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each call to Get opens the file anew and creates a new scanner.
I am thinking that it would be much more efficient to implement the store using a sync.Pool
of custom scanners that use an io.ReadSeeker
(an os.File
is an io.Seeker
).
The problem is generally that right now Get is O(n)
instead of O(1)
, which scales .. not great 😅
} | ||
defer f.Close() | ||
|
||
scanner, err := NewScanner(f, s.maxRecordSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge allocation for every request that enters our system
} | ||
|
||
func TestStore(t *testing.T) { | ||
dir, err := initWorkdir() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, would be good to have some more tests, especially around records with long values, and a high number of records
} | ||
|
||
func main() { | ||
flag.Parse() | ||
|
||
workdir, err := os.Getwd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The database file should not be stored in the working directory, as this has to be mounted in externally. You would have to change directory there first, after mounting it. Instead, I'd go for a well known path, like /data/db
(what MongoDB uses).
I mean, you are very obviously aware of the issue (after reading #10 ). I am thinking do we really need to store every record change as a new record? You mentioned compliance as a reason to know when a user was disabled for example, but we could simply store all these things in singular records. I feel like a simple hash map in memory that gets flushed to disk on every change would scale much better than this and be much less risky. |
This was tested on dev @jgraeger are you fine with merging this just so to get it out of the review queue? |
Sure. Merge #13 and close this PR as it's basically rejected in the current state. I will open a new PR for the next iteration as discussed when it's ready. |
Closing as this was fixed by #13. |
Fixes #7
I fucked up a local rebase and realized to late, thats why i confused you all with multiple (now closed pull requests; #1, #8). Sorry for that. This PR is now the real deal☺️