Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple persistence #9

Closed
wants to merge 9 commits into from
Closed

Simple persistence #9

wants to merge 9 commits into from

Conversation

jgraeger
Copy link
Member

@jgraeger jgraeger commented Jun 4, 2022

Fixes #7

I fucked up a local rebase and realized to late, thats why i confused you all with multiple (now closed pull requests; #1, #8). Sorry for that. This PR is now the real deal ☺️

@jgraeger jgraeger changed the title Implements simple persistence Simple persistence Jun 4, 2022
@jgraeger jgraeger mentioned this pull request Jun 4, 2022
2 tasks
@jgraeger jgraeger marked this pull request as ready for review June 4, 2022 18:32
@jgraeger
Copy link
Member Author

jgraeger commented Jun 4, 2022

PR is ready for review. Maybe, now that the services actually persists data in an append only manner, we should possibly rename it to Kodak

Copy link
Contributor

@ldb ldb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like the idea of the append-only log, in this implementation I see significant risks to performance once the user base scales up. Not that it matters too much at this stage, but we will have to benchmark our whole system towards the end of the project and I feel like the complexity is rather high for this tradeoff. This service in particular needs to be very robust with the data it's handling, so I'd like to see at least a couple more tests and a benchmark or two before I'd feel comfortable green lighting this, sorry 😅. Kill Once Destroy All, after all.

return fmt.Sprintf("no value for key: %v", e.key)
}

func IsNotFoundError(err error) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make it this complex? Wouldn't it be enough to only expose something like

var ErrNotFound = errors.New("key not found")

and let the caller identify the error using errors.As and errors.Is.

)

const (
kindValue = iota
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have the default value be a special case like kindEmpty to make identifying uninitialised records easy.


// ErrInsufficientData is returned when the given data is not enouch to be
// parsed into a Record
var ErrInsufficientData = errors.New("insufficient bytes to parse a record")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an error that would ever be exposed to the caller?

readBuf := bytes.NewBuffer(data)

checksum := uint32(binary.BigEndian.Uint32(readBuf.Next(checksumSize)))
kind, _ := readBuf.ReadByte()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unchecked error

return nil, ErrCorruptData
}

return &Record{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return a pointer here?

@@ -0,0 +1,46 @@
package logstore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests for the scanner? :(

}

func (s *Store) Get(key string) ([]byte, error) {
f, err := os.Open(s.storagePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each call to Get opens the file anew and creates a new scanner.

I am thinking that it would be much more efficient to implement the store using a sync.Pool of custom scanners that use an io.ReadSeeker (an os.File is an io.Seeker).

The problem is generally that right now Get is O(n) instead of O(1), which scales .. not great 😅

}
defer f.Close()

scanner, err := NewScanner(f, s.maxRecordSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge allocation for every request that enters our system

}

func TestStore(t *testing.T) {
dir, err := initWorkdir()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, would be good to have some more tests, especially around records with long values, and a high number of records

}

func main() {
flag.Parse()

workdir, err := os.Getwd()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The database file should not be stored in the working directory, as this has to be mounted in externally. You would have to change directory there first, after mounting it. Instead, I'd go for a well known path, like /data/db (what MongoDB uses).

@ldb
Copy link
Contributor

ldb commented Jun 4, 2022

I mean, you are very obviously aware of the issue (after reading #10 ).

I am thinking do we really need to store every record change as a new record? You mentioned compliance as a reason to know when a user was disabled for example, but we could simply store all these things in singular records. I feel like a simple hash map in memory that gets flushed to disk on every change would scale much better than this and be much less risky.

@ldb
Copy link
Contributor

ldb commented Jun 8, 2022

This was tested on dev @jgraeger are you fine with merging this just so to get it out of the review queue?

@jgraeger
Copy link
Member Author

jgraeger commented Jun 8, 2022

Sure. Merge #13 and close this PR as it's basically rejected in the current state. I will open a new PR for the next iteration as discussed when it's ready.

@ldb
Copy link
Contributor

ldb commented Jun 11, 2022

Closing as this was fixed by #13.
Not deleting to serve as future reference

@ldb ldb closed this Jun 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement persistence
2 participants