-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(compass-import-export): stream import errors to the log file with proper back pressure COMPASS-7820 #6151
Conversation
@@ -415,7 +415,7 @@ export const startImport = (): ImportThunkAction<Promise<void>> => { | |||
track( | |||
'Import Error Log Opened', | |||
{ | |||
errorCount: errors.length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug in our telemetry. We only keep up to 5 errors in the errors
(now firstErrors
) array. There's a numErrors var here for counting them all. (it could be thousands or millions)
|
||
expect(progressCallback.callCount).to.equal(3); | ||
expect(errorCallback.callCount).to.equal(1); // once for the batch | ||
expect(errorCallback.callCount).to.equal(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was this inconsistency where it was one error callback or one error or whatever per batch of errors in some place. It is always one per error now.
this.parsedHeader[name] = parseCSVHeaderName(name); | ||
} | ||
|
||
// TODO(COMPASS-7158): make sure array indexes start at 0 and have no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an existing TODO I just moved with the code.
We were keeping all import errors in memory and only writing them to the log file at the end. This now writes them to the log file during the import and waits for the write to finish so we get proper back pressure. This also means that I finally got to replace collection-stream.ts which is the only bit that survived the rewrite largely intact. Lots of unnecessary code got removed in the process. There is also a lot more code reused between import CSV and import JSON now.
The easiest way to test the most common / worst case scenario is to export a large collection then import the file back to the same collection. Every document's id will trigger a duplicate key error. If you import a large file, that's a large amount of stuff temporarily kept in memory.
It would have been possible to update the existing stream based code to do that but node's been supporting async iterators and generators for some time now plus that "just works" with streams, so I took the opportunity to rewrite most of import to use that. I think this nicely simplifies matters.
As a drive-by I renamed the UI state var from
errors
tofirstErrors
because we only keep up to 5 errors in the UI. There was a bug in the telemetry where it took the length of that array rather than using the numErrors var.In terms of review it is probably easiest to just read through
packages/compass-import-export/src/import/import-csv.ts
orpackages/compass-import-export/src/import/import-json.ts
(they use most of the same code now, so very little code left in each) and thendoImport()
inpackages/compass-import-export/src/import/import-utils.ts
. Also obviouslypackages/compass-import-export/src/import/import-writer.ts
which replacedcollection-stream.ts
.