Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the batch size from pelias-config. #125

Conversation

mansoor-sajjad
Copy link

@mansoor-sajjad mansoor-sajjad commented Nov 15, 2022

👋
Please read the details on PR in pelias-config#139.
This PR is dependent on pelias-config#139.

This PR needs to be updated with the latest version of pelias-config, after the above referenced PR get merged.


Here's the reason for this change 🚀

Hardcoded batchSize of 500 is not optimal in all circumstances.


Here's what actually got changed 👏

We are using the newly added batchSize configuration option from pelias-config.
Updated the existing and added the new tests.

@missinglink
Copy link
Member

missinglink commented Nov 16, 2022

Hi @mansoor-sajjad, I like the idea of making these hard coded variables configurable 👍

Can you please make some changes to this PR before we merge it...

  1. Prior to this PR, it was possible to pass opts into the constructor methods of the sink.js stream and they would be propagated down to all other places which needed it. This patch would result in a breaking change for any users who elected to pass custom opts when creating the stream. Can you please revert this change.

  2. Prior to this PR the opts had priority over the defaults (this._size = opts && opts.batchSize || defaults.batchSize;), with this PR only the config value is considered and the opts are ignored.

  3. We need to handle the case where the batchSize is unset in the config, we can add it to the config defaults or we can leave a hard-coded default in the source.
    [edit] I see this exists already 👍 Adding the option for configuring the batch size. config#139

  4. This library hasn't been touched in years and is showing some age, maybe including lodash (as we do in other repos) would simplify the code a bit, in particular with the use of _.defaults().

…wards compatibility. Added unit tests for batchSize.
@mansoor-sajjad
Copy link
Author

Hei @missinglink,

Thanks for the review.
I have implemented the suggested improvements and fixes.
Can you please have a look at new changes?

Copy link
Member

@orangejulius orangejulius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, all of @missinglink's suggestions were sound.

One thing we need to make sure to do before this is merged is make sure there is a proper semantic release commit so a new version is released. @missinglink how do we want to do that?

test/Batch.js Outdated Show resolved Hide resolved
@mansoor-sajjad
Copy link
Author

@missinglink
Do you have the opportunity to look into the 'semantic release commit', as mentioned by @orangejulius ?

@missinglink
Copy link
Member

Hi @mansoor-sajjad, can we avoid passing peliasConfig into Batch()?

diff --git a/package.json b/package.json
index 370c252..9d1d8da 100644
--- a/package.json
+++ b/package.json
@@ -38,7 +38,7 @@
   "dependencies": {
     "@hapi/joi": "^16.0.0",
     "elasticsearch": "^16.0.0",
-    "pelias-config": "^4.5.0",
+    "pelias-config": "^5.2.0",
     "pelias-logger": "^1.2.1",
     "through2": "^3.0.0"
   },
diff --git a/src/Batch.js b/src/Batch.js
index 9ea8fbb..fe89eb4 100644
--- a/src/Batch.js
+++ b/src/Batch.js
@@ -1,8 +1,13 @@
 const Task = require('./Task');
 const _ = require('lodash');
+const config = require('pelias-config');
 
-function Batch(opts, peliasConfig) {
-  this._size = _.defaults({}, opts, peliasConfig.dbclient).batchSize;
+function Batch(opts) {
+  this._size = (
+    _.get(opts, 'batchSize') ||
+    _.get(config.generate(), 'dbclient.batchSize') ||
+    500 // maximum records per batch
+  );
   this._slots = [];
   this.retries = 0;
   this.status = 999;
@@ -18,4 +23,4 @@ Batch.prototype.push = function( record ){
   this._slots.push( new Task( record ) );
 };
 
-module.exports = Batch;
\ No newline at end of file
+module.exports = Batch;
diff --git a/src/BatchManager.js b/src/BatchManager.js
index c36fc43..5055fb3 100644
--- a/src/BatchManager.js
+++ b/src/BatchManager.js
@@ -1,7 +1,6 @@
 const Batch = require('./Batch');
 const transaction = require('./transaction');
 const pelias_logger = require('pelias-logger');
-const peliasConfig = require('pelias-config').generate();
 
 const Stats = require('./stats');
 
@@ -20,7 +19,7 @@ function BatchManager( opts ){
   this._stats = new Stats(this._logger);
 
   // internal variables
-  this._current = new Batch( this._opts, peliasConfig );
+  this._current = new Batch( this._opts );
   this._transient = 0;
   this._resumeFunc = undefined;
 
@@ -98,7 +97,7 @@ BatchManager.prototype._dispatch = function( batch, next ){
 
 BatchManager.prototype.flush = function(next){
   this._dispatch( this._current, next );
-  this._current = new Batch( this._opts, peliasConfig );
+  this._current = new Batch( this._opts );
 };
 
 // call this on stream end
diff --git a/test/Batch.js b/test/Batch.js
index 75287bd..3c67ad1 100644
--- a/test/Batch.js
+++ b/test/Batch.js
@@ -13,7 +13,7 @@ module.exports.tests.validate = function (test) {
       batchSize: 200
     };
 
-    const batch = new Batch(opts, peliasConfig.generate());
+    const batch = new Batch(opts);
 
     t.equals(batch.free(), 200, 'opts should be prioritised over config');
     t.end();
@@ -25,7 +25,7 @@ module.exports.tests.validate = function (test) {
 
     process.env.PELIAS_CONFIG = path.resolve(__dirname + '/test-config.json');
 
-    const batch = new Batch(undefined, peliasConfig.generate());
+    const batch = new Batch();
 
     t.equals(batch.free(), 1000, 'batch size from config should be used when opts do not have it');
     t.end();
@@ -35,7 +35,7 @@ module.exports.tests.validate = function (test) {
 
   test('default batch size from config should be used', function (t) {
 
-    const batch = new Batch(undefined, peliasConfig.generate());
+    const batch = new Batch();
 
     t.equals(batch.free(), 500, 'default batch size should be used when no batch size is configured');
     t.end();

@missinglink
Copy link
Member

It's a pity we need to call generate() on every instantiation of Batch(), but otherwise it's impossible to test properly

@mansoor-sajjad
Copy link
Author

Hei @missinglink,

I had pelias-config inside the batch.js in my first commit in this PR, but taken it out to BatchManager.js later on, for exactly the same reason you mentioned. I am unsure on how costly the generate() function is? But I think we don't need to validate the config on every batch.
Maybe we can update the batch.js to just take in the batchSize parameter instead of full pelias-config, this will make it easier to test the batch.js?

@missinglink
Copy link
Member

Hi @mansoor-sajjad how about this version which avoids calling generate() through the introduction of static state in a new module called config.js?

#126

Can you please have a look and check that it does what you need?

@mansoor-sajjad
Copy link
Author

Hi @missinglink,

Thank for implementing #126.
Yes it fulfills our need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants