Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve transform perf #46

Merged
merged 5 commits into from
Mar 15, 2018

Conversation

kolodny
Copy link
Contributor

@kolodny kolodny commented Feb 20, 2018

Blocked on #35

General performance improvements for chunktransform

  • Tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
    - Appropriate docs were updated (if necessary)

For reference this is the file used to measure chunktransform benchmarks:

const Bigtable = require('./');
const ChunkTransformer = require('./src/chunktransformer');
const mutation = require('./src/mutation.js');
const through = require('through2');
const crypto = require('crypto');

const ProtoBuf = require('protobufjs');
const path = require('path');
const protosRoot = path.resolve(__dirname, './protos');
function applyProtoRoot(filename, root) {
  filename.root = path.resolve(filename.root) + '/';
  root.resolvePath = function(originPath, importPath, alreadyNormalized) {
    return ProtoBuf.util.path.resolve(
      filename.root,
      importPath,
      alreadyNormalized
    );
  };
  return filename.file;
}
const root = new ProtoBuf.Root();
root.loadSync(
  applyProtoRoot(
    {
      root: protosRoot,
      file: 'google/bigtable/v2/bigtable.proto',
    },
    root
  ),
  { keepCase: false }
);
const ReadRowsResponse = root.lookupType('google.bigtable.v2.ReadRowsResponse');
const CellChunk = root.lookupType(
  'google.bigtable.v2.ReadRowsResponse.CellChunk'
);


const allChunks = [];


let randomString = crypto.randomBytes(1024 / 2).toString('hex');
const get1MibString = () => {
  return randomString;
}

const makeChunk = ({ rowKey = null, commit = false, index } = {}) => {
  const chunk = {
    qualifier: { value: mutation.convertToBytes(`field${index}`) },
    timestampMicros: 0,
    value: mutation.convertToBytes(get1MibString()),
    // valueSize: 0,
    commitRow: commit,
  }
  if (rowKey) {
    chunk.rowKey = mutation.convertToBytes(rowKey);
    chunk.familyName = { value: 'family' };
  }
  return chunk;
};

function rowResponses(rowKey) {
  allChunks.push(makeChunk({ rowKey, index: 0 }));

  for (let index = 1; index < 9; index++) {
    allChunks.push(makeChunk({ index }))
  }
  allChunks.push(makeChunk({ commit: true, index: 9 }));
}


const bigtable = new Bigtable();

const INSTANCE = bigtable.instance('instance');
const TABLE = INSTANCE.table('table');

let start;
start = new Date();
for (let i = 0; i < 40000; i++) {
  rowResponses(`Row${i}`);
}
const response = { chunks: allChunks };
const readRowsResponse = ReadRowsResponse.fromObject(response, {
  defaults: true,
  longs: String,
  oneofs: true,
});
console.log(`Generated in ${new Date() - start}`);

start = new Date();
const encoded = ReadRowsResponse.encode(readRowsResponse).finish();
console.log(`Encoded in ${new Date() - start}`);

start = new Date();
const dataToEmit = ReadRowsResponse.decode(encoded);
console.log(`Decoded in ${new Date() - start}`);

TABLE.requestStream = () => {
  const emitter = through.obj();
  setImmediate(() => {
    setImmediate(() => {
      // emitter.emit('data', ReadRowsResponse.decode(encoded));
      emitter.emit('data', dataToEmit);
      setImmediate(() => {
        emitter.emit('end');
      });
    })
  })
  return emitter;
};

const transformer = new ChunkTransformer({ decode: false });
let totalRows = 0;

start = new Date();
TABLE.requestStream().pipe(transformer)
  .on('data', row => {
    totalRows++
  })
  .on('end', () => {
    console.log(`Emit ${totalRows} took ${new Date() - start}`);
    clearTimeout(timeout);
  })
;

const timeout = setTimeout(() => {}, 10000); // keep process alive

Before the first commit, this logged:

Generated in 2005
Encoded in 1790
Decoded in 1023
Emit 40000 took 1371

After the first commit it moved the emits/transforms to about 1000ms
After the second commit it dropped to be in the 450ms +/- 50ms

Note that the generating, encoding, and decoding times didn't change. I'm not too sure about how streaming services work with GAPIC (if they come in as encoded messages) which would make the decoding the real bottleneck (and which we don't technically control).

There's still some more work that can be done to increase transform speed, in particular the fact that a buffer can be holding a big number source and test, which is used for Mutation.encodeSetCell.
Removing that bit puts the transform performance to be in the mid 300ms range. I'll follow up on this last bit in another issue or offline.

This gave a 50% increase in performance for baseline chunktransformer transforms.
@ghost ghost assigned kolodny Feb 20, 2018
@ghost ghost added the cla: yes This human has signed the Contributor License Agreement. label Feb 20, 2018
@codecov
Copy link

codecov bot commented Feb 20, 2018

Codecov Report

Merging #46 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #46   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           9      9           
  Lines        1199   1199           
=====================================
  Hits         1199   1199
Impacted Files Coverage Δ
src/row.js 100% <100%> (ø) ⬆️
src/chunktransformer.js 100% <100%> (ø) ⬆️
src/mutation.js 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7abb15d...23a10da. Read the comment docs.

@stephenplusplus
Copy link
Contributor

Would you mind putting this against the GAPIC branch? Or should we just put this PR on hold until the conversion is finished?

@kolodny
Copy link
Contributor Author

kolodny commented Feb 20, 2018

There's no rush. Let's wait for GAPIC

@stephenplusplus stephenplusplus added status: blocked Resolving the issue is dependent on other work. do not merge Indicates a pull request not ready for merge, due to either quality or timing. labels Feb 20, 2018
@kolodny kolodny removed do not merge Indicates a pull request not ready for merge, due to either quality or timing. status: blocked Resolving the issue is dependent on other work. labels Mar 13, 2018
@kolodny
Copy link
Contributor Author

kolodny commented Mar 13, 2018

@stephenplusplus This PR as well as #48 should now be unblocked

src/mutation.js Outdated
Mutation.convertFromBytes = function(bytes, options) {
var buf = Buffer.from(bytes, 'base64');
var num = new Int64(buf).toNumber();
Mutation.convertFromBytes = function(bytes, options, isPossibleNumber) {

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@stephenplusplus stephenplusplus merged commit e0237cc into googleapis:master Mar 15, 2018
@ghost ghost removed the cla: yes This human has signed the Contributor License Agreement. label Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants