README

Compiling
=========

Test platform: Linux, with g++ 4.8.2.

Running make will create an executable named "cluster".


Usage
=====

usage: ./cluster [options...] input.json output.txt

Options:
  --min-center-size=INTEGER:  set the minimum center size to INTEGER.
                              Controls the speed-vs-accuracy tradeoff.
                              Reasonable values range from 10 to N/10000.
                              (e.g. 10 to 100 when processing 1M antibodies)
                              Lower values are generally more accurate.
                              Higher values are generally faster, but values
                              that are too high run extremely slowly.
  --min-center-size=disabled: disable megaclustering.
                              Warning: very slow and memory hungry!
                              Don't use with large datasets.


Test files
==========
Here you can download a test input file (and its output file) to give a try:

https://copy.com/9NUMjTVtbarFXmU8

to run this program:

    ./cluster 1M.json 1M.out


* Input file, 1M.json, contains 1 million Ab sequence objects

* Output file, 1M.out, is a text file with each row as the assigned cluster number for each sequence from the input file (in the same order).