Skip to content

Random Lookup Benchmark Instructions

Martin Nettling edited this page Dec 15, 2013 · 18 revisions

This wiki describes how to perform the benchmark on random lookups from the paper "DRUMS: Disk Repository with Update Management, for high throughput sequencing data." Tutorials, how DRUMS can be used with SNP data and HERV data with example files, can be found in the both tutorial packages ("herv/tutorial","snp/tutorial").

The results of this benchmark can be seen in figure 6 in the paper "DRUMS: Disk Repository with Update Management, for high throughput sequencing data."

long OUTPUT_AFTER_SELECTS = 10000;
long returnedElements = 0;
long performedSelects = 0;

DRUMS

Open an existing DRUMS table. The old configuration and the used HashFunction will be loaded automatically. Instantiate a DRUMSReader to select elements from the underlying table.

DRUMSParameterSet<HERV> globalParameters = new DRUMSParameterSet<HERV>(new File("/the/path/to/the/DRUMStable"));
DRUMS<HERV> drums = DRUMSInstantiator.openTable(AccessMode.READ_ONLY, globalParameters);
DRUMSReader<HERV> reader = drums.getReader();

We suggest to extract the wanted number of records to request from the input data. So the test is repeatable for different configurations.

HERV

long time = System.currentTimeMillis();
// run over all perviously generated keys, and generate HERV objects with key only
{
    HERV toRequest = new HERV(chromosome, startPositionChromosome, endPositionChromosome, startHERV, endHERV, idHERV);
    List<HERV> result = reader.get(toRequest.getKey());
    returnedElements += results.size();
    if(requestedElements % elementInterval == 0) {
        System.out.println("Performed last " + OUTPUT_AFTER_SELECTS + " random lookups in " + (System.currentTimeMillis() - time) + 
        " milli seconds. Requested " + returnedElements + " elements in total.");
        time = System.currentTimeMillis();
    }
}
System.out.println("Requested last elements in " + (System.currentTimeMillis() - time) + " milli seconds");

SNP

The benchmark on DRUMS for SNP data looks exactly the same. But instead of using the HERV class the SNP class must be used.

...
     SNP toRequest = new SNP(sequenceId, positionOnChromosome, ecotype);
...

MySQL

Don't forget to configure MySQL properly.

To select records from the MySQL tables using Java, build up a connection to the database with a JDBC driver (com.mysql.jdbc.Driver). The difference to the benchmark for DRUMS is only the way records are requested.

HERV

...
selectQuery =  String selectQuery = "SELECT * FROM herv " +
    "WHERE chromosome = " + toRequest.getChromosome() + " AND " +
    " startPositionChromosome = " + toRequest.getStartPositionChromosome()+ " AND " +
    " endPositionChromosome = " + toRequest.getEndPositionChromosome()+ " AND " +
    " startPositionHERV = " + toRequest.getStartHERV()+ " AND " +
    " endPositionHERV = " + toRequest.getEndHERV()+ " AND " +
    " idHERV = " + toRequest.getIdHERV()+ "";
ResultSet set = statement.executeQuery(selectQuery);
HERV record = new HERV(set.getByte("chromosome"), ...);
record.setEValue(set.getDouble("eValue"));
record.setStrandOnChromosome((byte) set.getDouble("strand"));
set.close();
select.close();
...

SNP

To perform the same benchmark for SNP data only the query and the used class must be adapted.

...
String selectQuery = "SELECT * FROM snp WHERE sequence_id = " + toRequest.getSequenceId()+ 
    " AND position = " + toRequest.getBasePosition() + 
    " AND ecotype_id= " + toRequest.getEcotypeId();
ResultSet set = statement.executeQuery(selectQuery);
SNP record = new SNP (set.getByte("sequence_id"), ...);
record.setFrom( set.getByte("fromBase"));
record.setTo( set.getByte("toBase"));
set.close();
select.close();
...
Clone this wiki locally