README

Using the code:


The directory all ready contains the file and partitions used by our experiments.

To compile, simply type:
$> make

This will generate 3 executables:
dblite - interactive SM database application
genqueries - generates synthetic query workload
benchmark - simulates a multi-user environment on workload queries

====================================================
Running dblite 
====================================================

To execute dblite type:
$> dblite  

This will take you into interactive mode.  Here you can create the database files, initialize the database with various configurations, and query the data

Creating the Database (at the dblite prompt):
dblite> create createdb

To load an existing database (at dblite prompt):
dblite> load config db.xml

There are three configurations available:
Filename	Description
========	============
config		n-ary 
configP		SM
configPAX	PAX

To Execute a Query:

dblite> query [query]

query syntax
=====================================================
projection(child; projection-list)
merge-join(left-child, right-child; join-clause; ) \\ where left child and right child are sscans
sscan(table[:alias]; projection-list; [where-clause])

*projection-list ::= comma separated qualified attribute list.
*example(s):

dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.a < 5), sscan(T2:S; S.a, S.b; ); S.b = R.a; ); R.a, R.b, S.b, R.g, R.c)
dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.g = 'Z'), sscan(T2:S; S.a, S.b; S.b > 350 & S.b <= 1232); S.b = R.a; ); R.a, R.b, S.b, R.g, R.c)
dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.g = 'Z' & R.a > 200), sscan(T2:S; S.a, S.b; S.b > 850 & S.b <= 1232); S.b = R.a; ); R.a, R.c)
                                                                                                                                                       
In the above examples, the names for the tables and the attributes are shown in the schema and can be found in "describe".  The names are determined in the db.xml file.

Type help for a full list of commands:

command       options                      description
=======       =======                      ===========
help                                       display usage/help
query         <query>                      execute query, results are returned to stdout
profile       [count] <query>              profiles the query
load          <partition-config> <schema>  loads the database
create        <synthetic-info>             creates a populated synthetic database
tables                                     list the tables in the database
describe      <table-name>                 list the table schema for the selected table
layout        ?|f|p                        gets/sets the current materialization
              ? - get current layout
              f - singl partition
              p - 2-partitons
quit                                       exits the program but why would you?

Known Issues
=====================================================
The parser is very brittle if incorrect syntax encountered it segfaults


Running with Cachegrind
=====================================================

In order to run with the simulated cache, call the following, 

valgrind --tool=cachegrind dblite

After loading the data and running a query, quit out of dblite, and the results will be shown. You can also simply load the data and not run a query before quitting to see to total overhead.


====================================================
Running genqueries 
====================================================
To execute genqueries type:
$> genqueries num-non-optimized nQ0 [nQ0 [nQ1 [nQ2 [nQ3 [nQ4 [nQ5 [nQ6 [nQ7 [nQ8 [nQ9 [nQ10 [nQ11 [nQ12 [nQ13 [nQ14]]]]]]]]]]]]]]]

parameters		description
num-non-optimized	number of non cache optimized queries to generate
nQX			number of queries to generate for query template X

====================================================
Running benchmark 
====================================================
To execute benchmark type:
$> benchmark config-file is-enabled nthreads < query-workload

parameters	description
config-file	same as dblite config file (specifies partition initialization)
is-enabled	enable or disable partition propagation (0 or 1)
nthreads	number of threads
query-workload	synthetic workload generated by genqueries

====================================================
Creating the data and the partitions
=====================================================

In order to create an auto generated file, make sure there is a folder called "Data" within the DBCacheBenchmarking page.

Open a file in the DBCacheBenchmarking folder called "createdb".

Then, for each relation you want generated, put the following
<filename>
<#fields>|<#records>|<#bytes per record|

Then for each field put one of the following types:

int|incr|<byte offset>
int|randIncr|<byte offset>
int|range|<lower bound>|<upper bound>|<lower missing bound>|<upper missing bound>|<byte offset>
int|oddRange|<lower bound>|<upper bound>|<byte offset>
string|<length>|<byte offset>
fK|<lower bound>|<upper bounde>|<lower missing bound>|<upper missing bound>|<byte offset>

int|incr means that it is an integer type that increments for each record created
int|randIncr creates an integer field that increments itself by a small random value for each record
int|range creates a random integer that is in the range of <lower bound> to <lower missing bound> and <upper missing bound> to <upper bound>
int|oddRange creates a random odd integer in the range of <lower bound> to <upper bound>
int|evenRange creates a random odd integer in the range of <lower bound> to <upper bound>
string creates a string with <length> characters
fK| gives unique random values in the given range in sorted order.
<byte offset> is the location of the field in the tuple

An example file is:

testTable1.tab
3|200|34
int|incr|0
string|26|4
int|range|0|55|0|0|30
testTable2.tab
4|1000|13
int|randIncr|0
int|evenRange|0|50|4
string|1|8
fK|55|100|70|76|9

This creates two binary data files "testTable1.tab" and "testTable2.tab" located in the "Data" folder.  The first table has 200 records, 3 fields, and a record is 34 bytes long.  The first field is an incremented integer, the second a string of 26 characters, and the third a randome integer from 0 to 55.  The second table has 1000 records, 4 fields, and 13 bytes per record.  The first field is a randomly incremented integer, the second a randome even integer from 0 to 50, the third and character, the fourth a unique sorted random integer from 55 to 70 and 76 to 100.

Another file must also be modified, called db.xml.  db.xml for the tables above looks like the following:

<database>
  <tables>
    <table id = "0" name = "T1" path="Data/"testTable1.tab">
      <schema>
        <attribute id="0" name="a" type="INTEGER" length="4" />
        <attribute id="1" name="b" type="STRING" length="26" />
        <attribute id="2" name="c" type="INTEGER" length="4" />
      </schema>
    </table>
    <table id = "1" name = "T2" path="Data/testTable2.tab">
      <schema>
        <attribute id="0" name="a" type="INTEGER" length="4" />
        <attribute id="1" name="b" type="INTEGER" length="4" />
        <attribute id="2" name="c" type="CHAR" length="1" />
        <attribute id="3" name="d" type="INTEGER" length="4" />
      </schema>
    </table>
  </tables>
</database>


Now, when running dblite, call create to create the new tables.  The program will end after the tables are created and will need to be restarted, but the new tables will be in place.


Creating Partitions:
In order to partition the tables, a config file must be created (though the name can be anything).

The first line must be 4096 (this is the page size, it should be variable, but 4096 is hard coded in other places, so it needs to be that value.

For each relation, do the following:
  <fileLocation>
  <#Partitions>|<#Fields>|<#records>|<#bytesPerRecord>
  
  for each partition, do the following:
    <#fieldsInPartition>|<#sizeOfPartitionTuple>

    for each field in the partition, do the following:
      <#fieldNum>|<fieldSize>


So, one config file for the above relations could be:
4096
Data/testTable1.tab
3|3|200|34
1|4
0|4
1|26
1|26
1|4
2|4
Data/testTable2.tab
2|4|1000|13
2|8
0|4
3|4
2|5
1|4
2|1

This splits the first relation in three partitions, each containing one field.  The first partition has size 4, the second has size 26, and the third has size 4 again.  The second relation is in 2 partitions, the first of which has the two fields, field 0 and field 3, and has a total tuple size of 8.  The second contains fields 1 and 2.  It should be noted that within a partition, the fields should be ordered from lowest to highest field id.

Another might be:
4096
Data/testTable1.tab
1|3|200|34
3|34
0|4
1|26
2|4
Data/testTable2.tab
2|4|1000|13
3|12
0|4
1|4
3|4
1|1
2|1

This puts the first relation in one partition (so in nsm format).  The second relation has two partitions, with 3 fields in on and 1 field in the other.