-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
235 lines (181 loc) · 8.93 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
Using the code:
The directory all ready contains the file and partitions used by our experiments.
To compile, simply type:
$> make
This will generate 3 executables:
dblite - interactive SM database application
genqueries - generates synthetic query workload
benchmark - simulates a multi-user environment on workload queries
====================================================
Running dblite
====================================================
To execute dblite type:
$> dblite
This will take you into interactive mode. Here you can create the database files, initialize the database with various configurations, and query the data
Creating the Database (at the dblite prompt):
dblite> create createdb
To load an existing database (at dblite prompt):
dblite> load config db.xml
There are three configurations available:
Filename Description
======== ============
config n-ary
configP SM
configPAX PAX
To Execute a Query:
dblite> query [query]
query syntax
=====================================================
projection(child; projection-list)
merge-join(left-child, right-child; join-clause; ) \\ where left child and right child are sscans
sscan(table[:alias]; projection-list; [where-clause])
*projection-list ::= comma separated qualified attribute list.
*example(s):
dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.a < 5), sscan(T2:S; S.a, S.b; ); S.b = R.a; ); R.a, R.b, S.b, R.g, R.c)
dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.g = 'Z'), sscan(T2:S; S.a, S.b; S.b > 350 & S.b <= 1232); S.b = R.a; ); R.a, R.b, S.b, R.g, R.c)
dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.g = 'Z' & R.a > 200), sscan(T2:S; S.a, S.b; S.b > 850 & S.b <= 1232); S.b = R.a; ); R.a, R.c)
In the above examples, the names for the tables and the attributes are shown in the schema and can be found in "describe". The names are determined in the db.xml file.
Type help for a full list of commands:
command options description
======= ======= ===========
help display usage/help
query <query> execute query, results are returned to stdout
profile [count] <query> profiles the query
load <partition-config> <schema> loads the database
create <synthetic-info> creates a populated synthetic database
tables list the tables in the database
describe <table-name> list the table schema for the selected table
layout ?|f|p gets/sets the current materialization
? - get current layout
f - singl partition
p - 2-partitons
quit exits the program but why would you?
Known Issues
=====================================================
The parser is very brittle if incorrect syntax encountered it segfaults
Running with Cachegrind
=====================================================
In order to run with the simulated cache, call the following,
valgrind --tool=cachegrind dblite
After loading the data and running a query, quit out of dblite, and the results will be shown. You can also simply load the data and not run a query before quitting to see to total overhead.
====================================================
Running genqueries
====================================================
To execute genqueries type:
$> genqueries num-non-optimized nQ0 [nQ0 [nQ1 [nQ2 [nQ3 [nQ4 [nQ5 [nQ6 [nQ7 [nQ8 [nQ9 [nQ10 [nQ11 [nQ12 [nQ13 [nQ14]]]]]]]]]]]]]]]
parameters description
num-non-optimized number of non cache optimized queries to generate
nQX number of queries to generate for query template X
====================================================
Running benchmark
====================================================
To execute benchmark type:
$> benchmark config-file is-enabled nthreads < query-workload
parameters description
config-file same as dblite config file (specifies partition initialization)
is-enabled enable or disable partition propagation (0 or 1)
nthreads number of threads
query-workload synthetic workload generated by genqueries
====================================================
Creating the data and the partitions
=====================================================
In order to create an auto generated file, make sure there is a folder called "Data" within the DBCacheBenchmarking page.
Open a file in the DBCacheBenchmarking folder called "createdb".
Then, for each relation you want generated, put the following
<filename>
<#fields>|<#records>|<#bytes per record|
Then for each field put one of the following types:
int|incr|<byte offset>
int|randIncr|<byte offset>
int|range|<lower bound>|<upper bound>|<lower missing bound>|<upper missing bound>|<byte offset>
int|oddRange|<lower bound>|<upper bound>|<byte offset>
string|<length>|<byte offset>
fK|<lower bound>|<upper bounde>|<lower missing bound>|<upper missing bound>|<byte offset>
int|incr means that it is an integer type that increments for each record created
int|randIncr creates an integer field that increments itself by a small random value for each record
int|range creates a random integer that is in the range of <lower bound> to <lower missing bound> and <upper missing bound> to <upper bound>
int|oddRange creates a random odd integer in the range of <lower bound> to <upper bound>
int|evenRange creates a random odd integer in the range of <lower bound> to <upper bound>
string creates a string with <length> characters
fK| gives unique random values in the given range in sorted order.
<byte offset> is the location of the field in the tuple
An example file is:
testTable1.tab
3|200|34
int|incr|0
string|26|4
int|range|0|55|0|0|30
testTable2.tab
4|1000|13
int|randIncr|0
int|evenRange|0|50|4
string|1|8
fK|55|100|70|76|9
This creates two binary data files "testTable1.tab" and "testTable2.tab" located in the "Data" folder. The first table has 200 records, 3 fields, and a record is 34 bytes long. The first field is an incremented integer, the second a string of 26 characters, and the third a randome integer from 0 to 55. The second table has 1000 records, 4 fields, and 13 bytes per record. The first field is a randomly incremented integer, the second a randome even integer from 0 to 50, the third and character, the fourth a unique sorted random integer from 55 to 70 and 76 to 100.
Another file must also be modified, called db.xml. db.xml for the tables above looks like the following:
<database>
<tables>
<table id = "0" name = "T1" path="Data/"testTable1.tab">
<schema>
<attribute id="0" name="a" type="INTEGER" length="4" />
<attribute id="1" name="b" type="STRING" length="26" />
<attribute id="2" name="c" type="INTEGER" length="4" />
</schema>
</table>
<table id = "1" name = "T2" path="Data/testTable2.tab">
<schema>
<attribute id="0" name="a" type="INTEGER" length="4" />
<attribute id="1" name="b" type="INTEGER" length="4" />
<attribute id="2" name="c" type="CHAR" length="1" />
<attribute id="3" name="d" type="INTEGER" length="4" />
</schema>
</table>
</tables>
</database>
Now, when running dblite, call create to create the new tables. The program will end after the tables are created and will need to be restarted, but the new tables will be in place.
Creating Partitions:
In order to partition the tables, a config file must be created (though the name can be anything).
The first line must be 4096 (this is the page size, it should be variable, but 4096 is hard coded in other places, so it needs to be that value.
For each relation, do the following:
<fileLocation>
<#Partitions>|<#Fields>|<#records>|<#bytesPerRecord>
for each partition, do the following:
<#fieldsInPartition>|<#sizeOfPartitionTuple>
for each field in the partition, do the following:
<#fieldNum>|<fieldSize>
So, one config file for the above relations could be:
4096
Data/testTable1.tab
3|3|200|34
1|4
0|4
1|26
1|26
1|4
2|4
Data/testTable2.tab
2|4|1000|13
2|8
0|4
3|4
2|5
1|4
2|1
This splits the first relation in three partitions, each containing one field. The first partition has size 4, the second has size 26, and the third has size 4 again. The second relation is in 2 partitions, the first of which has the two fields, field 0 and field 3, and has a total tuple size of 8. The second contains fields 1 and 2. It should be noted that within a partition, the fields should be ordered from lowest to highest field id.
Another might be:
4096
Data/testTable1.tab
1|3|200|34
3|34
0|4
1|26
2|4
Data/testTable2.tab
2|4|1000|13
3|12
0|4
1|4
3|4
1|1
2|1
This puts the first relation in one partition (so in nsm format). The second relation has two partitions, with 3 fields in on and 1 field in the other.