-
Notifications
You must be signed in to change notification settings - Fork 91
Implementing a Driver
The following are instructions for how to implement a new driver for a system. The driver interface is defined by the AbstractDriver
class.
The source code for a "System X" driver must be put in a single Python source file inside of the drivers directory. The file name must contain the name of the system in lowercase with no spaces, followed by the word "driver". For example, the "System X" driver file name should be "systemxdriver.py". This file must contain a class that extends the AbstractDriver
class that implements the required functions listed below. The name of this class may contain only alphanumeric characters in allow all lowercase except for the first character, followed by the word "Driver". For the example "System X", the name of the class that implements the required functions would be SystemxDriver
.
The driver must be stateless. That is, there cannot be any stored state from one transaction to the next. The driver must also support multiple instances (i.e., it cannot be a singleton object). Each instances of the driver will represent a unique connection to the backend database system.
These functions are invoked to perform various operations to configure the driver prior to loading data and executing transactions.
Returns a dict that represents the default configuration of the driver. The key of the dict is the name of the parameter and the value is a two-element tuple containing (1) the description of the parameter and (2) the default value of the parameter. This is what will be printed out when the --print-config
command-line argument is used.
This method is invoked with a dict containing the configuration parameters for the driver. The dict that is passed to this function will either come from (1) the file passed in with the --config
command-line argument or (2) the output of the drivers makeDefaultConfig()
function. The dict will also always include an additional reset
parameter; if this is true, then the driver should clear out all of the records in the database prior to returning. The dict will also include additional load
and execute
parameters that will be set to true when the driver is being used to load tuples or execute the transactional workload, respectively. If the driver is being used just for initialization, then both parameters will be set to false.
The following functions are used by the controller to load data into the target system using the driver. Before the loading begins, the controller will first execute the loadStart()
function to allow the driver to prepare anything that it may need. Next, the controller will generate tuples for the tables and pass them to the driver through the loadTuples()
function. The driver should not assume that the tables are generated in any set order (e.g., ORDER before ORDER_LINE), nor should it assume that all of the tuples for a particular table will be passed in a single invocation of loadTuples()
(i.e., there could be multiple calls to load tuples for a single table). The controller notifies the driver that the data loading phase is complete by executing the loadFinish()
function. None of these functions will be invoked if the --no-load
command-line argument is given.
Optional callback to indicate to the driver that the data loading phase is about to begin. Possible uses for this function include setting up temporary data structures or initializing connections to the database that are specific to the loading phase. The default implementation is to do nothing.
For the given table name, the driver must load the list of tuples into the underlying database. For each tuple in the list, the values of that tuple will be in the same order as the column specification in the TPC-C DDL.
Optional callback to indicate to the driver that the data loading phase is finished. Possible uses for this function include committing all changes to the tables and flush any buffers. The default implementation is to do nothing.
Optional callback to indicate to the driver that all of the data for a given logical WAREHOUSE has been given to the driver. The driver will be passed the WAREHOUSE id of the data set. The default implementation is to do nothing.
Optional callback to indicate to the driver that all of the data for a given logical DISTRICT has been given to the driver. The driver will be passed the WAREHOUSE id and DISTRICT id of the data set. The default implementation is to do nothing.
Optional callback to indicate to the driver that all of the ITEM data has been given to the driver. When running with multiple loading clients, only one client will be given ITEM tuples to load. The default implementation is to do nothing.
Optional callback to indicate to the driver that the transaction execution phase is about to begin. The default implementation is to do nothing.
Optional callback to indicate to the driver that the transaction execution phase is finished. The default implementation is to do nothing.
The following functions are used to implement the actual logic of the TPC-C transactions. The invocations of these functions are provided with a dict that contains the input parameters to the transaction according to the TPC-C specification.
Implements the DELIVERY transaction according to the TPC-C Benchmark Specification (§2.7).
params = {
'w_id': <int>, # The target WAREHOUSE id
'o_carrier_id': <int>, # The carrier id for the ORDER table
'ol_delivery_d': <datetime> # The delivery timestamp for the ORDER_LINE records
}
Implements the NEW_ORDER transaction according to the TPC-C Benchmark Specification (§2.4). The length of the i_ids
, i_w_ids
, and i_qtys
lists will all be the same. Each element in i_w_ids
has a 1% chance of being a "remote" warehouse (i.e., i_w_ids[i] != w_id
). Each NEW_ORDER invocation has on average 10 items to be inserted into the ORDER_LINE table, so that means there is a ~10% chance that a transaction will need to touch multiple logical warehouses.
params = {
'w_id': <int>, # The target WAREHOUSE id
'd_id': <int>, # The target DISTRICT id
'c_id': <int>, # The target CUSTOMER id
'o_entry_d': <datetime>, # The timestamp for the new ORDER record
'i_ids': <int>*, # List of ITEM ids
'i_w_ids': <int>*, # List of ORDER_LINE supply WAREHOUSE ids
'i_qtys': <int>*, # List of ORDER_LINE item quantities
}
Implements the ORDER_STATUS transaction according to the TPC-C Benchmark Specification (§2.6). The CUSTOMER id (c_id
) and CUSTOMER's last name (c_last
) are mutually exclusive; that is, if c_id
is not NULL, then the c_last
will be NULL, and vice versa.
params = {
'w_id': <int>, # The target WAREHOUSE id
'd_id': <int>, # The target DISTRICT id
'c_id': <int>, # The target CUSTOMER's id. Can be NULL.
'c_last': <str>, # The target CUSTOMER's last name. Can be NULL.
}
Implements the PAYMENT transaction according to the TPC-C Benchmark Specification (§2.5). The CUSTOMER id (c_id
) and CUSTOMER's last name (c_last
) are mutually exclusive; that is, if c_id
is not NULL, then the c_last
will be NULL, and vice versa. For 85% of the transactions, the CUSTOMER will be paying through the same WAREHOUSE (i.e., w_id == c_w_id
); the other 15% of the transactions will be for payments to remote WAREHOUSEs (i.e., w_id != c_w_id
).
params = {
'w_id': <int>, # The target WAREHOUSE id
'd_id': <int>, # The target DISTRICT id
'h_amount': <float>, # Payment amount
'c_w_id': <int>, # The WAREHOUSE that the CUSTOMER is paying through
'c_d_id': <int>, # The DISTRICT that the CUSTOMER is paying through
'c_id': <int>, # The target CUSTOMER's id. Can be NULL.
'c_last': <int>, # The target CUSTOMER's last name. Can be NULL.
'h_date': <datetime>, # The new HISTORY record timestamp
}
Implements the STOCK_LEVEL transaction according to the TPC-C Benchmark Specification (§2.8).
params = {
'w_id': <int>, # The target WAREHOUSE id
'd_id': <int>, # The target DISTRICT id
'threshold': <int>, # The STOCK quantity amount threshold
}