Skip to content
apavlo edited this page May 30, 2011 · 2 revisions

Authors

Sunil Mallya

Silvia Zuffi

Implementation

We implemented the TPC-C client using a Python driver for Memcache.

Membase supports different types of driver. Being based on Memcached, one can use a Memcached driver that communicates with the server on a specific port, where a Membase proxy listens and performs those Membase-specific operations, namely the support for vBuckets, that the driver does not do. A real Membase driver does not need the proxy, and can be faster.

At the beginning we thought of using a Java driver because it is the only driver available that is specifically for Membase. But then we decided to use the Python driver for simplicity in the implementation.

Membase is a key-value store, in order to store the tables we defined a mapping from each TPC-C table primary and secondary keys to a string that constitutes the Membase key. The Membase key contains the name of the table and the TPC-C keys. For example, the records in the WAREHOUSE table are stored with the key:

"WAREHOUSE_" + str(w_id)

The Membase value is a list of the values of the fields in the table records. With this simple representation it is easy to retrieve a single record for which all TPC-C keys are provided, while for range queries we retrieved all the records for a table, and performed the required record selections based on field values. For example, in the ORDER_STATUS transaction we have to select a customer based on his last name. We read all the possible customers and then filter the retrieved data based on the value of the field C_LAST.

This approach works fine if the range of values an unspecified key can assume is limited. Otherwise, it is more convenient to create additional key-value pairs where the key is built from all but one table keys, and the value is the list of the remaining key for all the records in a table.

We used this approach in the case of the transaction ORDER_STATUS where we had to retrieve the orders of a customer: we defined an additional table, ORDERS_IDS, where the keys are warehouse, district and customer identifiers, and there is only one field that is the list of the orders. When we have to read all the orders for a warehouse and district, instead of generating all the possible keys for the ORDERS table, we generate all the possible keys for customers and read all the orders from the ORDERS_IDS table retrieving records for the given warehouse, district and customer.

We loaded all the records on a server; we did not need to manage data partition because this is handle from the system. When one adds a server to a Membase cluster, a button in the web GUI can be activated to force the rebalance of the data to the new server.

Driver Dependencies

Our client requires the Memcached Python library.

Known Issues

The transaction that was more time consuming for our implementation is DELIVERY. For our experiment it seems we did not saturate the worker. We did not encounter any specific technical problem. Membase has a very nice Web interface for cluster configuration. The only necessary care is in the need to manually edit a configuration file in case of deployment on EC2.

Future Work

We think that our code could be improved by exploiting more the multiple GET commands, which could reduce the time, particularly in our setting based on the Membase proxy. We used this command to load the data, and it gave a huge speed- up in our loading phase.