This compact utility library is an annotation based object mapper for HBase (written in Java) that helps you:
- convert objects of your bean-like classes to HBase rows and vice-versa
- for use in Hadoop MapReduce jobs that read from and/or write to HBase tables
- and write efficient unit-tests for
Mapper
andReducer
classes
- define data access objects for entities that map to HBase rows
- for single/range/bulk access of rows of an HBase table
Let's say you've an HBase table citizens
with row-key format of country_code#UID
. Now, let's say your table is created with three column families main
, optional
and tracked
, which may have columns uid
, name
, salary
etc.
This library enables to you represent your HBase table as a bean-like class, as below:
@HBTable(name = "citizens", families = {@Family(name = "main"), @Family(name = "optional", versions = 3), @Family(name = "tracked", versions = 10)})
public class Citizen implements HBRecord<String> {
@HBRowKey
private String countryCode;
@HBRowKey
private Integer uid;
@HBColumn(family = "main", column = "name")
private String name;
@HBColumn(family = "optional", column = "age")
private Short age;
@HBColumn(family = "optional", column = "salary")
private Integer sal;
@HBColumn(family = "optional", column = "custom_details")
private Map<String, Integer> customDetails;
@HBColumn(family = "optional", column = "dependents")
private Dependents dependents;
@HBColumnMultiVersion(family = "tracked", column = "phone_number")
private NavigableMap<Long, Integer> phoneNumber;
@HBColumn(family = "optional", column = "pincode", codecFlags = {@Flag(name = BestSuitCodec.SERIALIZE_AS_STRING, value = "true")})
private Integer pincode;
@Override
public String composeRowKey() {
return String.format("%s#%d", countryCode, uid);
}
@Override
public void parseRowKey(String rowKey) {
String[] pieces = rowKey.split("#");
this.countryCode = pieces[0];
this.uid = Integer.parseInt(pieces[1]);
}
// Constructors, getters and setters
}
That is,
- The above class
Citizen
represents the HBase tablecitizens
, using the@HBTable
annotation. - Logics for conversion of HBase row key to member variables of
Citizen
objects and vice-versa are implemented usingparseRowKey
andcomposeRowKey
methods respectively. - The data type representing row key is the type parameter to
HBRecord
generic interface (in above case,String
). Fields that form row key are annotated with@HBRowKey
. - Names of columns and their column families are specified using
@HBColumn
or@HBColumnMultiVersion
annotations. - The class may contain fields of simple data types (e.g.
String
,Integer
), generic data types (e.g.Map
,List
), custom class (e.g.Dependents
) or even generics of custom class (e.g.List<Dependent>
) - The
@HBColumnMultiVersion
annotation allows you to map multiple versions of column in aNavigableMap<Long, ?>
. In above example, fieldphoneNumber
is mapped to columnphone_number
within the column familytracked
(which is configured for multiple versions)
See source files Citizen.java and Employee.java for detailed examples.
- The default codec of this library has the following behavior:
- uses HBase's native methods to serialize objects of data types
Boolean
,Short
,Integer
,Long
,Float
,Double
,String
andBigDecimal
- uses Jackson's JSON serializer for all other data types
- serializes
null
asnull
- uses HBase's native methods to serialize objects of data types
- To control/modify serialization/deserialization behavior, you may define your own codec (by implementing the
Codec
interface) or you may extend the default codec (BestSuitCodec
). - The optional parameter
codecFlag
(supported by both@HBColumn
and@HBColumnMultiVersion
annotations) can be used to pass custom flags to the underlying codec. (e.g. You may write your codec to serialize fieldInteger id
inCitizen
class differently from fieldInteger id
inEmployee
class) - The default codec class
BestSuitCodec
takes a flagBestSuitCodec.SERIALIZE_AS_STRING
, whose value is "serializeAsString" (as in the aboveCitizen
class example). When this flag is set totrue
on a field, the default codec serializes that field (even numerical fields) asString
s.- Your custom codec may take other such flags to customize serialization/deserialization behavior at a class field level.
If your MapReduce job is reading from an HBase table, in your map()
method, HBase's Result
object can be converted to object of your bean-like class using below method:
T readValue(ImmutableBytesWritable rowKey, Result result, Class<T> clazz)
For example:
Citizen e = hbObjectMapper.readValue(key, value, Citizen.class);
See file CitizenMapper.java for full sample code.
If your MapReduce job is writing to an HBase table, in your reduce()
method, object of your bean-like class can be converted to HBase's Put
(for row contents) and ImmutableBytesWritable
(for row key) using below methods:
ImmutableBytesWritable getRowKey(HBRecord<R> obj)
Put writeValueAsPut(HBRecord<R> obj)
For example, below code in Reducer writes your object as one HBase row with appropriate column families and columns:
Citizen citizen = new Citizen(/*details*/);
context.write(hbObjectMapper.getRowKey(citizen), hbObjectMapper.writeValueAsPut(citizen));
See file CitizenReducer.java for full sample code.
If your MapReduce job is reading from an HBase table, you would want to unit-test your map()
method as below.
Object of your bean-like class can be converted to HBase's Result
(for row contents) and ImmutableBytesWritable
(for row key) using below methods:
ImmutableBytesWritable getRowKey(HBRecord<R> obj)
Result writeValueAsResult(HBRecord<R> obj)
Below is an example of unit-test of a Mapper using MRUnit:
Citizen citizen = new Citizen(/*params*/);
citizenMapDriver
.withInput(
hbObjectMapper.getRowKey(citizen),
hbObjectMapper.writeValueAsResult(citizen)
)
.withOutput(
hbObjectMapper.toIbw("key"),
new IntWritable(citizen.getAge())
)
.runTest();
See file TestCitizenMR.java for full sample code.
If your MapReduce job is writing to an HBase table, you would want to unit-test your reduce()
method as below.
HBase's Put
object can be converted to your object of you bean-like class using below method:
T readValue(ImmutableBytesWritable rowKey, Put put, Class<T> clazz)
Below is an example of unit-test of a Reducer using MRUnit:
Pair<ImmutableBytesWritable, Mutation> reducerResult = citizenReduceDriver
.withInput(
hbObjectMapper.toIbw("key"),
inputList
)
.run()
.get(0);
CitizenSummary citizenSummary = hbObjectMapper.readValue(
reducerResult.getFirst(),
(Put) reducerResult.getSecond(),
CitizenSummary.class
);
Again, see file TestCitizenMR.java for full sample code.
This library provides an abstract class to define your own data access object. For example you can create a data access object for Citizen
class in the above example as follows:
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
public class CitizenDAO extends AbstractHBDAO<String, Citizen> {
public CitizenDAO(Configuration conf) throws IOException {
super(conf); // if you need to customize your codec, you may use super(conf, codec)
}
}
(see CitizenDAO.java)
Once defined, you can access, manipulate and persist a row of citizens
HBase table as below:
Configuration configuration = getConf(); // this is org.apache.hadoop.conf.Configuration
// Create a data access object:
CitizenDAO citizenDao = new CitizenDAO(configuration);
// Fetch a row from "citizens" HBase table with row key "IND#1":
Citizen pe = citizenDao.get("IND#1");
Citizen[] ape = citizenDao.get(new String[] {"IND#1", "IND#2"}); //bulk get
// In below, note that "IND#1" is inclusive and "IND#5" is exclusive
List<Citizen> lpe = citizenDao.get("IND#1", "IND#5"); //range get
// ('versioned' variant above method is available)
// for row keys in range ["IND#1", "IND#5"), fetch 3 versions of field 'phoneNumber' as a NavigableMap<row key, NavigableMap<timestamp, column value>>:
NavigableMap<String, NavigableMap<Long, Object>> phoneNumberHistory
= citizenDao.fetchFieldValues("IND#1", "IND#5", "phoneNumber", 3);
// (bulk variants of above range method are also available)
pe.setPincode(560034); // change a field
citizenDao.persist(pe); // Save it back to HBase
citizenDao.delete(pe); // Delete a row by it's object reference
citizenDao.delete(Arrays.asList(pe1, pe2)); // Delete multiple rows by list of object references
citizenDao.delete("IND#2"); // Delete a row by it's row key
citizenDao.delete(new String[] {"IND#3", "IND#4"}); // Delete a bunch of rows by their row keys
citizenDao.getHBaseTable() // returns HTable instance (in case you want to directly play around)
(see TestsAbstractHBDAO.java for more detailed examples)
Please note: Since we're dealing with HBase (and not an OLTP data store), fitting a classical ORM paradigm may not make sense. So this library doesn't intend to evolve as a full-fledged ORM. However, if you do intend to use HBase via an ORM library, I suggest you use Apache Phoenix.
- Being an object mapper, this library works for pre-defined columns only. For example, this library doesn't provide ways to fetch:
- columns matching a pattern or a regular expression
- unmapped columns of a column family
- This library doesn't provide you a way to 'selectively fetch and populate fields of your bean-like class' when you
get
a row by it's key. (However, you can still fetch column values selectively for one or more rows by usingfetchFieldValue
andfetchFieldValues
methods)
Add below entry within the dependencies
section of your pom.xml
:
<dependency>
<groupId>com.flipkart</groupId>
<artifactId>hbase-object-mapper</artifactId>
<version>1.8</version>
</dependency>
See artifact details: com.flipkart:hbase-object-mapper on Maven Central or com.flipkart:hbase-object-mapper on MVN Repository.
To build this project, follow below steps:
- Do a
git clone
of this repository - Checkout latest stable version
git checkout v1.8
- Execute
mvn clean install
from shell
Currently, projects that use this library are running on Hortonworks Data Platform v2.4 (corresponds to Hadoop 2.7 and HBase 1.1). However, if you're using a different distribution of Hadoop (like Cloudera) or if you are using a different version of Hadoop, you may change the versions in pom.xml to desired ones and build the project.
Please note: Test cases are very comprehensive - they even spin an in-memory HBase test cluster to run data access related test cases (near-realworld scenario). So, build times can sometimes be longer, depending on your machine configuration.
The change log can be found in the releases section.
If you intend to request a feature or report a bug, you may use Github Issues for hbase-object-mapper.
Copyright 2017 Flipkart Internet Pvt Ltd.
Licensed under the Apache License, version 2.0 (the "License"). You may not use this product or it's source code except in compliance with the License.