Labeled Faces in the Wild is a database of face photographs designed for studying the problem of unconstrained face recognition.
This project currently packages the pairsDevTrain / pairsDevTest image sets into a fuel compatible dataset along with targets to indicate whether the pairs are same or different. In addition to the original lfw dataset, conversion is supported for both the funneled and deepfunneled versions of the images.
This project uses kerosene to produce a fuel-compatible hdf5 file that is usable by blocks or keras.
From the included example
from keras.models import Sequential
from lfw_fuel import lfw
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = lfw.load_data(format="deepfunneled")
# (build the perfect model here)
model.fit(X_train, Y_train, show_accuracy=True, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, show_accuracy=True, verbose=0)
The features are currently stored in six channels - three for each of the two RGB images to be compared.
Note that the images are 250x250
- which is quite large by most
CNN standards. These can be cropped and scaled before passing them
to the network as shown in the example.
The primary task of Labeled Faces in the Wild is to learn whether the faces in two pictures are of the same person, or two different people. There are 2200 training pairs and 1000 test pairs in the predefined split.
Here are three matching training pairs:
Image 1 | Image 2 | Status |
---|---|---|
MATCH | ||
MATCH | ||
MATCH |
And here are three non-matching training pairs
Image 1 | Image 2 | Status |
---|---|---|
DIFFERENT | ||
DIFFERENT | ||
DIFFERENT |
In addition, this dataset is provided in both this raw format, and at
least two "preprocessed" versions called funneled
and deepfunneled
.
Often these are very similar, but here is an example of how they can differ.
Original | Funneled | Deep Funneled |
---|---|---|
On the LFW page you can browse the complete training set or the complete test set and see all three versions of all images.
There is an included example of how to train a network using keras for this task. To run this example from the repo:
$ python example/run-lfw.py
This should run the example, downloading the dataset if necessary.
Note that currently the example runs, but the performance is poor. Suggestions or merge requests improving this example certainly welcome.
Installation is optional - if kerosene is installed then simply clone the repo and run the example script. However, installation is an option so that the lfw_fuel dependency can be used from the path, which can be useful if you'd like to use this dataset in your own blocks or keras project.
python setup.py install
You can also rebuild the hdf5 files from scratch by running
fuel-download
and fuel-convert
with updated settings for
EXTRA_DOWNLOADERS
and EXTRA_CONVERTERS
.
FUEL_EXTRA_DOWNLOADERS="lfw_fuel" fuel-download lfw
FUEL_EXTRA_CONVERTERS="lfw_fuel" fuel-convert lfw
This will convert the original version of lfw, but funneled and deepfunneled formats are also supported:
FUEL_EXTRA_DOWNLOADERS="lfw_fuel" fuel-download lfw --format deepfunneled
FUEL_EXTRA_CONVERTERS="lfw_fuel" fuel-convert lfw --format deepfunneled
These settings can also be set in the ~/.fuelrc
file:
extra_downloaders: ['lfw_fuel']
extra_converters: ['lfw_fuel']
MIT