1.4 Install Nolearn
Nolearn containing a number of utility modules that are helpful with machine learning tasks is based on Lasagne. Most of the modules work together with scikit-learn, others are more generally useful.
There are simple steps to install it. You can also get more information at Nolearn website.
Insatllation
To install the latest version Nolearn, follow commands:
$ pip install -r https://raw.githubusercontent.com/dnouri/nolearn/master/requirements.txt
$ pip install git+https://github.com/dnouri/nolearn.git@master#egg=nolearn==0.7.git
Or if you want to get release one (which is somewhat old at this point), type:
$ pip install nolearn
- NOTICE: It recommends to install 0.5b1 version which is stable and can find lots of tutorial articles about it. Try this:
$ pip install nolearn==0.5b1
Usage
There is a example that uses ConvNetFeatures and scikit-learn to classify images from the Kaggle Dogs vs. Cats challenge. (You might need to install DeCAF for ConvNetFeatures to work, check here.)
Before you start, you must download the images from the Kaggle competition page. The train/
folder will be referred to further down as TRAIN_DATA_DIR
.
import os
from nolearn.decaf import ConvNetFeatures
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.pipeline import Pipeline
from sklearn.utils import shuffle
DECAF_IMAGENET_DIR = '/path/to/imagenet-files/'
TRAIN_DATA_DIR = '/path/to/dogs-vs-cats-training-images/'
A get_dataset
function will return a list of all image filenames and labels, shuffled for our convenience:
def get_dataset():
cat_dir = TRAIN_DATA_DIR + 'cat/'
cat_filenames = [cat_dir + fn for fn in os.listdir(cat_dir)]
dog_dir = TRAIN_DATA_DIR + 'dog/'
dog_filenames = [dog_dir + fn for fn in os.listdir(dog_dir)]
labels = [0] * len(cat_filenames) + [1] * len(dog_filenames)
filenames = cat_filenames + dog_filenames
return shuffle(filenames, labels, random_state=0)
We can now define our sklearn.pipeline.Pipeline
, which merely consists of ConvNetFeatures and a sklearn.linear_model.LogisticRegression
classifier.
def main():
convnet = ConvNetFeatures(
pretrained_params=DECAF_IMAGENET_DIR + 'imagenet.decafnet.epoch90',
pretrained_meta=DECAF_IMAGENET_DIR + 'imagenet.decafnet.meta',
)
clf = LogisticRegression()
pl = Pipeline([
('convnet', convnet),
('clf', clf),
])
X, y = get_dataset()
X_train, y_train = X[:100], y[:100]
X_test, y_test = X[100:300], y[100:300]
print "Fitting..."
pl.fit(X_train, y_train)
print "Predicting..."
y_pred = pl.predict(X_test)
print "Accuracy: %.3f" % accuracy_score(y_test, y_pred)
main()