Have a question? Ask us on Gitter! We encourage asking the dev team questions
Documentation: https://github.com/nextml/NEXT/wiki
Website: http://nextml.org
NEXT is a system that makes it easy to develop, evaluate, and apply active learning.
Talks give a good brief introduction to NEXT at the highest level. For scientists and develoeprs, we most recommend the PyData Ann Arbor talk. It's an enhanced and refined version of the SciPy talk.
Venue | Audience | Length | Link |
---|---|---|---|
PyData Ann Arbor | Scientists and developers | 1 hour | https://www.youtube.com/watch?v=rTyu4QTXZTc |
SciPy 2017 | Scientific Python developers | 30 minutes | https://www.youtube.com/watch?v=blPjDYCvppY |
Simons Institute conference on Interactive Learning | Machine learning researchers | 30 minutes | https://youtu.be/ESXgbZQ1ZTk?t=1732 |
We give more detail on the items on launching experiments and getting setup in the SciPy 2017 proceedings: http://conference.scipy.org/proceedings/scipy2017/pdfs/scott_sievert.pdf.
This readme contains a quick start to launch the NEXT system on EC2, and to replicate and launch the experiments from the NEXT paper. There are more detailed launch instructions here.
For more information, in-depth tutorials, and API docs, we recommend visiting our GitHub wiki here. You can contact us at [email protected]
We have an experimental AMI that can be used to run NEXT in a purely application based rather than development environment. Included in the AMI is a basic version of our frontend. The AMI is still highly experimental and we give no guarantees on it being up to date with the current code. For more info please visit here.
Run py.test
from NEXT/next
. Tests will be run from your local machine but
will ping an EC2 server to simulate a client.
Individual files can also be run with py.test
. Running py.test test_api.py
will only run test_api.py
and allow relative imports (which allows
from next.utils import timeit
).
stdout
can be captured with the -s
flag for py.test
.
pytest is installable with pip install pytest
and has a strict backwards
compatibility policy.
You can download the latest version of NEXT from github with the following clone command:
$ git clone https://github.com/nextml/NEXT.git
We are actively working to develop and improve NEXT, but users should be aware of the following caveats:
- NEXT currently supports only UNIX based OS (e.g. Windows compatibility is not yet available).
- An Amazon Web Services account is needed to launch NEXT on EC2; we have worked hard to make this process as simple as possible, at cost of ease of running the full NEXT stack on a local machine. We plan to make NEXT usable on a personal computer in the future.
First, you must set your Amazon Web Services (AWS) account credentials as enviornment variables. If you don't already have AWS account, you can follow our AWS account quickstart here or the official AWS account set-up guide here for an in-depth introduction. Make sure to have access to
- AWS access key id
- AWS secret access key
- Key Pair (pem file)
Make sure to note down the region that your key pair was made in. By default, the script assumes the region is Oregon (us-west-2). If you choose to use a different region, every time you use the
next_ec2.py
script, make sure to specify the region--region=<region>
(i.e.,--region=us-west-2
). For example, after selecting the regions "Oregon," the regionus-west-2
is specified on the EC2 dashboard. If another region is used, an--ami
option has to be included. For ease, we recommend using the Oregon region.
Export your AWS credentials as environment variables using:
$ export AWS_SECRET_ACCESS_KEY=[your_secret_aws_access_key_here]
$ export AWS_ACCESS_KEY_ID=[your_aws_access_key_id_here]
Note that you'll need to use your
AWS_SECRET_ACCESS_KEY
andAWS_ACCESS_KEY_ID
again later, so save them in a secure place for convenient reference later.
Install the local python packages needed for NEXT:
$ cd NEXT
$ sudo pip install -r local_requirements.txt
Throughout the rest of this tutorial, we will be using the next_ec2.py
startup script heavily. For more options and instructions, run
python next_ec2.py
without any arguments. Additionally, python next_ec2.py -h
will provide helper options.
For persistent data storage, we first need to create a bucket in AWS S3 using:
$ cd ec2
$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] createbucket [cluster-name]
where:
[keypair]
is the name of your EC2 key pair[key-file]
is the private key file for your key pair[cluster-name]
is the custom name you create and assign to your cluster
This will print out another environment variable command export AWS_BUCKET_NAME=[bucket_uid]
. Copy and paste this command into your terminal.
You will also need to use your
bucket_uid
later, so save it in a file along side yourAWS_SECRET_ACCESS_KEY
andAWS_ACCESS_KEY_ID
for later reference.
Now you are ready to fire up the NEXT system using our launch
command. This
command will create a new EC2 instance, pull the NEXT repository to that
instance, install all of the relevant Docker images, and finally run all Docker
containers.
WARNING: Users should note that this script launches a single
m3.large
machine, the current default NEXT EC2 instance type. This instance type costs $0.14 per hour to run. For more detailed EC2 pricing information, refer to this AWS page. You can change specify the instance type you want to with the--instance-type
option.
$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] launch [cluster-name]
Once your terminal shows a stream of many multi-colored docker appliances, you are successfully running the NEXT system!
Because NEXT aims to make it easy to reproduce empirical active learning results, we provide a simple command to initialize the experiments performed in this study.
First, in a new terminal, export your AWS credentials and use get-master
to obtain your public EC2 DNS.
$ export AWS_BUCKET_NAME=[your_aws_bucket_name_here]
$ cd NEXT/ec2
$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] get-master [cluster-name]
Then export this public EC2 DNS.
$ export NEXT_BACKEND_GLOBAL_HOST=[your_public_ec2_DNS_here]
$ export NEXT_BACKEND_GLOBAL_PORT=8000
Now you can execute run_examples.py
to initialize and launch the NEXT experiments.
$ cd ../examples
$ python run_examples.py
Once initialized, this script will return a link that you can distribute yourself or post as a HIT on Mechanical Turk. Visit:
http://your_public_ec2_DNS_here:8000/query/query_page/query_page/[exp_uid]/[exp_key]
where [exp_uid]
and [exp_key]
are unique identifiers for each of the
respective Dueling Bandits Pure Exploration, Active Non-Metric Multidimensional
Scaling (MDS), and Tuple Bandits Pure Exploration experiments respectively. See
this wiki
page
for a little more information.
Navigate to the strange_fruit_triplet
query link (the last one that printed
out to your terminal) and answer some questions! Doing so will provide the
system with data you can view and interact with in the next step.
You can access interactive experiment dashboards and data visualizations at by clicking experiments at:
http://your_public_ec2_DNS:8000/dashboard/experiment_list
And obtain all logs for an experiment through our RESTful API, visit:
http://your_public_ec2_DNS:8000/api/experiment/[exp_uid]/[exp_key]/logs
Where, again, [exp_uid]
corresponds to the unique Experiment ID shown on the experiment dashboard pages.
If you'd like to backup your database to access your data later, refer to this wiki for detailed steps.
Finally, you can terminate your EC2 instance and shutdown NEXT using:
$ cd ../ec2
$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] destroy [cluster-name]