tesseract is a library that enables the remote execution of python code on systems implementing the GA4GH Task Execution API.
Available on PyPi.
pip install py-tesseract
from __future__ import print_function
from tesseract import Tesseract, FileStore
def identity(n):
return n
def say_hello(a, b):
return "hello " + identity(a) + b
fs = FileStore("./test_store/")
r = Tesseract(fs, "http://localhost:8000")
r.with_resources(
cpu_cores=1, ram_gb=4, disk_gb=None,
docker="python:2.7", libraries=["cloudpickle"]
)
future = r.run(say_hello, "world", b="!")
result = future.result()
print(result)
r2 = r.clone().with_resources(cpu_cores=4)
f2 = r2.run(say_hello, "more", b="cpus!")
r2 = f2.result()
print(r2)
If you provide a swift, s3, or gs bucket url to your FileStore
tesseract_
will attempt to automatically detect your credentials for each of these systems.
To setup your environment for this run the following commands:
- Google Storage -
gcloud auth application-default login
- Amazon S3 -
aws configure
- Swift -
source openrc.sh
If your function expects input files to be available at a given path then add:
r.with_input("s3://your-bucket/path/to/yourfile.txt", "/home/ubuntu/yourfile.txt")
The first argument specifies where the file is available, the second specifies where your function will expect to find the file.
If your function generates files during its run you can specify these files as shown below and tesseract will handle getting them uploaded to the designated bucket.
r.with_output("./relative/path/to/outputfile.txt")