Welcome to the spannerlib project.
The spannerlib is a framework for building programming languages that are a combination of imperative and declarative languages. This combination is based off of derivations of the document spanner model.
Currently, we implement a language called spannerlog over python. spannerlog is an extension of statically types datalog which allows users to define their own ie functions which can be used to derive new structured information from relations.
The spannerlog repl, shown below is served using the jupyter magic commands
Below, we will show you how to install and use spannerlog through Spannerlib.
For more comprehensive walkthroughs, see our tutorials section.
To download and install RGXLog run the following commands in your terminal:
git clone https://github.com/DeanLight/spannerlib
cd spannerlib
pip install -e .
download corenlp to spannerlib/rgxlog/
from this link
# verify everything worked
# first time might take a couple of minutes since run time assets are being configured
python nbdev_test.py
git clone https://github.com/DeanLight/spannerlib
cd spannerlib
download corenlp to spannerlib/rgxlog/
from this link
docker build . -t spannerlib_image
# on windows, change `pwd to current working directory`
# to get a bash terminal to the container
docker run --name swc --rm -it \
-v `pwd`:/spannerlib:Z \
spannerlib_image bash
# to run an interactive notebook on host port 8891
docker run --name swc --rm -it \
-v `pwd`:/spannerlib:Z \
-p8891:8888 \
spannerlib_image jupyter notebook --no-browser --allow-root
#Verify tests inside the container
python /spannerlib/nbdev_test.py
Here is a TLDR intro, for a more comprehensive tutorial, please see the introduction section of the tutorials.
import spannerlib
import pandas as pd
# get dynamic access to the session running through the jupyter magic system
from spannerlib import get_magic_session
session = get_magic_session()
Get a dataframe
lecturer_df = pd.DataFrame(
[["walter","chemistry"],
["linus", "operating_systems"],
['rick', 'physics']
],columns=["name","course"])
lecturer_df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
name | course | |
---|---|---|
0 | walter | chemistry |
1 | linus | operating_systems |
2 | rick | physics |
Or a CSV
pd.read_csv('sample_data/example_students.csv',names=["name","course"])
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
name | course | |
---|---|---|
0 | abigail | chemistry |
1 | abigail | operation systems |
2 | jordan | chemistry |
3 | gale | operation systems |
4 | howard | chemistry |
5 | howard | physics |
Import them to the session
session.import_rel("lecturer",lecturer_df)
session.import_rel("enrolled","sample_data/enrolled.csv",delim=",")
They can even be documents
documents = pd.DataFrame([
["abigail is happy, but walter did not approve"],
["howard is happy, gale is happy, but jordan is sad"]
])
session.import_rel("documents",documents)
%%spannerlog
?documents(X)
'?documents(X)'
X |
---|
abigail is happy, but walter did not approve |
howard is happy, gale is happy, but jordan is sad |
document.querySelectorAll("#T_dec8a:not(.dataTable)").forEach(table => {
// Define the table data
// Define the dt_args
let dt_args = {"columnDefs": [{"targets": ["X"], "render": function(data, type, row) {
return '<div style="white-space: normal; word-wrap: break-word;">' + data + '</div>';
}, "width": "300px"}], "escape": true, "layout": {"topStart": null, "topEnd": null, "bottomStart": null, "bottomEnd": null}, "display_logo_when_loading": true, "order": []};
new DataTable(table, dt_args);
});
Define your own IE functions to extract information from relations
# the function itself, writing it as a python generator makes your data processing lazy
def get_happy(text):
"""
get the names of people who are happy in `text`
"""
import re
compiled_rgx = re.compile("(\w+) is happy")
num_groups = compiled_rgx.groups
for match in re.finditer(compiled_rgx, text):
if num_groups == 0:
matched_strings = [match.group()]
else:
matched_strings = [group for group in match.groups()]
yield matched_strings
# register the ie function with the session
session.register(
"get_happy", # name of the function
get_happy, # the function itself
[str], # input types
[str] # output types
)
rgxlog supports relations over the following primitive types * strings * spans * integers
Write a rgxlog program (like datalog but you can use your own ie functions)
session.remove_all_rules()
%%spannerlog
# you can also define data inline via a statically typed variant of datalog syntax
new sad_lecturers(str)
sad_lecturers("walter")
sad_lecturers("linus")
# and include primitive variable
gpa_doc = "abigail 100 jordan 80 gale 79 howard 60"
# define datalog rules
enrolled_in_chemistry(X) <- enrolled(X, "chemistry").
enrolled_in_physics_and_chemistry(X) <- enrolled_in_chemistry(X), enrolled(X, "physics").
# and query them inline (to print to screen)
# ?enrolled_in_chemistry("jordan") # returns empty tuple ()
# ?enrolled_in_chemistry("gale") # returns nothing
# ?enrolled_in_chemistry(X) # returns "abigail", "jordan" and "howard"
# ?enrolled_in_physics_and_chemistry(X) # returns "howard"
lecturer_of(X,Z) <- lecturer(X,Y), enrolled(Z,Y).
# use ie functions in body clauses to extract structured data from unstructured data
# standard ie functions like regex are already registered
student_gpas(Student, Grade) <-
rgx("(\w+).*?(\d+)",$gpa_doc)->(StudentSpan, GradeSpan),
as_str(StudentSpan)->(Student), as_str(GradeSpan)->(Grade).
# and you can use your defined functions as well
happy_students_with_sad_lecturers_and_their_gpas(Student, Grade, Lecturer) <-
documents(Doc),
get_happy(Doc)->(Student),
sad_lecturers(Lecturer),
lecturer_of(Lecturer,Student),
student_gpas(Student, Grade).
And query it
%%spannerlog
?happy_students_with_sad_lecturers_and_their_gpas(Stu,Gpa,Lec)
'?happy_students_with_sad_lecturers_and_their_gpas(Stu,Gpa,Lec)'
Stu | Gpa | Lec |
---|---|---|
abigail | 100 | linus |
gale | 79 | linus |
howard | 60 | walter |
document.querySelectorAll("#T_d313a:not(.dataTable)").forEach(table => {
// Define the table data
// Define the dt_args
let dt_args = {"columnDefs": [{"targets": ["Stu", "Gpa", "Lec"], "render": function(data, type, row) {
return '<div style="white-space: normal; word-wrap: break-word;">' + data + '</div>';
}, "width": "300px"}], "escape": true, "layout": {"topStart": null, "topEnd": null, "bottomStart": null, "bottomEnd": null}, "display_logo_when_loading": true, "order": []};
new DataTable(table, dt_args);
});
You can also get query results as Dataframes for downstream processing
df = session.export(
"?happy_students_with_sad_lecturers_and_their_gpas(Stu,Gpa,Lec)")
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Stu | Gpa | Lec | |
---|---|---|---|
0 | abigail | 100 | linus |
1 | gale | 79 | linus |
2 | howard | 60 | walter |