Skip to content

Commit

Permalink
Merge pull request #134 from DeanLight/edit_intro_based_on_reviews
Browse files Browse the repository at this point in the history
Edit intro based on reviews
  • Loading branch information
loayshaqir1 authored Oct 31, 2023
2 parents 8786e72 + 6ba0983 commit 1a6588e
Show file tree
Hide file tree
Showing 8 changed files with 529 additions and 338 deletions.
4 changes: 4 additions & 0 deletions nbdev_test.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
import subprocess
import os
from nbdev.doclinks import nbglob

files_to_test = nbglob()
files_to_skip = ['extended_version.ipynb']
processes = []

for file in files_to_test:
if os.path.basename(file) in files_to_skip:
continue
command = f"nbdev_test --path {file} --do_print"
try:
# Redirect stderr to stdout
Expand Down
231 changes: 231 additions & 0 deletions nbs/extended_version.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Advanced IE functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rgxlog has been enhanced with additional advanced IE functions. \n",
"To utilize these functions, specific installations are required prior to usage. <br>\n",
"\n",
"Rust: To download and utilize the Rust-based ie functions, execute the following code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"enum-spanner-rs was not found on your system\n",
"installing package. this might take up to 10 minutes...\n",
"info: syncing channel updates for '1.34-x86_64-unknown-linux-gnu'\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" 1.34-x86_64-unknown-linux-gnu unchanged - rustc 1.34.2 (6c2484dc3 2019-05-13)\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"info: checking for self-update\n",
" Updating git repository `https://github.com/NNRepos/enum-spanner-rs`\n",
" Installing enum-spanner-rs v0.1.0 (https://github.com/NNRepos/enum-spanner-rs#4c8ab5b3)\n",
"error: binary `enum-spanner-rs` already exists in destination as part of `enum-spanner-rs v0.1.0 (https://github.com/NNRepos/enum-spanner-rs#4c8ab5b3)`\n",
"Add --force to overwrite\n",
"installation completed\n"
]
}
],
"source": [
"#| output: false\n",
"from rgxlog.ie_func.rust_spanner_regex import download_and_install_rust_regex\n",
"download_and_install_rust_regex()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Wrapping shell-based functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rgxlog's `rgx_string` ie function is a good example of running an external shell as part of rgxlog code, <br>\n",
"`rgx_string` is a rust-based ie function, we can use it only after we installed the rust package. <br>\n",
"This time we won't remove the built-in function - we'll just show the implementation:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
"def rgx(text, regex_pattern, out_type: str):\n",
" \"\"\"\n",
" An IE function which runs regex using rust's `enum-spanner-rs` and yields tuples of strings/spans (not both).\n",
"\n",
" @param text: the string on which regex is run.\n",
" @param regex_pattern: the pattern to run.\n",
" @param out_type: string/span - decides which one will be returned.\n",
" @return: a tuple of strings/spans.\n",
" \"\"\"\n",
" with tempfile.TemporaryDirectory() as temp_dir:\n",
" rgx_temp_file_name = os.path.join(temp_dir, TEMP_FILE_NAME)\n",
" with open(rgx_temp_file_name, \"w+\") as f:\n",
" f.write(text)\n",
"\n",
" if out_type == \"string\":\n",
" rust_regex_args = rf\"{REGEX_EXE_PATH} {regex_pattern} {rgx_temp_file_name}\"\n",
" format_function = _format_spanner_string_output\n",
" elif out_type == \"span\":\n",
" rust_regex_args = rf\"{REGEX_EXE_PATH} {regex_pattern} {rgx_temp_file_name} --bytes-offset\"\n",
" format_function = _format_spanner_span_output\n",
" else:\n",
" assert False, \"illegal out_type\"\n",
"\n",
" regex_output = format_function(run_cli_command(rust_regex_args, stderr=True))\n",
"\n",
" for out in regex_output:\n",
" yield out\n",
"\n",
"def rgx_string(text, regex_pattern):\n",
" \"\"\"\n",
" @param text: The input text for the regex operation.\n",
" @param regex_pattern: the pattern of the regex operation.\n",
" @return: tuples of strings that represents the results.\n",
" \"\"\"\n",
" return rgx(text, regex_pattern, \"string\")\n",
"\n",
"RGX_STRING = dict(ie_function=rgx_string,\n",
" ie_function_name='rgx_string',\n",
" in_rel=RUST_RGX_IN_TYPES,\n",
" out_rel=rgx_string_out_type)\n",
"\n",
"# another version of these functions exists (rgx_from_file), it can be seen in the source code\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`run_cli_command` is an STDLIB function used in rgxlog, which basically runs a command using python's `Popen`.\n",
"\n",
"in order to denote regex groups, use `(?P<name>pattern)`. the output is in alphabetical order.\n",
"Let's run the ie function:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import rgxlog"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"printing results for query 'string_rel(X, Y)':\n",
" X | Y\n",
"-----+-----\n",
" a | cc\n",
" a | c\n",
" z | c\n",
"\n"
]
}
],
"source": [
"%%rgxlog\n",
"text = \"zcacc\"\n",
"pattern = \"(?P<group_not_c>[^c]+)(?P<group_c>[c]+)\"\n",
"string_rel(X,Y) <- rgx_string(text, pattern) -> (X,Y)\n",
"?string_rel(X,Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, to use nlp-based ie functions you need to first install nlp:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from rgxlog.ie_func.nlp import download_and_install_nlp\n",
"download_and_install_nlp()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"printing results for query 'tokens(Token, Span)':\n",
" Token | Span\n",
"---------+----------\n",
" Hello | [0, 5)\n",
" world | [6, 11)\n",
" . | [11, 12)\n",
" Hello | [13, 18)\n",
" world | [19, 24)\n",
" again | [25, 30)\n",
" . | [30, 31)\n",
"\n"
]
}
],
"source": [
"%%rgxlog\n",
"sentence = \"Hello world. Hello world again.\"\n",
"tokens(X, Y) <- Tokenize(sentence) -> (X, Y)\n",
"?tokens(Token, Span)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
25 changes: 9 additions & 16 deletions nbs/ie_func/04b_nlp.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -59,16 +59,7 @@
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/nbdev_spanner_workbench/spanner_workbench/src/rgxlog_interpreter/src/rgxlog/stdlib/stanford-corenlp-4.1.0\n",
"/nbdev_spanner_workbench/spanner_workbench/src/rgxlog_interpreter/src/rgxlog/stdlib/stanford-corenlp-4.1.0\n"
]
}
],
"outputs": [],
"source": [
"#| export\n",
"JAVA_MIN_VERSION = 1.8\n",
Expand Down Expand Up @@ -171,12 +162,14 @@
"outputs": [],
"source": [
"#| export\n",
"#| eval: false\n",
"try:\n",
" _run_installation()\n",
" CoreNLPEngine = StanfordCoreNLP(NLP_DIR_PATH)\n",
"except:\n",
" logger.error(\"Installation NLP failed\")"
"CoreNLPEngine = None\n",
"def download_and_install_nlp():\n",
" global CoreNLPEngine\n",
" try:\n",
" _run_installation()\n",
" CoreNLPEngine = StanfordCoreNLP(NLP_DIR_PATH)\n",
" except:\n",
" logger.error(\"Installation NLP failed\")"
]
},
{
Expand Down
43 changes: 13 additions & 30 deletions nbs/ie_func/04d_rust_spanner_regex.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,19 @@
"source": [
"#| export\n",
"#| hide\n",
"def _download_and_install_rust_regex() -> None:\n",
"def _is_installed_package() -> bool:\n",
" return Path(REGEX_EXE_PATH).is_file()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| export\n",
"#| hide\n",
"def download_and_install_rust_regex() -> None:\n",
" # don't use \"cargo -V\" because it starts downloading stuff sometimes\n",
" with Popen([WHICH_WORD, \"cargo\"], stdout=PIPE, stderr=PIPE) as cargo:\n",
" errcode = cargo.wait(SHORT_TIMEOUT)\n",
Expand All @@ -251,18 +263,6 @@
" logger.warning(\"installation completed\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| export\n",
"#| hide\n",
"def _is_installed_package() -> bool:\n",
" return Path(REGEX_EXE_PATH).is_file()"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -478,23 +478,6 @@
" in_rel=RUST_RGX_IN_TYPES,\n",
" out_rel=rgx_string_out_type)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"#| export\n",
"#| eval: false\n",
"# the package is installed when this module is imported, exclude the scenraio of github runner\n",
"try:\n",
" if not _is_installed_package() and 'runner' not in os.getcwd():\n",
" _download_and_install_rust_regex()\n",
"except:\n",
" logger.error(f\"cargo or rustup are not installed in $PATH. please install rust: {DOWNLOAD_RUST_URL}\")"
]
}
],
"metadata": {
Expand Down
Loading

0 comments on commit 1a6588e

Please sign in to comment.