-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #134 from DeanLight/edit_intro_based_on_reviews
Edit intro based on reviews
- Loading branch information
Showing
8 changed files
with
529 additions
and
338 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,231 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Advanced IE functions" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Rgxlog has been enhanced with additional advanced IE functions. \n", | ||
"To utilize these functions, specific installations are required prior to usage. <br>\n", | ||
"\n", | ||
"Rust: To download and utilize the Rust-based ie functions, execute the following code:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"enum-spanner-rs was not found on your system\n", | ||
"installing package. this might take up to 10 minutes...\n", | ||
"info: syncing channel updates for '1.34-x86_64-unknown-linux-gnu'\n" | ||
] | ||
}, | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"\n", | ||
" 1.34-x86_64-unknown-linux-gnu unchanged - rustc 1.34.2 (6c2484dc3 2019-05-13)\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"info: checking for self-update\n", | ||
" Updating git repository `https://github.com/NNRepos/enum-spanner-rs`\n", | ||
" Installing enum-spanner-rs v0.1.0 (https://github.com/NNRepos/enum-spanner-rs#4c8ab5b3)\n", | ||
"error: binary `enum-spanner-rs` already exists in destination as part of `enum-spanner-rs v0.1.0 (https://github.com/NNRepos/enum-spanner-rs#4c8ab5b3)`\n", | ||
"Add --force to overwrite\n", | ||
"installation completed\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"#| output: false\n", | ||
"from rgxlog.ie_func.rust_spanner_regex import download_and_install_rust_regex\n", | ||
"download_and_install_rust_regex()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Wrapping shell-based functions" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Rgxlog's `rgx_string` ie function is a good example of running an external shell as part of rgxlog code, <br>\n", | ||
"`rgx_string` is a rust-based ie function, we can use it only after we installed the rust package. <br>\n", | ||
"This time we won't remove the built-in function - we'll just show the implementation:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"```python\n", | ||
"def rgx(text, regex_pattern, out_type: str):\n", | ||
" \"\"\"\n", | ||
" An IE function which runs regex using rust's `enum-spanner-rs` and yields tuples of strings/spans (not both).\n", | ||
"\n", | ||
" @param text: the string on which regex is run.\n", | ||
" @param regex_pattern: the pattern to run.\n", | ||
" @param out_type: string/span - decides which one will be returned.\n", | ||
" @return: a tuple of strings/spans.\n", | ||
" \"\"\"\n", | ||
" with tempfile.TemporaryDirectory() as temp_dir:\n", | ||
" rgx_temp_file_name = os.path.join(temp_dir, TEMP_FILE_NAME)\n", | ||
" with open(rgx_temp_file_name, \"w+\") as f:\n", | ||
" f.write(text)\n", | ||
"\n", | ||
" if out_type == \"string\":\n", | ||
" rust_regex_args = rf\"{REGEX_EXE_PATH} {regex_pattern} {rgx_temp_file_name}\"\n", | ||
" format_function = _format_spanner_string_output\n", | ||
" elif out_type == \"span\":\n", | ||
" rust_regex_args = rf\"{REGEX_EXE_PATH} {regex_pattern} {rgx_temp_file_name} --bytes-offset\"\n", | ||
" format_function = _format_spanner_span_output\n", | ||
" else:\n", | ||
" assert False, \"illegal out_type\"\n", | ||
"\n", | ||
" regex_output = format_function(run_cli_command(rust_regex_args, stderr=True))\n", | ||
"\n", | ||
" for out in regex_output:\n", | ||
" yield out\n", | ||
"\n", | ||
"def rgx_string(text, regex_pattern):\n", | ||
" \"\"\"\n", | ||
" @param text: The input text for the regex operation.\n", | ||
" @param regex_pattern: the pattern of the regex operation.\n", | ||
" @return: tuples of strings that represents the results.\n", | ||
" \"\"\"\n", | ||
" return rgx(text, regex_pattern, \"string\")\n", | ||
"\n", | ||
"RGX_STRING = dict(ie_function=rgx_string,\n", | ||
" ie_function_name='rgx_string',\n", | ||
" in_rel=RUST_RGX_IN_TYPES,\n", | ||
" out_rel=rgx_string_out_type)\n", | ||
"\n", | ||
"# another version of these functions exists (rgx_from_file), it can be seen in the source code\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"`run_cli_command` is an STDLIB function used in rgxlog, which basically runs a command using python's `Popen`.\n", | ||
"\n", | ||
"in order to denote regex groups, use `(?P<name>pattern)`. the output is in alphabetical order.\n", | ||
"Let's run the ie function:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import rgxlog" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"printing results for query 'string_rel(X, Y)':\n", | ||
" X | Y\n", | ||
"-----+-----\n", | ||
" a | cc\n", | ||
" a | c\n", | ||
" z | c\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"%%rgxlog\n", | ||
"text = \"zcacc\"\n", | ||
"pattern = \"(?P<group_not_c>[^c]+)(?P<group_c>[c]+)\"\n", | ||
"string_rel(X,Y) <- rgx_string(text, pattern) -> (X,Y)\n", | ||
"?string_rel(X,Y)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Similarly, to use nlp-based ie functions you need to first install nlp:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from rgxlog.ie_func.nlp import download_and_install_nlp\n", | ||
"download_and_install_nlp()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"printing results for query 'tokens(Token, Span)':\n", | ||
" Token | Span\n", | ||
"---------+----------\n", | ||
" Hello | [0, 5)\n", | ||
" world | [6, 11)\n", | ||
" . | [11, 12)\n", | ||
" Hello | [13, 18)\n", | ||
" world | [19, 24)\n", | ||
" again | [25, 30)\n", | ||
" . | [30, 31)\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"%%rgxlog\n", | ||
"sentence = \"Hello world. Hello world again.\"\n", | ||
"tokens(X, Y) <- Tokenize(sentence) -> (X, Y)\n", | ||
"?tokens(Token, Span)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "python3", | ||
"language": "python", | ||
"name": "python3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.