This is a public repo to contain libraries, utilities, and other resources created by Sales Engineering and others to support and enhance ongoing and future RAI projects. These resources are not client-specific, can be freely shared, distributed and updated in the spirit of OSS.
Free License is pending.
Bash, Python, Julia, etc., tools for command line usage.
-
envcli_template.sh
Copy this template to create your own envcli.sh scripts, customized to each RAI project you work on.The bash scripts execute
source envcli.sh
to get your preferences for:- RAI_CLI_PROFILE - the
~/.rai/config
profile with your OAuth credentials (which match a specific RAI account) - RAI_CLI_ENGINE - your default Rel engine name in that RAI account
- RAI_CLI_DATABASE - your default database in that RAI account
- RAI_BENCH_DIR - your directory with the Basic Workload Benchmarks framework code
- RAI_CLI_PROFILE - the
-
bin/...
Bash scripts to simplify use of the CLI for RAI account management tasks.- clone_database.sh - create clone of an existing database in account RAI_CLI_PROFILE.
Syntax follows Linux command conventions (cp, mv, etc):
clone_database.sh source-db clone-db
. - create_engine.sh - spin up new Rel engine. The default name is specified by RAI_CLI_ENGINE variable, but a different name can be specified on the command line.
- create_project_skel.sh - create the directory structure for a new customer project
- delete_engine.sh - spin down an existing Rel engine. The default name is specified by RAI_CLI_ENGINE variable, but a different name can be specified on the command line.
- list_databases.sh - list the databases in account RAI_CLI_PROFILE.
- list_edb_names.sh - list the EDBs in database RAI_CLI_DATABASE, in account RAI_CLI_PROFILE.
- list_engines.sh - list the active engines in account RAI_CLI_PROFILE.
- load_source.sh - load specified Rel source file into RAI_CLI_DATABASE, using RAI_CLI_ENGINE, in account RAI_CLI_PROFILE. The relative path to the Rel source is preserved in the RAI model unless old/new reparenting directories are specified.
- rai_bench_results_summary.sh - generate human-readable summary results from the JSON Lines (*.jsonl) files in a Basic Workloads Benchmark framework ("RAI bench") output directory. The location of the Basic Workloads directory is specified in RAI_BENCH_DIR. The most recent output directory is used by default, but a different name can be specified on the command line.
Python tools for testing.
- parse_csv.py - use Python's
csv
reader to explore customer-provided CSV files (if RAI'sload_csv
doesn't behave as expected). Useparse_csv.py --file foo.csv --top 5 --full
to get started. Useparse_csv.py --help
for full help.
- clone_database.sh - create clone of an existing database in account RAI_CLI_PROFILE.
Syntax follows Linux command conventions (cp, mv, etc):
Folder se_lib contains Rel models for with various sets of utilities:
-
csv.rel: CSV file parsing and loading
-
query.rel: Tools for querying and poking around RAI database relations
-
kg.rel: functions to construct, manipulate, operate on and visualize knowledge graphs based on standard data model
-
graph.rel: functions to operate on Rel graph objects
-
util.rel: collection of useful general purpose functions supplementing standard library functions
-
viz.rel: helper functions for graphviz, vega/vega-lite, and other visualization libraries
-
visual.rel (DEPRECATED: do not use or stop using): graphviz-based visualization functions for knowledge graphs, ontology, etc.
-
debug.rel: TBD
Example of options module (OPTS
) passed to knowledge graph functions:
module kg_options
module graphviz
def title = "Knowledge Graph" // Graph Title
def layout = "dot"
def direction = "TD"
def entity_shape = {(:Customer, "oval");
(:Bank, "box");}
def label_edges = boolean_false
end
end
Knowldge graph visualization functions take ...
To parse and map a CSV file into standard model use utility function parse_attributes
defined in csv.rel
.
Below is example from IMDB demo (see imdb_model notebook for full code).
Suppose we have CSV file containing IMDB titles that has been loaded from Azure store like this:
// Title CSV
@no_diagnostics(:UNDEFINED_IDENTIFIER)
def delete[:title_csv] = title_csv
def title_config:path = "s3://psilabs-public-files/imdb/title_basics_1953_votes_30.csv"
def insert[:title_csv] = lined_csv[load_csv[title_config]]
The data will be used to create and populate entity Title
. For this purpose we define several auxilary modules. First, module create_entity
defines entity Title
and its constructor function title_from_id
:
entity type Title = String
entity type Name = String
module create_entity
def Title[x] = ^Title[x]
def title_from_id(id, e) = create_entity:Title[id](e) and
title_csv(imdb_meta:title:key, _, id)
def Name[x] = ^Name[x]
def name_from_id(id, e) = create_entity:Name[id](e) and
name_csv(imdb_meta:name:key, _, id)
end
Note, that we already used element from another auxilary module imdb_meta
that defines all necessary metadata to load, parse, and define Title
entity from CSV:
module imdb_meta
module title
def entity_name = :Title
def key = :tconst
def as_is_attr = {
:primaryTitle;
:titleType;
}
def int_attr = {
:startYear; :endYear; :numVotes; :runtimeMinutes;
}
def float_attr = {
:averageRating
}
def attr_alias_map = {
(:tconst, :id);
}
end
module name
def entity_name = :Name
def key = :nconst
def as_is_attr = {
:primaryName;
:primaryProfession;
}
def int_attr = {
:birthYear; :deathYear;
}
def attr_alias_map = {
(:nconst, :id);
}
end
end
There are more elements meta module may define depending on CSV file content, for example, it could also define datetime_attr
and date_attr
.
Let's review what meta module does.
First, we always define entity_name
(usually by capitalizing first letter) and key
(only single value keys are supported currently) like this:
def enity_name = :Title
def key = :tconst
Next, we define fields according to their types. If the field type doesn't change from the one parsed/recognized by load_csv
then it belongs to as_is_attr
:
def as_is_attr = {
:primaryTitle;
:titleType;
}
For integer fields loaded as strings use int_attr
:
def int_attr = {
:startYear; :endYear; :numVotes; :runtimeMinutes;
}
For float (decimals) use float_attr
:
def float_attr = {
:averageRating
}
For parsing date
and datetime
us date_attr and datetime_attr correspondingly (example not applicable to IMDB):
def datetime_attr = {
(:CreationDate, "y-m-dTH:M:S.sss");
(:LastAccessDate, "y-m-dTH:M:S.sss");
}
More types could be supported in the future.
Finally, use attr_alias_map
to rename attributes (if necessary):
def attr_alias_map = {
(:nconst, :id);
}
Finally, we can create data model by mapping CSV file:
with se_csv use parse_attributes
module imdb_data
// Title entity data
def title:id = transpose[create_entity:title_from_id]
def title(attr, e, val) = parse_attributes[title:id, title_csv, imdb_meta:title](attr, e, val)
end
TBD...
Resarch Upstream that results in Product Downstream - no exceptions and identified and planed from the beginning:
- Teams like DS team should be "research"-focused upstrem and "product"-bound downstream. It means that they start with and do research/dev that should always result in identified and defined products or product enhancements.
Back to Shesterkin. He apparently "starred" in the exhibition game where #WarCrimes Putin scored 8 goals against him (the game took place in May 2019 before full scale #UkraineRussiaWar): https://twitter.com/eddie_p_412/status/1523851402103111680?s=20&t=c6OjwKxXmgbTw9SRU3eH1w 3/4