Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load CSV tables into AtomSpace #2989

Merged
merged 57 commits into from
Aug 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
3c180e8
Start work on a CSV loader.
linas Aug 2, 2022
f50a203
initial scaffolding for csv tables
linas Aug 2, 2022
b23a69e
Copy code from asmoses
linas Aug 2, 2022
0c75df4
Cut down the original code to only the readers
linas Aug 2, 2022
31aa3f7
Merge branch 'master' into load-csv-tables
linas Aug 20, 2022
345de2b
Add Makefile.
linas Aug 20, 2022
ca52b37
Include AtomSpace
linas Aug 20, 2022
7a3e3cf
Convert bool and contin types to Values
linas Aug 20, 2022
c1e7824
Define what string_seq is
linas Aug 20, 2022
1e311f7
std namespace conversion for strings
linas Aug 20, 2022
15a338e
More std namespace and atomese conversions
linas Aug 20, 2022
4c5aac8
More namespace conversions
linas Aug 20, 2022
a8e1705
More conversions
linas Aug 20, 2022
cf8743b
White-space conversion
linas Aug 20, 2022
2556dbd
Ongoing conversion efforts
linas Aug 20, 2022
4709381
More conversions
linas Aug 20, 2022
4064853
Remove cruft
linas Aug 20, 2022
7f22f38
Whitespace rework
linas Aug 20, 2022
277fec8
Convert and simplify table reading
linas Aug 20, 2022
e45aa9f
More cleanup
linas Aug 20, 2022
2524133
Reorder order of teh code
linas Aug 20, 2022
c7b6ca9
Code that compiles.
linas Aug 20, 2022
70b2eaa
Remove unused code
linas Aug 20, 2022
b4789ca
Remove more dead code
linas Aug 20, 2022
41a43a1
Prepare columns that will be filled in.
linas Aug 20, 2022
635a479
Read boolean columns in the table
linas Aug 20, 2022
02bb9bd
Handle the remaining column types
linas Aug 20, 2022
9c16a75
Stub out or remove dead code
linas Aug 20, 2022
601a226
More cleanup
linas Aug 20, 2022
aab0dd5
Start passing column names in
linas Aug 20, 2022
ae3db28
Start placing the values on the anchor
linas Aug 20, 2022
d867b1e
Handle the other kinds of columns
linas Aug 20, 2022
5bc0176
Add the list of keys to a well-known location
linas Aug 20, 2022
20edb09
More header cleanup
linas Aug 20, 2022
36a4ccb
Move stuff from header to c file
linas Aug 20, 2022
e980338
Add documentation
linas Aug 20, 2022
44869c1
Remove un-needed files
linas Aug 20, 2022
829d934
Move documentation around
linas Aug 20, 2022
c15a912
Add a README to explain what is going on
linas Aug 20, 2022
7c8256c
Start work on a unit test for CSV
linas Aug 20, 2022
8171c14
Fix typo in the name
linas Aug 20, 2022
a9c5b23
Bug fix, failed to pass types along
linas Aug 20, 2022
0a0f8a6
nother bug fix
linas Aug 20, 2022
eff2d58
Another bugfix
linas Aug 20, 2022
f9272b8
Expand teh unit test some more
linas Aug 20, 2022
be92867
Add scheme bindings to the table loader
linas Aug 20, 2022
2b491d5
Add the scm side of the csv-table module
linas Aug 20, 2022
b862c41
Bug fix cut-n-paste error
linas Aug 20, 2022
6bd2075
Specify file path correctly
linas Aug 21, 2022
dbc24f0
Mkae the AtoSpace an explicit argument
linas Aug 21, 2022
eb8adb0
Start work on a table demo.
linas Aug 21, 2022
e56a4af
Announce the demo
linas Aug 21, 2022
577aa9b
Must use FloatValueOf not ValueOf
linas Aug 21, 2022
9e61cc0
Update unit test to use the new API.
linas Aug 21, 2022
f6df994
Provide a scoring function example.
linas Aug 21, 2022
1aec920
Add explanation of the demo
linas Aug 21, 2022
54f05f4
List additional modules.
linas Aug 21, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions examples/atomspace/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ first).
* `formulas.scm` -- Representing arithmetic and computing Values.
* `flows.scm` -- Flowing Values around.
* `flow-formulas.scm` -- Dynamically updating value flows.
* `table.scm` -- Fetching Values from a CSV/TSV table.
* `multi-space.scm` -- Using multiple AtomSpaces at once.

After going through the above, go to the demos in the
Expand Down Expand Up @@ -192,6 +193,7 @@ everything else depends on.
```
(use-modules (opencog))
(use-modules (opencog atom-types))
(use-modules (opencog csv-table))
(use-modules (opencog exec))
(use-modules (opencog logger))
(use-modules (opencog matrix))
Expand All @@ -201,9 +203,11 @@ everything else depends on.
(use-modules (opencog persist-rocks))
(use-modules (opencog persist-sql))
(use-modules (opencog python))
(use-modules (opencog randgen))
(use-modules (opencog sheaf))
(use-modules (opencog test-runner))
(use-modules (opencog type-utils))
(use-modules (opencog uuid))
```

There are other modules provided in other projects and repos. Here is
Expand Down
4 changes: 2 additions & 2 deletions examples/atomspace/flows.scm
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@

; Try out some math
(cog-execute! (SetValue bar kee
(Times (ValueOf foo key) (ValueOf foo key))))
(Times (FloatValueOf foo key) (FloatValueOf foo key))))

; Verify
(cog-execute! (ValueOf bar kee))
Expand All @@ -162,6 +162,6 @@
(cog-execute!
(SetValue bar kee
(DefinedSchema "triangle numbers")
(ValueOf foo key)))
(FloatValueOf foo key)))
;
; -------- THE END -----------
22 changes: 22 additions & 0 deletions examples/atomspace/table.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#
# This is a simple demo CSV file.
# It contains a table of data, in comma-separated-value format.
# You can also use tab-separated values.
#
# This table contains a text column header.
# The column labels can be anything.
# If the header is absent, default labels will be generated.
#
b1, b2, b3, flt1, flt2, lbl

# Now for some data. Three columns of binary numbers,
# Two floats, and one column of strings.
0, 0, 1, 3.3, 4.4, "one"
0, 0, 1, 4.4, 5.5, "one"
0, 1, 1, 3.4, 6.5, "three"
1, 0, 1, 2.4, 7.5, "five"

# T and F are maybe better for binary ...
T, F, T, 4, 9, "five"
T, T, F, 5, 11, "six"
T, T, T, 2, 8.9, "seven"
110 changes: 110 additions & 0 deletions examples/atomspace/table.scm
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
;
; table.scm -- Formulas applied to Values from a CSV/TSV table.
;
; This is similar to the `flows.scm` demo, except that the values
; are feteched from a convetional DSV (delimiter-separated-value)
; table. The demo is in two parts. The first part reads the table,
; (a one-liner) and explores how it is represented in the AtomSpace.
; The second part applies some formulas to the table columns.
;
; The second part of the demo is intereasting, because it shows how
; functions, written in Atomese, can be applied to tables, and how
; a "utility function" or a "scoring function" can be written.
; Utility functions are commonly used in machine learning, they
; provide a grand-total score that can be maximized or minized during
; training. The interesting point here is that the scoring function
; is represented in Atomese: it is some tree, some DAG of inputs.
; These trees can be randomly generated and mutated, thus allowing
; genetic-programming algorithms to be implemented in the AtomSpace.
;
; This is. of course, exactly what AS-MOSES does. This is effectively
; a demo of a sub-component of the AS-MOSES subsystem.
;
(use-modules (opencog) (opencog exec))
(use-modules (opencog csv-table))

; Create an Atom on which the table will be located.
(define tab (Concept "My foo Table"))

; Load the table (located in this directory.)
(load-table tab "table.csv")

; Verify that the table loaded. First, take a look at all of the keys:
(cog-keys tab)

; The ordered list of all the columns will be located at the
; "well-known predicate". All tables will have this; it is an
; ordered list of the columns in the table (in the same order
; as the file.)
(define colkeys (Predicate "*-column-keys-*"))
(cog-value tab colkeys)

; Verify that the data for each column is present.
; Loop over the columns, and print the keys and values on them.
(for-each
(lambda (KEY)
(format #t "The key ~A holds data ~A\n" KEY (cog-value tab KEY)))
(cog-value->list (cog-value tab colkeys)))
;
; -------------------------------------------------------------------
; Part two: apply some formulas to the columns.
;
; Note that `cog-value` and `cog-execute! ValueOf` return the same thing:
(cog-value tab (PredicateNode "flt1"))
(cog-execute! (ValueOf tab (PredicateNode "flt1")))

; Take the difference of two columns. Note that `FloatValueOf` is
; used instead of `ValueOf`, so that the type-checking subsystem
; is happy about the types passed to the operator.
(cog-execute!
(Minus
(FloatValueOf tab (PredicateNode "flt2"))
(FloatValueOf tab (PredicateNode "flt1"))))

; The above can be wrapped into a function. Several examples follow,
; below. First, a function that takes the table as an argument,
; subtracts to columns, and places the result in a third column.
; The column names are hard-coded in the function.

(DefineLink
(DefinedSchema "col diffs")
(Lambda
(Variable "$tbl-name")
(SetValue
(Variable "$tbl-name") (Predicate "f2 minus f1")
(Minus
(FloatValueOf (Variable "$tbl-name") (PredicateNode "flt2"))
(FloatValueOf (Variable "$tbl-name") (PredicateNode "flt1"))))))

; Apply the function to the table.
(cog-execute! (Put (DefinedSchema "col diffs") tab))

; Verify that the new column showed up.
(cog-keys tab)

; .. and that it contains the expected data.
(cog-value tab (Predicate "f2 minus f1"))

;--------
; The AccumulateLink can be used to sum up all of the rows in a column.
(cog-execute!
(Accumulate (FloatValueOf tab (Predicate "f2 minus f1"))))

; This can be turned into a simple scoring function. It computes the
; sum-total of the difference of two columns. This is a score, in that
; it is a single number that can be used as a utility function in
; conventional machine-learning algos.
(DefineLink
(DefinedSchema "compute score")
(Lambda
(Variable "$tbl-name")
(Accumulate
(Minus
(FloatValueOf (Variable "$tbl-name") (PredicateNode "flt2"))
(FloatValueOf (Variable "$tbl-name") (PredicateNode "flt1"))))))

; Apply the function to the table.
(cog-execute! (Put (DefinedSchema "compute score") tab))

; That's all, folks.
; -------------------------------------------------------------------
6 changes: 6 additions & 0 deletions opencog/atoms/value/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,4 +94,10 @@ Adding New Atom and Value Types
Please see the
[README-Adding-New-Atom-Types.md](../atom_types/README-Adding-New-Atom-Types.md) file.

See also the [Custom Types Example](../../../examples/type-system/README.md)

TODO
----
* Perhaps add a TypeValue, which would be a vector of Types. If could
be useful as a kind-of table signature (for the csv table handling
code).
1 change: 1 addition & 0 deletions opencog/persist/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
ADD_SUBDIRECTORY (storage)
ADD_SUBDIRECTORY (api)
ADD_SUBDIRECTORY (csv)

IF (HAVE_GEARMAN AND HAVE_GUILE)
ADD_SUBDIRECTORY (gearman)
Expand Down
8 changes: 8 additions & 0 deletions opencog/persist/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@ Local subdirectories include:
for RocksDB and one that allows AtomSpaces to trade
Atoms over the network.)

* csv -- Load Values from CSV/TSV files. These are "delimiter
separated values" -- ordinary tables. Each column in the
table is loaded into an appropriate Value (`FloatValue`,
`BoolValue` or `StringValue`). The values are placed
under keys (named after the column) on the provided Atom.
This is intended for the ASMOSES subsystem, which
naturally operates on tables or streams of data.

* file -- Read and write files containing Atomese s-expressions.
Provides both a `FileStorageNode`, and also some utilities
to read files, and dump Atomspace contents to files or
Expand Down
42 changes: 42 additions & 0 deletions opencog/persist/csv/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

# Generic JSON decoding.
ADD_LIBRARY (csv
table_read.cc
)

ADD_DEPENDENCIES(csv opencog_atom_types)

TARGET_LINK_LIBRARIES(csv
atomspace
atombase
${COGUTIL_LIBRARY}
)

INSTALL (TARGETS csv EXPORT AtomSpaceTargets
DESTINATION "lib${LIB_DIR_SUFFIX}/opencog"
)

INSTALL (FILES
table_read.h
DESTINATION "include/opencog/persist/csv"
)

# -------------------------------

ADD_LIBRARY (csv-table
TableSCM.cc
)

TARGET_LINK_LIBRARIES(csv-table
csv
atomspace
smob
)

ADD_GUILE_EXTENSION(SCM_CONFIG csv-table "opencog-ext-path-csv-table")

INSTALL (TARGETS csv-table EXPORT AtomSpaceTargets
DESTINATION "lib${LIB_DIR_SUFFIX}/opencog"
)

# -------------------------------
50 changes: 50 additions & 0 deletions opencog/persist/csv/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@

Load Ordinary CSV Tables
========================
The code here is able to load "delimiter-separated values" (DSV,
or CSV, TSV for comma and tab separators) from a file. This are
just very conventional tables.

Each column from a DSV file is read in and placed into an Atomese
Values on an indicated Atom. Atomese Values are vectors (of floats,
bools, strings). Each Value holds one column from the dataset.

Basically, this just gets CSV data into the AtomSpace, where it
becomes easy for Atomese programs to act on them, i.e. to use them
as input for some kind of data stream processing.

The features (columns) specified in ignore_features will be omitted
from the representation.

Example
-------
For example, a CSV dataset like this:
```
o, i1, i2, i3, i4
1, 0, 0, 3.3, "foo"
0, 1, 0, 4.4, "bar"
```
will be loaded as key-value pairs on the `anchor` Atom.

The column names will be loaded under a "well known key":
```
(Predicate "*-column-keys-*")
```
This key will point at a value holding a list of all of the
column-keys in the table:
```
(LinkValue
(Predicate "o")
(Predicate "i1")
(Predicate "i2")
(Predicate "i3")
(Predicate "i4"))
```
Then, under each key, there will a column of values:
```
(Predicate "o") (BoolValue 1 0)
(Predicate "i1") (BoolValue 0 1)
(Predicate "i2") (BoolValue 0 0)
(Predicate "i3") (FloatValue 3.3 4.4)
(Predicate "i4") (StringValue "foo" "bar")
```
88 changes: 88 additions & 0 deletions opencog/persist/csv/TableSCM.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
/*
* opencog/persist/csv/TableSCM.cc
*
* Copyright (c) 2008 by OpenCog Foundation
* Copyright (c) 2008, 2009, 2013, 2015, 2022 Linas Vepstas <[email protected]>
* All Rights Reserved
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License v3 as
* published by the Free Software Foundation and including the exceptions
* at http://opencog.org/wiki/Licenses
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with this program; if not, write to:
* Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/

#ifndef _OPENCOG_CSV_TABLE_SCM_H
#define _OPENCOG_CSV_TABLE_SCM_H

#include <opencog/guile/SchemeModule.h>

namespace opencog
{
/** \addtogroup grp_persist
* @{
*/

class TableSCM : public ModuleWrap
{
private:
void init(void);

void load_table(const Handle&, const std::string&);
public:
TableSCM(void);
}; // class

/** @}*/
} // namespace

extern "C" {
void opencog_csv_table_init(void);
};

#endif // _OPENCOG_CSV_TABLE_SCM_H

#include <opencog/atomspace/AtomSpace.h>
#include <opencog/guile/SchemePrimitive.h>

#include "table_read.h"

using namespace opencog;

TableSCM::TableSCM(void)
: ModuleWrap("opencog csv-table")
{
static bool is_init = false;
if (is_init) return;
is_init = true;
module_init();
}

// Temporary(?) Hacky experimental API. Subject to change.
void TableSCM::init(void)
{
define_scheme_primitive("load-table",
&TableSCM::load_table, this, "csv-table");
}

// =====================================================================

void TableSCM::load_table(const Handle& h, const std::string& path)
{
const AtomSpacePtr& as = SchemeSmob::ss_get_env_as("load-table");
opencog::load_csv_table(as, h, path);
}

void opencog_csv_table_init(void)
{
static TableSCM patty;
}
Loading