Compostionality of Atomese processing stages (was "StopUsing SetLinks!) #2911

linas · 2021-12-04T21:36:51Z

Issue #1502 "Stop using SetLink for search results!" highlights a problem with the compositionality of the search and processing primitives in Atomese. Search results were (still are) delivered wrapped in a SetLink. This presents challenges for the cleanup of search results, and various other core issues (See issues described on the SetLink wiki.)

Basically, it is difficult to create long processing pipelines, such as (cog-evaluate! (SequentialAnd ... (Put .... (State ... (Get ... (SequentialOr ... (Delete ... (Bind ... with the top-level SequentialAnd written to be tail recursive. Such long pipelines have been implemented (in the robot code, circa 2017) and they do work (see for example the tail-recusion demo and older copies of the https://github.com/opencog/ros-behavior-scripting repo) However, there are problems:

There is difficulty in deleting the wrapper SetLinks when they are no longer needed.
Writing long chained pipelines seems more difficult than it should be.
Awkward internal implementation of how SetLinks are handled when they are passed to PutLink and other assorted functions (e.g. PlusLink when given a set of numbers to add.)

A replacement solution is needed. Desirable properties:

Ability to get search results incrementally, as they come in, rather than getting one big blob at the end.
Ability to run one query in parallel mode.
Ability to handle processing pipelines on steaming data, e.g. from the LinkStream Value.

So basically, there are two issues being explored here, in tandem:

How to make the query subsystem stream? (Well, it already does, but how should it be used effectively?)
How to build a general streaming subsystem?

A wholly unexplored idea:

Monads -- The PutLink and other misc links currently accept either individual atoms, or sets of atoms, as input. In the case of sets, the contained atoms are automatically unwrapped and processed. This suggests that the current use of Setlink is as a kind of sloppy monad and thus what we really need is a cleaned-up monad to hold multiple-value results. The precise way to do this is unclear

The best idea seems to be:

Futures/Promises subsystem Compositionality can be achieved by creating a subsystem that supports futures and steams. Some partial work in this direction has been already taken, with the FormulaStream and FutureStream and also with QueueValue. Currently, there is no way to concatenate QueueValues.
Generic Publish-Subscribe system Streams/futures are one-to-one: one producer, one consumer. It seems like multiplexers should be provided as well, allowing multiple consumers/producers. The QueueValue already multiplexes, in a way. This should be provided as an add-on to the above. (Some old, obslete ideas were discussed in issue 1750: Design a Publish/Subscribe System (aka Futures) #1750.)

The following progress has been made:

QueryLink returning QueueValue -- implemented, works, documented, unit tested. See All search functions now use QueueValue for results. #2571 How do do compositionality with this is unexplored.

The cogutils provides five thread-safe tools for building these things:

concurrent_queue.h-- thread-safe FIFO
concurrent_set.h -- thread-safe version of std::set. Note it provides deduplication (just like std::set does.)
concurrent_stack.h -- thread-safe LIFO
async_method_caller.h -- asynchronous method caller. Manages a collection of threads that call some method on some data, at a later time, in some other thread than the current thread. Useful if the method is slow or might block. Avoids overflow; guarantees forward progress. Data is placed on a queue (FIFO) and is processed in order of arrival.
async_buffer.h -- Same as above, except data is placed in a set. This does provide deduplication, if the same request is made multiple ties. It loses the ability to guarantee that really old data eventually gets handled. Data is processed in the same order as what std::set provides, i.e. in std::less_than order.

Note that QueueValue is built on top of concurrent_queue.h

Implementing compositionality requires finishing work on the $vau-ization (fexpr-ization) of Atomese functions. See $vau (aka fexper) on Wikipedia. Many or most functions now work like this; the grand exceptions include the PutLink and most of the TruthValue subsystem.

Possible building block for monads or related:

AtomSpaceNode -- This suggests treating the AtomSpace as a kind of mutable Link. Thus, the query would return an AtomSpace, layered on the main space, holding the search results. Partly implemented: See AtomSpaces are now a kind-of Atom #2865 The AtomSpacePtr is now a kind of ValuePtr. No one is using this for anything, just yet. There is no way to automatically project/collapse contents of derived atomspaces back into the main atomspace. (You'd have to do it by hand.) This could generalize: Atomspaces could be layered arbitrarily deep; passed around, hold temporary results that disappear when all references to the AtomSpace disappear. AtomSpaces could be stored in ordinary Links.
QueueValue -- This is currently used to hold search results from QueryLink and MeetLink. The whole value flow subsystem is envisioned as being able to handle flows of ... values ... and not flows of atoms.

To explore these issues, and possible solutions, some demos are suggested. So far, we have the following demos and related issues:

dot product -- Demo of how to compute the dot-product of two vectors, where the two vectors are sets of atoms returned by a searrch query. See dot-product.scm
recursive query -- Demo of how a recursive query can be written. See recursive.scm
Flexible transient execution #2752 -- Transient Atoms -- this describes another data flow issue.
SatisfactionLink does not leave behind *-PatternGroundingKey-* #2215 -- SatisfactionLink used to leave behind a *-PatternGroundingKey-* that told you how it was satisfied. But this crumbled away.

The text was updated successfully, but these errors were encountered:

linas · 2022-12-16T00:49:59Z

At this time, building on top of QueueValue seems the most sensible thing to do. This could be formalized by providing an API within QueuueValue itself, instead of inheriting from what's in cogutils.

Linking results to the AnchorNode idea now seems like a bad idea, in retrospect. The ideas in opencog#2911 seem superior. So trash the AnchorNode support in the query subsystem. FWIW, if this kind of thing was wanted, a better solution would be a new kind of Value, that dequeued from a QueueValue, and plopped the results onto an AnchorNode. ***This*** is the real reason for stripping away this code: its not generic enough. This reverts commit 46dea8e, reversing changes made to 5ec84bb.

Linking results to the AnchorNode idea now seems like a bad idea, in retrospect. The ideas in #2911 seem superior. So trash the AnchorNode support in the query subsystem. FWIW, if this kind of thing was wanted, a better solution would be a new kind of Value, that dequeued from a QueueValue, and plopped the results onto an AnchorNode. ***This*** is the real reason for stripping away this code: its not generic enough. Merge branch 'revert-anchor'

linas · 2022-12-16T03:49:49Z

The following attempt was made:

Anchor proposal -- This suggests that an AnchorLink can be specified in the query, and results are chained on with MemberLink, as they show up. This has been implemented and documented and unit-tested. See Post search results to an AnchorLink #2500 No one uses it. It was reverted earlier today, in eee7a61

The reason that this was reverted is because it didn't seem to be needed, and there seems to be a more generic solution: if one really needs stuff stuck to an AnchoreNode, then create a new kind of Value, that dequeues from the QueueValue, as results come in, and sticks them onto an AnchoreNode. This could be done for any kind of data stream, and not just for queries.

There has been some minor exploration of how compositionality works; it is demoed in the (still existing) example query.scm This example uses AnchorNodes, but without needing pull req #2500 to do it. It "works". It's even multi-threaded, so its "naturally" parallel. Is it clunky? I dunno. It posts results to the AtomSpace, ... but why? was this really needed?

linas mentioned this issue Dec 4, 2021

Stop using SetLink for search results! #1502

Closed

linas pinned this issue Dec 4, 2021

linas added enhancement execution labels Dec 4, 2021

linas mentioned this issue Dec 4, 2021

execute() should return a set not a handle #2530

Closed

linas added the Architecture label Dec 4, 2021

linas mentioned this issue Dec 15, 2022

Design a Publish/Subscribe System (aka Futures) #1750

Closed

linas mentioned this issue Dec 16, 2022

Modernize FilterLink #3020

Merged

linas mentioned this issue Dec 16, 2022

Provide an IncomingSetOfLink #3021

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compostionality of Atomese processing stages (was "StopUsing SetLinks!) #2911

Compostionality of Atomese processing stages (was "StopUsing SetLinks!) #2911

linas commented Dec 4, 2021 •

edited

Loading

linas commented Dec 16, 2022

linas commented Dec 16, 2022 •

edited

Loading

Compostionality of Atomese processing stages (was "StopUsing SetLinks!) #2911

Compostionality of Atomese processing stages (was "StopUsing SetLinks!) #2911

Comments

linas commented Dec 4, 2021 • edited Loading

linas commented Dec 16, 2022

linas commented Dec 16, 2022 • edited Loading

linas commented Dec 4, 2021 •

edited

Loading

linas commented Dec 16, 2022 •

edited

Loading