Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compostionality of Atomese processing stages (was "StopUsing SetLinks!) #2911

Open
linas opened this issue Dec 4, 2021 · 2 comments
Open

Comments

@linas
Copy link
Member

linas commented Dec 4, 2021

Issue #1502 "Stop using SetLink for search results!" highlights a problem with the compositionality of the search and processing primitives in Atomese. Search results were (still are) delivered wrapped in a SetLink. This presents challenges for the cleanup of search results, and various other core issues (See issues described on the SetLink wiki.)

Basically, it is difficult to create long processing pipelines, such as (cog-evaluate! (SequentialAnd ... (Put .... (State ... (Get ... (SequentialOr ... (Delete ... (Bind ... with the top-level SequentialAnd written to be tail recursive. Such long pipelines have been implemented (in the robot code, circa 2017) and they do work (see for example the tail-recusion demo and older copies of the https://github.com/opencog/ros-behavior-scripting repo) However, there are problems:

  • There is difficulty in deleting the wrapper SetLinks when they are no longer needed.
  • Writing long chained pipelines seems more difficult than it should be.
  • Awkward internal implementation of how SetLinks are handled when they are passed to PutLink and other assorted functions (e.g. PlusLink when given a set of numbers to add.)

A replacement solution is needed. Desirable properties:

  • Ability to get search results incrementally, as they come in, rather than getting one big blob at the end.
  • Ability to run one query in parallel mode.
  • Ability to handle processing pipelines on steaming data, e.g. from the LinkStream Value.

So basically, there are two issues being explored here, in tandem:

  • How to make the query subsystem stream? (Well, it already does, but how should it be used effectively?)
  • How to build a general streaming subsystem?

A wholly unexplored idea:

  • Monads -- The PutLink and other misc links currently accept either individual atoms, or sets of atoms, as input. In the case of sets, the contained atoms are automatically unwrapped and processed. This suggests that the current use of Setlink is as a kind of sloppy monad and thus what we really need is a cleaned-up monad to hold multiple-value results. The precise way to do this is unclear

The best idea seems to be:

  • Futures/Promises subsystem Compositionality can be achieved by creating a subsystem that supports futures and steams. Some partial work in this direction has been already taken, with the FormulaStream and FutureStream and also with QueueValue. Currently, there is no way to concatenate QueueValues.
  • Generic Publish-Subscribe system Streams/futures are one-to-one: one producer, one consumer. It seems like multiplexers should be provided as well, allowing multiple consumers/producers. The QueueValue already multiplexes, in a way. This should be provided as an add-on to the above. (Some old, obslete ideas were discussed in issue 1750: Design a Publish/Subscribe System (aka Futures) #1750.)

The following progress has been made:

The cogutils provides five thread-safe tools for building these things:

  • concurrent_queue.h-- thread-safe FIFO
  • concurrent_set.h -- thread-safe version of std::set. Note it provides deduplication (just like std::set does.)
  • concurrent_stack.h -- thread-safe LIFO
  • async_method_caller.h -- asynchronous method caller. Manages a collection of threads that call some method on some data, at a later time, in some other thread than the current thread. Useful if the method is slow or might block. Avoids overflow; guarantees forward progress. Data is placed on a queue (FIFO) and is processed in order of arrival.
  • async_buffer.h -- Same as above, except data is placed in a set. This does provide deduplication, if the same request is made multiple ties. It loses the ability to guarantee that really old data eventually gets handled. Data is processed in the same order as what std::set provides, i.e. in std::less_than order.

Note that QueueValue is built on top of concurrent_queue.h

Implementing compositionality requires finishing work on the $vau-ization (fexpr-ization) of Atomese functions. See $vau (aka fexper) on Wikipedia. Many or most functions now work like this; the grand exceptions include the PutLink and most of the TruthValue subsystem.

Possible building block for monads or related:

  • AtomSpaceNode -- This suggests treating the AtomSpace as a kind of mutable Link. Thus, the query would return an AtomSpace, layered on the main space, holding the search results. Partly implemented: See AtomSpaces are now a kind-of Atom #2865 The AtomSpacePtr is now a kind of ValuePtr. No one is using this for anything, just yet. There is no way to automatically project/collapse contents of derived atomspaces back into the main atomspace. (You'd have to do it by hand.) This could generalize: Atomspaces could be layered arbitrarily deep; passed around, hold temporary results that disappear when all references to the AtomSpace disappear. AtomSpaces could be stored in ordinary Links.
  • QueueValue -- This is currently used to hold search results from QueryLink and MeetLink. The whole value flow subsystem is envisioned as being able to handle flows of ... values ... and not flows of atoms.

To explore these issues, and possible solutions, some demos are suggested. So far, we have the following demos and related issues:

@linas
Copy link
Member Author

linas commented Dec 16, 2022

At this time, building on top of QueueValue seems the most sensible thing to do. This could be formalized by providing an API within QueuueValue itself, instead of inheriting from what's in cogutils.

linas added a commit to linas/atomspace that referenced this issue Dec 16, 2022
Linking results to the AnchorNode idea now seems like a bad idea,
in retrospect.  The ideas in opencog#2911 seem superior. So trash the
AnchorNode support in the query subsystem.

FWIW, if this kind of thing was wanted, a better solution would
be a new kind of Value, that dequeued from a QueueValue, and
plopped the results onto an AnchorNode. ***This*** is the real
reason for stripping away this code: its not generic enough.

This reverts commit 46dea8e, reversing
changes made to 5ec84bb.
linas added a commit that referenced this issue Dec 16, 2022
Linking results to the AnchorNode idea now seems like a bad idea,
in retrospect.  The ideas in #2911 seem superior. So trash the
AnchorNode support in the query subsystem.

FWIW, if this kind of thing was wanted, a better solution would
be a new kind of Value, that dequeued from a QueueValue, and
plopped the results onto an AnchorNode. ***This*** is the real
reason for stripping away this code: its not generic enough.

Merge branch 'revert-anchor'
@linas
Copy link
Member Author

linas commented Dec 16, 2022

The following attempt was made:

  • Anchor proposal -- This suggests that an AnchorLink can be specified in the query, and results are chained on with MemberLink, as they show up. This has been implemented and documented and unit-tested. See Post search results to an AnchorLink #2500 No one uses it. It was reverted earlier today, in eee7a61

The reason that this was reverted is because it didn't seem to be needed, and there seems to be a more generic solution: if one really needs stuff stuck to an AnchoreNode, then create a new kind of Value, that dequeues from the QueueValue, as results come in, and sticks them onto an AnchoreNode. This could be done for any kind of data stream, and not just for queries.

There has been some minor exploration of how compositionality works; it is demoed in the (still existing) example query.scm This example uses AnchorNodes, but without needing pull req #2500 to do it. It "works". It's even multi-threaded, so its "naturally" parallel. Is it clunky? I dunno. It posts results to the AtomSpace, ... but why? was this really needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant