Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: TFValueOf #2004

Closed
linas opened this issue Jan 22, 2019 · 23 comments
Closed

Proposal: TFValueOf #2004

linas opened this issue Jan 22, 2019 · 23 comments

Comments

@linas
Copy link
Member

linas commented Jan 22, 2019

Proposal for a significant change in inner workings of pattern matcher and related subsystems.

Proposal: introduce a special binary true/false value on all atoms.

Why?

  • Using a binary 0/1 value for those algorithms that work naturally with T/F will improve performance, by avoiding the clunkier SimpleTruthValue.
  • Keeping this value distinct from ordinary TV's will liberate ordinary TV's for other purposes (i.e. probabilities, as always intended).

How?

  • Every Atom already has a bitflag in it. Carve out a single bit from it. Thus, sizeof(Atom) is unaffected.
  • Give class:Atom a new method with signature of bool TFValueOf(void) const to access this bit.
  • The AndLink, OrLink, NotLink etc. will call TFValueOf() when needed, to compute a crisp T/F for their outgoing set.
  • Change pattern matcher to access this flag.

OK, that's real easy, so far. Now the hard part. The stuff below is more controversial, and might have un-intended consequences:

Since changing cog-evaluate! will screw up some existing code, propose two new atom types.

Maybe-bonus that maybe I should NOT mention here, but ah what the heck:

  • Allow TruthValueOfLink to work with PlusLink, TimesLink, etc. This will enable arithmetic in the AtomSpace. Since all PLN TV formulas are simple arithmetic, this will allow PLN tv formulas to be stored in the atomspace, This has one short-term benefit, and one long-term benefit:
  1. Short-term: Performance. There is a HUGE overhead to get in and out of scheme/python; its about 50 microseconds, which is crazy for arithmetic. Using TimesLink, etc will give a big performance boost.

  2. Long-term: By placing the PLN formulas in the atomspace itself, the formulas could be reasoned about. Or even learned. For example, MOSES could try to find optimal formulas for PLN to use. Some deep-learning net could do the same. It will be a while before we could actually do this, but at least it now would be possible. Its not possible, as long as the formulas are buried in black-box GroundedPredicateNodes (or wherever PL:N is keeping them).

@linas
Copy link
Member Author

linas commented Jan 22, 2019

Off-topic, but there's another bonus side-effect: If we have a NewGroundedPredicateNode that takes the EvaluationLink as the first argument, then would could introduce a PythonEvaluationLink, as a kind-of EvaluationLink, but it would have a C++ class behind it, that looks like this:

class PythonEvaluationLink : EvaluationLink {
private:
    PyObject* pyDict;
    PyObject* pyUserFunc;
public:
    PythonEvaluationLink() {
           pyDict = PyModule_GetDict(pyModule);
           pyUserFunc = PyDict_GetItem(pyDict, "whatever");
    }

and then the python function could save local state into the pyDict, instead of storing it in ValuePtr. Basically, it would store whatever you wanted to stick into ValuePtr. And also store any other python stuff you might ever want to store.

You'd use it like so:

 (PythonEvaluationLink
      (NewGroundedpredicateNode "py:foobar")
      (ListLink (...args))

In particular, large parts, or most of, or even all of today's class PythonEval would be moved to the new class PythonEvaluationLink and so PythonEval would no longer be this wart on the side, but a first-class real-life atom.

So, then to make it easy for foobar to save state, it would get either the C++ PythonEvaluationLink atom as first argument, or maybe we would also have some python/cython code (in addition to the C++ code):

class PythonEvaluationLink:
    def __init__(self):
          _stuff = 0
    def save_user_state(self, user_stuff):
         _stuff = user_stuff
    def get_user_state(self):
        return _stuff

so that foobar just calls these two methods to set and restore state.

@bgoertzel
Copy link
Contributor

bgoertzel commented Jan 22, 2019 via email

@ngeiswei
Copy link
Member

@linas

Change GroundedPredicateNode to expect a bool return value, instead of a TV. This will hit a lot of code and examples. (It will simplify them, and make them easier to understand!)

I suppose you mean, change GroundedPredicateNode to support bool, alongside TV, right?

@ngeiswei
Copy link
Member

So, if understand correctly you want to use a bit of Atom::_flags to store the boolean return value of a virtual link during pattern matching. But what if 2 pattern matchers are working over the same virtual link? Also why would you need to store it anyway, in most cases you just pass it around?

@ngeiswei
Copy link
Member

Allow TruthValueOfLink to work with PlusLink, TimesLink, etc. This will enable arithmetic in the AtomSpace.

Do you mean that PlusLink, TimesLink, etc, could accept Value, not just NumberNode. So for instance one may write

(TimesLink (TruthValueOf A) (TruthValueOf B))

Yeah, that + short-hands like StrengthOf and ConfidenceOf would make this possible.

Since all PLN TV formulas are simple arithmetic, this will allow PLN tv formulas to be stored in the atomspace.

When confidence isn't taken into account, yes there are simple arithmetic. When confidence is taken into account, it's another story, it ideally requires a form of numerical integration, but sure, with adequate operators, we could (and ultimately should) do that.

@ngeiswei
Copy link
Member

Also, as much as I like the idea of using boolean value for pattern matcher, we already have TruthValue::TRUE_TV() and TruthValue::FALSE_TV(). These are static so passing them around amounts to passing pointers. To take advantage of them we'd just need to add scheme/python/haskell bindings for them, and then have the pattern matcher assume their usage, so that comparing them would just amount to comparing pointers.

But I would also be perfectly happy to have a BoolValue (that would inherit from Value) or a BoolTruthValue (that would inherit from TruthValue) that would just hold true or false.

I'm not against storing some TFValue in Atom::_flags either, though I have to admit I feel a bit weird about it, but the devil, or the divine, is in the details.

@ngeiswei
Copy link
Member

ngeiswei commented Jan 22, 2019

Change cog-evalutate! to return binary 0/1 as hinted in #1996

Currently cog-evaluate! over any most atom type returns its TV. For instance

(cog-evaluate! (Inheritance (stv 0.5 0.5) (Concept "A") (Concept "B")))

returns

(stv 0.5 0.5)

It also looks like (cog-evaluate! x) could be just turned into a shorthand for (cog-tv (cog-execute! x)) but that creates hairy cases.

For instance

(cog-tv (cog-execute! (Inheritance (stv 0.5 0.5) (Concept "A") (Concept "B"))))

is equivalent to

(cog-evaluate! (Inheritance (stv 0.5 0.5) (Concept "A") (Concept "B")))

However

(cog-tv (cog-execute! (Evaluation (GroundedPredicate "my-gpn") (Concept "my-arg"))))

is not equivalent to

(cog-evaluate! (Evaluation (GroundedPredicate "my-gpn") (Concept "my-arg")))

as it returns (stv 1 0) instead of the TV returned by my-gpn.

Maybe that is why you, @linas , suggest to store the resulting TFValue in the EvaluationLink, right?

@noskill
Copy link
Contributor

noskill commented Jan 22, 2019

It seems that there two separate usecases for logical operators in Atomese:

  1. pattern matching on graph e.g. "or" means match such subgraph or such.
  2. Computation of TruthValue, like it is done by URE.

It makes sense to separate this usecases by using different names e.g. PMOrLink which would expect boolean values and OrLink AndLink, IfThenElse etc. links which would be interpreted by URE or other interpreter.

@vsbogd
Copy link
Contributor

vsbogd commented Jan 22, 2019

For me main question of the issue is: PatternMatcher uses boolean logic, but EvaluationLinks (OrLink, AndLink, ...) return TruthValue which are converted into boolean by PatternMatcher code. How can we improve this conversion or eliminate it at all?

@linas, as I see you have suggested adding boolean values into atomspace but why not introduce BoolValue as regular atomspace value and modify PatternMatcher to work with it? Then we can change evaluation links (AndLink, OrLink, ...) implementation to return BoolValue. TruthValue for these links will be calculated by URE using specific logic rules. Or as Anatoly said add PatternMatcher specific PMAndLink, PMOrLink, ... to work with BoolValue.

If PatternMatcher query includes calculation of some atom which returns TruthValue it can be wrapped by EvaluationLink or ExecutionOutputLink which converts TruthValue to BoolValue. So author of the query can specify the conversion logic himself. To adapt already written PatternMatcher queries to new API we can introduce BoolValueOfTVLink which will convert TV to boolean using current PatternMatcher style.

@ngeiswei
Copy link
Member

ngeiswei commented Jan 23, 2019

I think AndLink, OrLink and NotLink could be overloaded to work with BoolValue/TFValue, however we should stop using AndLink to wrap groundable non-virtual clauses and instead use PresentLink, and only put virtual clauses under an AndLink. So for instance

instead of

(Get
  (And
    (Inheritance (Variable "$X") (Variable "$Y"))
    (Inheritance (Variable "$Y") (Variable "$Z"))
    (Equal (Variable "$X") (Variable "$Z"))))

one should write

(Get
  (And
    (Present
      (Inheritance (Variable "$X") (Variable "$Y"))
      (Inheritance (Variable "$Y") (Variable "$Z")))
    (Equal (Variable "$X") (Variable "$Z"))))

PresentLink would have the effect of checking if a given pattern is present in the AtomSpace, return T if it is, F if it isn't. And EqualLink would return a BoolValue/TFValue too.

I know that it adds an extra PresentLink but it makes the semantics of the pattern matcher more transparent and in bonus has the potential to decrease the usage of quotations. For instance today if you want to fetch AndLink from the atomspace, you may write

(Get (LocalQuote (And (Variable "$X") (Variable "$Y"))))

but you can also write

(Get (Present (And (Variable "$X") (Variable "$Y"))))

The beauty is that PresentLink is already implemented and the examples above work. Since PresentLink is underused it probably has a few bugs here and there but they shouldn't be hard to fix. I have intended to convert all my code (URE, pattern miner, etc) to use PresentLink, which I'll do right after I'm done with my current tasks. IMO, it should become the norm rather than the exception, and eventually using AndLink to check the presence of groundable clauses should be phased out.

@vsbogd
Copy link
Contributor

vsbogd commented Jan 23, 2019

@ngeiswei, it is an excellent idea, we have discussed same thing with @noskill yesterday, but I didn't remember about PresentLink. It will make PatternMatcher queries more explicit.

I would add that in case if you need to calculate some predicate on grounded variable during pattern matching then you need some way to convert TruthValue to BoolValue/TFValue like BoolValueOfTVLink I have described above.

@linas
Copy link
Member Author

linas commented Jan 26, 2019

Responding in chronological order. @bgoertzel asks:

So if I understand this one point correctly: if we have a GPN that is naturally thought of as returning some sort of complex TruthValue object, this TruthValue would be attached to the EvaluationLink?

Yes. If two different EvalLinks call the same GPN, then the return value is kept with the EvalLink that called it. And more: the EvalLink can keep track of all of the hidden state needed to make that GPN work efficiently.

@linas
Copy link
Member Author

linas commented Jan 26, 2019

@ngeiswei asks

I suppose you mean, change GroundedPredicateNode to support bool, alongside TV, right?

No, I mean get rid of TV entirely. It is useless and completely unused; when you red through the code, all of it does some version of this:

TruthValuePtr tv = evaluate(...);
if (tv->getMean() > 0.5) {...} else {...}

which is dorky and pointless and a waste of CPU cycles. Instead, I propose this:

bool do_it= evaluate(...);
if (do_it) {...} else {...}

No waste, no fuss, no memory-reference counting, no pointless creation of TV values which are then immediately discarded.

@linas
Copy link
Member Author

linas commented Jan 26, 2019

So, if understand correctly you want to use a bit of Atom::_flags to store the boolean return value of a virtual link during pattern matching. But what if 2 pattern matchers are working over the same virtual link? Also why would you need to store it anyway, in most cases you just pass it around?

I do not recall saying "virtual link" anywhere. I meant ordinary Atom. Perhaps this can be done without storing, but this is not obvious. Having a cached value you can always get, instead of always being force to recompute it is convenient. But perhaps it does not need to be stored. I'd have to think about that more.

@linas
Copy link
Member Author

linas commented Jan 26, 2019

Remaining questions later; not ignoring you just busy.

@ngeiswei
Copy link
Member

I suppose you mean, change GroundedPredicateNode to support bool, alongside TV, right?

No, I mean get rid of TV entirely

Well, as long as one can create grounded function to return TruthValue (or any Value type) I'm OK with that. It's just that in OpenCog/PLN literature a predicate node outputs TV. That isn't to say we can't change it. Or maybe we'd want to create a subtype GroundedBoolPredicateNode.

@linas
Copy link
Member Author

linas commented Jan 26, 2019

@ngeiswei asks:

Do you mean that PlusLink, TimesLink, etc, could accept Value, not just NumberNode. So for instance one may write (TimesLink (TruthValueOf A) (TruthValueOf B))
Yeah, that + short-hands like StrengthOf and ConfidenceOf would make this possible.

You asked three questions, the answers are: No, yes, and yes.

First question: "Do you mean that PlusLink, TimesLink, etc, could accept Value," No. Links are still links, same as they ever were, and can only have atoms in their outgoing set; never values. However, many atoms are evaluatable. Such as EvaluationLink and ValueOfLink. If evaluatable atoms return values that can be interpreted as numbers, then PlusLink, TimesLink, etc, can add and multiply those numbers. '''This already works today''', it was implemented last summer, so that the neural net guys (@Necr0x0Der and crew) could use it for their neural nets. The working demo is here: https://github.com/opencog/atomspace/blob/master/examples/atomspace/stream.scm

Third question/remark: "StrengthOf and ConfidenceOf " Yes. It would take under an hour to implement these two, and everything else will just work automatically, already.

By the way, why does approximately half the code call it "mean", and the other half call it "strength"? Maybe we can pick one, or the other?

Finally, you made a comment:

simple arithmetic.

I mean "normal arithmetic" and not a simplified version of it. (there is a thing called "simple arithmetic" and I don't mean that) By arithmetic, it is meant any expression involving plus, minus, times, divide. Arithmetic excludes the possibility of infinite series, infinite sequences, min and max over infinite sets, limits, etc. so no sine, cosine, lim sup lim inf, integrals or analysis. Basically, arithmetic is of the same strength as first-order logic, while analysis requires second-order logic (i.e. requires the algebra of infinity).

@linas
Copy link
Member Author

linas commented Jan 26, 2019

Regarding this comment:

However
(cog-tv (cog-execute! (Evaluation (GroundedPredicate "my-gpn") (Concept "my-arg"))))
is not equivalent to
(cog-evaluate! (Evaluation (GroundedPredicate "my-gpn") (Concept "my-arg")))

The meta-goal is to fix this, so that it would be equivalent. The only reason that it is not equivalent is, as far as I can tell, is because cog-execute! does not know what to do with a GPN, and ignores it. Are there cases where these should not be equivalent?

@linas
Copy link
Member Author

linas commented Jan 26, 2019

@vsbogd writes:

For me main question of the issue is: PatternMatcher uses boolean logic, but EvaluationLinks (OrLink, AndLink, ...) return TruthValue which are converted into boolean by PatternMatcher code. How can we improve this conversion or eliminate it at all?

Bingo! Yes, exactly!

@linas, as I see you have suggested adding boolean values into atomspace but why not introduce BoolValue as regular atomspace value and modify PatternMatcher to work with it? Then we can change evaluation links (AndLink, OrLink, ...) implementation to return BoolValue. TruthValue for these links will be calculated by URE using specific logic rules. Or as Anatoly said add PatternMatcher specific PMAndLink, PMOrLink, ... to work with BoolValue.

Because this doesn't solve the main issue. Instead, it introduces a bunch of new atom types that are almost the same as some other atom types, but just slightly different. This is bad for multiple reasons:

  • Users will have a hard time understanding the difference between OrLink and PMOrLink. They will make mistakes about which one to use, when.
  • We will have to write complicated documentation explaining the differences. Which the user will have to read. The documentation will contain bugs.
  • We need to write brand-new code for PMOrLink, etc. which causes code-bloat: its very nearly identical to existing code, but different. And likely to contain tiny bugs in it.
  • We will double the the number of required unit tests to test the combinatorial explosion of different ways of combining things. The unit tests will contain bugs.
  • We will have to maintain backwards-compatibility with the current link types until everything is converted. Potentially keep backwards compat for more than a few years.
  • None of this actually improves performance, nor does it enable integration of SAT solvers, constraint solvers, or any other crisp-logic solvers.

Those are the reasons I dislike BoolValue.

@linas
Copy link
Member Author

linas commented Jan 26, 2019

@ngeiswei writes:

we should stop using AndLink to wrap groundable non-virtual clauses and instead use PresentLink,

Yes, we should. However, this is extremely difficult, and requires a major deep dive and restructuring of the pattern matcher. I tried doing this maybe a year ago or two, and after a long week, I failed. I kept the branch, pushed it to github somewhere in my repo, in case I wanted to try again.(here: https://github.com/linas/atomspace/tree/absent-link-rework ) There are multiple existing issues to cover this already. #215 -- #217 -- #1596 -- #1882

linas added a commit to linas/atomspace that referenced this issue Jan 26, 2019
Per discussion in opencog#2004 this should help PLN.
@linas
Copy link
Member Author

linas commented Jan 26, 2019

@ngeiswei

short-hands like StrengthOf and ConfidenceOf

This particular subtask seemed to be so straight-forward, that I simply just coded it up. It's here: pull req #2015 -- please review. I think it is more-or-less "complete". it can probably be extended to distributional TV's as well, or more generic values, but for now, its just for ordinary TV's,

@ngeiswei
Copy link
Member

By the way, why does approximately half the code call it "mean", and the other half call it "strength"? Maybe we can pick one, or the other?

The problem is that in the Simple TV code "strength" may either correspond to a probability or a fuzzy value.

But I would argue that even in the case of a probability the term "mean" is misleading. Does it mean the mean frequency of an occurrence of some positive event? Does it is mean the actually probability estimate of that positive event? Current it is implemented as the former, which is different than the later, as the actual probability estimate requires a prior (typically a Beta distribution with a Jeffrey or Bayesian prior), and the mean as currently implemented is actually the mode of the second order distribution.

Hopefully the Distributional Value will clarify the terminology, first by removing the confusion between fuzzy and probability, and second by presenting the estimates, mean, mode, standard deviation, etc, as obtained from second order distributions.

@linas
Copy link
Member Author

linas commented May 31, 2020

Closing. This was an interesting bu indeterminate conversation. The meta-issue was how to optimize the pattern engine for crsip true-false values, and pull req #2663 takes a step in that direction.

@linas linas closed this as completed May 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants