Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize and complete Value #2013

Closed
ngeiswei opened this issue Jan 24, 2019 · 18 comments
Closed

Generalize and complete Value #2013

ngeiswei opened this issue Jan 24, 2019 · 18 comments

Comments

@ngeiswei
Copy link
Member

ngeiswei commented Jan 24, 2019

I suggest a few changes to generalize and complete Value (also known before as Proto Atom).

  1. Implement MapValue, to represent any mapping Value -> Value, with MapValue::get_value, etc, to return the associate value of a given value.
  2. Move Atom::compute_hash() and Atom::get_hash() to Value, but not Atom::_content_hash in order to have Value as light weight as possible. Of course Value::get_hash would directly call compute_hash while Atom::get_hash would first check if Atom::_content_hash is defined. Since values tends to be ephemeral, not caching their hash seems sensible.
  3. Replace Atom::_values by Atom::_value. Where the C++ type of _value would be mutable ValuePtr instead of mutable std::map<const Handle, ValuePtr>. Atom::getValue can still be kept for backward compatibility but would only work when _value is a MapValue by calling MapValue::get_value, and raise an exception otherwise.
  4. Implement more Value types, IntValue, BoolValue, etc. (letting aside the sequence or not debate).

I know that adding more Value types requires to update the persistent storage code, but that sort of elementary types and constructs are few and once it is done, it is done. And this offers a massive freedom to the user. Even the freedom to only attach a boolean and nothing else to an atom (just use a BoolValue for Atom::_value), or a probability (just use a FloatValue for Atom::_value). This is as lightweight and flexible as it gets.

@vsbogd
Copy link
Contributor

vsbogd commented Jan 24, 2019

Nil, how do you see the Atom::setValue(key, value) implementation? Should we (1) have two different signatures: setValue(key, value) and setValue(value) or (2) setValue(key, value) should check if it is only set and then keep it as single value and replace it by ValueMap when second value is added?

@vsbogd
Copy link
Contributor

vsbogd commented Jan 24, 2019

We could make a storage API more extendible if allow adding specific storage procedure for each Value (or Atom) into ValueFactory (ClassServer). It is similar to way of storing factory methods which is used now.

@ngeiswei
Copy link
Member Author

@vsbogd , I wonder about (3), which would be a mixture of option (1) and option (2)?

@ngeiswei
Copy link
Member Author

ngeiswei commented Jan 24, 2019

Regarding the RAM footprint of my proposal:

  1. Lower if the atom doesn't hold any value, as an empty ValuePtr is 8 bytes I believe, while an empty std::map is at least 16 bytes.
  2. Lower if the atom holds only one value, clearly justified by the lack of std::map overhead.
  3. Higher if the atom holds multiple values, +12 bytes I believe, 8 for the ValuePtr and 4 for Value::_type of MapValue.

So whether it is better or worth depends on the usage. It seems to me the majority of atoms in an atomspace have no value so it would be a pro in that case. But of course it ultimately depends on the usage.

@vsbogd
Copy link
Contributor

vsbogd commented Jan 24, 2019

@ngeiswei, excuse me, I didn't get your answer here #2013 (comment) May be my question #2013 (comment) was confusing, let me rephrase it. If Atom::_values is replaced by ValuePtr then how Atom::setValue(key, value) will work? And how user will be able to set single value instead of ValueMap?

@ngeiswei
Copy link
Member Author

@vsbogd , my tentative suggestion was to overload setValue(value) as you suggested in (1), which should in fact be the primary way to set Atom::_value, and change the behavior of setValue(key, value) so that one doesn't need to create a MapValue if there is only one value, as you suggested in (2). So it is a mixture of (1) and (2). I don't know if that's the best but it seems reasonable enough to me.

@vsbogd
Copy link
Contributor

vsbogd commented Jan 24, 2019

Ah, ok, I've got it. The only disadvantage of having both setValue(value) and setValue(key, value) in API I see is that it is not obvious that they overrides results of each other; i.e. setValue(value)deletes the map andsetValue(key, value)` deletes the value. But unfortunately I don't know what can be suggested instead.

What is the main motivation of this issue? Is it adding MapValue and reusing it to implement value storage in Atom? Or is it decreasing amount of memory which Atom consumes?

@ngeiswei
Copy link
Member Author

My primary motivation is to add MapValue, then I used the memory aspect to try to sell it to @linas :-)

Of course we don't need to replace Atom::_values by Atom::_value to justify adding MapValue but I thought it was a nice side effect.

@ngeiswei
Copy link
Member Author

@vsbogd, I'm realizing a problem with (2) is that there is no way to know whether the single value is being updated or if a new value is being inserted, unless the key is somehow stored somewhere, which requires an extra member. So, unless I misunderstood (2) I would now bend towards (1).

@ngeiswei
Copy link
Member Author

ngeiswei commented Jan 24, 2019

setValue(value) deletes the map and setValue(key, value) deletes the value

Yes, I suppose instead setValue(key, value) should raise an exception if _value is not a MapValue (and not an null ValuePtr).

@linas
Copy link
Member

linas commented Jan 26, 2019

OK, so .. regarding 1+3 Yes, during the original value design effort, I very explicitly thought a very long time about having a MapValue in the original design, and I explicitly rejected that, because it seemed to be asking for trouble in a bunch of different ways. For example:

  • User A creates a MapValue, sticks it on an Atom, puts a bunch of stuff there. User B doesn't bother to check first, and just inserts a TV, completley erasing the MapValue. Ouch.

  • So you can say "OK, user B, you are a bad person! You should always check to see if there is a MapValue there first! If it is not there then you should create it!" Triple Ouch. First, you call User B an idiot (you've seen what happens when I do that, right?) and then you demand that user B write some complicated code that needlessly wastes CPU cycles. This adds complexity, increases the chance of a bug, and it also wastes CPU time.

  • Where are TV and AV stored? In the current design, they have unique, well-known keys, and can always be found. In the 1+3 design, its no longer clear how to find them, where to store them.

  • If you have to always create a MapValue first, before adding a TV, then why not just ... hard-code it so that it is there by default?

  • User C, who is one of those very enthusiastic and creative kind of people who loves inventing new things, decides to invent a new key-to-a-key-to-a-key-to-a-value system, using MapValue, creating some compilcated, impossible-to-debug network that is not maintainable. I have known plenty enough enthusiastic and creative people: one of them "invented a new musical instrument" with my guitar. Unfortunately, my guitar ceased to exist in the process.

  • Encoding MapValue in the storage backend requires a bunch of new work, cause its a whack format. It doesn't work like the rest.

  • Everything you want to do with MapValue, you can already do with LinkValue. Mostly, we need to improve the LinkValue demo to show how this is done.

For all of the above reasons, which I really did think about long and hard during many walks morning and evening in many different cities, I decided MapValue was a bad idea, and to instead hard-code one single map into the atom itself.

@linas
Copy link
Member

linas commented Jan 26, 2019

The users who really want a hierarchical systems can use LinkValue to implement it. By "hierarchical" I mean the industry-standard named-path concept: /key-a/key-b/key-c/my-value -- the key concept of unix file systems and of URL's are really just concepts of hierarchical directories, and the LinkValue was invented to solve that. That's why LinkValue exists: so that MapValue would not have to.

Also, to be clear: take a look at how Plan 9 treats files and directories. If you haven't looked at Plan 9, its work a look. Roughly speaking, it allows any part of the file system to behave like the linux /proc and the linux /sys file systems. I've been using Plan 9 as a kind-of inspiration for how StreamValue might work. StreamValue is effectively a virtual "file", instead of a real one.

So - again -- everything you want to do with MapValue can be done with LinkValue, and some elbow grease. That is why LinkValue was invented. (And why LinkValue is a sequential list, and not an unordered set or other structure. I thought this stuff out. It's not random code.)

@linas
Copy link
Member

linas commented Jan 26, 2019

To be super-clear about this: instead of a MapValue, and all the headache that causes, instead, just write some C++ code that parses slash-separated path-names. So, for example:

Atom::SetDirValue(char* path, ValuePtr v)
{
   ValuePtr dir = v;
   char * p = strrchr(path, '/');  // find last separator
   while (p) {
      dir = createLinkValue( createStringValue(p), dir);
      *p = 0;
      p--;
      p = strrchr(p, '/');
   }
   static Handle mainkey = createNode(PREDICATE_NODE, "*-key-for-directory-*");
   setValue(mainkey, dir);
}

Right? This uses plain-old ordinary LinkValues to implement what looks like a Unix file directory path. You could even make it parse things like key://some/path/to/my/value and you can do that without having to invent a new MapValue.

@linas
Copy link
Member

linas commented Jan 26, 2019

Re: 4) IntValue, BoolValue -- I thought long and hard about that too, way back when I was designing the whole thing. I realized that sooner or later, users would want these. I also realized that

  • Most users who say they want these probably have terrible reasons for wanting them.
  • The users who want that have an existing work-around: FloatValue really can store ints and bools, if you try hard enough.
  • If the users still cannot figure it out, we can code them a C++ wrapper that converts ints and bools to floats, under the covers, and the user will never know.
  • But mostly try to talk them out of it, because they really almost surely don't need it.
  • If some day, someone who is really sharp, smart, capable shows up, and they are willing to write all of the code needed to support InvValue, including the unit tests, and the needed additions to the SQL backend, the scheme and the python bindings, then let them do that.
  • But if someone that smart shows up, they will probably realize that there is not much use to IntValue, anyway. So they probably won't do the work.

Yes, I really thought about these things a lot. So my plan was to resist code-bloat for as long as possible.

I'm very very serious about code-bloat. When you create code that no one actually needs:

  • you now have a lot of code. Its probably buggy.
  • you have to write unit tests for it.
  • if there is a design flaw, and you want to fix that flaw, you have to find a backwards-compat solution.. But if there are no users of that code, its a waste of time to try to be backwards compat.
  • If only one person used the code once, three years ago, you have Zombie code.
  • So now you are maintaining zombie-code, half-alive, that is old and rotten and has flesh falling off of it.
  • newcomers like anatoly or vitaly see this zombie code and get the wrong idea. Either they don't realize its a zombie that should be killed, or they do realize its a zombie and say "I'm leaving, this place stinks".

So I really want to resist adding new features until they are really really needed.

@ngeiswei
Copy link
Member Author

The good thing about MapValue is that it offers logarithmic retrieval. Yes it can be implemented on top of LinkValue, and FloatValue can be used to store other types than doubles, but having dedicated value types makes it a lot easier to use. As of right now if one wants to store an int in FloatValue one has to write extra casting code and stay consistent with how the value is being treated. I cannot comment on the SQL backend because I've never touched that code, it just seems to me we'll want that eventually. I do agree about postponing its implementation till it's really needed though.

@noskill
Copy link
Contributor

noskill commented Mar 6, 2019

We can store all the data in some atomspace owned Map[Atom] -> Map[Value] -> value. Then atom without values will require 0 bytes instead of 8 or 16

@linas
Copy link
Member

linas commented Mar 6, 2019

I thought I gave some clear and cogent arguments about why this should not be done, and how existing alternatives are adequate. I'd like to close this issue for further discussion. If there is an actual technical problem that needs an actual technical solution, a new, different issue should be opened.

Regarding @noskill comment about where values are stored: yes, the implementation of issue #1967 would alter where, and how values are stored.

As to sizes of atoms: there is always a space-time tradeoff. I would like to have access to values be "as fast as possible", and if this means fatter atoms, so what. Currently, atoms average out to 1.5KBytes in size, almost all of that being the incoming set.16 extra bytes is a "drop in the bucket".

Can I close this issue?

@ngeiswei
Copy link
Member Author

ngeiswei commented Mar 7, 2019

I'm neutral on whether this issue can be closed. The best way to tell what's best would be to try but that takes a lot of work, and it's certainly not an urgent matter.

@linas linas closed this as completed May 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants