Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making vocabulary subsets reinterpret automatically #183

Closed
Seanny123 opened this issue Jun 13, 2018 · 15 comments
Closed

Making vocabulary subsets reinterpret automatically #183

Seanny123 opened this issue Jun 13, 2018 · 15 comments
Milestone

Comments

@Seanny123
Copy link
Collaborator

If I have one vocab which I:

  • split into two sub-vocabs
  • assign one module to each sub-vocab
  • combine each sub-vocab module into a module with the original vocab

Ideally, I shouldn't have to do spa.reinterpret for each module. Here's an example of what I mean in code:

import nengo_spa as spa
import numpy as np

ss = spa.sym
D = 64  # the dimensionality of the vectors
rng = np.random.RandomState(0)

num_keys = {'ONE', 'TWO', 'THREE', 'FOUR'}
color_keys = {'BLACK', 'RED', 'WHITE'}

all_vocab = num_keys | color_keys | {'NUM', 'COLOR'}

top_vocab = spa.Vocabulary(D, rng=rng, name="top")
top_vocab.populate(';'.join(all_vocab))

num_vocab = top_vocab.create_subset(num_keys)
color_vocab = top_vocab.create_subset(color_keys)

with spa.Network() as model:
    num_vision = spa.State(num_vocab)
    color_vision = spa.State(color_vocab)
    combined = spa.State(top_vocab)

    num_vision * ss.NUM + color_vision * ss.COLOR >> combined

Here's how I need to modify the code to make it work:

import nengo_spa as spa
import numpy as np

ss = spa.sym
D = 64  # the dimensionality of the vectors
rng = np.random.RandomState(0)

num_keys = {'ONE', 'TWO', 'THREE', 'FOUR'}
color_keys = {'BLACK', 'RED', 'WHITE'}

all_vocab = num_keys | color_keys | {'NUM', 'COLOR'}

top_vocab = spa.Vocabulary(D, rng=rng, name="top")
top_vocab.populate(';'.join(all_vocab))

num_vocab = top_vocab.create_subset(num_keys)
num_vocab.name = "num"
color_vocab = top_vocab.create_subset(color_keys)
color_vocab.name = "color"

with spa.Network() as model:
    num_vision = spa.State(num_vocab)
    color_vision = spa.State(color_vocab)
    combined = spa.State(top_vocab)

    (spa.reinterpret(num_vision) * top_vocab['NUM'] +
     spa.reinterpret(color_vision) * top_vocab['COLOR']) >> combined

@tcstewar said this should happen automatically, but I might have just described it incorrectly to him.

@jgosmann
Copy link
Collaborator

ss.NUM will use num_vision's vocab which does not contain NUM (same for color) because the type inference is done from the other operand in a binary operation. Thus, you either need to use spa.reinterpret(num_vision) * top_vocab['NUM'] or spa.reinterpret(num_vision, top_vocab) * ss.NUM or spa.reinterpret(num_vision * top_vocab['NUM'], top_vocab).

@jgosmann
Copy link
Collaborator

In other words, I don't see how this could happen automatically.

@Seanny123
Copy link
Collaborator Author

Seanny123 commented Jun 14, 2018

Alternatively, would it be worth making this syntax more elegant by allowing a vocab to be used as a context mananger? Like, if I wrapped a whole statement in with top_vocab as a way of forcing reinterpretation?

Upon further consideration, this is a really bad idea, because although it works in my ideal use case, it could lead to a lot of confusion if someone creates a network inside of the vocab context.

@jgosmann
Copy link
Collaborator

I think I had some insight into the underlying problem that makes it hard to handle this nicely: We are using the vocab for two different things. First, it is used for the type system (ensuring that we only connect modules with matching vocabs and allowing type inference for the symbols). Second, we use to denote which Semantic Pointers have meaning in a module (which are those we want to visualize in a plot). Unfortunately, the appropriate vocabularies in these two cases are not the same.

In the example above, all states should use top_vocab for a convenient formulation of the action rule:

num_vision * ss.NUM + color_vision * ss.COLOR >> combined

However, for visualizing the content of num_vision and color_vision, we want num_vocab and color_vocab respectively.

Not sure what possible options are to improve on this. One suggestion could be to use vocab that is appropriate according to the type system (top_vocab for all states here) and then use the restricted vocabs for plotting purposes. However, there is no way to set the vocab accordingly in the GUI at the moment.

@tcstewar
Copy link
Collaborator

We are using the vocab for two different things. First, it is used for the type system (ensuring that we only connect modules with matching vocabs and allowing type inference for the symbols). Second, we use to denote which Semantic Pointers have meaning in a module (which are those we want to visualize in a plot).

Hmm... I'd actually thought that the reason we have vocab subsets was to deal with this. I'd thought that vocab subsets were the same "type" as far as connecting modules goes (i.e. no transformation added), but they'd be treated differently in the gui. What are the other use cases of vocab subsets? I'm sure we've talked about them before but it's not coming to mind for me.... The only other thing I can think of is that cleanup memories could use a subset to only clean up to a subset of the items, but that can also be handled in other ways.....

@jgosmann
Copy link
Collaborator

Not much thought went into subsets so far as they were basically copied from the legacy SPA system. One of the main uses cases to me was for clean-up memories, but #177 enforces the use of the mapping argument and adds some syntactic suger so that this use case should fall away.

So far subset vocabularies are completely separate vocabularies and require explicit reinterpret or translate. Technically it is possible to add additional Semantic Pointers to the subset in which case it ceases to be a subset and it is unclear whether an automatic reinterpret would still be appropriate.

Note that making the subset type compatible to the superset vocabulary still requires to explicitely get the NUM and COLOR vectors from the top_vocab and thus only partially solves the problem.

@jgosmann jgosmann added this to the 0.6 milestone Jun 27, 2018
@jgosmann
Copy link
Collaborator

jgosmann commented Jul 6, 2018

I spend some more thought on this and #69 (how to add different binding methods to Nengo SPA) and came to the conclusion that vocabularies might have originally been intended for plotting in the GUI, but we should move away from that. The GUI might base the default displayed Semantic Pointer expressions on the vocabulary, but more fine grained control over the displayed pointers should be provided by the GUI itself. It feels somewhat wrong to me to make special allowances for Nengo GUI in a project not dependent on Nengo GUI. Also, vocabularies have another important task in Nengo SPA now by ensuring type safety and I am thinking about tying the specific instance of employed binding operation to a vocabulary. That would allow to use different binding operations in the same model, but ensures that they are not accidentally mixed (due to type safety). So far I couldn't think of any case where there is not a 1:1 mapping between the vocabulary and the binding operation to use. Also, some special vocabularies (like axis-aligned vectors, #129) might require some special binding operation (or the other way around a specific binding operation might require vectors created in a specific way).

For the original issue this means: Use a single vocab and create subsets for the plotting. If you want to use Nengo GUI, you might have to do what you are doing now unfortunately, until the SPA plots get improved.

@Seanny123
Copy link
Collaborator Author

For the original issue this means: Use a single vocab and create subsets for the plotting. If you want to use Nengo GUI, you might have to do what you are doing now unfortunately, until the SPA plots get improved.

Does this also imply that if the SPA plots were improved, then Nengo SPA would proceed and better support for type-safety? Specifically, "making the subset type compatible to the superset vocabulary". Alternatively, how did you imagine this would proceed?

@jgosmann
Copy link
Collaborator

jgosmann commented Jul 9, 2018

Everything in Nengo SPA would stay as it is right now.

@jgosmann
Copy link
Collaborator

jgosmann commented Jul 11, 2018

I thought a bit more about this and think there are in principle three options:

  1. Different vocabularies are incombatible. This is the current state and what I am advocating, mostly because it keeps the Nengo SPA code simpler and does not have to deal with a bunch of conceptual problems (that I explain in more detail when discussing the other options). For the example originally proposed this means to use top_vocab everywhere and the GUI should provide a possiblity to restrict the SPs displayed. If you want, you can still use different vocabs to achieve this effect, but you have to accept that you have to deal with some reinterprets then. I could see Nengo SPA providing some way to set the set of relevant SP expressions for visualization on a module that is separate from the actual vocab, if someone wants to come up with a way (syntax-wise) of doing this.
  2. Make subsets of vocabularies compatible to the superset in an ad-hoc fashion. But should subsets be compatible with each other? What if you add new pointers to a subset after creating it? I don't want to add such an ad-hoc feature because such things can make an API quickly inconsistent and full of corner cases.
  3. Exactly track the vocabulary types across operations with a general case solution.
    • c = a + b with vocabularies Va and Vb would give Vc = Va ∪ Vb. Feeding from a subset to a superset would be allowed (not sure about the other direction). But what about keys existent in both vocabularies? Maybe keys most be globally unique, but this is quite a deviation from what is implemented now.
    • c = a * b would give Vc = Va ⨯ Vb. Note that Vb ⨯ Va ≠ Va ⨯ Vb in general for binding operations other than circular convolution. Thus, it might be easy to get confusing errors because of stating things in the wrong order (though when using circular convolutio, both order should be treated the same)
    • c = a * b * ~b should give Va. Not sure how this would be implemented, but probably possible. What would be the type of a sole ~b?
      Doing all of this requires to track how vocabularies are constituted in a tree structure with methods for simplification and comparison. So this would add quite some complexity. But also the assumption that vocabularies combine in these ways might not always be true. For example, when adding elements into a stack-like or serial memory buffer on might have the equation m_i+1 = m_i * t + a which would have the vocab Vm ⨯ Vt + Va which is changing depending on the number of elements, but we need something static. Another problem is that this requires the vocabularies to be known in advance, but for symbolic expression, we infer vocabs from the other operand in operations. This would no longer be possible, but exemplifies that current SPA fundamentally assumes that things being operated on are in the same vocabulary.

I think it might be possible to find consistent answers to the open questions in option 3 and implement a working SPA system based on that, but it is a major job with what seems to me minimal benefit.

Finally, I just realized that the originall problem might be partially addressed by allowing to disable the type checking (which would only require dimensions to match up and basically do reinterprets). Something like:

with spa.unsafe:
    num_vision * top_vocab['NUM'] + color_vision * top_vocab['COLOR'] >> combined

@jgosmann
Copy link
Collaborator

Another reason against option 3: I was thinking about using vocabs to track which binding operation to use, but this only makes sense when both operands are always from the same vocab.

@pblouw
Copy link
Contributor

pblouw commented Jul 26, 2018

One benefit of having an option to disable the type checking is that it would allow Vocabulary and SemanticPointer objects to be more easily used for HRR operations independently of Nengo. (i.e. as an HRR library of sorts). Right now, you cannot perform operations involving SPs from different vocabularies without reinterpreting everything into a single vocabulary (e.g. vocab_a['KEY'] * vocab_b['VALUE'] gives a type error).

One common use case of vocabularies involves defining some SPs in terms of others, not necessarily from the same vocabulary. For example, if I have one vocabulary for goals, and another for objects, I might want to make a new vocabulary that includes SPs for actions that are related to these goals and objects in some way, e.g. ACTION_A = GOALS * (GOAL_1 + GOAL_2) + OBJECTS * (OBJECT_1 + OBJECT_2). If I want to match these goals and objects to appropriate actions, it is important that OBJECT_1 has the same underlying value in both my object vocab and in my action vocab.

Is there are recommended approach to doing this sort of thing in nengo_spa? I currently see two options: either define everything in a single vocab, or call reinterpret when needed to ensure effectively the same result when defining a complex SP made up simple SPs from different vocabs.

I might be missing something, but reinterpret seems to basically act as a flag for ignoring the type system (since once you reinterpret an SP, there are no restrictions on how it can be used symbolically with other SPs as far as I can tell). But if so, I think it would make sense to have options for ignoring type checking more universally, either with a context manager as above, or maybe by allowing vocabs to be set as type-free somehow, so that the SPs in them can be freely combined with SPs in any other vocab.

@jgosmann
Copy link
Collaborator

My viewpoint is that the recommended approach should be to define a single vocabulary. You can keep lists or sets of the keys of the subsets around and use the union to easily define the full vocabulary as done in @Seanny123 example in the first post.

The main situation where subset vocabularies are actually needed still seems to me to me when comparing a semantic pointer to the vocab for plotting as some elements in the vocab might be irrelevant. In that case it is easy to create the desired subset vocabulary and spa.similarity does not do any type checking, so usually no reinterpret will be neccessary.

One additional situation where multiple vocabs might be desired is when the max. similarity constraint should not be ensured across for example the goal and object vectors. But this seems to be problematic when combining goals and objects in a single vector and thus I'm not convinced that this needs to be any easier than it already is. It is also slightly different from the issue how subvocabs should be treated from their parent vocabulary, because instead of creating subvocabs, this requires creating supersets.

@pblouw
Copy link
Contributor

pblouw commented Jul 26, 2018

OK, that sounds reasonable - the idea being to use sets of keys to keep track if different items of interest rather than grouping them in to separate vocabs.

@jgosmann
Copy link
Collaborator

Closing as it seems everyone is more or less convinced that my proposal is the way to go (and the algebra PR #198 has been merged).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants