-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arXiv '22 | How to build a cognitive map: insights from models of the hippocampal formation. #20
Comments
AbstractLearning and interpreting the structure of the environment is integral to guiding flexible behaviors for evolutionary viability, and the concept of a cognitive map has emerged as one of the leading metaphors for these capacities.
In this perspective, we
|
IntroductionCognitive maps were originally proposed as internal neural representations affording flexible behavior, such as planning routes or taking never-before-seen shortcuts. More recent descriptions formalized this view with the key concept of generalization. Here, the fundamental role of cognitive maps is to organize knowledge, facilitating generalization of this knowledge to novel experiences, and thus enabling the rapid inference from sparse observations which characterizes biological intelligence. So here we characterizes the role as few-shot learning, which is already implemented by James #16 . |
Entorhinal cortex is known to support spatial cognition, and the characteristic hexagonal firing pattern of entorhinal grid cells is also found when animals navigate abstract spaces, e.g. human ERC & mPFC, monkey mPFC and etc. These parallels in representation suggest the mechanism for understanding the spatial cognitive map might ,in fact, be an instance of a more general coding mechanism capable of building abstract cognitive maps covering any domain. Looks like we need to refer to the whimsy from Numenta! |
In recent years, many models of the hippocampal formation have attempted to do this, providing explanations of neural data and offering falsifiable predictions. While greatly informative, these models differ in their focus and the language of their formalism, obscuring the overall direction and vast potential of this work. The aim of this Perspective is to clarify the common theory underlying these models, while providing novel results offering normative explanations for a range of old and new neural phenomena, just like #19 does. |
The cognitive mapping problemCognitive maps organize knowledge to afford flexible behavior:
In order to achieve this, there are some requirements and desiderata for the neural representations of the cognitive map. Here, we describe these computational considerations and explain the models relevant to each. We aim to provide a clear conceptual understanding of the interlinked ideas. |
Reinforcement Learning and planningTo afford successful behavior, cognitive maps must represent state (a particular configuration of the world). Reinforcement Learning (RL) is a formalism of this concept: actions are taken based on the current world state. Representing the entire world state is often infeasible, due to "curse of dimensionality". So we need an appropriate state abstraction, learning, or attending to, the appropriate abstraction is a central issue of the cognitive mapping problem. We have to note that RL typically assumes that the underlying state-space is fixed, and the state-space in RL is tightly linked to behavior (through rewards, values, and policies). Classic (or model-free) RL slowly learns the value (value-based) of states, or which actions (policy-based) are good in which states, and requires no knowledge of how states relate to each other. While this is provably optimal in the long term, value-based learning is often inflexible and slow to learn, especially in situations where the dynamics of the environment keep changing. Knowing the relationships allows you flexibly plan routes between any start and any goal state. Unfortunately, transitional planning mechanisms (such as tree-search, only have the information about local transition) are computationally costly, but alternatives do exist, e.g. Silver et. al. 2018, Botvinick et. al. 2012, Bush et. al. 2015. More broadly, with a clever representation of the state-space (see next subsection), the cost of planning can be reduced, or even completely avoided (e.g. successive representation, but it's passive, policy-dependent, not ideal enough). This is a powerful way to formalize the central goal of cognitive maps:
|
Space as a state-spaceThe abstract location can be represented in a variety of ways; for example,
The choice of which representation to use has major consequences. For example,
These two representation types are analogous to place and grid cells in the hippocampal formation:
By a clever choice of representation, grid cells prevent the need for computation. |
Non-spatial state-spacesWhile it is easy to intuit good state-spaces in physical space, this problem becomes less clear in non-space. One promising approach, derived from RL, is to cast spatial learning as understanding relationships on a graph, and the previously mentioned coordinates are built on such graph (instead of continuous space). This is a re-conceptualization of a map in terms of its connections (topology), as opposed to distances (geometry). Graphs afford planning via transition matrix T (just like tree-search). The problem of building graphs for cognitive maps is the same problem as building state-spaces in reinforcement learning. However, once the state space is defined there is a further choice of how each state is actually represented. This is fine as long as there is no state coupling, but the representation of states can vary widely, with different representations suitable for different functions. Clever choice of representation can reduce online value/policy computations. This has allowed normative mathematical theories to predict neural representations. Reinforcement learning is concerned with taking appropriate actions at specific states RL state-spaces define graphs with transition matrix, which means whatever the graph representation is, we represent connections between states in terms of transition-distance. If we train the model (e.g. TEM) to predict the next (only one step away from current state) state, then this graph representation will emerge. Another graph representation, the successive representation (SR) #11, is particularly relevant to cognitive map. Critically, if we represent connections between states in the world in terms of the SR-distance, then computing value is easy, since the SR is one half of the value computation Is ovc a kind of successive representation? But ovc often contains vector, it resembles One prominent issue with SR, however, is its policy-dependence. This means that when rewards move - or, worse, when obstacles appear - value calculations using SR are no longer optimal. Piray et. al. 2020 addresses this problem, using linera RL. The required default representation (DR) resembles the SR. The model further provides **a novel account of how to build world representations compositionally out of component cells representations (e.g. how grid and border cells interact to represent the insertion of a barrier, see Mark et. al. 2020 for more details). |
Latent states and sequence learningHow do we know which graphs to build? Our world is not "fully observable"; instead, we face "partially observable" problems and must infer latent state representations. We have to infer latent states from sensory sequences, e.g. clone-structured cognitive graph (CSCG), building a latent state-space map can be used to afford different behaviors in sensorially identical situations. The CSCG model is an elegant approach for building de-alised state-spaces. Here, hippocampus contains multiple "clone cells" for each sensory observation. This model use Bayes to:
These transition weights are analogous to the transition matrix for graphs, but critically the state-space is learned, rather than provided by the modeller, which is the critical difference between CSCG and the following models. CSCG infers the whole latent space within the hippocampus (as opposed to the cortical input (maybe designed by modeller) to the hippocampus). This enables learning rules to be local, biologically plausible, and fast. It looks like the proposal I mentioned in #16 . The hippocampus may have its own dynamics, which supports the graph-learning process of entorhinal cortex. By contrast, CSCG has to learn each map de novo and cannot benefit from having learnt similar maps before. It is exciting to think how these benefits may be combined, e.g. Complementary maps in hippocampus and cortex. CSCG is closely related to hidden Markov models. From a sequence of sensory observations
Modelling the full sequence of observation is then: |
Path integration and compressionInferring latent states is really a problem of understanding where you are in an abstract space. Entorhinal grid cells are considered an attractive substrate for path integration of two-dimensional spaces since:
Using (x,y)-coordinates to organize graphs offer a benefit compared to representing every individual connection between nodes: adding a new node immediately implies all other connections without needing to observe that relationship explicitly (even without off-line planning). Path integration doesn't require the knowledge of the entire world, it treats all nodes equally and relationships are structured (do not care about the specific meanings of relationship, only care about how to integrate among relationships). As such, only the few rules of path integration need to be known, not every possible relationship, that's why TEM can transfer among the environments where the same rule apply.
Not all graphs, however, can be path-integrated, since consistent actions do not always exist across graphs (for instance, social networks merely describe generic relationships). To do path integration, continuous attractor neural networks (CANN) receive velocity input With an appropriate set of weights, CANNs path integrate, with different cell classes (head-direction cells, place cells, grid cells) modeled with different weights (cannot be unified in one single framework?). Remarkably, CANNs really exists in nature; attractor manifolds are found in rodents. Other path integrating models exist, velocity-coupled oscillators (VCOs) suggest path integration (along an axis) via interference between theta oscillations and velocity-dependent dendritic oscillations, with their phase difference indicating path integrated distance along an axis (this looks like a plane wave!). Here, grid cells are the sum of three such neurons with preferred axes at Π/3 relative angles. One major limitation of CANNs and VCOs, however, is that the weights of the recurrent weight matrix, W, are carefully selected and not learned from sensory experience. However, it is easy enough to set up path integration as a learning problem via predicting observations Neural units in these models form periodic representations, but there are often amorphous 4-fold symmetric grids. Sorscher et. al. 2019, however, demonstrated that the 4- to 6-fold symmetry transition is governed by a single property: a third order regularization term of grid cells, like the following regularization loss item? L_reg_g3 = tf.reduce_sum(tf.stack([tf.reduce_sum(g ** 3, axis=1)\
for g in g_inf], axis=0), axis=0) Indeed, this is easily implemented by the biological constraint of ensuring neural activity is positive. |
GeneralizationGeneralization, or the transfer of knowledge from one situation to another, is the substrate of the profound behavioral flexibility exhibited by animals.
Generalizing with graphs, however, is hard as they require perfect alignment, which is NP-hard and thus impractical in most situations. Generalizing with periodic path integration representations, on the other hand, is easy since all positions are treated equally, e.g. we are binding rules instead of the entire graph to each part. This is generalization of relational knowledge. What kinds of cells support generalization?
Spatial generalization, at least, seems to exist in entorhinal cortex and is consistent with path integration. To actually make sensory predictions, however, you need to know more than just abstract knowledge. You need to know how it interacts with real world representations. One influential proposal is that hippocampal cells reflect this interaction, with abstract knowledge from MEC and sensory knowledge from LEC combined rapidly (fast-mapped) in hippocampus. We have seen models that build latent state representations, and models that path integrate. If these principles could be combined, we could build a powerful system that
Recall the probabilistic interpretation of path integration: Previously, While TEM and SMP are conceptually the same model, they have different implementations:
|
Novel interpretations, integrations, and predictionsThese models often do so in seemingly divergent ways, and there are many neural phenomena that remain perplexing. Here we consider how these ideas can be integrated in order to model and understand cognitive maps at a deeper level, and offer novel accounts of several neural phenomena through a formal lens. Non-spatial hippocampal cells are latent state representations for generalizationWe have argued latent state representations serve two purposes:
These arguments suggest two things:
As a didactic example, spatial alternation task can be "un-rolled" into a "big-loop" state space, which is the latent space for the task, and de-aliases the common "trunk" section. This "big-loop", however, ignores the spatial knowledge - understanding the big-loop alone does not let you know you are back in the same place - to generalize spatial knowledge you additionally need a spatial representation. Hippocampal cells in this task indeed code for both space (place cells) and big-loop (splitter cells).
|
Complementary maps in hippocampus and cortexDoes hippocampus map space, or is it role one of memory?
The observation above is a distinction that offers a potential unification of the hippocampus role in mapping and memories: it is easier to learn how to generalize if each (latent) state-space is already build, just like prediction error, once the association has been built, the prediction error will be backpropagated, maybe GFlowNet will help? More precisely, should all states of the world be appropriately separated, the relationships between states known, cortex can receive high-fidelity training signals (since preditions, e.g. the generation process of cortex, can be compared to a de-aliased state-space), thereby significantly reducing the burden of learning. This means entirely novel sets of relationships can be efficiently learned as follows:
This proposal follows complementary learning systems theory, where cortex slowly learns the statistics of hippocampal episodes. We take note of an interesting model proposed by Evans et. al. 2019, that, while not involving structural learning or generalization, leverages two independent systems for self-localization:
This integrated approach is realizable within the existing models. Since both TEM and CSCG utilize multiple "clone" hippocampal cells for each sensory observation, it is particularly easy to combine these models. This would be formulated as a TEM-like model, but where the hippocampus is predictive of future hippocampal states. Such an approach combines the best of both models - learning novel maps fast (CSCG), but also leveraging past knowledge to understand similarities between maps (TEM/SMP). |
Cognitive maps and behaviorThe models discussed here interact with behavior in different ways (including using eigenspaces for various behaviors).
The observation that grid cells resemble eigenvectors of place cells (or of the spatial transition matrix; if place cells use successive representation, then of the SR matrix) has led to interesting suggestions about mechanisms for planning and exploration. To plan the future, you need to look across multiple transitions. Eigenvectors simplify this problem because all multi-step transition matrices (successive representation, there is already a static consideration of # of steps) share the same Eigenvectors. Intuitively, this means these eigenvectors can be used for exploration, planning, sampling in replay, or any other type of multi-step navigation. Different sampling patterns differ only in the eigenvalue matrix It seems the upstream brain area can modify the spectral in MEC to generate hippocampal representation suitable for current task. In fact, with another choice of weighting matrix, So far, we have been considering diffusive transition matrices, i.e. matrices without actions. However, by making transition matrices actions dependent (remember path integration has action-dependent matrices too) we can play games just like path integration. In space, at least, the transition matrices needed for different actions all have exactly the same eigenvectors, but different eigenvalues. Maybe suitable for all graphs? After all, the transition matrix is not limited. Hence, path integration can be reduced to successively adding the eigenvalues associated with each action, which is also a Bellman equation conditioning on the policy used in the following environment (it bootstraps!). This way of thinking unifies path integration with SR-like planning, e.g. learning so that we don't have to learn. Interestingly, it also brings different models of path integration into a common framework since, in this case, the eigenvectors are plane waves (not grids as the transitions are unidirectional!) just like those required for VCOs, and the transition matrix is just like the weight matrices required for CANNs. Maybe we generate superdiffusion is because the distance between current state and target state has already been updated in the probability representation, which could be used to explain the reverse replay after encountering a shiny object, as demonstrated in Eldar et. al. 2020 and the superdiffusive behavior, as demonstrated in McNamee et. al. 2021. |
Credit assignment through generalization and the interplay with striatal RLRL typically assumes that the underlying state-space is fixed, and values are slowly assigned to these states. There is no requirement for state representations to be fixed (but in RL, only care about the algorithm, not care about the representation learning problem), however; they can change to better represent value. For example, after encountering a goal, GVCs (goal vector cells) form - cells that are active at certain distances and directions from goals (is that object vector cell?). This can be interpreted as a state representation augmentation (not only Pre-learned goal-vector representations can be immediately composed with spatial representations to generate an accurate and flexible representation of any goal state, this is driven by reward signal, how about composition of subgraphs, driven by prediction error? The only online role of the cognitive map is inferring which pre-learned and pre-credit assigned representations to compose. This is credit assignment through generalization, and is akin to meta-RL, since prior statistical knowledge (e.g. GVCs) can be integrated on-the-fly to solve novel tasks.
Where does these representations come from in the first place? The cognitive map models suggest that such representations can be learned from statistics of behavior:
In general, to train these "pre-credit assigned" compositional representations, cortex must learn from sequences of behavior (which are generated via classical RL, perhaps in the striatum). Understandably, initial striatal actions will be bad (when encountering entirely novel tasks), but as RL learns good policies, actions will be towards goals (which are high-fidelity training signals). The cortico-hippocampal system can then learn compositional representations of these policies (e.g. GVCs) from the statistics of these sequences, an action-version of complementary system, which relates to recent machine learning methods in offline RL. Here, sequence models learn the statistics of behavioral sequences from conventional RL algorithms, after which the sequence model can be used for planning in a manner analogous to planning by inference, as mentioned in Chen et. al. 2021, Janner et. al. 2021.
|
Replay: offline state-space constructionIf behavior control in a new world is reduced to a state-space composition problem, it becomes important to construct state-spaces rapidly and accurately (offline seems good!), and to store them in memory so they can inform future decisions. An appealing substrate for this composition is replay. For example, when an animal receives reward, it is important that all other states in the environment are aware of their relative location to the reward. Replay can path integrate away from the reward (that is what exactly TEM-OVC does, ovc-replay!), successively tying (composing) each new goal-vector cell to its respective hippocampal/cortical location (perhaps building landmark cells in hippocampus; this is similar mechanism to the simultaneous grid and place cell replay, but now used to instantiate rewarding policies, instead of ensuring consistency between place and grid representations). After encountering a goal, we want the goal-vector representations to exist across all of space, and especially any start locations. Replay trajectories provide an offline solution; path integrate (offline) GVCs and bind them (via memory) to important locations such as the start state. Now, should the animal return to a state, that state representation already "knows" about its relation to the reward. It is no longer necessary to hold all goal locations in mind, as the state-space composition is stored in memory. This idea relates to previous ideas from RL that cast replay as
However, in a generalization framework (outlined in the section above, e.g. TEM, SMP), these two computational processes are subsumed by the single process of composing state-spaces from pre-learnt bases (looks like TD-learning can be used widely, maybe GFlowNet will help?). To test this framework against data, it will be interesting to build a formal understanding of optimal replay patterns under these assumptions. Notably, it will make predictions about
|
When neural representations factorizeSpatial representations found in entorhinal cortex, such as grid cells, OVCs, and BVCs, are seemingly factorized, since they compositionally augment the entorhinal grid representation to represent different environment configurations. Recent evidence however, has shown that grid cells warp towards consistently rewarded locations, as demonstrated in Boccara et. al. 2018 and Butler et. al. 2019. Factorized representations do not warp, since warping is an environment-specific phenomenon; warping around rewards does not transfer to different spatial configurations of rewards (a trade-off between generalization and maximization of reward, people are not rational!). Specifically, there is a computational trade-off between using factorized compositional bases and using bespoke warped representations, e.g. a pressure to generalize versus precisely representing a single task.
|
Open questionsThe role of time in memory and cognitive mapsThe discussion of cognitive map models so far assumes that learned representations remain stable over time. This clearly cannot be the case, due to representation drift, as mentioned in #5 . But how can hippocampus maintain a stable representation of space, if the cellular basis of this representation is drifting over time? Generalization models offer a natural solution as, here, hippocampal cells bind multiple factors of the input. Only on factor needs to change for the entire hippocampal representation to change. Representation drift, in this view, is just hippocampal remapping, but now it is not sensory observations or space that has changed, but time instead (this is weird, why time? space evolving with time makes more sense!). Hippocampal represents time thought more than just drift. Pure "time cells", for example, emerge when rodents are required to stay still, or run on a wheel, for a particular duration of time in a task, maybe can see this representation as a sensory input, like the clock that will interrupt the computer.
|
Interacting levels of abstractionThe real power of abstractions comes when this process can happen repeatedly, so that abstractions can themselves lead to further abstractions, e.g. memory consolidation. The latent space and the corresponding transition rule (due to the configuration of actions) would not have generalized if the T-maze become a W-maze, thus we need something fundamentally new in the models to account for this.
One intriguing possibility is that the different representations observed in fronto-temporal cortices might reflect such a factorization, with entorhinal representations grounded in interactions with the physical environment, while neurons in PFC representing abstract, task-related invariances, such as "location in task", e.g. PFC modulates MEC to form grid characterization. Interestingly, though, the very same vectors can be reused whether it be the oven or the chopping board. This makes a prediction - vector cells that are contextually modulated depending on "location in task".
|
From sequences to other domains of cognitionThe models we have described translate the problem of building maps into problems of understanding the structure of possible sequences. This raises two interesting points:
|
Whittington, James CR, et al. How to build a cognitive map: insights from models of the hippocampal formation.
The text was updated successfully, but these errors were encountered: