Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDR: Implement ways to measure quality of produced SDR #155

Open
4 of 11 tasks
breznak opened this issue Dec 10, 2018 · 19 comments
Open
4 of 11 tasks

SDR: Implement ways to measure quality of produced SDR #155

breznak opened this issue Dec 10, 2018 · 19 comments
Assignees
Labels
enhancement New feature or request research new functionality of HTM theory, research idea SP

Comments

@breznak
Copy link
Member

breznak commented Dec 10, 2018

Relevant classes:

  • SpatialPooler
  • SDR
  • Topology

Why?

  • when making changes to SP, we don't have ways to meansure the quality of its outputs: SDRs.

Functionality:
SDR = sparse distributed representation

  • sparse:
    • min, max active bits in SDR, compared to % size
    • avg distance between active bits in SDR (should be similar for all SDRs), uses Topology
  • distributed:
    • uses activeDutyCycles of the active bits (=cols) to see that all cols are used equally
    • information/entropy of the bit/SDR

Implementation:

  • helper method to SP (SDR?)

Hypothesis:

  • what is a "quality" SDR?
  • does higher quality SDR translate to better (how?) results? (in what?)

EDIT: latest update 14/01/2019

Update: Implemented in PR #184

  • SDR Sparsity Metrics
  • SDR Activation Frequency Metrics
  • SDR Average Overlap Metrics
  • SDR All Metrics Convenience Class

Summary: Ideas which are discussed here but not yet implemented:

  • Cell death (via method SDR.killCells)
  • SDR topology
  • SP noise resistance (via method SDR.addNoise, also the example sp_tutorial will demonstraight this)
  • SP long term stability
  • TM estimate false positive & negative rates
  • Test Hypothesises
  • Write about how to measure HTM's using these metrics
@breznak breznak added enhancement New feature or request SP labels Dec 10, 2018
@breznak breznak self-assigned this Dec 10, 2018
@breznak breznak added the research new functionality of HTM theory, research idea label Dec 10, 2018
@dkeeney
Copy link

dkeeney commented Dec 10, 2018

when making changes to SP, we don't have ways to meansure the quality of its outputs: SDRs.

Yes, I agree. Having some sort of measure would be very useful. 👍

@ctrl-z-9000-times
Copy link
Collaborator

Great idea!

I would add stats: min/mean/std-dev/max for ac tiveDutyCycles, and then binary entropy which is a single fraction (in range 0-1) which describes utilization.

Ths SDR class has a hook which is called everytime its value is updated, could be useful for this task?

@breznak
Copy link
Member Author

breznak commented Dec 11, 2018

I would add stats: min/mean/std-dev/max for ac tiveDutyCycles

So maybe in 2 ways of implementation; as for the first type, I'd like only metrics that is computed instantly, just from the SDR. To make it simpler (no logic needs to be added to SP), and faster.

entropy .. which describes utilization.

For a single bit, whole SDR, or the SP?

@ctrl-z-9000-times
Copy link
Collaborator

We could split these metrics into different methods, and then have a print method which calls all four. Then the min/max can be computed fast and separately, but the print method (which is typically only called once at end of program) can display all of the stats.

class SDR_ActivationFrequency {
    SDR_ActivationFrequency( SDR &dataSource );
    Real min();
    Real max();
    Real mean();
    Real std();
    Real entropy();
    String pretty_print(); // Uses all the metrics.
}

entropy .. which describes utilization.

For a single bit, whole SDR, or the SP?

Entropy is for the activation frequency of the SDR as a whole. Here is my python function for it:

def _binary_entropy(p): // p is an array of floats in range [0, 1]
    p_ = (1 - p)
    s  = -p*np.log2(p) -p_*np.log2(p_)
    return np.mean(np.nan_to_num(s))

Then to scale entropy into range [0, 1] simply divide by the theoretical maximum entropy which is: entropy(mean(activationFrequency)).

@ctrl-z-9000-times
Copy link
Collaborator

min, max active bits in SDR, compared to % size

Another good idea, to which I would add mean & std. Min & max tell you about the extremes & outliers, which can be helpful for spotting bugs. Mean & std tell you about its normal operating behaviour.

Yet another interesting metric to track is: Average overlap between consecutive assignments to an SDR. This measures how quickly an SDR changes, sort of like a derivative. I have in past experiments used this to measure the quality of encoders, w/ regards to semantic similarity property. I've also used this metric in experiments with Layer 2/3 cell stability / view-point invariance.

@breznak
Copy link
Member Author

breznak commented Dec 11, 2018

split these metrics into different methods,

Yes, I'd like the metric to provide a mapping to [0, 1], but (also) return the separate stats.

Mean & std tell you about its normal operating behaviour.

First I've thought of "quality" as a one-shot measure of an SDR, you're suggesting to add statistics over the run of the program on the dataset (which is a good thing!) Only if these should be separate? Quality of SDR, and stats of SP. Or keep it together in one.

def _binary_entropy(p): // p is an array of floats in range [0, 1]

And the p here is? activation freq for each column(bit) after N runs?

Yet another interesting metric to track is: Average overlap between consecutive assignments to an SDR

What do you mean by this? If it's overlap between 2 consecutive (any) SDR values produced by SP, that imho has no meaning, as these do not have to be anyhow correlated...?

@ctrl-z-9000-times
Copy link
Collaborator

I think a good way to organize all of these would be to give each metric its own class. Then create a class named SDR_Metrics which would gather up all of the metrics into a single easy to use package.

Each metric could follow a common design pattern, such as:

SDR_MetricName {
    void SDR_MetricName( SDR &dataSource, ... );
    Real statistics(); // Min Mean Std Max
    String print();
    void save( ... );
    void load( ... );
}

@breznak
Copy link
Member Author

breznak commented Dec 11, 2018

  • I'm looking into measuring the following property of an SDR:
    "Distributed = each bit can be REused in several different contexts(SDRs), and a collection of multiple bits is unique (a SDR)"

We can evaluate sparsity quite well, but this distributed-ness? Would information/entropy over column activations over the run over dataset (too much over-s :D) be enough? In SP we can use (active)DutyCycles as well...

@breznak
Copy link
Member Author

breznak commented Dec 11, 2018

I would like to tyrn this into a paper. The main ideas are:

  • we can&should measure quality of the encoding (SDRs) - how? What features?
  • (How) does the quality correlate with good algorithm results? (prediction & anomaly)
  • Compare quality of (output) encodings of other ML algorithms. (Which? Only sparse representations?)
    • sparse auto-encoders
    • cortical.io retina (SDRs for NLP)
    • biological (BCI data from which regions, retina, ...)

@ctrl-z-9000-times
Copy link
Collaborator

And the p here is? activation freq for each column(bit) after N runs?

Yes.

What do you mean by [average overlap]? If it's overlap between 2 consecutive (any) SDR values produced by SP, that imho has no meaning, as these do not have to be anyhow correlated...?

This is only relevant for time-series datasets. The encoder output should have an overlap when the input value is slowly and smoothly moving, which indicates semantic similarity between encoded values. The SP should have very little overlap because it should map similar inputs to distinctive outputs. The column-pooler should have a significant average overlap because it is supposed to do view-point invariance.

@ctrl-z-9000-times
Copy link
Collaborator

For reference: I got a lot of ideas for statistics by reading numenta's papers. In their SP paper they describe several ways to measure the quality of their results. IIRC the SDR paper was also useful.

"Distributed = each bit can be REused in several different contexts(SDRs), and a collection of multiple bits is unique (a SDR)"

In this context I think that "distributed" means "decorrelated". You can measure the correlation between two SDRs, and between every pair of SDRs in a set, and then average those correlations together into a single result describing overall quality. In past experiments I've measured correlations between & within labelled categories, which I found useful.

I would like to tyrn this into a paper

Alternatively, this info would be great for our wiki too. It would be helpful for other ppl to understand how to build & debug HTM systems. I have been meaning to write on the htm-community wiki. I've started writing a wiki in my fork of nupic.cpp but its not done yet. I am hoping to turn the wiki into a practical guide for using HTMs. The numenta wiki already has a lot of good material & docs which we should copy into this wiki at some point.

@breznak
Copy link
Member Author

breznak commented Dec 20, 2018

Average overlap between consecutive assignments to an SDR. This measures how quickly an SDR changes, sort of like a derivative. I have in past experiments used this to measure the quality of encoders, w/ regards to semantic similarity property. I've also used this metric in experiments with Layer 2/3 cell stability / view-point invariance.

What would be good datasets to test this?

You can measure the correlation between two SDRs

I'm trying to figure how to eliminate the (error) caused from encoders, which are written by hand. We could use a set of SDRs and just modify them (to have semantically similar, with known difference data), MNIST would be a good example from a practical domain.

Also, would c++, py be a better repo to start this research at?

@ctrl-z-9000-times
Copy link
Collaborator

What would be good datasets to test [SP-AverageOverlap]?

This would be useful in conjunction with any encoder. Use artificial data as input so that you can control the rate it changes at, and check that the resulting SDR has a reasonable average overlap. The SP-AverageOverlap class should use an exponential rolling average, so it is possible to get the exact overlap (rather than an average) for testing purposes by setting its parameter to 1.

What would be good datasets to test [Layer 2/3 cell stability / view-point invariance]?

An artificial dataset. Numenta created 3D objects to test this.

In my experiments I used words: I encoded each letter of the alphabet as a random SDR, and fed the two layer network a sequence of words (with whitespace removed). I judged the quality of layers 2/3 by the average overlap, as well as a more detailed analysis of the actual overlaps within & between categories (where each word is a category).

Also, would c++, py be a better repo to start this research at?

IMO C++. I would rather make this repo really good, and then have python bindings.

@ctrl-z-9000-times
Copy link
Collaborator

From the SP paper: Two more metrics for the SP, not generic for all SDRs. These metrics depend on an input dataset and prior training, so there is some work required from the user.

  • Noise resistance
    • We could have a method SP.computeNoisy(inputs, outputs, percentNoise) -> noiseResistance which would calculate this metric and return the percent overlap between the clean & noisy results.
  • Long term stability of inputs -> outputs
    • Method SDR.overlap() can help with this.

From "Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory": Both of the following metrics could be methods of TM class.

  • False positive rate (estimate): needs input SDR-Sparsity
  • False negative rates (estimate): needs input SDR-Sparsity & percentNoise

Cell death experiments: We could make an SDR subclass which kills a fraction of cells in an SDR and filters them out of its value.

@ctrl-z-9000-times ctrl-z-9000-times self-assigned this Dec 22, 2018
@breznak
Copy link
Member Author

breznak commented Dec 22, 2018

These metrics depend on an input dataset and prior training, so there is some work required from the user.

I figured most of the interesting metrics would be task (dataset) dependant. In a form of a sliding window, as HTM is doing online learning.

Noise resistance

I'd add this under the autoassociative memory experiment, with dropout:

Cell death experiments: We could make an SDR subclass which kills a fraction of cells in an SDR and filters them out of its value.

Also, about this

Cell death experiments:

would not add a subclass, but constructor param float dropoutRatio that kills (=flips) each bit randomly with given chance.

False positive rate (estimate): needs input SDR-Sparsity
False negative rates (estimate): needs input SDR-Sparsity & percentNoise

FP, FN rates: 👍

@breznak
Copy link
Member Author

breznak commented Dec 22, 2018

Some other hypothesis to verify:

  • H3: low accumulated SDR quality -> hint to change (running) params of the network (SP, TM,..params); should ignore anomalies (as it implies "I don't understand the problem").

  • H4: Quality acts as a "confidence measure", orthogonal to anomaly score. Allows us to say: "I'm highly confident this is a contextual anomaly" (=high quality, high anomaly) vs. "anomaly && low quality" = "I'm new to the problem, don't take predictions too seriously" (= we may filter out the anomaly) vs (high quality & low anomaly) -> "don't filter out, just small anomaly, but I'm confident about that"

  • H5: cummulatice quality drop indicates domain change -> could trigger auto-reset(), param tuning, or just hint the domain change. (ex sine wave switches to stairs patter), find datasets for this.

@breznak breznak closed this as completed Dec 22, 2018
@breznak breznak reopened this Dec 22, 2018
@ctrl-z-9000-times
Copy link
Collaborator

ctrl-z-9000-times commented Jan 6, 2019

Update: Implemented in PR #184

  • SDR Sparsity Metrics
  • SDR Activation Frequency Metrics
  • SDR Average Overlap Metrics
  • SDR All Metrics Convenience Class

TODO: This is not critical, but maybe useful? I'd like all the SDR Metrics to have another constructor which does not accept an SDR, instead the user must call Metric.addData( SDR ). This lets the users manage their own data and is a more flexible solution.

UPDATE Metric.addData( SDR ) implemented.

Summary: Ideas which are discussed here but not yet implemented:

  • Cell death (via SDR subclass)
  • SDR topology
  • SP noise resistance
  • SP long term stability
  • TM estimate false positive & negative rates
  • Test Hypothesises
  • Write about how to measure HTM's using these metrics

@ctrl-z-9000-times
Copy link
Collaborator

does higher quality SDR translate to better (how?) results? (in what?)

I accidentally made a bug in the mnist branch, which resulted in a 2% decrease in accuracy from 95 to 93%. This bug also caused the entropy to drop from ~95% to less than 75%!

@breznak breznak mentioned this issue Sep 17, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request research new functionality of HTM theory, research idea SP
Projects
None yet
Development

No branches or pull requests

3 participants