Ports a minimal (non-optimized) implementation of Mamba.
In short and simple terms, Mamba is an alternative, with trade-offs, to the attention mechanism. Mamba can be used in RNNs that steps over a single sequence point at a time (instead of requiring to observe multiple sequence points at the same time) but it needs to carry over the previous state so it's memory and time requirements are fixed for each sequence point.
[dependencies.dfdx-mamba]
git = 'https://github.com/swfsql/dfdx-mamba.git'
branch = "main"
## instead of using a branch, you can pin to a specific commit:
# rev = ""
features = ["nightly", "safetensors"]
Note that this depends on a fork of dfdx that has some draft prs merged into it:
[dependencies.dfdx]
git = 'https://github.com/swfsql/dfdx.git'
rev = "c4a2995"
# branch = "this-main"
default-features = false
features = ["nightly", "safetensors"]
You can check an example using this mamba block for inference in here (you can also check it in the browser in WebAssembly).
- state-spaces/mamba.
- huggingface/candle-examples/mamba-minimal.
- johnma2006/mamba-minimal.
- kroggen/mamba.c.
- kroggen/mamba-cpu.
- Stanford MLSys Seminars - Efficiently Modeling Long Sequences with Structured State Spaces - Albert Gu | Stanford MLSys #46.
- Stanford MedAI - MedAI #41: Efficiently Modeling Long Sequences with Structured State Spaces | Albert Gu.
- Yingzhen Li - Structured State Space Models for Deep Sequence Modeling (Albert Gu, CMU).