diff --git a/metrics/perplexity/README.md b/metrics/perplexity/README.md
new file mode 100644
index 00000000000..3aa42bc4cec
--- /dev/null
+++ b/metrics/perplexity/README.md
@@ -0,0 +1,87 @@
+# Metric Card for Perplexity
+
+## Metric Description
+Given a model and an input text sequence, perplexity measures how likely the model is to generate the input text sequence. This can be used in two main ways:
+1. to evaluate how well the model has learned the distribution of the text it was trained on
+    - In this case, the model input should be the trained model to be evaluated, and the input texts should be the text that the model was trained on.
+2. to evaluate how well a selection of text matches the distribution of text that the input model was trained on
+    - In this case, the model input should be a trained model, and the input texts should be the text to be evaluated.
+
+## Intended Uses
+Any language generation task.
+
+## How to Use
+
+The metric takes a list of text as input, as well as the name of the model used to compute the metric:
+
+```python
+from datasets import load_metric
+perplexity = load_metric("perplexity")
+results = perplexity.compute(input_texts=input_texts, model_id='gpt2')
+```
+
+### Inputs
+- **model_id** (str): model used for calculating Perplexity. NOTE: Perplexity can only be calculated for causal language models.
+    - This includes models such as gpt2, causal variations of bert, causal versions of t5, and more (the full list can be found in the AutoModelForCausalLM documentation here: https://huggingface.co/docs/transformers/master/en/model_doc/auto#transformers.AutoModelForCausalLM )
+- **input_texts** (list of str): input text, each separate text snippet is one list entry. Perplexity returned will be an average of the perplexity for each list entry.
+- **stride** (int): stride size, defaults to 512
+- **device** (str): device to run on, defaults to 'cuda' when available
+
+### Output Values
+This metric outputs a dictionary with one value: the average perplexity score for the text input in the list.
+
+```
+{'perplexity': 117.9}
+```
+
+This metric's range is 0 and up. A lower score is better.
+
+#### Values from Popular Papers
+
+
+### Examples
+Calculating perplexity on input_texts defined here:
+```python
+perplexity = datasets.load_metric("perplexity")
+input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"]
+results = perplexity.compute(model_id='gpt2',
+                              input_texts=input_texts,
+                              stride=1)
+round(results["perplexity"], 1)
+>>> 78.2
+```
+Calculating perplexity on input_texts loaded in from a dataset:
+```python
+perplexity = datasets.load_metric("perplexity")
+input_texts = datasets.load_dataset("wikitext",
+                                     "wikitext-2-raw-v1",
+                                     split="test")["text"][:10]
+
+results = perplexity.compute(model_id='gpt2',
+                              input_texts=input_texts,
+                              stride=256)
+round(results["perplexity"], 1)
+>>> 117.9
+```
+
+## Limitations and Bias
+Note that the output value is based heavily on what text the model was trained on. This means that perplexity scores are not comparable between models or datasets. 
+
+
+## Citation
+
+```bibtex
+@article{jelinek1977perplexity,
+title={Perplexity—a measure of the difficulty of speech recognition tasks},
+author={Jelinek, Fred and Mercer, Robert L and Bahl, Lalit R and Baker, James K},
+journal={The Journal of the Acoustical Society of America},
+volume={62},
+number={S1},
+pages={S63--S63},
+year={1977},
+publisher={Acoustical Society of America}
+}
+```
+
+## Further References
+- [Hugging Face Perplexity Blog Post](https://huggingface.co/docs/transformers/perplexity)