diff --git a/docs/assets/feast_model_inference_architecture.png b/docs/assets/feast_model_inference_architecture.png
new file mode 100644
index 0000000000..3ea4fba4d0
Binary files /dev/null and b/docs/assets/feast_model_inference_architecture.png differ
diff --git a/docs/getting-started/architecture/model-inference.md b/docs/getting-started/architecture/model-inference.md
index 3a061603c1..582657dbc4 100644
--- a/docs/getting-started/architecture/model-inference.md
+++ b/docs/getting-started/architecture/model-inference.md
@@ -1,5 +1,13 @@
 # Feature Serving and Model Inference
 
+![](../../assets/feast_model_inference_architecture.png)
+
+
+{% hint style="info" %}
+**Note:** this ML Infrastructure diagram highlights an orchestration pattern that is driven by a client application. 
+This is not the only approach that can be taken and different patterns will result in different trade-offs.
+{% endhint %}
+
 Production machine learning systems can choose from four approaches to serving machine learning predictions (the output 
 of model inference):
 1. Online model inference with online features