From 2094a00c0527c707022f901e08a7adc4ec39c939 Mon Sep 17 00:00:00 2001 From: Maya Raman Date: Mon, 14 Oct 2024 16:57:54 -0400 Subject: [PATCH] DOCSP-43242: Improve UTF-8 validation documentation to clarify validation occurs on decoded data only (#908) * DOCSP-43242: updating paragraph * committed incorrect change * updating note * changing to documents * remove note altogether * adding original note back * flow + bson strings * more specific string distinction * generalizing data * added warning and lone surrogate information * removing * Update source/fundamentals/bson/utf8-validation.txt Co-authored-by: Jordan Smith <45415425+jordan-smith721@users.noreply.github.com> --------- Co-authored-by: Jordan Smith <45415425+jordan-smith721@users.noreply.github.com> --- source/fundamentals/bson/utf8-validation.txt | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/source/fundamentals/bson/utf8-validation.txt b/source/fundamentals/bson/utf8-validation.txt index 54fb04c1c..1e2adc7a8 100644 --- a/source/fundamentals/bson/utf8-validation.txt +++ b/source/fundamentals/bson/utf8-validation.txt @@ -25,15 +25,16 @@ processing overhead since it needs to check the data. If you *disable* validation, your application avoids the validation processing overhead, but cannot guarantee consistent presentation of invalid UTF-8 data. -The driver enables UTF-8 validation by default. It checks documents for any -characters that are not encoded in a valid UTF-8 format when it transfers data -between your application and MongoDB. +By default, the driver enables UTF-8 validation on data from MongoDB. +It checks incoming documents for any characters that are not encoded in a +valid UTF-8 format when it parses data sent from MongoDB to your application. .. note:: - The current version of the {+driver-short+} automatically substitutes - invalid UTF-8 characters with alternate valid UTF-8 ones before - validation when you send data to MongoDB. Therefore, the validation + This version of the {+driver-short+} automatically substitutes invalid + `lone surrogates `__ + with the `replacement character `__ + before validation when you send data to MongoDB. Therefore, the validation only throws an error when the setting is enabled and the driver receives invalid UTF-8 document data from MongoDB.