Skip to content

Commit

Permalink
DOCSP-43242: Improve UTF-8 validation documentation to clarify valida…
Browse files Browse the repository at this point in the history
…tion occurs on decoded data only (#908)

* DOCSP-43242: updating paragraph

* committed incorrect change

* updating note

* changing to documents

* remove note altogether

* adding original note back

* flow + bson strings

* more specific string distinction

* generalizing data

* added warning and lone surrogate information

* removing

* Update source/fundamentals/bson/utf8-validation.txt

Co-authored-by: Jordan Smith <[email protected]>

---------

Co-authored-by: Jordan Smith <[email protected]>
  • Loading branch information
mayaraman19 and jordan-smith721 authored Oct 14, 2024
1 parent 1cb8caf commit 2094a00
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions source/fundamentals/bson/utf8-validation.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,16 @@ processing overhead since it needs to check the data.
If you *disable* validation, your application avoids the validation processing
overhead, but cannot guarantee consistent presentation of invalid UTF-8 data.

The driver enables UTF-8 validation by default. It checks documents for any
characters that are not encoded in a valid UTF-8 format when it transfers data
between your application and MongoDB.
By default, the driver enables UTF-8 validation on data from MongoDB.
It checks incoming documents for any characters that are not encoded in a
valid UTF-8 format when it parses data sent from MongoDB to your application.

.. note::

The current version of the {+driver-short+} automatically substitutes
invalid UTF-8 characters with alternate valid UTF-8 ones before
validation when you send data to MongoDB. Therefore, the validation
This version of the {+driver-short+} automatically substitutes invalid
`lone surrogates <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters>`__
with the `replacement character <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toWellFormed>`__
before validation when you send data to MongoDB. Therefore, the validation
only throws an error when the setting is enabled and the driver
receives invalid UTF-8 document data from MongoDB.

Expand Down

0 comments on commit 2094a00

Please sign in to comment.