Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CDRIVER-4448] Fix interaction of bson_visit with bson_validate #1090

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/libbson/doc/bson_iter_t.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ The :symbol:`bson_t` *MUST* be valid for the lifetime of the iter and it is an e
bson_iter_utf8
bson_iter_value
bson_iter_visit_all
bson_iter_visit_all_v2

Examples
--------
Expand Down
6 changes: 5 additions & 1 deletion src/libbson/doc/bson_iter_visit_all.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,11 @@ Parameters
Description
-----------

A convenience function to iterate all remaining fields of ``iter`` using the callback vtable provided by ``visitor``.
A convenience function to iterate all remaining fields of ``iter`` using the
callback vtable provided by ``visitor``.

This function is equivalent to :symbol:`bson_iter_visit_all_v2` with
``BSON_ITER_VISIT_DEFAULT`` given for its ``flags`` argument.

Returns
-------
Expand Down
118 changes: 118 additions & 0 deletions src/libbson/doc/bson_iter_visit_all_v2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
:man_page: bson_iter_visit_all_v2

bson_iter_visit_all_v2()
========================

Synopsis
--------

.. code-block:: c

enum bson_iter_visit_flags {
BSON_ITER_VISIT_NOFLAGS = 0,
BSON_ITER_VISIT_VALIDATE_KEYS = 1 << 0,
BSON_ITER_VISIT_VALIDATE_UTF8 = 1 << 1,
BSON_ITER_VISIT_VALIDATE_REGEX = 1 << 2,
BSON_ITER_VISIT_VALIDATE_CODE = 1 << 3,
BSON_ITER_VISIT_VALIDATE_DBPOINTER = 1 << 4,
BSON_ITER_VISIT_VALIDATE_SYMBOL = 1 << 5,
BSON_ITER_VISIT_ALLOW_NUL_IN_UTF8 = 1 << 6,
BSON_ITER_VISIT_VALIDATE_STRINGS =
BSON_ITER_VISIT_VALIDATE_KEYS | BSON_ITER_VISIT_VALIDATE_UTF8 |
BSON_ITER_VISIT_VALIDATE_REGEX | BSON_ITER_VISIT_VALIDATE_CODE |
BSON_ITER_VISIT_VALIDATE_DBPOINTER | BSON_ITER_VISIT_VALIDATE_SYMBOL,
BSON_ITER_VISIT_VALIDATE_VALUES =
BSON_ITER_VISIT_VALIDATE_UTF8 | BSON_ITER_VISIT_VALIDATE_REGEX |
BSON_ITER_VISIT_VALIDATE_CODE | BSON_ITER_VISIT_VALIDATE_DBPOINTER |
BSON_ITER_VISIT_VALIDATE_SYMBOL,
BSON_ITER_VISIT_DEFAULT = BSON_ITER_VISIT_VALIDATE_KEYS |
BSON_ITER_VISIT_VALIDATE_VALUES |
BSON_ITER_VISIT_ALLOW_NUL_IN_UTF8,
};

bool
bson_iter_visit_all_v2 (bson_iter_t *iter,
const bson_visitor_t *visitor,
const bson_iter_visit_flags flags.
void *data);

Parameters
----------

* ``iter``: A :symbol:`bson_iter_t`.
* ``visitor``: A :symbol:`bson_visitor_t`.
* ``flags``: A set of bit flags from ``bson_iter_visit_flags`` to control
visitation (See below).
* ``data``: Optional data for ``visitor``.

Description
-----------

A convenience function to iterate all remaining fields of ``iter`` using the
callback vtable provided by ``visitor``.

This function is an extension of :symbol:`bson_iter_visit_all`, with the
additional ``flags`` parameter that can be used to control the behavior of the
visit iteration. The following flags are defined:

``BSON_ITER_VISIT_NOFLAGS``

None of the behaviors requested by any other options are used.

``BSON_ITER_VISIT_VALIDATE_KEYS``

Validate that element keys are valid UTF-8-encoded strings.

``BSON_ITER_VISIT_VALIDATE_UTF8``

Validate that UTF-8 text elements are valid UTF-8-encoded strings.

``BSON_ITER_VISIT_VALIDATE_REGEX``

Validate regular expression elements to contain valid UTF-8 encoded strings.

``BSON_ITER_VISIT_VALIDATE_CODE``

Validate code elements that the code component is a valid UTF-8-encoded
string.

``BSON_ITER_VISIT_VALIDATE_DBPOINTER``

Validate that DBPointer element names are valid UTF-8-encoded strings.

``BSON_ITER_VISIT_VALIDATE_SYMBOL``

Validate that symbol elements are valid UTF-8-encoded strings.

``BSON_ITER_VISIT_ALLOW_NUL_IN_UTF8``

When validating any UTF-8 strings, permit a zero ``0x00`` code unit as a valid
UTF-8 code unit.

``BSON_ITER_VISIT_VALIDATE_STRINGS``

Validate all strings in all components in top-level elements of the document.

``BSON_ITER_VISIT_VALIDATE_VALUES``

Validate all values in all top-level elements of the document.

``BSON_ITER_VISIT_DEFAULT``

Validate all keys and values. Permits a zero ``0x00`` UTF-8 code unit in
strings.

This flag has the same behavior as :symbol:`bson_iter_visit_all`.

.. note::

If a requested element fails validation, the ``bson_iter_visit_all_v2`` call
will return and indicate the position of the erring element via ``iter``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
will return and indicate the position of the erring element via ``iter``.
will return and indicate the position of the erring element via ``iter->err_off``.


Returns
-------

Returns true if visitation was prematurely stopped by a callback function. Returns false either because all elements were visited *or* due to corrupt BSON.

See :symbol:`bson_visitor_t` for examples of how to set your own callbacks to provide information about the location of corrupt or unsupported BSON document entries.

71 changes: 47 additions & 24 deletions src/libbson/src/bson/bson-iter.c
Original file line number Diff line number Diff line change
Expand Up @@ -1888,23 +1888,34 @@ bson_iter_array (const bson_iter_t *iter, /* IN */
#define VISIT_MAXKEY VISIT_FIELD (maxkey)
#define VISIT_MINKEY VISIT_FIELD (minkey)

bool
bson_iter_visit_all (bson_iter_t *iter,
const bson_visitor_t *visitor,
void *data)
{
return bson_iter_visit_all_v2 (iter, visitor, BSON_ITER_VISIT_DEFAULT, data);
}

bool
bson_iter_visit_all (bson_iter_t *iter, /* INOUT */
const bson_visitor_t *visitor, /* IN */
void *data) /* IN */
bson_iter_visit_all_v2 (bson_iter_t *iter,
const bson_visitor_t *visitor,
enum bson_iter_visit_flags flags,
void *data)
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider also checking the return values of bson_init_static for the BSON_TYPE_ARRAY and BSON_TYPE_DOCUMENT cases, like the following:

         if (!bson_init_static (&b, docbuf, doclen)) {
            iter->err_off = iter->off;
            break;
         }
         if (VISIT_DOCUMENT (iter, key, &b, data)) {
            return true;
         }

Otherwise, an invalid BSON document or array length may be ignored.

uint32_t bson_type = 0;
const char *key = NULL;
bool unsupported;
const bool allow_nul = flags & BSON_ITER_VISIT_ALLOW_NUL_IN_UTF8;

BSON_ASSERT (iter);
BSON_ASSERT (visitor);

while (_bson_iter_next_internal (iter, 0, &key, &bson_type, &unsupported)) {
if (*key && !bson_utf8_validate (key, strlen (key), false)) {
iter->err_off = iter->off;
break;
if (flags & BSON_ITER_VISIT_VALIDATE_KEYS) {
if (*key && !bson_utf8_validate (key, strlen (key), false)) {
iter->err_off = iter->off;
break;
}
}

if (VISIT_BEFORE (iter, key, data)) {
Expand All @@ -1925,9 +1936,11 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

utf8 = bson_iter_utf8 (iter, &utf8_len);

if (!bson_utf8_validate (utf8, utf8_len, true)) {
iter->err_off = iter->off;
return true;
if (flags & BSON_ITER_VISIT_VALIDATE_UTF8) {
if (!bson_utf8_validate (utf8, utf8_len, allow_nul)) {
iter->err_off = iter->off;
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return true;
break;

If UTF-8 validation fails, I expect false to be returned (indicating an error) and the visit_corrupt function to be called.

Suggest changing this, and other cases below.

}
}

if (VISIT_UTF8 (iter, key, utf8_len, utf8, data)) {
Expand Down Expand Up @@ -2009,9 +2022,11 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */
const char *options = NULL;
regex = bson_iter_regex (iter, &options);

if (!bson_utf8_validate (regex, strlen (regex), true)) {
iter->err_off = iter->off;
return true;
if (flags & BSON_ITER_VISIT_VALIDATE_REGEX) {
if (!bson_utf8_validate (regex, strlen (regex), allow_nul)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!bson_utf8_validate (regex, strlen (regex), allow_nul)) {
if (!bson_utf8_validate (regex, strlen (regex), allow_nul) || !bson_utf8_validate (options, strlen (options), allow_nul)) {

iter->err_off = iter->off;
return true;
}
}

if (VISIT_REGEX (iter, key, regex, options, data)) {
Expand All @@ -2025,9 +2040,11 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

bson_iter_dbpointer (iter, &collection_len, &collection, &oid);

if (!bson_utf8_validate (collection, collection_len, true)) {
iter->err_off = iter->off;
return true;
if (flags & BSON_ITER_VISIT_VALIDATE_DBPOINTER) {
if (!bson_utf8_validate (collection, collection_len, allow_nul)) {
iter->err_off = iter->off;
return true;
}
}

if (VISIT_DBPOINTER (
Expand All @@ -2041,9 +2058,11 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

code = bson_iter_code (iter, &code_len);

if (!bson_utf8_validate (code, code_len, true)) {
iter->err_off = iter->off;
return true;
if (flags & BSON_ITER_VISIT_VALIDATE_CODE) {
if (!bson_utf8_validate (code, code_len, allow_nul)) {
iter->err_off = iter->off;
return true;
}
}

if (VISIT_CODE (iter, key, code_len, code, data)) {
Expand All @@ -2056,9 +2075,11 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

symbol = bson_iter_symbol (iter, &symbol_len);

if (!bson_utf8_validate (symbol, symbol_len, true)) {
iter->err_off = iter->off;
return true;
if (flags & BSON_ITER_VISIT_VALIDATE_SYMBOL) {
if (!bson_utf8_validate (symbol, symbol_len, allow_nul)) {
iter->err_off = iter->off;
return true;
}
}

if (VISIT_SYMBOL (iter, key, symbol_len, symbol, data)) {
Expand All @@ -2074,9 +2095,11 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */

code = bson_iter_codewscope (iter, &length, &doclen, &docbuf);

if (!bson_utf8_validate (code, length, true)) {
iter->err_off = iter->off;
return true;
if (flags & BSON_ITER_VISIT_VALIDATE_CODE) {
if (!bson_utf8_validate (code, length, allow_nul)) {
iter->err_off = iter->off;
return true;
}
}

if (bson_init_static (&b, docbuf, doclen) &&
Expand Down
45 changes: 44 additions & 1 deletion src/libbson/src/bson/bson-iter.h
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,8 @@ bson_iter_overwrite_double (bson_iter_t *iter, double value);


BSON_EXPORT (void)
bson_iter_overwrite_decimal128 (bson_iter_t *iter, const bson_decimal128_t *value);
bson_iter_overwrite_decimal128 (bson_iter_t *iter,
const bson_decimal128_t *value);


BSON_EXPORT (void)
Expand All @@ -537,6 +538,48 @@ bson_iter_visit_all (bson_iter_t *iter,
const bson_visitor_t *visitor,
void *data);

/**
* @brief Flag options to control bson_iter_visit APIs
*/
enum bson_iter_visit_flags {
BSON_ITER_VISIT_NOFLAGS = 0,
/// Validate element keys to be valid UTF-8
BSON_ITER_VISIT_VALIDATE_KEYS = 1 << 0,
/// Validate UTF-8 elements to be valid UTF-8
BSON_ITER_VISIT_VALIDATE_UTF8 = 1 << 1,
/// Validate regular expression elements to contain valid UTF-8
BSON_ITER_VISIT_VALIDATE_REGEX = 1 << 2,
/// Validate code and code_w_scope elements to contain valid UTF-8
BSON_ITER_VISIT_VALIDATE_CODE = 1 << 3,
/// Validate dbPointer elements to contain valid UTF-8
BSON_ITER_VISIT_VALIDATE_DBPOINTER = 1 << 4,
/// Validate symbol elements to contain valid UTF-8
BSON_ITER_VISIT_VALIDATE_SYMBOL = 1 << 5,
/// When checking UTF-8 strings, allow a NUL as a valid codepoint
BSON_ITER_VISIT_ALLOW_NUL_IN_UTF8 = 1 << 6,
/// Validate all strings in the visited BSON to be valid UTF-8 strings
BSON_ITER_VISIT_VALIDATE_STRINGS =
BSON_ITER_VISIT_VALIDATE_KEYS | BSON_ITER_VISIT_VALIDATE_UTF8 |
BSON_ITER_VISIT_VALIDATE_REGEX | BSON_ITER_VISIT_VALIDATE_CODE |
BSON_ITER_VISIT_VALIDATE_DBPOINTER | BSON_ITER_VISIT_VALIDATE_SYMBOL,
/// Validate all element values to be valid (does not validate keys)
BSON_ITER_VISIT_VALIDATE_VALUES =
BSON_ITER_VISIT_VALIDATE_UTF8 | BSON_ITER_VISIT_VALIDATE_REGEX |
BSON_ITER_VISIT_VALIDATE_CODE | BSON_ITER_VISIT_VALIDATE_DBPOINTER |
BSON_ITER_VISIT_VALIDATE_SYMBOL,
/// The default option for bson_iter_visit_all_v2 (validates all keys and
/// values, allows NUL in UTF-8 strings)
BSON_ITER_VISIT_DEFAULT = BSON_ITER_VISIT_VALIDATE_KEYS |
BSON_ITER_VISIT_VALIDATE_VALUES |
BSON_ITER_VISIT_ALLOW_NUL_IN_UTF8,
};

BSON_EXPORT (bool)
bson_iter_visit_all_v2 (bson_iter_t *iter,
const bson_visitor_t *visitor,
enum bson_iter_visit_flags flags,
void *data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider an alternative API with options:

typedef struct _bson_visitor_opts_t bson_visitor_opts_t;

BSON_EXPORT (bson_visitor_opts_t *)
bson_visitor_opts_new (bson_iter_visit_flags flags);

BSON_EXPORT (void)
bson_visitor_opts_destroy (bson_visitor_opts_t *opts);

BSON_EXPORT (bool)
bson_iter_visit_all_with_opts (bson_iter_t *iter,
                               const bson_visitor_t *visitor,
                               const bson_visitor_opts_t *opts,
                               void *data);

The with_opts name is consistent with other cases where functions required extension. For example: bson_as_json was extended with options as bson_as_json_with_opts.

An options struct allows extension without creating a new function.


BSON_EXPORT (uint32_t)
bson_iter_offset (bson_iter_t *iter);

Expand Down
10 changes: 9 additions & 1 deletion src/libbson/src/bson/bson.c
Original file line number Diff line number Diff line change
Expand Up @@ -3584,7 +3584,15 @@ _bson_iter_validate_document (const bson_iter_t *iter,
state->phase = BSON_VALIDATE_PHASE_LF_REF_KEY;
}

(void) bson_iter_visit_all (&child, &bson_validate_funcs, state);
bson_iter_visit_all_v2 (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the result of this call be part of the condition below? e.g.

   if (!bson_iter_visit_all_v2 (
          &child, &bson_validate_funcs, BSON_ITER_VISIT_NOFLAGS, state) &&
       child.err_off > 0 && state->err_offset < 0) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By careful reading of the source, the return value from this function is nearly meaningless, because it can return true or false both for success and for failure. It's just quite unfortunate that we're trying to cram too much information into a single bool.

&child, &bson_validate_funcs, BSON_ITER_VISIT_VALIDATE_KEYS, state);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If state.flags has BSON_VALIDATE_UTF8, should BSON_ITER_VISIT_VALIDATE_STRINGS be passed instead?

Same question for BSON_ITER_VISIT_ALLOW_NUL_IN_UTF8.


// Assert: bson_iter_visit_all_v2 will never generate an error and not give
// that error to our visitor as a "corrupt" BSON element. This implies that
// if child.err_off is greater than zero, our own error offset MUST ALSO be
// greater than zero, because the state will have been told about that
// invalid element.
BSON_ASSERT (child.err_off <= 0 || state->err_offset > 0);

if (state->phase == BSON_VALIDATE_PHASE_LF_ID_KEY ||
state->phase == BSON_VALIDATE_PHASE_LF_REF_UTF8 ||
Expand Down
Binary file added src/libbson/tests/binary/invalid-key.bson
Binary file not shown.
Binary file added src/libbson/tests/binary/invalid-utf8.bson
Binary file not shown.
10 changes: 10 additions & 0 deletions src/libbson/tests/test-bson.c
Original file line number Diff line number Diff line change
Expand Up @@ -1198,6 +1198,16 @@ test_bson_validate (void)
9,
BSON_VALIDATE_NONE,
"corrupt BSON");
VALIDATE_TEST ("invalid-utf8.bson",
BSON_VALIDATE_UTF8,
4,
BSON_VALIDATE_UTF8,
"Invalid utf8 string for key \"a\"");
VALIDATE_TEST ("invalid-key.bson",
BSON_VALIDATE_UTF8,
4,
BSON_VALIDATE_NONE,
"corrupt BSON");
VALIDATE_TEST ("overflow4.bson",
BSON_VALIDATE_NONE,
9,
Expand Down