-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hotfix(mashape-analytics) improve ALF buffer under load #757
Conversation
As reported by #750, the buffer is vulnerable to a heavy load because of its `sending_queue` of batches pending for sending. It also does not handle the new 207 HTTP status code returned by the collector in case of invalid ALFs. Changes: - handle 207 HTTP status code by discarding the batch. Some ALFs in it will have been saved, and the invalid one(s) should not be retried. - implement a maximum size (in MB) for the sending_queue, as suggested by #750. Originally, I was about to implement such a size limitation by "number of batches pending", but it would not be intuitive for users to know what fits best their use case, because ALF sizes varies from one API to another, and one endpoint to another. By defining it in MB it is easier for users to chose a value. The default value of 10MB has been chosen after performing some benchmarking, and should handle from 300 to 500 req/s depending on the ALFs sizes. When the `sending_queue` has reached its limit, the current ALFs in the buffer will be **discarded**. - implement a retry policy. Instead of insisting on retrying to send batches when the collector cannot be reached, a delay is computed an exponentially increases on each failure to connect to the collector. That delay is shared by all workers. This avoids to load the collector when it is having difficulties and saves up bandwidth on Kong's side. As soon as the collector can be reached again, the delay is reset. Currently, the minimum retry delay is 1s and the maximum is 60s. Those values cannot be configured. - no more line jumps in logs printing responses from the collector.
0f9e375
to
87e15dd
Compare
local setmetatable = setmetatable | ||
|
||
local MB = 1024 * 1024 | ||
local MAX_BUFFER_SIZE = 1 * MB | ||
local MAX_BUFFER_SIZE = 500 * MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change from 1MB to 500MB correct? Seems awfully large. This is the maximum payload size that the socket server will accept in a single request, right? Given that max_queue_size
defaults to 10MB, and the queue holds multiple batches, I would expect this value to be < 10MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value is correct. This constant will be renamed, and its actual purpose is to mirror a constant present in the Galileo collector. It is the maximum payload size accepted by an HTTP request by the Galileo collector. It used to be 1MB, and is now 500. I believe this value should mirror the Galileo restrictions in the case someone really knows what he is doing and wish to send extremely large batches of ALFs (possibly containing request bodies that are very large). This value is hard coded and cannot be bypassed. Any ALF beyond that limit will be dropped as it is too big by itself, and if a batch reaches that limit, it will be put in the sending_queue directly, wether it has reached the configured batch_size
or not.
batch_size
(n of ALFs) and sending_queue_size
(Mbs) are the value one should tweak to enforce a low memory footprint. I could see a third value, being the batch_size
but in terms of bytes too, being useful. In case one's ALFs are much much bigger than he intended to. It would make the plugin more complicated to understand and configure (it already does with the sending_queue_size
so I am not sure about that yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cb372 yes it's very large, but it isn't meant as a way to reject very large batches, it's mostly just a failsafe in case a client gets stuck and forgets to close the tcp socket or something like that. The collector handles it just fine even with extremely large batches. If it reaches a size that big without being a bad socket, it's often due to huge bodies and we don't want to drop those.
### Summary #### 2.6.0 ``` Release 2.6.0 Tue February 6 2024 Security fixes: #789 #814 CVE-2023-52425 -- Fix quadratic runtime issues with big tokens that can cause denial of service, in partial where dealing with compressed XML input. Applications that parsed a document in one go -- a single call to functions XML_Parse or XML_ParseBuffer -- were not affected. The smaller the chunks/buffers you use for parsing previously, the bigger the problem prior to the fix. Backporters should be careful to no omit parts of pull request #789 and to include earlier pull request #771, in order to not break the fix. #777 CVE-2023-52426 -- Fix billion laughs attacks for users compiling *without* XML_DTD defined (which is not common). Users with XML_DTD defined have been protected since Expat >=2.4.0 (and that was CVE-2013-0340 back then). Bug fixes: #753 Fix parse-size-dependent "invalid token" error for external entities that start with a byte order mark #780 Fix NULL pointer dereference in setContext via XML_ExternalEntityParserCreate for compilation with XML_DTD undefined #812 #813 Protect against closing entities out of order Other changes: #723 Improve support for arc4random/arc4random_buf #771 #788 Improve buffer growth in XML_GetBuffer and XML_Parse #761 #770 xmlwf: Support --help and --version #759 #770 xmlwf: Support custom buffer size for XML_GetBuffer and read #744 xmlwf: Improve language and URL clickability in help output #673 examples: Add new example "element_declarations.c" #764 Be stricter about macro XML_CONTEXT_BYTES at build time #765 Make inclusion to expat_config.h consistent #726 #727 Autotools: configure.ac: Support --disable-maintainer-mode #678 #705 .. #706 #733 #792 Autotools: Sync CMake templates with CMake 3.26 #795 Autotools: Make installation of shipped man page doc/xmlwf.1 independent of docbook2man availability #815 Autotools|CMake: Add missing -DXML_STATIC to pkg-config file section "Cflags.private" in order to fix compilation against static libexpat using pkg-config on Windows #724 #751 Autotools|CMake: Require a C99 compiler (a de-facto requirement already since Expat 2.2.2 of 2017) #793 Autotools|CMake: Fix PACKAGE_BUGREPORT variable #750 #786 Autotools|CMake: Make test suite require a C++11 compiler #749 CMake: Require CMake >=3.5.0 #672 CMake: Lowercase off_t and size_t to help a bug in Meson #746 CMake: Sort xmlwf sources alphabetically #785 CMake|Windows: Fix generation of DLL file version info #790 CMake: Build tests/benchmark/benchmark.c as well for a build with -DEXPAT_BUILD_TESTS=ON #745 #757 docs: Document the importance of isFinal + adjust tests accordingly #736 docs: Improve use of "NULL" and "null" #713 docs: Be specific about version of XML (XML 1.0r4) and version of C (C99); (XML 1.0r5 will need a sponsor.) #762 docs: reference.html: Promote function XML_ParseBuffer more #779 docs: reference.html: Add HTML anchors to XML_* macros #760 docs: reference.html: Upgrade to OK.css 1.2.0 #763 #739 docs: Fix typos #696 docs|CI: Use HTTPS URLs instead of HTTP at various places #669 #670 .. #692 #703 .. #733 #772 Address compiler warnings #798 #800 Address clang-tidy warnings #775 #776 Version info bumped from 9:10:8 (libexpat*.so.1.8.10) to 10:0:9 (libexpat*.so.1.9.0); see https://verbump.de/ for what these numbers do Infrastructure: #700 #701 docs: Document security policy in file SECURITY.md #766 docs: Improve parse buffer variables in-code documentation #674 #738 .. #740 #747 .. #748 #781 #782 Refactor coverage and conformance tests #714 #716 Refactor debug level variables to unsigned long #671 Improve handling of empty environment variable value in function getDebugLevel (without visible user effect) #755 #774 .. #758 #783 .. #784 #787 tests: Improve test coverage with regard to parse chunk size #660 #797 #801 Fuzzing: Improve fuzzing coverage #367 #799 Fuzzing|CI: Start running OSS-Fuzz fuzzing regression tests #698 #721 CI: Resolve some Travis CI leftovers #669 CI: Be robust towards absence of Git tags #693 #694 CI: Set permissions to "contents: read" for security #709 CI: Pin all GitHub Actions to specific commits for security #739 CI: Reject spelling errors using codespell #798 CI: Enforce clang-tidy clean code #773 #808 .. #809 #810 CI: Upgrade Clang from 15 to 18 #796 CI: Start using Clang's Control Flow Integrity sanitizer #675 #720 #722 CI: Adapt to breaking changes in GitHub Actions Ubuntu images #689 CI: Adapt to breaking changes in Clang/LLVM Debian packaging #763 CI: Adapt to breaking changes in codespell #803 CI: Adapt to breaking changes in Cppcheck Special thanks to: Ivan Galkin Joyce Brum Philippe Antoine Rhodri James Snild Dolkow spookyahell Steven Garske and Clang AddressSanitizer Clang UndefinedBehaviorSanitizer codespell GCC Farm Project OSS-Fuzz Sony Mobile ``` #### 2.6.1 ``` Release 2.6.1 Thu February 29 2024 Bug fixes: #817 Make tests independent of CPU speed, and thus more robust #828 #836 Expose billion laughs API with XML_DTD defined and XML_GE undefined, regression from 2.6.0 Other changes: #829 Hide test-only code behind new internal macro #833 Autotools: Reject expat_config.h.in defining SIZEOF_VOID_P #819 Address compiler warnings #832 #834 Version info bumped from 10:0:9 (libexpat*.so.1.9.0) to 10:1:9 (libexpat*.so.1.9.1); see https://verbump.de/ for what these numbers do Infrastructure: #818 CI: Adapt to breaking changes in clang-format Special thanks to: David Hall Snild Dolkow ``` #### 2.6.2 ``` Release 2.6.2 Wed March 13 2024 Security fixes: #839 #842 CVE-2024-28757 -- Prevent billion laughs attacks with isolated use of external parsers. Please see the commit message of commit 1d50b80cf31de87750103656f6eb693746854aa8 for details. Bug fixes: #839 #841 Reject direct parameter entity recursion and avoid the related undefined behavior Other changes: #847 Autotools: Fix build for DOCBOOK_TO_MAN containing spaces #837 Add missing #821 and #824 to 2.6.1 change log #838 #843 Version info bumped from 10:1:9 (libexpat*.so.1.9.1) to 10:2:9 (libexpat*.so.1.9.2); see https://verbump.de/ for what these numbers do Special thanks to: Philippe Antoine Tomas Korbar and Clang UndefinedBehaviorSanitizer OSS-Fuzz / ClusterFuzz ``` Signed-off-by: Aapo Talvensaari <[email protected]>
### Summary #### 2.6.0 ``` Release 2.6.0 Tue February 6 2024 Security fixes: #789 #814 CVE-2023-52425 -- Fix quadratic runtime issues with big tokens that can cause denial of service, in partial where dealing with compressed XML input. Applications that parsed a document in one go -- a single call to functions XML_Parse or XML_ParseBuffer -- were not affected. The smaller the chunks/buffers you use for parsing previously, the bigger the problem prior to the fix. Backporters should be careful to no omit parts of pull request #789 and to include earlier pull request #771, in order to not break the fix. #777 CVE-2023-52426 -- Fix billion laughs attacks for users compiling *without* XML_DTD defined (which is not common). Users with XML_DTD defined have been protected since Expat >=2.4.0 (and that was CVE-2013-0340 back then). Bug fixes: #753 Fix parse-size-dependent "invalid token" error for external entities that start with a byte order mark #780 Fix NULL pointer dereference in setContext via XML_ExternalEntityParserCreate for compilation with XML_DTD undefined #812 #813 Protect against closing entities out of order Other changes: #723 Improve support for arc4random/arc4random_buf #771 #788 Improve buffer growth in XML_GetBuffer and XML_Parse #761 #770 xmlwf: Support --help and --version #759 #770 xmlwf: Support custom buffer size for XML_GetBuffer and read #744 xmlwf: Improve language and URL clickability in help output #673 examples: Add new example "element_declarations.c" #764 Be stricter about macro XML_CONTEXT_BYTES at build time #765 Make inclusion to expat_config.h consistent #726 #727 Autotools: configure.ac: Support --disable-maintainer-mode #678 #705 .. #706 #733 #792 Autotools: Sync CMake templates with CMake 3.26 #795 Autotools: Make installation of shipped man page doc/xmlwf.1 independent of docbook2man availability #815 Autotools|CMake: Add missing -DXML_STATIC to pkg-config file section "Cflags.private" in order to fix compilation against static libexpat using pkg-config on Windows #724 #751 Autotools|CMake: Require a C99 compiler (a de-facto requirement already since Expat 2.2.2 of 2017) #793 Autotools|CMake: Fix PACKAGE_BUGREPORT variable #750 #786 Autotools|CMake: Make test suite require a C++11 compiler #749 CMake: Require CMake >=3.5.0 #672 CMake: Lowercase off_t and size_t to help a bug in Meson #746 CMake: Sort xmlwf sources alphabetically #785 CMake|Windows: Fix generation of DLL file version info #790 CMake: Build tests/benchmark/benchmark.c as well for a build with -DEXPAT_BUILD_TESTS=ON #745 #757 docs: Document the importance of isFinal + adjust tests accordingly #736 docs: Improve use of "NULL" and "null" #713 docs: Be specific about version of XML (XML 1.0r4) and version of C (C99); (XML 1.0r5 will need a sponsor.) #762 docs: reference.html: Promote function XML_ParseBuffer more #779 docs: reference.html: Add HTML anchors to XML_* macros #760 docs: reference.html: Upgrade to OK.css 1.2.0 #763 #739 docs: Fix typos #696 docs|CI: Use HTTPS URLs instead of HTTP at various places #669 #670 .. #692 #703 .. #733 #772 Address compiler warnings #798 #800 Address clang-tidy warnings #775 #776 Version info bumped from 9:10:8 (libexpat*.so.1.8.10) to 10:0:9 (libexpat*.so.1.9.0); see https://verbump.de/ for what these numbers do Infrastructure: #700 #701 docs: Document security policy in file SECURITY.md #766 docs: Improve parse buffer variables in-code documentation #674 #738 .. #740 #747 .. #748 #781 #782 Refactor coverage and conformance tests #714 #716 Refactor debug level variables to unsigned long #671 Improve handling of empty environment variable value in function getDebugLevel (without visible user effect) #755 #774 .. #758 #783 .. #784 #787 tests: Improve test coverage with regard to parse chunk size #660 #797 #801 Fuzzing: Improve fuzzing coverage #367 #799 Fuzzing|CI: Start running OSS-Fuzz fuzzing regression tests #698 #721 CI: Resolve some Travis CI leftovers #669 CI: Be robust towards absence of Git tags #693 #694 CI: Set permissions to "contents: read" for security #709 CI: Pin all GitHub Actions to specific commits for security #739 CI: Reject spelling errors using codespell #798 CI: Enforce clang-tidy clean code #773 #808 .. #809 #810 CI: Upgrade Clang from 15 to 18 #796 CI: Start using Clang's Control Flow Integrity sanitizer #675 #720 #722 CI: Adapt to breaking changes in GitHub Actions Ubuntu images #689 CI: Adapt to breaking changes in Clang/LLVM Debian packaging #763 CI: Adapt to breaking changes in codespell #803 CI: Adapt to breaking changes in Cppcheck Special thanks to: Ivan Galkin Joyce Brum Philippe Antoine Rhodri James Snild Dolkow spookyahell Steven Garske and Clang AddressSanitizer Clang UndefinedBehaviorSanitizer codespell GCC Farm Project OSS-Fuzz Sony Mobile ``` #### 2.6.1 ``` Release 2.6.1 Thu February 29 2024 Bug fixes: #817 Make tests independent of CPU speed, and thus more robust #828 #836 Expose billion laughs API with XML_DTD defined and XML_GE undefined, regression from 2.6.0 Other changes: #829 Hide test-only code behind new internal macro #833 Autotools: Reject expat_config.h.in defining SIZEOF_VOID_P #819 Address compiler warnings #832 #834 Version info bumped from 10:0:9 (libexpat*.so.1.9.0) to 10:1:9 (libexpat*.so.1.9.1); see https://verbump.de/ for what these numbers do Infrastructure: #818 CI: Adapt to breaking changes in clang-format Special thanks to: David Hall Snild Dolkow ``` #### 2.6.2 ``` Release 2.6.2 Wed March 13 2024 Security fixes: #839 #842 CVE-2024-28757 -- Prevent billion laughs attacks with isolated use of external parsers. Please see the commit message of commit 1d50b80cf31de87750103656f6eb693746854aa8 for details. Bug fixes: #839 #841 Reject direct parameter entity recursion and avoid the related undefined behavior Other changes: #847 Autotools: Fix build for DOCBOOK_TO_MAN containing spaces #837 Add missing #821 and #824 to 2.6.1 change log #838 #843 Version info bumped from 10:1:9 (libexpat*.so.1.9.1) to 10:2:9 (libexpat*.so.1.9.2); see https://verbump.de/ for what these numbers do Special thanks to: Philippe Antoine Tomas Korbar and Clang UndefinedBehaviorSanitizer OSS-Fuzz / ClusterFuzz ``` Signed-off-by: Aapo Talvensaari <[email protected]>
``` Release 2.6.0 Tue February 6 2024 Security fixes: #789 #814 CVE-2023-52425 -- Fix quadratic runtime issues with big tokens that can cause denial of service, in partial where dealing with compressed XML input. Applications that parsed a document in one go -- a single call to functions XML_Parse or XML_ParseBuffer -- were not affected. The smaller the chunks/buffers you use for parsing previously, the bigger the problem prior to the fix. Backporters should be careful to no omit parts of pull request #789 and to include earlier pull request #771, in order to not break the fix. #777 CVE-2023-52426 -- Fix billion laughs attacks for users compiling *without* XML_DTD defined (which is not common). Users with XML_DTD defined have been protected since Expat >=2.4.0 (and that was CVE-2013-0340 back then). Bug fixes: #753 Fix parse-size-dependent "invalid token" error for external entities that start with a byte order mark #780 Fix NULL pointer dereference in setContext via XML_ExternalEntityParserCreate for compilation with XML_DTD undefined #812 #813 Protect against closing entities out of order Other changes: #723 Improve support for arc4random/arc4random_buf #771 #788 Improve buffer growth in XML_GetBuffer and XML_Parse #761 #770 xmlwf: Support --help and --version #759 #770 xmlwf: Support custom buffer size for XML_GetBuffer and read #744 xmlwf: Improve language and URL clickability in help output #673 examples: Add new example "element_declarations.c" #764 Be stricter about macro XML_CONTEXT_BYTES at build time #765 Make inclusion to expat_config.h consistent #726 #727 Autotools: configure.ac: Support --disable-maintainer-mode #678 #705 .. #706 #733 #792 Autotools: Sync CMake templates with CMake 3.26 #795 Autotools: Make installation of shipped man page doc/xmlwf.1 independent of docbook2man availability #815 Autotools|CMake: Add missing -DXML_STATIC to pkg-config file section "Cflags.private" in order to fix compilation against static libexpat using pkg-config on Windows #724 #751 Autotools|CMake: Require a C99 compiler (a de-facto requirement already since Expat 2.2.2 of 2017) #793 Autotools|CMake: Fix PACKAGE_BUGREPORT variable #750 #786 Autotools|CMake: Make test suite require a C++11 compiler #749 CMake: Require CMake >=3.5.0 #672 CMake: Lowercase off_t and size_t to help a bug in Meson #746 CMake: Sort xmlwf sources alphabetically #785 CMake|Windows: Fix generation of DLL file version info #790 CMake: Build tests/benchmark/benchmark.c as well for a build with -DEXPAT_BUILD_TESTS=ON #745 #757 docs: Document the importance of isFinal + adjust tests accordingly #736 docs: Improve use of "NULL" and "null" #713 docs: Be specific about version of XML (XML 1.0r4) and version of C (C99); (XML 1.0r5 will need a sponsor.) #762 docs: reference.html: Promote function XML_ParseBuffer more #779 docs: reference.html: Add HTML anchors to XML_* macros #760 docs: reference.html: Upgrade to OK.css 1.2.0 #763 #739 docs: Fix typos #696 docs|CI: Use HTTPS URLs instead of HTTP at various places #669 #670 .. #692 #703 .. #733 #772 Address compiler warnings #798 #800 Address clang-tidy warnings #775 #776 Version info bumped from 9:10:8 (libexpat*.so.1.8.10) to 10:0:9 (libexpat*.so.1.9.0); see https://verbump.de/ for what these numbers do Infrastructure: #700 #701 docs: Document security policy in file SECURITY.md #766 docs: Improve parse buffer variables in-code documentation #674 #738 .. #740 #747 .. #748 #781 #782 Refactor coverage and conformance tests #714 #716 Refactor debug level variables to unsigned long #671 Improve handling of empty environment variable value in function getDebugLevel (without visible user effect) #755 #774 .. #758 #783 .. #784 #787 tests: Improve test coverage with regard to parse chunk size #660 #797 #801 Fuzzing: Improve fuzzing coverage #367 #799 Fuzzing|CI: Start running OSS-Fuzz fuzzing regression tests #698 #721 CI: Resolve some Travis CI leftovers #669 CI: Be robust towards absence of Git tags #693 #694 CI: Set permissions to "contents: read" for security #709 CI: Pin all GitHub Actions to specific commits for security #739 CI: Reject spelling errors using codespell #798 CI: Enforce clang-tidy clean code #773 #808 .. #809 #810 CI: Upgrade Clang from 15 to 18 #796 CI: Start using Clang's Control Flow Integrity sanitizer #675 #720 #722 CI: Adapt to breaking changes in GitHub Actions Ubuntu images #689 CI: Adapt to breaking changes in Clang/LLVM Debian packaging #763 CI: Adapt to breaking changes in codespell #803 CI: Adapt to breaking changes in Cppcheck Special thanks to: Ivan Galkin Joyce Brum Philippe Antoine Rhodri James Snild Dolkow spookyahell Steven Garske and Clang AddressSanitizer Clang UndefinedBehaviorSanitizer codespell GCC Farm Project OSS-Fuzz Sony Mobile ``` ``` Release 2.6.1 Thu February 29 2024 Bug fixes: #817 Make tests independent of CPU speed, and thus more robust #828 #836 Expose billion laughs API with XML_DTD defined and XML_GE undefined, regression from 2.6.0 Other changes: #829 Hide test-only code behind new internal macro #833 Autotools: Reject expat_config.h.in defining SIZEOF_VOID_P #819 Address compiler warnings #832 #834 Version info bumped from 10:0:9 (libexpat*.so.1.9.0) to 10:1:9 (libexpat*.so.1.9.1); see https://verbump.de/ for what these numbers do Infrastructure: #818 CI: Adapt to breaking changes in clang-format Special thanks to: David Hall Snild Dolkow ``` ``` Release 2.6.2 Wed March 13 2024 Security fixes: #839 #842 CVE-2024-28757 -- Prevent billion laughs attacks with isolated use of external parsers. Please see the commit message of commit 1d50b80cf31de87750103656f6eb693746854aa8 for details. Bug fixes: #839 #841 Reject direct parameter entity recursion and avoid the related undefined behavior Other changes: #847 Autotools: Fix build for DOCBOOK_TO_MAN containing spaces #837 Add missing #821 and #824 to 2.6.1 change log #838 #843 Version info bumped from 10:1:9 (libexpat*.so.1.9.1) to 10:2:9 (libexpat*.so.1.9.2); see https://verbump.de/ for what these numbers do Special thanks to: Philippe Antoine Tomas Korbar and Clang UndefinedBehaviorSanitizer OSS-Fuzz / ClusterFuzz ``` KAG-4331 Signed-off-by: Aapo Talvensaari <[email protected]>
As reported by #750, the buffer is vulnerable to a heavy load because of
its
sending_queue
of batches pending for sending which might grow too big and eventually crash the Lua VM. It also does nothandle the new 207 HTTP status code returned by the collector in case of
invalid ALFs.
Changes:
will have been saved, and the invalid one(s) should not be retried.
by Lua VM crashed, reason: not enough memory #750. Originally, I was about to implement such a size limitation
by "number of batches pending", but it would not be intuitive for
users to know what fits best their use case, because ALF sizes varies
from one API to another, and one endpoint to another. By defining it
in MB it is easier for users to chose a value. The default value of
10MB has been chosen after performing some benchmarking, and should
handle from 300 to 500 req/s depending on the ALFs sizes.
When the
sending_queue
has reached its limit, the current ALFs inthe buffer will be discarded.
when the collector cannot be reached, a delay is computed and
exponentially increases on each failure to connect to the collector.
That delay is shared by all workers. This avoids to load the collector
when it is having difficulties and saves up bandwidth on Kong's side.
As soon as the collector can be reached again, the delay is reset.
Currently, the minimum retry delay is 1s and the maximum is 60s. Those
values cannot be configured.
This still needs testing and add a few more tests.