Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML: Use RAII wrappers instead of manual memory management #5767

Merged
merged 11 commits into from
Jul 30, 2024

Conversation

rschu1ze
Copy link
Contributor

@rschu1ze rschu1ze commented Jul 7, 2024

We (ClickHouse) use azure-sdk-for-cpp as a submodule to connect to Azure Blob storage.

Our comprehensive suite of tests is run under thread/undefined/memory/address sanitizers to detect issues before they hit production. One particular issue detected was a memory leak found and fixed in January 2023 which this PR upstreams.

Notes:

The problem was first addressed in azure 1.6 by these PRs: ClickHouse#4, ClickHouse#5, ClickHouse#6.

Every time azure-sdk-for-cpp was upgraded, the fixes have been picked into the current release. ClickHouse currently pulls in azure-sdk-for-cpp 1.12, also with the fix. This is to say that the fix has been part of our production code for a long while.

There was a previous attempt to upstream the fix (#4647) but 1. that other PR was sort of different from this fix and 2. the contributor unfortunately disappeared and the PR was abandoned.

I cannot determine which exact memory leak the fix originally addressed. The original fixes in ClickHouse's fork of azure-sdk-for-cpp only state that they fix a memory leak but they don't link to a defect report (sorry). I could spend some time doing code and Git archeology but it is probably not worth the trouble. The idea of this PR is to manage memory using RAII - that is generally a good idea as makes the code safer and less prone to errors. I could not find any obvious places where non-use of RAII would lead to errors.

The PR also makes setup/teardown functions XmlGlobalInitialize and XmlGlobalDeinitialize public. This is required for another fix that frees memory in TLS allocated by libxml when azure/libxml is used by multiple threads: https://github.com/ClickHouse/ClickHouse/blob/907a3659262e9029fd2c257913c62fadfc3bc433/programs/server/Server.cpp#L929 (see ClickHouse/ClickHouse#45796).

Two more notes:

  • For some weird reason, the XML wrapper code exists twice:

    • sdk/tables/azure-data-tables/inc/azure/data/tables/internal/xml_wrapper.hpp / sdk/tables/azure-data-tables/inc/azure/data/tables/internal/xml_wrapper.hpp
    • sdk/storage/azure-storage-common/inc/azure/storage/common/internal/xml_wrapper.hpp / sdk/storage/azure-storage-common/src/xml_wrapper.cpp

    This PR only addresses the second file(s). If you think we should rather apply the same fixes in both sets of files, please push another commit into this PR. My personal preference is to kill one of the copies (the other one, ideally) - feel free to do so.

  • I am a Linux user and did not check or compile the code in the AZ_PLATFORM_WINDOWS ifdef. Perhaps the Windows code is good but kindly double-check.

@github-actions github-actions bot added Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Storage Storage Service (Queues, Blobs, Files) labels Jul 7, 2024
Copy link

github-actions bot commented Jul 7, 2024

Thank you for your contribution @rschu1ze! We will review the pull request and get back to you soon.

@Jinming-Hu
Copy link
Member

@rschu1ze I'd like to confirm, you didn't observe any memory leak in the original implementation. This PR is just to adopt a good practice of memory management.

@rschu1ze
Copy link
Contributor Author

Sorry for the delay, I was on a business trip.

@rschu1ze I'd like to confirm, you didn't observe any memory leak in the original implementation. This PR is just to adopt a good practice of memory management.

I touched on this a bit in #5767 (comment). In retrospective, I can't tell which exact memory leak was addressed, but the title of ClickHouse#4 says "Fix memory leak", so I assume there was some memory leak existed somewhere. In any case, I would argue that using RAII for resource management is a good idea makes generally sense.

@rschu1ze
Copy link
Contributor Author

Thanks for approving. I am good but as I mentioned in the commit message, the XML wrapper class exists as copypasta in another place. We ideally get rid of the other copy or apply the fix in the other location.

@Jinming-Hu
Copy link
Member

@rschu1ze can you fix the CI failures? There are some format issues.

cc @gearama for the copypasta in table package.

@rschu1ze

This comment was marked as resolved.

@Jinming-Hu
Copy link
Member

Hi @rschu1ze , please apply below patch to fix build on Windows

diff --git a/sdk/storage/azure-storage-common/src/xml_wrapper.cpp b/sdk/storage/azure-storage-common/src/xml_wrapper.cpp
index 40f5ce10..ede70454 100644
--- a/sdk/storage/azure-storage-common/src/xml_wrapper.cpp
+++ b/sdk/storage/azure-storage-common/src/xml_wrapper.cpp
@@ -31,7 +31,7 @@ namespace Azure { namespace Storage { namespace _internal {
   void XmlGlobalInitialize() {}
   void XmlGlobalDeinitialize() {}
 
-  struct XmlReaderContext
+  struct XmlReader::XmlReaderContext
   {
     XmlReaderContext()
     {
@@ -99,20 +99,14 @@ namespace Azure { namespace Storage { namespace _internal {
       throw std::runtime_error("Unsupported xml encoding.");
     }
 
-    m_context = context.release();
+    m_context = std::move(context);
   }
 
-  XmlReader::~XmlReader()
-  {
-    if (m_context)
-    {
-      delete static_cast<XmlReaderContext*>(m_context);
-    }
-  }
+  XmlReader::~XmlReader() {}
 
   XmlNode XmlReader::Read()
   {
-    auto context = static_cast<XmlReaderContext*>(m_context);
+    auto& context = m_context;
 
     auto moveToNext = [&]() {
       HRESULT ret = WsReadNode(context->reader, context->error);
@@ -222,7 +216,7 @@ namespace Azure { namespace Storage { namespace _internal {
     }
   }
 
-  struct XmlWriterContext
+  struct XmlWriter::XmlWriterContext
   {
     XmlWriterContext()
     {
@@ -276,20 +270,14 @@ namespace Azure { namespace Storage { namespace _internal {
       throw std::runtime_error("Failed to initialize xml writer.");
     }
 
-    m_context = context.release();
+    m_context = std::move(context);
   }
 
-  XmlWriter::~XmlWriter()
-  {
-    if (m_context)
-    {
-      delete static_cast<XmlWriterContext*>(m_context);
-    }
-  }
+  XmlWriter::~XmlWriter() {}
 
   void XmlWriter::Write(XmlNode node)
   {
-    auto context = static_cast<XmlWriterContext*>(m_context);
+    auto& context = m_context;
     if (node.Type == XmlNodeType::StartTag)
     {
       if (node.HasValue)
@@ -363,7 +351,7 @@ namespace Azure { namespace Storage { namespace _internal {
 
   std::string XmlWriter::GetDocument()
   {
-    auto context = static_cast<XmlWriterContext*>(m_context);
+    auto& context = m_context;
 
     BOOL boolValueTrue = TRUE;
     WS_XML_WRITER_PROPERTY writerProperty[2];

@rschu1ze

This comment was marked as resolved.

@Jinming-Hu
Copy link
Member

@rschu1ze This PR now looks good to me. We can merge it after fixing the clang format issue.

@rschu1ze
Copy link
Contributor Author

@rschu1ze This PR now looks good to me. We can merge it after fixing the clang format issue.

Done, thanks.

@rschu1ze
Copy link
Contributor Author

(Note to myself: bump to v1.12)

@Jinming-Hu Jinming-Hu merged commit 6960ad1 into Azure:main Jul 30, 2024
41 checks passed
rschu1ze pushed a commit to ClickHouse/azure-sdk-for-cpp that referenced this pull request Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants