Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEODE-9753: Solve PDX serialization coredump #891

Closed
wants to merge 1 commit into from

Conversation

gaussianrecurrence
Copy link
Contributor

@gaussianrecurrence gaussianrecurrence commented Nov 5, 2021

  • While serializing a PdxSerializable object, there is a possible race
    condition which might cause the client to crash.
    This race-condition happens whenever the cluster is restarted during
    the serialization process and if
    on-client-disconnect-clear-pdxType-Ids is set to true, meaning the
    PdxTypeRegistry will be cleaned up if the connection towards the
    cluster is lost.
  • This issue has been solved by using the previously fetched local PDX
    type.
  • In order to properly test this solution, PdxRemoteWriterFactory has been added,
    so the race-condition can be force at test-time.
  • A new IT has been added to test that the solution is working fine.
  • make_unique was needed inside cppcache/src, so it was moved to
    utils/cxx_extensions.hpp

@gaussianrecurrence gaussianrecurrence force-pushed the feature/GEODE-9753 branch 4 times, most recently from af45bbf to 007e822 Compare November 6, 2021 03:42
 - While serializing a PdxSerializable object, there is a possible race
   condition which might cause the client to crash.
   This race-condition happens whenever the cluster is restarted during
   the serialization process and if
   on-client-disconnect-clear-pdxType-Ids is set to true, meaning the
   PdxTypeRegistry will be cleaned up if the connection towards the
   cluster is lost.
 - This issue has been solved by using the previously fetched local PDX
   type.
 - In order to properly test this solution, PdxRemoteWriterFactory has been added,
   so the race-condition can be force at test-time.
 - A new IT has been added to test that the solution is working fine.
 - make_unique was needed inside cppcache/src, so it was moved to
   utils/cxx_extensions.hpp
@gaussianrecurrence gaussianrecurrence marked this pull request as ready for review November 9, 2021 07:06
@gaussianrecurrence
Copy link
Contributor Author

I am reviewing all of my pending PRs, and this has been awating review for quite some time. Maybe @pdxcodemonkey, @pivotal-jbarrett @albertogpz @mkevo @jvarenina or anyone else could take a look at it and provide some feedback?
Thanks!

return PdxRemoteWriter(output, className, pdxTypeRegistry);
}
};
auto prw = cacheImpl->getPdxRemoteWriterFactory().create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a slightly longer, meaningful name? like remoteWriter for instance?

namespace apache {
namespace geode {
namespace client {

class PdxRemoteWriter : public PdxLocalWriter {
class APACHE_GEODE_EXPORT PdxRemoteWriter : public PdxLocalWriter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't export this, we can't change the ABI and anyway we don't want any more publicly-accessible things than we already have(!).

Copy link
Contributor Author

@gaussianrecurrence gaussianrecurrence Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exported for testing purposes. Will try to test it the way @pivotal-jbarrett, which would avoid exporting symbols.

namespace geode {
namespace client {

class APACHE_GEODE_EXPORT PdxRemoteWriterFactoryImpl : public virtual PdxRemoteWriterFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please don't export.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exported for testing purposes. Will try to test it the way @pivotal-jbarrett, which would avoid exporting symbols.

return PdxRemoteWriter(output, className, pdxTypeRegistry);
}
};
auto prw = cacheImpl->getPdxRemoteWriterFactory().create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this extracted into an pure virtual interface for the sole purpose of integration testing?

Copy link
Contributor Author

@gaussianrecurrence gaussianrecurrence Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I've seen you proposed an alternative way of testing this. I'll look into it.

@jake-at-work
Copy link
Contributor

  • This issue has been solved by using the previously fetched local PDX
    type.

Do you mean this issue is solved in this PR or in a previous and this PR just adds more testing?

cachePerfStats.incPdxSerialization(
pdxLen + 1 + 2 * 4); // pdxLen + 93 DSID + len + typeID
}
uint8_t* stPos =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

cachePerfStats.incPdxSerialization(
pdxLen + 1 + 2 * 4); // pdxLen + 93 DSID + len + typeID
}
uint8_t* stPos = const_cast<uint8_t*>(output.getBuffer()) +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

cppcache/src/PdxHelper.cpp Show resolved Hide resolved
}
initialize();

m_pdxClassName = pdxType->getPdxClassName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pdxType is nullable then this line will throw.

m_preserveDataIdx(0),
m_currentDataIdx(-1),
m_remoteTolocalMapLength(0) {
m_preserveData = nullptr;
if (m_pdxType != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pdxType is not nullable then this check is unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing is I did leave that because I thought what if pdxType passed were nullptr, but that's not the way it's intended to be used, so I guess it would be best to force a coredump if pdxType is nullptr, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if it is nullptr then line 63 is going to fault anyway. Also, if passed here as nulltpr but later use excepted m_pdxType to not be nullptr... boom.

std::shared_ptr<PdxTypeRegistry> pdxTypeRegistry);

PdxRemoteWriter(DataOutput& output, std::string pdxClassName,
PdxRemoteWriter(DataOutput& output, std::shared_ptr<PdxType> pdxType,
std::shared_ptr<PdxRemotePreservedData> preservedData,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It this really feels like an extension of the other constructor with the optional preservedData parameters to maybe re-order the parameters to look like that.


PdxRemoteWriterFactoryImpl::~PdxRemoteWriterFactoryImpl() = default;

std::unique_ptr<PdxRemoteWriter> PdxRemoteWriterFactoryImpl::create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really seems like you could test this at unit level by mocking PdxTypeRegistery and PdxType. Make this a member function back from where it came and test it a the unit level. This avoids the creation of this factory, the exporting of internal symbols, and the interdiction of artificial behavior into the production implementation, via overriding this implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your approach seems promising. I'll look into it and see if it's feasible.

@jake-at-work
Copy link
Contributor

  • This issue has been solved by using the previously fetched local PDX
    type.

Do you mean this issue is solved in this PR or in a previous and this PR just adds more testing?

Ok, I think I found the fix but I a not sure I understand it. How does preserving the PdxType in all cases of the RemotePdxWriter construction fix the race condition?

@gaussianrecurrence
Copy link
Contributor Author

  • This issue has been solved by using the previously fetched local PDX
    type.

Do you mean this issue is solved in this PR or in a previous and this PR just adds more testing?

The issue I am referring to here is the one I described in the bullet above.

@gaussianrecurrence
Copy link
Contributor Author

Closing this PR as this issue is already addressed in GEODE-10276

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants