Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Denial Constraint structures that are necessary to implement DC verification #417

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions src/core/algorithms/dc/base_provider.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#pragma once

#include <memory>
#include <stdexcept>

namespace model {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all these dc-specific classes should be placed in algos::dc or algos::dc::model namespace, but certainly not in global ::model


class PredicateBuilder;

template <typename Derived>
class BaseProvider {
protected:
static std::shared_ptr<Derived> instance_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Singleton shared pointer is a bad design. (we discussed this extensively in private messages)


BaseProvider() = default;

// FIXME: this is wrong, there should be one class representing the algorithm,
// which will create/clear singletons and call PredicateBuilder, PliShardBuiler,
// SomeOtherBuilder in order..
friend PredicateBuilder;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is declared friend though? This class doesn't have private fields at all


static void CreateInstance() {
if (instance_) {
throw std::runtime_error(Derived::ClassName() +
" instance is already created. "
"Perhaps PredicateBuilder has not been deleted");
}
instance_ = std::shared_ptr<Derived>(new Derived());
}

static void ClearInstance() {
if (instance_) {
Derived::Clear();
instance_.reset();
}
}

public:
BaseProvider(BaseProvider const &) = delete;
BaseProvider &operator=(BaseProvider const &) = delete;

static std::shared_ptr<Derived> GetInstance() {
if (!instance_) {
throw std::runtime_error(Derived::ClassName() +
" instance is not created. "
"Perhaps PredicateBuilder has not been created");
}
return instance_;
}
};

template <typename Derived>
std::shared_ptr<Derived> BaseProvider<Derived>::instance_ = nullptr;

} // namespace model
9 changes: 9 additions & 0 deletions src/core/algorithms/dc/column_operand.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#include "column_operand.h"

namespace model {

size_t hash_value(ColumnOperand const& col_op) noexcept {
return std::hash<ColumnOperand>()(col_op);
}

} // namespace model
75 changes: 75 additions & 0 deletions src/core/algorithms/dc/column_operand.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
#pragma once

#include <boost/functional/hash.hpp>

#include "table/column.h"

namespace model {

/**
* @brief Represents a column operand within a predicate for FastADC.
*
* FastADC processes Denial Constraints (DCs) that involve comparisons between
* pairs of rows within a dataset. A typical DC example, derived from a Functional
* Dependency such as A -> B, is expressed as: `forall t, s in r, not (t.A = s.A and t.B != s.B)`
* This denotes that for any pair of rows in the relation, it should not be the case
* that while the values in column "A" are equal, the values in column "B" are unequal.
*
* A predicate in this context (e.g., t.A == s.A) comprises three elements to be fully
* represented: the column operand from the first tuple ("t.A"), the comparison operator
* ("="), and the column operand from the second tuple ("s.A"). The `ColumnOperand` class
* encapsulates the column operand part of a predicate, such as "t.A" or "s.A".
*
* The class distinguishes between operands derived from the first tuple (t) and those
* from the second tuple (s) using a boolean flag `tuple_`, where `true` indicates an
* operand from the first tuple (t), and `false` indicates an operand from the second
* tuple (s).
*/
class ColumnOperand {
private:
Column const* column_;
bool tuple_;

public:
ColumnOperand(Column const* column, bool tuple) : column_(column), tuple_(tuple) {}

bool operator==(ColumnOperand const& rhs) const {
return column_ == rhs.column_ && tuple_ == rhs.tuple_;
}
Comment on lines +36 to +38
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class has operator==, but doesn't have operator!=, which is weird. Define both or neither


Column const* GetColumn() const {
return column_;
}

bool GetTuple() const {
return tuple_;
}

// here TS means (t, s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these t and s?

ColumnOperand GetInvTS() const {
return ColumnOperand(column_, !tuple_);
}

std::string ToString() const {
return (tuple_ ? "t." : "s.") + column_->GetName();
}
};

// NOLINTBEGIN(readability-identifier-naming)
size_t hash_value(model::ColumnOperand const& k) noexcept;
// NOLINTEND(readability-identifier-naming)

} // namespace model

namespace std {
template <>
struct hash<model::ColumnOperand> {
size_t operator()(model::ColumnOperand const& k) const noexcept {
size_t seed = 0;
boost::hash_combine(seed, k.GetColumn()->GetIndex());
boost::hash_combine(seed, k.GetTuple());
return seed;
}
};

} // namespace std
92 changes: 92 additions & 0 deletions src/core/algorithms/dc/operator.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
#include "algorithms/dc/operator.h"

namespace model {

bool Operator::Eval(std::byte const* v1, std::byte const* v2, Type const& type) const {
CompareResult cr = type.Compare(v1, v2);

switch (op_) {
case OperatorType::kEqual:
return cr == CompareResult::kEqual;
case OperatorType::kUnequal:
return cr != CompareResult::kEqual;
case OperatorType::kGreater:
return cr == CompareResult::kGreater;
case OperatorType::kLess:
return cr == CompareResult::kLess;
case OperatorType::kGreaterEqual:
return cr == CompareResult::kGreater || cr == CompareResult::kEqual;
case OperatorType::kLessEqual:
return cr == CompareResult::kLess || cr == CompareResult::kEqual;
default:
return false;
}
}

std::unordered_map<OperatorType, OperatorType> const Operator::InitializeInverseMap() {
return {{OperatorType::kEqual, OperatorType::kUnequal},
{OperatorType::kUnequal, OperatorType::kEqual},
{OperatorType::kGreater, OperatorType::kLessEqual},
{OperatorType::kLess, OperatorType::kGreaterEqual},
{OperatorType::kGreaterEqual, OperatorType::kLess},
{OperatorType::kLessEqual, OperatorType::kGreater}};
}

std::unordered_map<OperatorType, OperatorType> const Operator::InitializeSymmetricMap() {
return {{OperatorType::kEqual, OperatorType::kEqual},
{OperatorType::kUnequal, OperatorType::kUnequal},
{OperatorType::kGreater, OperatorType::kLess},
{OperatorType::kLess, OperatorType::kGreater},
{OperatorType::kGreaterEqual, OperatorType::kLessEqual},
{OperatorType::kLessEqual, OperatorType::kGreaterEqual}};
}

std::unordered_map<OperatorType, std::vector<OperatorType>> const
Operator::InitializeImplicationsMap() {
return {{OperatorType::kEqual,
{OperatorType::kEqual, OperatorType::kGreaterEqual, OperatorType::kLessEqual}},
{OperatorType::kUnequal, {OperatorType::kUnequal}},
{OperatorType::kGreater,
{OperatorType::kGreater, OperatorType::kGreaterEqual, OperatorType::kUnequal}},
{OperatorType::kLess,
{OperatorType::kLess, OperatorType::kLessEqual, OperatorType::kUnequal}},
{OperatorType::kGreaterEqual, {OperatorType::kGreaterEqual}},
{OperatorType::kLessEqual, {OperatorType::kLessEqual}}};
}

std::unordered_map<OperatorType, std::vector<OperatorType>> const
Operator::InitializeTransitivesMap() {
return {{OperatorType::kEqual, {OperatorType::kEqual}},
{OperatorType::kUnequal, {OperatorType::kEqual}},
{OperatorType::kGreater,
{OperatorType::kGreater, OperatorType::kGreaterEqual, OperatorType::kEqual}},
{OperatorType::kLess,
{OperatorType::kLess, OperatorType::kLessEqual, OperatorType::kEqual}},
{OperatorType::kGreaterEqual,
{OperatorType::kGreater, OperatorType::kGreaterEqual, OperatorType::kEqual}},
{OperatorType::kLessEqual,
{OperatorType::kLess, OperatorType::kLessEqual, OperatorType::kEqual}}};
}

std::unordered_map<OperatorType, std::string> const Operator::InitializeShortStringMap() {
return {{OperatorType::kEqual, "=="}, {OperatorType::kUnequal, "!="},
{OperatorType::kGreater, ">"}, {OperatorType::kLess, "<"},
{OperatorType::kGreaterEqual, ">="}, {OperatorType::kLessEqual, "<="}};
}

std::unordered_map<OperatorType, OperatorType> const Operator::kInverseMap =
Operator::InitializeInverseMap();
std::unordered_map<OperatorType, OperatorType> const Operator::kSymmetricMap =
Operator::InitializeSymmetricMap();
std::unordered_map<OperatorType, std::vector<OperatorType>> const Operator::kImplicationsMap =
Operator::InitializeImplicationsMap();
std::unordered_map<OperatorType, std::vector<OperatorType>> const Operator::kTransitivesMap =
Operator::InitializeTransitivesMap();
std::unordered_map<OperatorType, std::string> const Operator::kShortStringMap =
Operator::InitializeShortStringMap();

size_t hash_value(model::Operator const& k) noexcept {
return std::hash<model::Operator>()(k);
}

} // namespace model
95 changes: 95 additions & 0 deletions src/core/algorithms/dc/operator.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#pragma once

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include what you use, that is all standard headers from which types/functions are used here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies to all files

#include <boost/functional/hash.hpp>

#include "type.h"

namespace model {

enum class OperatorType { kEqual, kUnequal, kGreater, kLess, kGreaterEqual, kLessEqual };

constexpr std::array<OperatorType, 6> kAllOperatorTypes = {
OperatorType::kEqual, OperatorType::kUnequal, OperatorType::kGreater,
OperatorType::kLess, OperatorType::kGreaterEqual, OperatorType::kLessEqual};

class Operator {
private:
OperatorType op_;
static std::unordered_map<OperatorType, OperatorType> const kInverseMap;
static std::unordered_map<OperatorType, OperatorType> const kSymmetricMap;
static std::unordered_map<OperatorType, std::vector<OperatorType>> const kImplicationsMap;
static std::unordered_map<OperatorType, std::vector<OperatorType>> const kTransitivesMap;
static std::unordered_map<OperatorType, std::string> const kShortStringMap;

public:
Operator(OperatorType type) : op_(type) {}

bool operator==(Operator const& rhs) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same thing about operator!=

return op_ == rhs.op_;
}

bool Eval(std::byte const* v1, std::byte const* v2, Type const& type) const;

// 'a op b' <=> !'a op.inverse b'
Operator GetInverse() const {
return Operator(kInverseMap.at(op_));
}

// 'a op b' <=> 'b op.symmetric a'
Operator GetSymmetric() const {
return Operator(kSymmetricMap.at(op_));
}

// If 'a op b', then 'a op.implications[i] b'
std::vector<Operator> GetImplications() const {
std::vector<Operator> implications;
for (auto type : kImplicationsMap.at(op_)) {
implications.push_back(Operator(type));
}
return implications;
}

// If 'a op b' and 'b op.transitives[i] c', then 'a op c'
std::vector<Operator> GetTransitives() const {
std::vector<Operator> transitives;
for (auto type : kTransitivesMap.at(op_)) {
transitives.push_back(Operator(type));
}
return transitives;
}
Comment on lines +34 to +59
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just fyi: All these methods return the same value for the same op_, basically they're constructing at runtime vector which is known at compile time. All these functions can be made constexpr, but this of course requires constexpr maps.


std::string ToString() const {
return kShortStringMap.at(op_);
}

OperatorType GetType() const {
return op_;
}

static std::unordered_map<OperatorType, OperatorType> const InitializeInverseMap();

static std::unordered_map<OperatorType, OperatorType> const InitializeSymmetricMap();
Comment on lines +69 to +71
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const in return type does nothing


static std::unordered_map<OperatorType, std::vector<OperatorType>> const
InitializeImplicationsMap();

static std::unordered_map<OperatorType, std::vector<OperatorType>> const
InitializeTransitivesMap();

static std::unordered_map<OperatorType, std::string> const InitializeShortStringMap();
Comment on lines +69 to +79
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why all these functions are part of the public interface? Why do we need these functions at all?
They're used only to initialize static maps and the initialization itself is just an assignment. So why not simply make maps internally linked in src/core/algorithms/dc/operator.cpp and assign correct values in the point of declaration without any additional functions?

Copy link
Collaborator

@polyntsov polyntsov Jun 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, strictly speaking google style guide forbids static variables of type with throwing constructor (because it's impossible to catch the exception). So it's a good place to use frozen containers

};

// NOLINTBEGIN(readability-identifier-naming)
size_t hash_value(model::Operator const& k) noexcept;
// NOLINTEND(readability-identifier-naming)

} // namespace model

namespace std {
template <>
struct hash<model::Operator> {
size_t operator()(model::Operator const& k) const noexcept {
return static_cast<size_t>(k.GetType());
}
};
}; // namespace std
Comment on lines +88 to +95
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And ; is redundant in the end

62 changes: 62 additions & 0 deletions src/core/algorithms/dc/predicate.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#include "dc/predicate.h"

#include "dc/predicate_provider.h"

namespace model {

PredicatePtr GetPredicate(Operator const& op, ColumnOperand const& l, ColumnOperand const& r) {
return PredicateProvider::GetInstance()->GetPredicate(op, l, r);
}

PredicatePtr GetPredicateByType(PredicatesSpan predicates, OperatorType type) {
auto it = std::find_if(predicates.begin(), predicates.end(),
[type](PredicatePtr p) { return p->GetOperator() == type; });

return it == predicates.end() ? nullptr : *it;
}

bool Predicate::Satisfies(std::vector<model::TypedColumnData>& col_data, int t, int s) const {
TypedColumnData const& lhs = col_data[l_.GetColumn()->GetIndex()];
TypedColumnData const& rhs = col_data[r_.GetColumn()->GetIndex()];

// Assumes that types in both columns are the same (and they should)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add assert then if it's expected

Type const& type = lhs.GetType();

std::byte const* l_val = lhs.GetValue(l_.GetTuple() ? t : s);
std::byte const* r_val = rhs.GetValue(r_.GetTuple() ? t : s);

return op_.Eval(l_val, r_val, type);
}

PredicatePtr Predicate::GetSymmetric() const {
if (!symmetric_) {
symmetric_ = GetPredicate(op_.GetSymmetric(), r_, l_);
}
Comment on lines +31 to +34
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem with this caching strategy: when the method is const, it is implicitly assumed by other developers that it's thread safe, which is not the case for these methods and may cause subtle data race bugs in the future.
I'd think about more robust design of caching here (also these mutable fields just look bad)

return symmetric_;
}

PredicatePtr Predicate::GetInvTS() const {
if (!inv_TS_) {
inv_TS_ = GetPredicate(op_, r_.GetInvTS(), l_.GetInvTS());
}
return inv_TS_;
}

PredicatePtr Predicate::GetInverse() const {
if (!inverse_) {
inverse_ = GetPredicate(op_.GetInverse(), l_, r_);
}
return inverse_;
}

std::vector<PredicatePtr> Predicate::GetImplications() const {
if (implications_.empty()) {
auto op_implications = op_.GetImplications();
for (auto const& op_implication : op_implications) {
implications_.push_back(GetPredicate(op_implication, l_, r_));
}
}
return implications_;
}

} // namespace model
Loading
Loading