Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Augment coiteration algorithm to handle hashed level during conjunctive merge #19

Closed
wants to merge 32 commits into from

Conversation

adam2392
Copy link
Contributor

@adam2392 adam2392 commented Mar 23, 2023

Closes: #20
Towards: #17

The CoIteration algorithm currently assumes all inputs are ordered. A hashed level format is unordered by definition, but we would like for it to work as is when one of the input levels is the "hashed" level.

Specifically, the coiteration works exactly as is, but if one of the levels is unordered, then will iterate through the other levels, and use locate into the hashed level. This will only work as expected during a conjunctive merge.

Details

The unit-test currently highlights my current understanding of how the algorithm should be checked. Let me know if there are any issues you see?

@adam2392 adam2392 changed the title [WIP] of compareVector [ENH] Augment coiteration algorithm to handle hashed level during conjunctive merge Apr 3, 2023
@adam2392 adam2392 marked this pull request as ready for review April 3, 2023 04:07
Copy link
Owner

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of changes.

test/source/coiteration_test.cpp Outdated Show resolved Hide resolved
test/source/coiteration_test.cpp Outdated Show resolved Hide resolved
test/source/coiteration_test.cpp Outdated Show resolved Hide resolved
test/source/coiteration_test.cpp Outdated Show resolved Hide resolved
@adam2392
Copy link
Contributor Author

Addressed changes in the unit-test.

Now, I will re-visit the logic we discussed about implementation.

Signed-off-by: Adam Li <[email protected]>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file contains my initial attempt at: adding a constexpr check at compile time to determine if the coiteration is valid (this assumes that the user converts non-ordered, non-locatable formats to a correct format).

What I'm not sure on now is how to best implement the locate logic. That is, we want to somehow given a tuple of levels, determine which of these have locate and instead of advancing the iterator on those levels, we locate into them given a index and pointer.

@adam2392
Copy link
Contributor Author

adam2392 commented May 26, 2023

May 26th 2023

I had a few clarifying questions as well:

  1. If we only get unordered formats, is this supported? I would say… no?
  2. If we get an arbitrary combination of ordered and unordered formats, that is fine as long as the unordered formats all have “locate” function right?
  3. Is it safe to assume that the function, f, passed to Coiterate is always a "conjunction", or a "disjunction"? That is, we would also want to store a bool indicating which type of function it is? Or are there others to take into account? The reason I ask is that my understanding is the co-iteration logic will be different depending on which case we are in. I.e. if conjunction, we iterate over ordered levels until they all reached end. If disjunction, we iterate over all levels until they all reached end regardless of whether or not they are ordered, or unordered with locate.
  4. My intuition for what code needs to change are the following parts of iteration:
  • advance_iter to account for the fact that not all levels are ordered
  • operator++ for same reason as above

Am I missing anything?

…t implement the altered co-iteration logic assuming that some of the levels are unordered.

Signed-off-by: Adam Li <[email protected]>
@hameerabbasi
Copy link
Owner

  1. If we only get unordered formats, is this supported? I would say… no?

We would iterate through the first one and locate into all the other ones in this case; but the output will be marked unordered as well.

This means only a level with an insert capability can be the output.

  1. If we get an arbitrary combination of ordered and unordered formats, that is fine as long as the unordered formats all have “locate” function right?

This is correct.

  1. Is it safe to assume that the function, f, passed to Coiterate is always a "conjunction", or a "disjunction"? That is, we would also want to store a bool indicating which type of function it is? Or are there others to take into account? The reason I ask is that my understanding is the co-iteration logic will be different depending on which case we are in. I.e. if conjunction, we iterate over ordered levels until they all reached end. If disjunction, we iterate over all levels until they all reached end regardless of whether or not they are ordered, or unordered with locate.

You'd have to check if it's a conjunction wrt the unordered levels. It can be somewhere in the middle, i.e. (a & b) | c is totally fine as long as a or b are unordered, but not both. In contrast, a & (b | c) only works if a is unordered, but not anything else.

A simple algorithm to identify which functions are valid is as follows:

  • Feed into f a function which is true only for the unordered levels and false for the ordered ones. The result of f must be false in this case. Repeat the test with every combination of inputs for unordered levels, the result must always be false. Ordered levels remain at false all the time.
  • The first point also implies that if there is NO unordered level, feeding in all-false must result in false.

You wouldn't need to store anything, just test the function itself.

  1. My intuition for what code needs to change are the following parts of iteration:
  • advance_iter to account for the fact that not all levels are ordered

  • operator++ for same reason as above

That sounds good to me, but one other place I can think of is the dereferencing operator, it must return the right ik and pk.

@adam2392
Copy link
Contributor Author

  1. If we only get unordered formats, is this supported? I would say… no?

We would iterate through the first one and locate into all the other ones in this case; but the output will be marked unordered as well.

This means only a level with an insert capability can be the output.

Okay, this sounds like it is not handled by co-iteration, but at the merge lattice level, mapping inputs to outputs. Right?

  1. Is it safe to assume that the function, f, passed to Coiterate is always a "conjunction", or a "disjunction"? That is, we would also want to store a bool indicating which type of function it is? Or are there others to take into account? The reason I ask is that my understanding is the co-iteration logic will be different depending on which case we are in. I.e. if conjunction, we iterate over ordered levels until they all reached end. If disjunction, we iterate over all levels until they all reached end regardless of whether or not they are ordered, or unordered with locate.

You'd have to check if it's a conjunction wrt the unordered levels. It can be somewhere in the middle, i.e. (a & b) | c is totally fine as long as a or b are unordered, but not both. In contrast, a & (b | c) only works if a is unordered, but not anything else.

A simple algorithm to identify which functions are valid is as follows:

  • Feed into f a function which is true only for the unordered levels and false for the ordered ones. The result of f must be false in this case. Repeat the test with every combination of inputs for unordered levels, the result must always be false. Ordered levels remain at false all the time.
  • The first point also implies that if there is NO unordered level, feeding in all-false must result in false.

You wouldn't need to store anything, just test the function itself.

So to clarify, this means that the function f(levels_converted_to_bool) returning True means there is an error because it is a disjunction wrt unordered levels. If f returns False, it means co-iteration should work. So, would the initialization code look something like this?

        explicit inline Coiterate(F f, Levels&... levels)
            : m_levelsTuple(std::tie(levels...))
            , m_comparisonHelper(f)
            , m_orderedIndices(meet_criteria(f, levels...))
        {
            if (![](auto const& first, auto const&... rest)
                { return ((first == rest) && ...); }(levels.size()...))
            {
                throw std::invalid_argument("level sizes should be same");
            }
            
            // this loops through the levels passed in and assigns True/False to each level based on ordered vs unordered
            levels_bool = convert_levels_to_true_or_false(levels)
            if f(levels_bool) == True:
            {
                throw std::invalid_argument("levels have disjunction wrt unordered levels. Please convert at least one of the levels involved in disjunction to a ordered level first.");
            }
        }

Then, this means that we just assume that f is set correctly based on what the user calling co-iteration wants. We convert the levels passed into a list of true/false statements, which is passed to the user-defined f, which will give us True, or False. This now handles the error-checking for initialization of the co-iterator.

The next part is how to handle actually co-iterating. If we don't store which levels are unordered, then we just use convert_levels_to_true_or_false constexpr function to determine during compile time, which levels are ordered (will have bool false), vs unordered (bool true) and the way to iterate now is through the ones that are false, and locate into the ones that are true? Thus, we call convert_levels_to_true_or_false inside the operator++, advance_iter and operator* functions?

@hameerabbasi
Copy link
Owner

Okay, this sounds like it is not handled by co-iteration, but at the merge lattice level, mapping inputs to outputs. Right?

Both have a component to them -- we'd handle it here by signalling to merge lattices whether the output is ordered.

        explicit inline Coiterate(F f, Levels&... levels)
            : m_levelsTuple(std::tie(levels...))
            , m_comparisonHelper(f)
            , m_orderedIndices(meet_criteria(f, levels...))
        {
            if (![](auto const& first, auto const&... rest)
                { return ((first == rest) && ...); }(levels.size()...))
            {
                throw std::invalid_argument("level sizes should be same");
            }
            
            // this loops through the levels passed in and assigns True/False to each level based on ordered vs unordered
            levels_bool = convert_levels_to_true_or_false(levels)
            if f(levels_bool) == True:
            {
                throw std::invalid_argument("levels have disjunction wrt unordered levels. Please convert at least one of the levels involved in disjunction to a ordered level first.");
            }
        }

This code looks roughly correct, except for two things:

  1. For unordered levels, we'd have to test ALL combinations of true/false, but false only for ordered ones.
  2. The exception would be replaced by a static_assert since all of this can be done at compile-time.

Copy link
Owner

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments I hope will be helpful on how to proceed.

Comment on lines 13 to 42
/*
The class template for Coiteration of level formats.

Uses a generic function object F to compare elements
from different sequences at the same position and returns a tuple of the
minimum index and the corresponding elements from each sequence.

Parameters
----------
F : class
A function object that is used to compare two elements from different ranges.
IK : class
The type of the first element of each range.
PK : class
The type of the second element of each range.
Levels : Tuple of class
A tuple of level formats, where each level is itself a tuple of elements to be iterated.
Is : Tuple of class
A tuple of indices that is used to keep track of the current position in each level.

Notes
-----
Coiteration is only allowed through tuples of levels if the following criterion is met:
If:
1. the levels are all ordered (i.e. has the `is_ordered == True` property)
2. if any of the level are do not have the is_ordered property, it must have the locate
function, else return False. Then do a check that `m_comparisonHelper` defines
a conjunctive merge (i.e. AND operation).
Otherwise, coiteration is not allowed.
*/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! However, I'd recommend Doxygen-style docstrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I changed the docstring to this style: https://www.doxygen.nl/manual/docblocks.html. Not as familiar w/ Doxygen, so lmk if you think I did anything wrong here.

test/source/coiteration_test.cpp Outdated Show resolved Hide resolved
@adam2392
Copy link
Contributor Author

This code looks roughly correct, except for two things:

  1. For unordered levels, we'd have to test ALL combinations of true/false, but false only for ordered ones.

I don't quite understand what you mean here, so want to clarify by walking thru an example.

Notes from today's brief call: Let's take your example for instance: F(a, b, c) = a & (b | c), that is F is a function that takes in boolean values representing a, b, c and then does some boolean expression on them.

During initialization of the Coiterate, we would be able to know what levels are ordered during compile-time since these are properties of the levels. So say a and b are ordered, then we would just need to test "F(a, b, c)", which we would run the function F(false, false, true), which evaluates to false.

Say c is unordered, then check F(false, false, true) so we would check F(false, false, false), so for all combinations of unordered levels, check True/False for those.

For storing what levels are formatted: have a constexpr function with tuple of levels input that spits out a tuple of true/false indicating ordered/unordered elements in levels.

For dereferencing: Have to modify the algorithm to only return the PKs for the ordered ones and then locate into all unordered levels.

test/CMakeLists.txt Outdated Show resolved Hide resolved
@adam2392
Copy link
Contributor Author

adam2392 commented May 30, 2023

@hameerabbasi quick few questions:

  1. should level_properties be part of hashed.cpp? I don't see it in there.
  2. how would I access the is_ordered property of the dense level for example? Something like:
 // Function to get the is_ordered properties of the levels as a tuple of booleans
        constexpr auto ordered_levels() const noexcept {
            return std::apply([](auto&... levels) {
                return std::tuple{levels.is_ordered...};
            }, m_levelsTuple);
        }

seems sensible, but upon further inspection does not work as I would want.

@adam2392
Copy link
Contributor Author

This is the first time they're actually being used, so changing the design is something I'd be open to if that's useful. @adam2392, would you please make that change?

Starting PR #22 to sort that out separately and then will come back here.

@adam2392
Copy link
Contributor Author

adam2392 commented Jun 2, 2023

Another issue besides access to level properties is the initialization of the coiteration_helper which assumes the levels passed in have the level capabilities iteration_helper. This is not available for hashed levels and thus results in a compilation error.

How should we handle this?

Xref: co_iteration.hpp file

std::tuple<typename Levels::LevelCapabilities::iteration_helper...> m_iterHelpers;

Nvm, we would just setup the iteration helpers for the m_levelsTuple[ordered_idx]

@@ -30,6 +59,26 @@ namespace xsparse::level_capabilities
}
}

constexpr auto ordered_levels() const noexcept
Copy link
Contributor Author

@adam2392 adam2392 Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hameerabbasi and @bharath2438 does this look fine in your opinion to get tuple of booleans corresponding to ordered levels?

If so, my next step is to implement a function for checking whether or not an instantiation of Coiterate satisfies the co-iteration criterion:

  1. all levels are ordered, or has_locate_v
  2. for levels that are unordered, check all True/False combinations as input to function F

There is a total of 2**N combinations, where N is the number of unordered levels.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks fine. Is there any way to do away with decay_t by passing in the right argument to std::apply?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so? What would "the right argument" to std::apply be? We only have access to m_levelsTuple and m_comparisonHelper in this setting.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, decay_t is used for stripping off the reference (or const) from the type, so I was wondering if it was really necessary, or would std::make_tuple(decltype(levels)::LevelProperties::is_ordered...) be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, yeah I tried that before (and just now again), but the issue is that:

In file included from /Users/adam2392/Documents/xsparse/test/source/coiteration_test.cpp:17:
/Users/adam2392/Documents/xsparse/include/xsparse/level_capabilities/co_iteration.hpp:77:25: error: type 'decltype(levels)' (aka 'const xsparse::levels::dense<std::tuple<>, unsigned long, unsigned long> &') cannot be used prior to '::' because it has no members
                        decltype(levels)::LevelProperties::is_ordered...);

@adam2392
Copy link
Contributor Author

adam2392 commented Jun 6, 2023

Some notes for discussion tomorrow:

Perhaps it is my lack of knowledge/experience writing "compiler-time" functions vs "runtime" functions, but I think the next step is to essentially "filter" the m_levelsTuple to create a constexpr tuple of levels that are ordered. IIUC, it is only this "sub-tuple" that should be used in the existing downstream iterator code.

So I would presume we need something like template recursion, kinda like this: https://godbolt.org.

@adam2392
Copy link
Contributor Author

adam2392 commented Jun 7, 2023

Ah so the issue I was discussing today about hashed not having the level_capabilities is a bit deeper.

Currently for dense, compressed, offset, range and singleton, the iteration_helper class is implemented in the coordinate_iterate.hpp file: https://github.com/hameerabbasi/xsparse/blob/main/include/xsparse/level_capabilities/coordinate_iterate.hpp. This is then passed to each of the levels using something like this inside the code for each level:

using LevelCapabilities
                = level_capabilities::locate_position_iterate<hashed,
                                                                 std::tuple<LowerLevels...>,
                                                                 IK,
                                                                 PK,
                                                                 ContainerTraits,
                                                                 _LevelProperties>;

We want a similar design for hashed I think. I'm thinking of thus migrating the class iteration_helper code currently in hashed.hpp (

class iteration_helper
) into coordinate_iterate.hpp. How does that sound @hameerabbasi and @bharath2438?

Signed-off-by: Adam Li <[email protected]>
@bharath2438
Copy link
Collaborator

How does that sound @hameerabbasi and @bharath2438?

Actually, the main reason why iteration_helper is different for hashed is because it's implementation is not consistent with the others. Hashed quite does not fit into either of coordinate_position_iterate and coordinate_value_iterate and also uses a bidirectional iterator, that is why it's helper is written in it's own file.

We could however, move the helper code to coordinate_iterate.hpp as coordinate_hashed_iterate, unless there is a better way of doing it @hameerabbasi @adam2392

@adam2392
Copy link
Contributor Author

Closed by #25

@adam2392 adam2392 closed this Aug 31, 2023
@adam2392 adam2392 deleted the comparevec branch August 31, 2023 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Augment coiteration to work for conjunctive merges when one of the levels is unordered (i.e. hashed level)
3 participants