Skip to content

Commit

Permalink
Merge branch 'dev'
Browse files Browse the repository at this point in the history
  • Loading branch information
lczech committed Aug 5, 2024
2 parents 7a91d0c + 6011e38 commit 1a220c2
Show file tree
Hide file tree
Showing 120 changed files with 15,151 additions and 3,748 deletions.
469 changes: 352 additions & 117 deletions .github/workflows/ci.yaml

Large diffs are not rendered by default.

16 changes: 13 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
# Contact:
# Lucas Czech <[email protected]>
# Department of Plant Biology, Carnegie Institution For Science
# 260 Panama Street, Stanford, CA 94305, USA
# Lucas Czech <[email protected]>
# University of Copenhagen, Globe Institute, Section for GeoGenetics
# Oster Voldgade 5-7, 1350 Copenhagen K, Denmark

# --------------------------------------------------------------------------------------------------
# CMake Init
Expand Down Expand Up @@ -192,6 +192,7 @@ option (GENESIS_BUILD_TESTS "Build the Genesis test suites."
# If available, use external dependencies.
option (GENESIS_USE_ZLIB "Use zlib." ON)
option (GENESIS_USE_OPENMP "Use OpenMP." ON)
option (GENESIS_USE_AVX "Use AVX/AVX2." ON)

# Additional dependencies: htslib
# We use a locally installed (in the build directory) version of htslib,
Expand Down Expand Up @@ -352,6 +353,15 @@ IF(GENESIS_USE_OPENMP)
include( IncludeOpenMP )
ENDIF()

# ----------------------------------------------------------
# AVX/AVX2
# ----------------------------------------------------------

# IF(GENESIS_USE_AVX)
# # Included from modules dir tools/cmake
# include( IncludeAVX )
# ENDIF()

# ----------------------------------------------------------
# htslib
# ----------------------------------------------------------
Expand Down
82 changes: 82 additions & 0 deletions doc/manual/supplement/acknowledgements.md
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,88 @@ for our needs.
> DEALINGS IN THE SOFTWARE.
@htmlonly </details> @endhtmlonly
## Concurrent Queue @anchor supplement_acknowledgements_code_reuse_concurrent_queue

Genesis contains an implementation of a Concurrent Queue and related classes:

* @link genesis::utils::ConcurrentQueue ConcurrentQueue@endlink
* @link genesis::utils::BlockingConcurrentQueue BlockingConcurrentQueue@endlink
* @link genesis::utils::LightweightSemaphore LightweightSemaphore@endlink

This implementation is from the excellent moodycamel::ConcurrentQueue
(https://github.com/cameron314/concurrentqueue), using version v1.0.4,
which was published under a simplified BSD license, and also dual-licensed under the Boost
Software License. The full [original license](https://github.com/cameron314/concurrentqueue/blob/master/LICENSE.md), as of 2024-07-04, is copied below.

We adapted the original code by (roughly) formatting it to our formatting standard, as well as
renaming the namespace from moodycamel to be contained within our namespace, to keep our
documentation and usage consistent. Other than that, all functionality is kept as-is.

@htmlonly <details><summary>License</summary> @endhtmlonly
> This license file applies to everything in this repository except that which
> is explicitly annotated as being written by other authors, i.e. the Boost
> queue (included in the benchmarks for comparison), Intel's TBB library (ditto),
> dlib::pipe (ditto),
> the CDSChecker tool (used for verification), the Relacy model checker (ditto),
> and Jeff Preshing's semaphore implementation (used in the blocking queue) which
> has a zlib license (embedded in lightweightsempahore.h).
>
> ---
>
> Simplified BSD License:
>
> Copyright (c) 2013-2016, Cameron Desrochers.
> All rights reserved.
>
> Redistribution and use in source and binary forms, with or without modification,
> are permitted provided that the following conditions are met:
>
> - Redistributions of source code must retain the above copyright notice, this list of
> conditions and the following disclaimer.
> - Redistributions in binary form must reproduce the above copyright notice, this list of
> conditions and the following disclaimer in the documentation and/or other materials
> provided with the distribution.
>
> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
> EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
> MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
> OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
> EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>
> ---
>
> I have also chosen to dual-license under the Boost Software License as an alternative to
> the Simplified BSD license above:
>
> Boost Software License - Version 1.0 - August 17th, 2003
>
> Permission is hereby granted, free of charge, to any person or organization
> obtaining a copy of the software and accompanying documentation covered by
> this license (the "Software") to use, reproduce, display, distribute,
> execute, and transmit the Software, and to prepare derivative works of the
> Software, and to permit third-parties to whom the Software is furnished to
> do so, all subject to the following:
>
> The copyright notices in the Software and this entire statement, including
> the above license grant, this restriction and the following disclaimer,
> must be included in all copies of the Software, in whole or in part, and
> all derivative works of the Software, unless such copies or derivative
> works are solely in the form of machine-executable object code generated by
> a source language processor.
>
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
> SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
> FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
> ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS IN THE SOFTWARE.
@htmlonly </details> @endhtmlonly
## Succinct Range Minimum Query @anchor supplement_acknowledgements_code_reuse_succinct_rmq

The implementation of our @link genesis::utils::RangeMinimumQuery RangeMinimumQuery@endlink data
Expand Down
81 changes: 81 additions & 0 deletions doc/manual/supplement/acknowledgements/c_07_concurrent_queue.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
## Concurrent Queue @anchor supplement_acknowledgements_code_reuse_concurrent_queue

Genesis contains an implementation of a Concurrent Queue and related classes:

* @link genesis::utils::ConcurrentQueue ConcurrentQueue@endlink
* @link genesis::utils::BlockingConcurrentQueue BlockingConcurrentQueue@endlink
* @link genesis::utils::LightweightSemaphore LightweightSemaphore@endlink

This implementation is from the excellent moodycamel::ConcurrentQueue
(https://github.com/cameron314/concurrentqueue), using version v1.0.4,
which was published under a simplified BSD license, and also dual-licensed under the Boost
Software License. The full [original license](https://github.com/cameron314/concurrentqueue/blob/master/LICENSE.md), as of 2024-07-04, is copied below.

We adapted the original code by (roughly) formatting it to our formatting standard, as well as
renaming the namespace from moodycamel to be contained within our namespace, to keep our
documentation and usage consistent. Other than that, all functionality is kept as-is.

@htmlonly <details><summary>License</summary> @endhtmlonly
> This license file applies to everything in this repository except that which
> is explicitly annotated as being written by other authors, i.e. the Boost
> queue (included in the benchmarks for comparison), Intel's TBB library (ditto),
> dlib::pipe (ditto),
> the CDSChecker tool (used for verification), the Relacy model checker (ditto),
> and Jeff Preshing's semaphore implementation (used in the blocking queue) which
> has a zlib license (embedded in lightweightsempahore.h).
>
> ---
>
> Simplified BSD License:
>
> Copyright (c) 2013-2016, Cameron Desrochers.
> All rights reserved.
>
> Redistribution and use in source and binary forms, with or without modification,
> are permitted provided that the following conditions are met:
>
> - Redistributions of source code must retain the above copyright notice, this list of
> conditions and the following disclaimer.
> - Redistributions in binary form must reproduce the above copyright notice, this list of
> conditions and the following disclaimer in the documentation and/or other materials
> provided with the distribution.
>
> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
> EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
> MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
> OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
> EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>
> ---
>
> I have also chosen to dual-license under the Boost Software License as an alternative to
> the Simplified BSD license above:
>
> Boost Software License - Version 1.0 - August 17th, 2003
>
> Permission is hereby granted, free of charge, to any person or organization
> obtaining a copy of the software and accompanying documentation covered by
> this license (the "Software") to use, reproduce, display, distribute,
> execute, and transmit the Software, and to prepare derivative works of the
> Software, and to permit third-parties to whom the Software is furnished to
> do so, all subject to the following:
>
> The copyright notices in the Software and this entire statement, including
> the above license grant, this restriction and the following disclaimer,
> must be included in all copies of the Software, in whole or in part, and
> all derivative works of the Software, unless such copies or derivative
> works are solely in the form of machine-executable object code generated by
> a source language processor.
>
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
> SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
> FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
> ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS IN THE SOFTWARE.
@htmlonly </details> @endhtmlonly
6 changes: 3 additions & 3 deletions lib/genesis/genesis.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@
along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact:
Lucas Czech <[email protected]>
Department of Plant Biology, Carnegie Institution For Science
260 Panama Street, Stanford, CA 94305, USA
Lucas Czech <[email protected]>
University of Copenhagen, Globe Institute, Section for GeoGenetics
Oster Voldgade 5-7, 1350 Copenhagen K, Denmark
*/

/**
Expand Down
6 changes: 3 additions & 3 deletions lib/genesis/placement.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@
along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact:
Lucas Czech <[email protected]>
Department of Plant Biology, Carnegie Institution For Science
260 Panama Street, Stanford, CA 94305, USA
Lucas Czech <[email protected]>
University of Copenhagen, Globe Institute, Section for GeoGenetics
Oster Voldgade 5-7, 1350 Copenhagen K, Denmark
*/

/**
Expand Down
7 changes: 4 additions & 3 deletions lib/genesis/population.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@
along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact:
Lucas Czech <[email protected]>
Department of Plant Biology, Carnegie Institution For Science
260 Panama Street, Stanford, CA 94305, USA
Lucas Czech <[email protected]>
University of Copenhagen, Globe Institute, Section for GeoGenetics
Oster Voldgade 5-7, 1350 Copenhagen K, Denmark
*/

/**
Expand All @@ -37,6 +37,7 @@
#include "genesis/population/filter/filter_status.hpp"
#include "genesis/population/filter/sample_counts_filter.hpp"
#include "genesis/population/filter/sample_counts_filter_numerical.hpp"
#include "genesis/population/filter/sample_counts_filter_positional.hpp"
#include "genesis/population/filter/variant_filter.hpp"
#include "genesis/population/filter/variant_filter_numerical.hpp"
#include "genesis/population/filter/variant_filter_positional.hpp"
Expand Down
27 changes: 23 additions & 4 deletions lib/genesis/population/filter/sample_counts_filter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,12 @@ namespace population {
// add other values to that enum, we want to know here, in order to adapt all below functions
// accordingly.
static_assert(
static_cast<FilterStatus::IntType>( SampleCountsFilterTag::kEnd ) == 10,
"SampleCountsFilterTag::kEnd != 10. The enum has values that are not accounted for."
static_cast<FilterStatus::IntType>( SampleCountsFilterTag::kEnd ) == 12,
"SampleCountsFilterTag::kEnd != 12. The enum has values that are not accounted for."
);
static_assert(
static_cast<FilterStatus::IntType>( SampleCountsFilterTagCategory::kEnd ) == 3,
"SampleCountsFilterTagCategory::kEnd != 3. The enum has values that are not accounted for."
static_cast<FilterStatus::IntType>( SampleCountsFilterTagCategory::kEnd ) == 4,
"SampleCountsFilterTagCategory::kEnd != 4. The enum has values that are not accounted for."
);

// =================================================================================================
Expand All @@ -65,6 +65,9 @@ SampleCountsFilterTagCategory sample_counts_filter_tag_to_category( SampleCounts
switch( tag ) {
case SampleCountsFilterTag::kPassed:
return SampleCountsFilterTagCategory::kPassed;
case SampleCountsFilterTag::kMaskedPosition:
case SampleCountsFilterTag::kMaskedRegion:
return SampleCountsFilterTagCategory::kMasked;
case SampleCountsFilterTag::kMissing:
case SampleCountsFilterTag::kNotPassed:
case SampleCountsFilterTag::kInvalid:
Expand Down Expand Up @@ -98,6 +101,8 @@ SampleCountsFilterCategoryStats sample_counts_filter_stats_category_counts(
// Build our result, by simply adding up the values to our simple categories / classes.
SampleCountsFilterCategoryStats result;
result[SampleCountsFilterTagCategory::kPassed] += stats[ SampleCountsFilterTag::kPassed ];
result[SampleCountsFilterTagCategory::kMasked] += stats[ SampleCountsFilterTag::kMaskedPosition ];
result[SampleCountsFilterTagCategory::kMasked] += stats[ SampleCountsFilterTag::kMaskedRegion ];
result[SampleCountsFilterTagCategory::kMissingInvalid] += stats[ SampleCountsFilterTag::kMissing ];
result[SampleCountsFilterTagCategory::kMissingInvalid] += stats[ SampleCountsFilterTag::kNotPassed ];
result[SampleCountsFilterTagCategory::kMissingInvalid] += stats[ SampleCountsFilterTag::kInvalid ];
Expand All @@ -123,6 +128,11 @@ size_t sample_counts_filter_stats_category_counts(
result += stats[ SampleCountsFilterTag::kPassed ];
break;
}
case SampleCountsFilterTagCategory::kMasked: {
result += stats[ SampleCountsFilterTag::kMaskedPosition ];
result += stats[ SampleCountsFilterTag::kMaskedRegion ];
break;
}
case SampleCountsFilterTagCategory::kMissingInvalid: {
result += stats[ SampleCountsFilterTag::kMissing ];
result += stats[ SampleCountsFilterTag::kNotPassed ];
Expand Down Expand Up @@ -165,6 +175,12 @@ std::ostream& print_sample_counts_filter_stats(
assert( stats.data.size() == static_cast<size_t>( SampleCountsFilterTag::kEnd ) );

// Go through all possible enum values and print them
if( stats[SampleCountsFilterTag::kMaskedPosition] > 0 || verbose ) {
os << "Masked position: " << stats[SampleCountsFilterTag::kMaskedPosition] << "\n";
}
if( stats[SampleCountsFilterTag::kMaskedRegion] > 0 || verbose ) {
os << "Masked region: " << stats[SampleCountsFilterTag::kMaskedRegion] << "\n";
}
if( stats[SampleCountsFilterTag::kMissing] > 0 || verbose ) {
os << "Missing: " << stats[SampleCountsFilterTag::kMissing] << "\n";
}
Expand Down Expand Up @@ -220,6 +236,9 @@ std::ostream& print_sample_counts_filter_category_stats(
assert( stats.data.size() == static_cast<size_t>( SampleCountsFilterTagCategory::kEnd ) );

// Go through all possible enum values and print them
if( stats[SampleCountsFilterTagCategory::kMasked] > 0 || verbose ) {
os << "Masked: " << stats[SampleCountsFilterTagCategory::kMasked] << "\n";
}
if( stats[SampleCountsFilterTagCategory::kMissingInvalid] > 0 || verbose ) {
os << "Missing: " << stats[SampleCountsFilterTagCategory::kMissingInvalid] << "\n";
}
Expand Down
28 changes: 28 additions & 0 deletions lib/genesis/population/filter/sample_counts_filter.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,29 @@ enum class SampleCountsFilterTag : FilterStatus::IntType
*/
kPassed = 0,

// -------------------------------------------
// Position
// -------------------------------------------

/**
* @brief Position has been masked out from processing.
*
* This can be due to, e.g., via a RegionLocus set from a fasta file, see read_mask_fasta().
* We distinguish this from kMaskedRegion purely for semantic reasons. Both filters are due to
* some user-specified position-based filter, and created by similar functions. However, we
* generally mean to indicate that a masked position is due to some fine-grained positional
* filter, while masked regions are meant to indicate filters for larger regions such as
* chromsosomes or genes.
*/
kMaskedPosition,

/**
* @brief Position is part of a masked region.
*
* See kMaskedPosition for details on the distrinction between the two.
*/
kMaskedRegion,

// -------------------------------------------
// Missing and Invalid
// -------------------------------------------
Expand Down Expand Up @@ -152,6 +175,11 @@ enum class SampleCountsFilterTagCategory : FilterStatus::IntType
*/
kPassed = 0,

/**
* @brief Position is masked.
*/
kMasked,

/**
* @brief Position is missing or otherwise invalid.
*/
Expand Down
Loading

0 comments on commit 1a220c2

Please sign in to comment.