Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-treading for action #219

Merged
merged 91 commits into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
dc2d904
Fix optimization flags.
bredelings Feb 14, 2024
0e1ab9c
Define HAVE_CPUID_H if we have <cpuid.h>
bredelings Feb 26, 2024
e6ac866
Run an apt-get update.
bredelings Feb 26, 2024
9b6614b
Indentation fix.
bredelings Feb 28, 2024
850e716
Ask for C++11 threading with --threading
bredelings Feb 28, 2024
e177aea
Initial wiring for threading in action...
bredelings Feb 28, 2024
5060072
Do the action by category.
bredelings Mar 4, 2024
df532c7
Use inline declaration.
bredelings Mar 4, 2024
360bb67
Add inline functions to construct MapType objects.
bredelings Mar 4, 2024
de656f3
Construct MapType for destination on the spot.
bredelings Mar 4, 2024
ce37107
Construct MapType for input partials on the spot.
bredelings Mar 4, 2024
edeef5c
Construct maps on the spot.
bredelings Mar 4, 2024
38a3722
Make rescalePartials take an integer index.
bredelings Mar 4, 2024
faaa8a9
Make calcPrePartialsPartials2( ) also take an index.
bredelings Mar 4, 2024
0e3b8fb
Remove unused variables.
bredelings Mar 4, 2024
f80d61a
Eliminate gMappedPartials and gMappedPartialsCache
bredelings Mar 4, 2024
a22ca29
Remove no-op virtual overrides.
bredelings Mar 4, 2024
072304c
Remove unused method declarations.
bredelings Mar 4, 2024
1c78eac
Add functions to get MapType for a subset of columns.
bredelings Mar 4, 2024
d34f118
Make argument orders match CPU Impl
bredelings Mar 6, 2024
49ae3b6
Defer to CPU impl for updatePrePartials and implement upPrePartials i…
bredelings Mar 6, 2024
054b0a1
Add startPattern and endPattern argument to calc functions.
bredelings Mar 6, 2024
7ff21fc
Remove command to error out when threading.
bredelings Mar 6, 2024
67bde48
Use start and end partition
bredelings Mar 6, 2024
dd986a9
The number of columns may not be kPatternCount anymore.
bredelings Mar 6, 2024
29a6d2c
Add simpleAction3( )
bredelings Mar 6, 2024
b830f0a
Remove call to factorial and condense code.
bredelings Mar 11, 2024
dec62c3
Eliminate call to factorial.
bredelings Mar 11, 2024
0cb70a7
Add a tag to make the message more clear.
bredelings Mar 11, 2024
2ba56eb
Add space.
bredelings Mar 11, 2024
336645e
Move assignment to beginning.
bredelings Mar 11, 2024
b318868
Move the normP1 and normPInf outside the class.
bredelings Mar 11, 2024
fee55ad
Use normP1( )
bredelings Mar 11, 2024
2b061fd
Use break instead of a boolean variable f
bredelings Mar 11, 2024
b924d9c
Avoid using temporary vector.
bredelings Mar 11, 2024
b7ef314
The loops cannot be merged.
bredelings Mar 11, 2024
0fb49d1
Small speedup by factoring out diagonal...
bredelings Mar 11, 2024
37fd9c4
Add note about the error.
bredelings Mar 11, 2024
29be16c
Don't crash.
bredelings Mar 11, 2024
fb6eabe
Don't assume that ceil( ) fits inside int.
bredelings Mar 11, 2024
09fc070
Handle Q*t - mu*I == 0
bredelings Mar 11, 2024
0d49e65
Also handle NaNs
bredelings Mar 11, 2024
18cb28c
Fix rescaling with more than 1 thread.
bredelings Mar 12, 2024
14a07c6
Test rescaling in actiontest
bredelings Mar 12, 2024
7a75bbc
Use the Ibanez action.
bredelings Mar 12, 2024
aa196dd
Replace ? with min/max.
bredelings Mar 18, 2024
48de146
Simplify getDValue2
bredelings Mar 19, 2024
210d035
Throw away the old value of the ds.
bredelings Mar 19, 2024
a342fd7
Further simplify getDValue2( )
bredelings Mar 19, 2024
543ae9a
Also clear powerMatrices
bredelings Mar 19, 2024
d80e7a9
Eliminate gHighestPowers
bredelings Mar 19, 2024
e096105
Remove v1 of simpleAction( ), getStatistics( ), getDValue( ).
bredelings Mar 19, 2024
d658bcb
Ensure that s >= 1.
bredelings Mar 19, 2024
95bb0f6
Shift index by 1.
bredelings Mar 19, 2024
457e0ce
Handle p==0 inside getDValue2( ).
bredelings Mar 19, 2024
ee8c6bc
Use vector<T> instead of map<int,T>
bredelings Mar 19, 2024
657ffbf
Add very drafty version of normest1( ).
bredelings Mar 19, 2024
331c3c6
Add a special case for getting the exact answer.
bredelings Mar 19, 2024
8c92327
Switch gMuBs from pointer to vector.
bredelings Mar 19, 2024
098278d
Use vector instead of pointers for more things.
bredelings Mar 19, 2024
ede6b33
Further small cleanup of createInstance( ).
bredelings Mar 19, 2024
d98a1b4
Clean up code slightly.
bredelings Mar 19, 2024
e716c59
Use random bits for columns after first.
bredelings Mar 20, 2024
691b627
Simplify code.
bredelings Mar 20, 2024
17b77d4
First rewrite of normest1( )
bredelings Mar 20, 2024
834af66
Fix
bredelings Mar 21, 2024
23b9bc6
Ensure that we do at least two iterations.
bredelings Mar 21, 2024
3494c45
Make itmax a parameter.
bredelings Mar 21, 2024
e1d723c
Fix outdated comment.
bredelings Mar 21, 2024
0227c3b
Add assert.
bredelings Mar 25, 2024
170c429
Make k start at 1
bredelings Mar 25, 2024
a594e70
Improve comments.
bredelings Mar 25, 2024
695f4d5
Fixes for normest1( ).
bredelings Mar 25, 2024
91cf953
Rename idx_hist back to ind_hist, following paper.
bredelings Mar 25, 2024
2103d64
Add some more commented-out debugging lines.
bredelings Mar 25, 2024
8647f1e
Handle the case where tmax < t
bredelings Mar 26, 2024
e7fdf2a
Add missing break statement.
bredelings Mar 26, 2024
94b26e9
Also print out est / est_old.
bredelings Mar 26, 2024
46010e4
Add some asserts.
bredelings Mar 26, 2024
eaa44b6
Don't exist if h.maxcoeff() == h[ind_best]
bredelings Mar 26, 2024
ea15fdd
Take the component-wise absolute value.
bredelings Mar 26, 2024
85fd084
Remove now-unneeded abs.
bredelings Mar 26, 2024
e1a6e37
Update comments.
bredelings Mar 26, 2024
a1cef75
Switch to using normest1 and stop computing power matrices.
bredelings Mar 26, 2024
e38fe22
Add function for pmax.
bredelings Mar 26, 2024
8d00b47
Precompute ds[eigenIndex][p] and make getDValue2( ) const.
bredelings Mar 26, 2024
aa60a02
Make simpleAction{2,3} const.
bredelings Mar 26, 2024
11d38e5
Switch back to simpleAction2
bredelings Mar 26, 2024
c871327
Update notes.
bredelings Apr 23, 2024
63985a7
Complain and quit if setTipStates( ) is called with action.
bredelings Apr 23, 2024
ca3fb7c
Automatically download Eigen.
bredelings Apr 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ jobs:
- name: Install (Linux)
if: runner.os == 'Linux' && matrix.name != 'windows'
run: |
sudo apt-get update
sudo apt-get install -y cmake ninja-build ccache pkg-config
sudo apt-get install -y openjdk-17-jdk libeigen3-dev ocl-icd-opencl-dev

Expand Down
29 changes: 24 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ else ()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")
endif ()

if (NOT ${CMAKE_BUILD_TYPE} MATCHES Debug)

if (NOT "${CMAKE_BUILD_TYPE}" MATCHES Debug)
if(MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /Ox /Ot")
else()
Expand Down Expand Up @@ -109,11 +110,22 @@ if(BUILD_OPENMP)
endif(BUILD_OPENMP)

if(BUILD_ACTION)
find_package (Eigen3 3.3 REQUIRED NO_MODULE)
include(FetchContent)
set(EIGEN_VERSION 3.4)
find_package(Eigen3 ${EIGEN_VERSION} QUIET)
if(NOT EIGEN3_FOUND)
message(" Eigen3: library not found - downloading.")
set(BUILD_TESTING OFF CACHE INTERNAL "")
FetchContent_Declare(eigen
GIT_REPOSITORY https://gitlab.com/libeigen/eigen.git
GIT_TAG ${EIGEN_VERSION}
GIT_SHALLOW ON)
FetchContent_MakeAvailable(eigen)
unset(BUILD_TESTING CACHE)
message(" Eigen3: downloaded.")
endif()

set(CMAKE_CXX_STANDARD 17)
if (NOT TARGET Eigen3::Eigen)
message(FATAL_ERROR "No Eigen3 Found")
endif (NOT TARGET Eigen3::Eigen)
add_definitions(-DBEAGLE_ACTION)
# add_definitions(-DBEAGLE_DEBUG_FLOW)
endif(BUILD_ACTION)
Expand Down Expand Up @@ -180,6 +192,13 @@ if (BEAGLE_OPTIMIZE_FOR_NATIVE_ARCH)
endif()
endif()


include(CheckIncludeFileCXX)
check_include_file_cxx("cpuid.h" HAVE_CPUID_H)
if (HAVE_CPUID_H)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DHAVE_CPUID_H")
endif()

add_subdirectory(libhmsbeagle)

link_libraries(hmsbeagle)
Expand Down
20 changes: 18 additions & 2 deletions examples/actiontest/actiontest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ void printFlags(long inFlags) {
if (inFlags & BEAGLE_FLAG_VECTOR_SSE) fprintf(stdout, " VECTOR_SSE");
if (inFlags & BEAGLE_FLAG_VECTOR_AVX) fprintf(stdout, " VECTOR_AVX");
if (inFlags & BEAGLE_FLAG_THREADING_NONE) fprintf(stdout, " THREADING_NONE");
if (inFlags & BEAGLE_FLAG_THREADING_CPP) fprintf(stdout, " THREADING_CPP");
if (inFlags & BEAGLE_FLAG_THREADING_OPENMP) fprintf(stdout, " THREADING_OPENMP");
if (inFlags & BEAGLE_FLAG_FRAMEWORK_CPU) fprintf(stdout, " FRAMEWORK_CPU");
if (inFlags & BEAGLE_FLAG_FRAMEWORK_CUDA) fprintf(stdout, " FRAMEWORK_CUDA");
Expand All @@ -147,8 +148,7 @@ int main( int argc, const char* argv[] )
}
fprintf(stdout, "\n");

// bool scaling = true;
bool scaling = false; // disable scaling for now
bool scaling = true;

bool doJC = true;

Expand All @@ -171,6 +171,19 @@ int main( int argc, const char* argv[] )

bool useGpu = argc > 1 && strcmp(argv[1] , "--gpu") == 0;

bool useThreading = false;
for(int i=1;i<argc;i++)
if (!strcmp(argv[i],"--threading"))
useThreading = true;

for(int i=1;i<argc;i++)
if (!strcmp(argv[i],"--help"))
{
std::cerr<<"Flag: --gpu\n";
std::cerr<<"Flag: --threading\n";
std::exit(1);
}

bool useTipStates = false;

int whichDevice = -1;
Expand Down Expand Up @@ -207,6 +220,9 @@ int main( int argc, const char* argv[] )
requirementFlags |= BEAGLE_FLAG_VECTOR_NONE;
}

if (useThreading)
preferenceFlags |= BEAGLE_FLAG_THREADING_CPP;

// create an instance of the BEAGLE library
int instance = beagleCreateInstance(
3, /**< Number of tip data elements (input) */
Expand Down
122 changes: 47 additions & 75 deletions libhmsbeagle/CPU/BeagleCPUActionImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -82,29 +82,23 @@ namespace beagle {
using BeagleCPUImpl<BEAGLE_CPU_ACTION_DOUBLE>::kFlags;
// SpMatrix** gScaledQs;
int kPartialsCacheOffset;
MapType** gMappedPartials;
MapType** gMappedPartialCache;
// using BeagleCPUImpl<BEAGLE_CPU_ACTION_DOUBLE>::gStateFrequencies;
// using BeagleCPUImpl<BEAGLE_CPU_ACTION_DOUBLE>::gTipStates;
// using BeagleCPUImpl<BEAGLE_CPU_ACTION_DOUBLE>::gTransitionMatrices;
double* gIntegrationTmp;
// double* gLeftPartialTmp;
// double* gRightPartialTmp;
SpMatrix* gInstantaneousMatrices;
SpMatrix* gBs;
double* gMuBs;
double* gB1Norms;
int* gEigenMaps;
double* gEdgeMultipliers;
std::map<int, SpMatrix>* powerMatrices;
std::map<int, double>* ds;
int* gHighestPowers;
std::vector<SpMatrix> gInstantaneousMatrices;
std::vector<SpMatrix> gBs;
std::vector<double> gMuBs;
std::vector<double> gB1Norms;
std::vector<int> gEigenMaps;
std::vector<double> gEdgeMultipliers;
std::vector<std::vector<double>> ds;
SpMatrix identity;
SpMatrix* gScaledQTransposeTmp;
MapType* gMappedIntegrationTmp;
// MapType* gMappedLeftPartialTmp;
// MapType* gMappedRightPartialTmp;
double* gRescaleTmp;
const int mMax = 55;
std::map<int, double> thetaConstants = {
//The first 30 values are from table A.3 of Computing Matrix Functions.
Expand Down Expand Up @@ -169,31 +163,32 @@ namespace beagle {
long long preferenceFlags,
long long requirementFlags);

virtual int setPartials(int bufferIndex,
const double* inPartials);

// virtual int setCategoryRates(const double* inCategoryRates);

virtual int setTipPartials(int tipIndex,
const double* inPartials);
int setTipStates(int tipIndex, const int* inStates);

virtual int updatePartials(const int *operations,
int operationCount,
int cumulativeScalingIndex);
protected:
virtual int upPartials(bool byPartition,
const int *operations,
int operationCount,
int cumulativeScalingIndex);

virtual int updatePrePartials(const int *operations,
int operationCount,
int cumulativeScalingIndex);
// protected:
// virtual int getPaddedPatternsModulus();
virtual int upPrePartials(bool byPartition,
const int *operations,
int operationCount,
int cumulativeScalingIndex);

private:
virtual void rescalePartials(MapType *destP,
double *scaleFactors,
double *cumulativeScaleFactors,
const int fillWithOnes);
inline MapType partialsMap(int index, int category, int startPattern, int endPattern);

inline MapType partialsMap(int index, int category);

inline MapType partialsCacheMap(int index, int category, int startPattern, int endPattern);

inline MapType partialsCacheMap(int index, int category);

// virtual int getPaddedPatternsModulus();

private:
virtual int setEigenDecomposition(int eigenIndex,
const double *inEigenVectors,
const double *inInverseEigenVectors,
Expand All @@ -206,58 +201,35 @@ namespace beagle {
const double* edgeLengths,
int count);

void simpleAction2(MapType *destP, MapType *partials, int edgeIndex,
bool transpose);
void simpleAction2(MapType& destP, MapType& partials, int edgeIndex,
int category, bool transpose) const;

void simpleAction3(MapType& destP, MapType& partials, int edgeIndex,
int category, bool transpose) const;

void simpleAction(MapType* destP,
MapType* partials,
SpMatrix* matrix);
void calcPartialsPartials2(int destPIndex,
int partials1Index,
int edgeIndex1,
int partials2Index,
int edgeIndex2,
int startPattern,
int endPattern);

void calcPartialsPartials(MapType* destP,
MapType* partials1,
SpMatrix* matrices1,
MapType* partials2,
SpMatrix* matrices2);

void calcPartialsPartials2(MapType *destP, MapType *partials1,
MapType *partials2, int edgeIndex1,
int edgeIndex2, MapType *partialCache1,
MapType *partialCache2);

void calcPrePartialsPartials(MapType *destP,
MapType *partials1,
SpMatrix *matrices1,
MapType *partials2,
SpMatrix *matrices2);

void calcPrePartialsPartials2(MapType *destP, MapType *partials1,
MapType *partials2, int edgeIndex1,
int edgeIndex2, MapType *partialCache2);

// Return (m,s)
std::tuple<int,int> getStatistics(double A1Norm,
SpMatrix * matrix,
double t,
int nCol);
void calcPrePartialsPartials2(int destPIndex,
int partials1Index,
int edgeIndex1,
int partials2Index,
int edgeIndex2,
int startPattern,
int endPattern);

// Return (m,s)
std::tuple<int,int> getStatistics2(double t, int nCol, double edgeMultiplier,
int eigenIndex);

double getDValue(int p,
std::map<int, double> &d,
std::map<int, SpMatrix> &powerMatrices);

double getDValue2(int p, int eigenIndex);

double normP1(SpMatrix * matrix);

double normPInf(SpMatrix* matrix);
double normPInf(MapType matrix);
double normPInf(MatrixXd * matrix);
int eigenIndex) const;

double getDValue(int p, int eigenIndex) const;

double getPMax() const;
};


Expand Down
Loading
Loading