Various parallelization frameworks are used to implement the algorithms of HLR. If possible (and reasonable), the implementation of a specific algorithm is also optimized for a particular framework. Please note, that not all algorithms are implemented for all frameworks.
Special functions are stored in the sub directories of the corresponding modules, e.g.,
tbb/arith.hh with arithmetical functions implemented using TBB.
Uses non-parallel, sequential algorithms, usually with corresponding optimizations, e.g., without mutices, etc… It also forms the reference implementation for correctness and when measuring the speedup of parallelized implementations.
Distributed Memory Frameworks¶
As the standard for distributed memory implementations, it is also used in HLR.
The implementations are mainly based on collective communication but also uses point-to-point communication if needed or for less important tasks.
Uses blocking, synchronous broadcasts (MPI_Bcast, etc.) from MPI-1.
Uses non-blocking, asynchronous broadcasts (MPI_Ibcast, etc.) from MPI-3.
Uses non-blocking, one-sided communication (MPI_Rget, etc.) from MPI-2.
GPI-2 is another attempt to provide an alternative to MPI with a special focus on one-sided communication, simplicity and thread safety.
HPX has its own set of command line arguments. Arguments for the user program have
to be provided after
tlr-hpx -- -n 1024
HPX performs thread affinity setting. Add
-t <n> to the command line flags where
n is the number of threads, e.g.,
tlr-hpx -t 4 -- -n 1024
CPU core binding is performed with HPX:
dag-hpx -t 64 --hpx:bind="thread:0-63=core:0-63.pu:0"
numactl -C 0-63 dag-hpx -t 64 --hpx:bind=none
To increase the flexibility of handling data in GPI-2, the following values are set in
#define GASPI_MAX_GROUPS (128) #define GASPI_MAX_MSEGS (16384)