Various parallelization frameworks are used to implement the algorithms of HLR. If possible (and reasonable), the implementation of a specific algorithm is also optimized for a particular framework. Please note, that not all algorithms are implemented for all frameworks.

Special functions are stored in the sub directories of the corresponding modules, e.g., tbb/arith.hh with arithmetical functions implemented using TBB.

Sequential (seq)

Uses non-parallel, sequential algorithms, usually with corresponding optimizations, e.g., without mutices, etc… It also forms the reference implementation for correctness and when measuring the speedup of parallelized implementations.

Shared Memory Frameworks

OpenMP (omp)

Uses OpenMP to parallelize algorithms, e.g., via omp parallel or omp task.

Threading Building Blocks (tbb)

Also used in HLIBpro, TBB is usually the first framework for which a new algorithm is implemented. Therefore, it also provides the largest set of implemented algorithms.

C++-Taskflow (tf)

C++-Taskflow is a new library for task-based programming, competing with TBB.

HPX (hpx)

HPX is a framework for shared and distributed memory programming, forming a unified approach for both cases. As of now, only shared memory algorithms are used.

Distributed Memory Frameworks


As the standard for distributed memory implementations, it is also used in HLR.

The implementations are mainly based on collective communication but also uses point-to-point communication if needed or for less important tasks.

Synchronous (mpi-bcast)

Uses blocking, synchronous broadcasts (MPI_Bcast, etc.) from MPI-1.

Asynchronous (mpi-ibcast)

Uses non-blocking, asynchronous broadcasts (MPI_Ibcast, etc.) from MPI-3.

One-Sided-Communication (mpi-rdma)

Uses non-blocking, one-sided communication (MPI_Rget, etc.) from MPI-2.

GASPI/GPI-2 (gpi2)

GPI-2 is another attempt to provide an alternative to MPI with a special focus on one-sided communication, simplicity and thread safety.



HPX has its own set of command line arguments. Arguments for the user program have to be provided after --, e.g.,

tlr-hpx -- -n 1024

HPX performs thread affinity setting. Add -t <n> to the command line flags where n is the number of threads, e.g.,

tlr-hpx -t 4 -- -n 1024

CPU core binding is performed with HPX:

dag-hpx -t 64 --hpx:bind="thread:0-63=core:0-63.pu:0"


numactl -C 0-63 dag-hpx -t 64 --hpx:bind=none


To increase the flexibility of handling data in GPI-2, the following values are set in GPI2_Types.h:

#define GASPI_MAX_GROUPS  (128)
#define GASPI_MAX_MSEGS   (16384)