# Frameworks¶

Various parallelization frameworks are used to implement the algorithms of HLR. If possible (and reasonable), the implementation of a specific algorithm is also optimized for a particular framework. Please note, that not all algorithms are implemented for all frameworks.

Special functions are stored in the sub directories of the corresponding modules, e.g., tbb/arith.hh with arithmetical functions implemented using TBB.

## Sequential (seq)¶

Uses non-parallel, sequential algorithms, usually with corresponding optimizations, e.g., without mutices, etc… It also forms the reference implementation for correctness and when measuring the speedup of parallelized implementations.

## Shared Memory Frameworks¶

### OpenMP (omp)¶

Uses OpenMP to parallelize algorithms, e.g., via omp parallel or omp task.

### Threading Building Blocks (tbb)¶

Also used in HLIBpro, TBB is usually the first framework for which a new algorithm is implemented. Therefore, it also provides the largest set of implemented algorithms.

### HPX (hpx)¶

HPX is a framework for shared and distributed memory programming, forming a unified approach for both cases. As of now, only shared memory algorithms are used.

## Distributed Memory Frameworks¶

### MPI¶

As the standard for distributed memory implementations, it is also used in HLR.

The implementations are mainly based on collective communication but also uses point-to-point communication if needed or for less important tasks.

#### Synchronous (mpi-bcast)¶

Uses blocking, synchronous broadcasts (MPI_Bcast, etc.) from MPI-1.

#### Asynchronous (mpi-ibcast)¶

Uses non-blocking, asynchronous broadcasts (MPI_Ibcast, etc.) from MPI-3.

#### One-Sided-Communication (mpi-rdma)¶

Uses non-blocking, one-sided communication (MPI_Rget, etc.) from MPI-2.

### GASPI/GPI-2 (gpi2)¶

GPI-2 is another attempt to provide an alternative to MPI with a special focus on one-sided communication, simplicity and thread safety.

## Remarks¶

### HPX¶

HPX has its own set of command line arguments. Arguments for the user program have to be provided after --, e.g.,

tlr-hpx -- -n 1024


HPX performs thread affinity setting. Add -t <n> to the command line flags where n is the number of threads, e.g.,

tlr-hpx -t 4 -- -n 1024


CPU core binding is performed with HPX:

dag-hpx -t 64 --hpx:bind="thread:0-63=core:0-63.pu:0"


or

numactl -C 0-63 dag-hpx -t 64 --hpx:bind=none


### GPI-2¶

To increase the flexibility of handling data in GPI-2, the following values are set in GPI2_Types.h:

#define GASPI_MAX_GROUPS  (128)
#define GASPI_MAX_MSEGS   (16384)