Mat-Vec with Compression¶
This extends the compression approach by implementing matrix-vector multiplication with on-the-fly decompression as in [1]. All computations are sill performed in double precision (FP64).
See also https://arxiv.org/abs/2405.03456.
Program¶
Please compile with
> scons program=mpmvm frameworks=seq,tbb compressor=<...> aplr=default zblas=1
Only aflp, bfl, dfl, mp3 and mp2 (see compress programm) support compressed arithmetic. The mixed precision approaches perform computations in FP64 and FP32 using BLAS.
H-Matrix-Vector Multiplication¶
The results are obtained on an AMD Epyc 9554 with 64 CPU cores and 12 32GB DDR5-4800 DIMMs.
Due to its low arithmetic intensity, H-mat-vec is bandwidth limited as can be seen in the following (empirical) roofline plot.
It also demonstrates that the uncompressed H-mat-vec achieves near optimal performance on the system.
Before runtime results, the compression ratio with the different methods is shown. Aside from results for an H-matrix also BLR was tested.
H | BLR |
---|---|
The following results show the speedup compared to the multiplication with an uncompressed H-matrix. The lowrank accuracy is fixed as \(\varepsilon = 10^{-6}\).
H | BLR |
---|---|
A clear correlation between better compression and higher performance is visible as AFLP achieves best performance with best compression.
References¶
Anzt, Flegar, Grützmacher, Quintana-Ortí: “Toward a modular precision ecosystem for high-performance computing”, Int. J. of HPC Applications, 33(6), 1069–1078, 2019