Mat-Vec with Compression

This extends the compression approach by implementing matrix-vector multiplication with on-the-fly decompression as in [1]. All computations are sill performed in double precision (FP64).

See also https://arxiv.org/abs/2405.03456.


Program

Please compile with

> scons program=mpmvm frameworks=seq,tbb compressor=<...> aplr=default zblas=1

Only aflp, bfl, dfl, mp3 and mp2 (see compress programm) support compressed arithmetic. The mixed precision approaches perform computations in FP64 and FP32 using BLAS.

H-Matrix-Vector Multiplication

The results are obtained on an AMD Epyc 9554 with 64 CPU cores and 12 32GB DDR5-4800 DIMMs.

Due to its low arithmetic intensity, H-mat-vec is bandwidth limited as can be seen in the following (empirical) roofline plot.

It also demonstrates that the uncompressed H-mat-vec achieves near optimal performance on the system.

Before runtime results, the compression ratio with the different methods is shown. Aside from results for an H-matrix also BLR was tested.

H BLR

The following results show the speedup compared to the multiplication with an uncompressed H-matrix. The lowrank accuracy is fixed as \(\varepsilon = 10^{-6}\).

H BLR

A clear correlation between better compression and higher performance is visible as AFLP achieves best performance with best compression.

References

  1. Anzt, Flegar, Grützmacher, Quintana-Ortí: “Toward a modular precision ecosystem for high-performance computing”, Int. J. of HPC Applications, 33(6), 1069–1078, 2019