# Mat-Vec with Compression¶

This extends the compression approach by implementing matrix-vector multiplication with on-the-fly decompression as in [1]. All computations are sill performed in double precision (FP64).

See also https://arxiv.org/abs/2405.03456.

## Program¶

Please compile with

```
> scons program=mpmvm frameworks=seq,tbb compressor=<...> aplr=default zblas=1
```

Only *aflp, bfl, dfl, mp3* and *mp2* (see *compress* programm) support compressed arithmetic. The mixed precision
approaches perform computations in FP64 and FP32 using BLAS.

## H-Matrix-Vector Multiplication¶

The results are obtained on an AMD Epyc 9554 with 64 CPU cores and 12 32GB DDR5-4800 DIMMs.

Due to its low arithmetic intensity, H-mat-vec is bandwidth limited as can be seen in the following (empirical) roofline plot.

It also demonstrates that the uncompressed H-mat-vec achieves near optimal performance on the system.

Before runtime results, the compression ratio with the different methods is shown. Aside from results for an H-matrix also BLR was tested.

H | BLR |
---|---|

The following results show the speedup compared to the multiplication with an uncompressed H-matrix. The lowrank accuracy is fixed as \(\varepsilon = 10^{-6}\).

H | BLR |
---|---|

A clear correlation between better compression and higher performance is visible as AFLP achieves best performance with best compression.

## References¶

Anzt, Flegar, Grützmacher, Quintana-Ortí:

*“Toward a modular precision ecosystem for high-performance computing”*, Int. J. of HPC Applications, 33(6), 1069–1078, 2019