Assembling boundary integral operators is expensive since the matrices are generally dense. If is the number of elements in the grid, the overall complexity of the assembly grows like and the cost of a matrix-vector product grows like . This makes large problems infeasible on standard hardware.

In recent years two techniques have been established to speed-up the assembly of boundary integral operators: fast multipole methods (FMM) and hierarchical matrix (H-matrix) techniques. Bempp currently implements the latter one. It brings down the cost of assembly and matrix-vector products for integral operators to , making even larger problems with hundreds of thousands of degrees of freedom possible on a single workstation.

## Introduction to hierarchical matrices

Hierarchical matrices are a complex topic. For a good introduction we recommend

the MPI lecture notes by Börm, Grasedyck & Hackbusch and the book *Hierarchical Matrices* by Hackbusch.

## Using hierarchical matrix assembly

Fast H-matrix assembly of boundary and potential operators is enabled by default in Bempp. The corresponding options are bempp.api.global_parameters.assembly.boundary_operator_assembly_type and bempp.api.global_parameters.assembly.potential_operator_assembly_type. By default both are set to `hmat`

, which enables H-matrix assembly. For very small problems it is advisable to change the boundary operator assembly type to dense as there is a larger overhead of H-matrix techniques for small problems.

A range of parameters controls the H-matrix assembly itself. In the following, we summarise all parameters that control the assembly.

The most important parameter is the accuracy parameter bempp.api.global_parameters.hmat.eps. This specifies the accuracy of the H-matrix approximation, and by default is set to `1E-3`

. The best value is problem dependent.

For very large problems, it is advisable to increase the value of bempp.api.global_parameters.hmat.max_block_size. As default it is set to 2048. This restricts the largest size an admissible matrix block can have that will be low-rank approximated. Choosing a very large value can have negative effect on the load-balancing between the different cores during assembly. Choosing a too small value can lead to a significant increase in overall computational complexity.

By default, Bempp uses a coarsening strategy to post-process the hierarchical matrix assembly and further reduce the amount of memory an H-matrix requires. This post-processing is based an randomised low-rank approximations. In most of our experiments, it increased the assembly time by around 10% and often reduced the memory consumption by close to 50%. However, if it is not desired then the parameter bempp.api.global_parameters.hmat.coarsening should be set to False.

The accuracy of the coarsening strategy is by default the same as the assembly. Other values can be set by modifying bempp.api.global_parameters.hmat.coarsening_accuracy.

## Querying hierarchical matrices

Operators assembled via H-matrices mostly behave in the same way as operators assembled using dense matrix techniques and the interface is identical. However, it is often desirable to query advanced information from H-matrices. This is implemented in the module bempp.api.hmatrix_interface.

Let us assemble a simple H-matrix operator over a regular sphere.

grid = bempp.api.shapes.regular_sphere(4)

space = bempp.api.function_space(grid, “DP”, 0)

discrete_operator = bempp.api.operators.boundary.laplace.single_layer(

space, space, space).weak_form()

Since H-matrix assembly is automatically active we do not have to do anything else. We can now collect some information on the H-matrix.

To find out the memory consumption in kB we can use the following command.

This will show the amount of storage that the H-matrix data needs (not that it does not count storage of administrational data such as the underlying tree structure).

The total number of blocks in the H-matrix is given by

Similar commands exist for the number of dense blocks and the number of low-rank blocks.

Bempp also gives access to the complete underlying tree structure. To obtain the block cluster tree the following command can be used

To plot the tree call:

This command requires that PyQt4 is installed.

An iterator over all leaf nodes of the tree is given by tree.leaf_nodes. The root node of the tree is obtained by:

From the root node, the tree can be traversed hierarchically. The children of a node are available through an iterator. Hence, to iterate through the four children of root use:

print((child.row_cluster_range, child.column_cluster_range))

((0, 1024), (1024, 2048))

((1024, 2048), (0,1024))

((1024,2048), (1024, 2048))

This prints out the ranges of the four subnodes of root. Subnodes are traversed in C-style order. Hence, if at the first level the H-matrix has the form

the children iterator returns the nodes in the order , , , .

To get from the leaves of the block cluster tree to the data associated with the leaves the function bempp.api.hmatrix_interface.data_block can be used.

For example, the following code makes a histogram plot of the ranks of the data for all admissible nodes in the above created H-matrix.

admissible_nodes = [node for node in tree.leaf_nodes if node.admissible]

ranks = [bempp.api.hmatrix_interface.data_block(discrete_operator, node).rank

for node in admissible_nodes]

plt.hist(ranks)

plt.xlabel(‘Rank’)

plt.ylabel(‘Number of blocks’)

plt.show()

Most low-rank data blocks are compressed down to a rank of around six.