IFGF.jl
A pure Julia implementation of the Interpolated Factored Green Function Method on adaptive trees
Introduction
This package implements an algorithm for approximating the forward map (i.e. the matrix-vector product) of certain kernel matrices, where kernel matrix will be used to refer to any matrix $A \in \mathbb{F}^{m\times n}$ whose entries can be computed from the following three components:
- a vector of target elements $X = \left\{ \boldsymbol{x}_i \right\}_{i=1}^m$
- a vector of source elements $Y = \left\{ \boldsymbol{y}_i \right\}_{i=1}^n$
- a kernel function $K(x,y) : X \times Y \to \mathbb{F}$, where $\mathbb{F}$ is the return type of the kernel (and of the underlying matrix)
The entries of a kernel matrix are given by the explicit formula $A_{i,j} = K(\boldsymbol{x}_i,\boldsymbol{y}_j)$. Typically, $X$ and $Y$ are vectors of points in $\mathbb{R}^d$, and $K$ is a function mapping pairs of points into $\mathbb{R}$ or $\mathbb{C}$.
Provided $K$ has sufficient structure, an approximation of the product $Ax$ can be computed in linear or log-linear complexity by various well known algorithms such as the Fast Multipole Method or hierarchical matrices; this package implements another acceleration algorithm named Interpolated Factored Green Function method.
The main structure exported by this package is the IFGFOp{T}
, which inherits from AbstractMatrix{T}
and supports some basic linear algebra operations. To construct an IFGFOp
you will need to specify a kernel K
, the target elements X
, and the source elements Y
. You may also need to pass some problem-specific information regarding K
.
To illustrate how to set up the IFGFOp
, let $X$ and $Y$ be a set of random points on the unit cube, and take $K$ to be the Helmholtz Green function in three dimensions:
\[ K(\boldsymbol{x},\boldsymbol{y}) = \frac{e^{ik|\boldsymbol{x}-\boldsymbol{y}|}}{4\pi |\boldsymbol{x}-\boldsymbol{y}|}\]
Setting up the aforementioned kernel matrix can be done as follows:
using IFGF, LinearAlgebra, StaticArrays
import IFGF: wavenumber
# random points on a cube
const Point3D = SVector{3,Float64}
m,n = 100_000, 100_000
X,Y = rand(Point3D,m), rand(Point3D,n)
# define a the kernel matrix
struct HelmholtzMatrix <: AbstractMatrix{ComplexF64}
X::Vector{Point3D}
Y::Vector{Point3D}
k::Float64
end
# indicate that this is an ocillatory kernel with wavenumber `k`
wavenumber(A::HelmholtzMatrix) = A.k
# functor interface
function (K::HelmholtzMatrix)(x,y)
k = wavenumber(K)
d = norm(x-y)
exp(im*k*d)*inv(4*pi*d)
end
# abstract matrix interface
Base.size(A::HelmholtzMatrix) = length(X), length(Y)
Base.getindex(A::HelmholtzMatrix,i::Int,j::Int) = A(A.X[i],A.Y[j])
# create the abstract matrix
k = 8π
A = HelmholtzMatrix(X,Y,k)
100000×100000 Main.HelmholtzMatrix:
-0.00685517-0.0668924im … -0.0569299-0.0664648im
-0.103151-0.0666959im -0.0759449-0.0459245im
-0.0464392-0.0735239im -0.127411+0.00151087im
-0.0907876+0.0109399im 0.0271012-0.110902im
0.158783+0.0040284im 0.0968854+0.0379255im
0.0586389-0.0574775im … -0.0571985-0.0662535im
0.0130289-0.0652858im 0.321177-0.0822165im
-0.0595217-0.0643535im -0.000843506+0.0748601im
-0.0859715-0.0257891im -0.116034-0.0448555im
0.0356727-0.107905im 0.0401748-0.106015im
⋮ ⋱
0.0122234+0.141811im -0.127569+0.0100555im
-0.0355077+0.0893647im 0.258135+0.142173im
-0.0907412+0.0115377im 0.0675232+0.0389552im
-0.0232255-0.0637052im 0.157951+0.0106334im
0.106124-0.0109096im … -0.0585052+0.0425543im
0.0528788-0.0377006im 0.0404561+0.0649188im
-0.0897826-0.0109418im -0.166481-0.110453im
0.321578-0.0283171im 0.106179-0.00168166im
0.0783546-0.0174679im -0.00132668+0.0978675im
Note that A
is a lazy $100000 \times 100000$ matrix which computes its entries on demand; as such, it has a light memory footprint. Trying to instantiate the underlying Matrix{ComplexF64}
would most likely result in you running out of memory.
To build an approximation of A
using the IFGFOp
, you may use the high-level constructor assemble_ifgf
as follows:
L = assemble_ifgf(A,X,Y;tol=1e-3)
IFGFOp of ComplexF64 with range 1:100000 × 1:100000
number of nodes: 2047
number of leaves: 1024
number of active cones: 179452
maximum number of active cones: 503
Check the documentation of assemble_ifgf
for more information on the available options. Note that L
, while not as compact as A
, still has a relatively light memory footprint:
mbytes = Base.summarysize(L) / 1e6
print("Size of L: $mbytes megabytes")
Size of L: 36.661984 megabytes
To approximate oscillatory kernels, the IFGF
algorithm adapts to the acoustic size of each of the cells in the source tree in order to keep the interpolation error constant across all levels of the tree. In practice, this means that memory and runtime will depend on both the total number of points n
and on the wavenumber k
, with smaller k
being "easier" to approximate.
We can use L
in lieu of A
to approximate the matrix vector product, as illustrated below:
x = randn(ComplexF64,n)
y = L*x
100000-element Vector{ComplexF64}:
-40.30770724779544 - 36.282838547078065im
19.55007680935604 - 62.17510636326251im
-15.434860717752908 - 0.3922166010035699im
-132.08342008016334 + 7.775392737438748im
-3.7121400129125712 - 12.093674333836026im
40.37325895191082 + 40.613244399387334im
-16.643864801075857 + 32.31369120960893im
7.097950035634436 - 69.58244714551807im
-146.8602137034202 - 107.36372474221069im
1.4281423547714946 - 39.361662833543335im
⋮
22.689612577941247 - 0.019501769494676413im
17.435713313716455 - 23.668910846592716im
20.040540277275923 + 46.2513165558736im
-35.77449751181919 - 8.017614459440509im
29.943129193458645 - 37.08607267738165im
-80.34283012208415 - 45.9711057298677im
29.267602900318707 + 9.13110119314889im
-20.5277027873903 - 8.205616670978678im
88.58374856785778 - 5.789445177128644im
The quality of the approximation may be verified by computing the exact value at a few randomly selected set of indices:
I = rand(1:m,100)
exact = [sum(A[i,j]*x[j] for j in 1:n) for i in I] # matrix-vector product
er = norm(y[I]-exact) / norm(exact) # relative error
print("approximate relative error: $er")
approximate relative error: 0.00026267647601079577
To keep things simple in this example, we neglected some important optimizations that can be performed when defining your own kernel. In particular, you can overload some default (generic) methods for your own kernel to provide e.g. a vectorized evaluation using SIMD
instructions. We also avoided the possible issue of kernel blowing up at $\boldsymbol{x} = \boldsymbol{y}$. See the the test/testutils.jl
file for an optimized definition of some classical kernels, or the Custom kernels section for a discussion on what methods can be overloaded.
Custom kernels
When defining your own kernel matrix, there is one required as well as some optional methods to overload in order to provide kernel-specific information to the IFGFOp
constructor. The table below summarizes the interface methods, where Y
is a SourceTree
, x
is an SVector
representing a point, and K
is your custom kernel:
Method's name | Required | Brief description |
---|---|---|
wavenumber(K) | Yes | Return the wavenumber of your kernel |
return_type(K) | No | Type of element returned by K(x,y) |
centered_factor(K,x,Y) | No | A representative value of K(x,y) for y ∈ Y |
inv_centered_factor(K,x,Y) | No | A representative value of inv(K(x,y)) for y ∈ Y |
transfer_factor(K,x,Y) | No | Transfer function given by inv_centered_factor(K,x,parent(Y))*centered_factor(K,x,Y) |
near_interaction!(C,K,X,Y,σ,I,J) | No | Compute C[i] <- C[i] + ∑ⱼ K(X[i],Y[j])*σ[j] for i ∈ I , j ∈ J |
Because the IFGF
algorithms must adapt its interpolation scheme depending on the frequency of oscillations to control the interpolation error, you must extend the IFGF.wavenumber
method. If the kernel is not oscillatory (e.g. exponential kernel, Newtonian potential), simply return 0
.
The return_type
method will be used to infer the type of the underlying abstract matrix. By default, it will use Base.promote_op
to try and infer a return type, but you may force that type manually if needed by extending this method.
The centered_factor
, inv_centered_factor
, and transfer_factor
have reasonable defaults, and typically are overloaded only for performance reasons (e.g. if some analytic simplifications are available).
Finally, the near_interaction
method is called at the leaves of the source tree to compute, well, the near interactions. Extending this function to make use of e.g. SIMD instructions can significantly speed up some parts of the code.
To illustrate how to define a new kernel with the aforementioned hooks, we will first extend the definition of the HelmholtzMatrix
(done in the introduction) by implementing a vectorized near_interaction
method using the LoopVectorization
package. The code below, though more complex, implements a vectorized evaluation of the dense forward map:
using LoopVectorization
function IFGF.near_interaction!(C,K::HelmholtzMatrix,X,Y,σ,I,J)
Tx = eltype(X)
Ty = eltype(Y)
@assert Tx <: SVector && Ty <: SVector
Xm = reshape(reinterpret(eltype(Tx),X),3,:)
Ym = reshape(reinterpret(eltype(Ty),Y),3,:)
@views helmholtz3d_sl_vec!(C[I],Xm[:,I],Ym[:,J],σ[J],K.k)
end
function helmholtz3d_sl_vec!(C::AbstractVector{Complex{T}},X,Y,σ,k) where {T}
@assert eltype(σ) == eltype(C)
m,n = size(X,2), size(Y,2)
C_T = reinterpret(T, C)
C_r = @views C_T[1:2:end,:]
C_i = @views C_T[2:2:end,:]
σ_T = reinterpret(T, σ)
σ_r = @views σ_T[1:2:end,:]
σ_i = @views σ_T[2:2:end,:]
@turbo for j in 1:n # LoopVectorization magic
for i in 1:m
d2 = (X[1,i] - Y[1,j])^2
d2 += (X[2,i] - Y[2,j])^2
d2 += (X[3,i] - Y[3,j])^2
d = sqrt(d2)
s, c = sincos(k * d)
zr = inv(4π*d) * c
zi = inv(4π*d) * s
C_r[i] += (!iszero(d))*(zr*σ_r[j] - zi*σ_i[j])
C_i[i] += (!iszero(d))*(zi*σ_r[j] + zr*σ_i[j])
end
end
return C
end
The other optimization that we can perform for the HelmhotlzMatrix
is to rewrite the transfer_factor
so that it uses only one exponential. The default definition of transfer_factor(K,x,Y)
simply divides K(x,yc)
by K(x,yp)
, where yc
is the center of the source box Y
and yp
is the center of the parent of Y
. This division can be simplified to avoid computing two complex exponentials as follows:
function IFGF.transfer_factor(K::HelmholtzMatrix,x,Y)
yc = IFGF.center(Y)
yp = IFGF.center(IFGF.parent(Y))
d = norm(x-yc)
dp = norm(x-yp)
exp(im*K.k*(d-dp))*dp/d
end
We can check that the result is still correct with the code below:
y = L*x
er = norm(y[I]-exact) / norm(exact) # relative error
print("approximate relative error: $er")
approximate relative error: 0.00026267647601081084
Of course, you should benchmark your code (with your specific kernel) to see if and when it may be useful to provide faster versions of these methods.
Everything said so far applies if K
returns a tensor instead of a scalar. This is the case e.g. for the dyadic Green function for time-harmonic Maxwell's equations, given by
\[ K(\boldsymbol{x},\boldsymbol{y}) = \mathbb{G}(\boldsymbol{x}, \boldsymbol{y}) := \left(\mathbb{I}+\frac{\nabla_{\boldsymbol{x}} \nabla_{\boldsymbol{x}}}{k^{2}}\right) G(\boldsymbol{x}, \boldsymbol{y}),\]
where $k$ is a constant which depends on the electric permittivity, the magnetic permeability, and the angular frequency, and where
\[G(\boldsymbol{x},\boldsymbol{y}) = \frac{e^{ik|\boldsymbol{x}-\boldsymbol{y}|}}{4\pi |\boldsymbol{x}-\boldsymbol{y}|},\]
is the Helmholtz Green's function.
Defining for example a MaxwellKernel
as done below, and calling the assemble_ifgf
constructor, should work as expected. Note that, for performance reasons, tensor-valued kernels will typically return a StaticArray
.
using IFGF, StaticArrays, LinearAlgebra
struct MaxwellKernel
k::Float64
end
function (K::MaxwellKernel)(x,y)
r = x - y
d = norm(r)
# helmholtz greens function
g = exp(im*K.k*d)/(4π*d)
gp = im*K.k*g - g/d
gpp = im*K.k*gp - gp/d + g/d^2
RRT = r*transpose(r) # rvec ⊗ rvecᵗ
G = g*LinearAlgebra.I + 1/K.k^2*(gp/d*LinearAlgebra.I + (gpp/d^2 - gp/d^3)*RRT)
return (!iszero(d))*G
end
import IFGF: wavenumber
wavenumber(K::MaxwellKernel) = K.k
Advanced usage
The assemble_ifgf
provides a high-level constructor for the IFGFOp
structure by making various default choices of clustering algorithms, admisibility condition, how the interpolation cones should be created, etc. To obtain a more granular control over these parameters, you have to create them independently and pass them to the (default) IFGF
constructor
Resuming the example for the Helmholtz kernel, we will go over all the steps to manually construct the fields of the IFGFOp
.