BLIT — Block Iterative Linear Solvers
High-performance sparse solvers for symmetric FEM matrices
High-performance sparse solvers for symmetric FEM matrices
Multiple RHS simultaneously, sharing Krylov subspace for 4× fewer iterations
Quasi inner product ⟨x,y⟩ = Σxₖyₖ — natural for A = Aᵀ systems
Built-in ILU, Jacobi, and split preconditioners
Outperforms SuperLU and PARDISO for large sparse systems
Short three-term recurrences, not full Krylov basis
Multi-threaded RHS solving with automatic BLAS thread control
# pip install blocksolver import numpy as np from blocksolver import blqmr from scipy.sparse import diags # Complex symmetric FEM matrix n = 5000 A = diags([-1, 4+0.1j, -1], [-1, 0, 1], shape=(n, n), dtype=complex) # Block solve: 16 RHS at once B = np.random.randn(n, 16) + 0j result = blqmr(A, B, tol=1e-10, precond_type='diag') print(f"Converged: {result.converged}")
% Complex symmetric FEM matrix n = 5000; K = spdiags([-ones(n,1) 4*ones(n,1) -ones(n,1)], -1:1, n, n); A = K + 0.1i * speye(n); % Block solve: 16 RHS at once B = rand(n, 16) + 1i*rand(n, 16); opt.precond = 'diag'; [X, flag] = blqmr(A, B, 1e-10, 1000, [], [], [], opt);
program solve_fem use blit_blqmr_real implicit none type(BLQMRSolver) :: qmr integer :: n, nnz, nrhs integer, allocatable :: Ap(:), Ai(:) real(8), allocatable :: Ax(:), B(:,:), X(:,:) n = 5000; nnz = 14998; nrhs = 16 ! ... allocate and fill CSC arrays ... call BLQMRCreate(qmr, n) qmr%maxit = 1000 qmr%qtol = 1.0d-10 qmr%pcond_type = 3 ! Jacobi call BLQMRPrep(qmr, Ap, Ai, Ax, nnz) call BLQMRSolve(qmr, Ap, Ai, Ax, nnz, X, B, nrhs) call BLQMRDestroy(qmr) end program
#include "blit_solvers.h" int main() { const int n = 5000, nnz = 14998, nrhs = 16; std::vector<int> Ap(n+1), Ai(nnz); std::vector<double> Ax(nnz), b(n*nrhs), x(n*nrhs); BlitBLQMR<double> solver(n, nrhs, 1000, 1e-3); solver.Prepare(Ap.data(), Ai.data(), Ax.data(), nnz); solver.Solve(x.data(), b.data(), nrhs); return 0; }
Complex symmetric Helmholtz FEM matrix (tetrahedral mesh), Jacobi split preconditioner, OpenMP with nblock=1. Tested on 20-core workstation.
Speedup = SuperLU_time / BLQMR_time. Green (>1.0) = BLQMR wins.
| Grid | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 |
|---|---|---|---|---|---|---|---|---|
| 10³ (1K) | 4.1 | 0.65 | 0.57 | 0.83 | 0.64 | 0.50 | 0.88 | 0.62 |
| 15³ (3K) | 6.6 | 3.3 | 3.2 | 3.3 | 3.1 | 1.7 | 1.4 | 0.83 |
| 20³ (8K) | 9.4 | 8.0 | 7.7 | 8.7 | 4.4 | 3.2 | 1.9 | 1.3 |
| 25³ (16K) | 25.6 | 19.5 | 19.2 | 18.9 | 13.5 | 6.1 | 4.2 | 2.8 |
| 30³ (27K) | 38.1 | 36.2 | 35.5 | 34.0 | 17.9 | 11.4 | 6.6 | 4.3 |
| 35³ (43K) | 42.1 | 60.5 | 58.8 | 51.8 | 29.9 | 16.1 | 9.5 | 5.9 |
| 40³ (64K) | 69.5 | 83.7 | 79.8 | 70.1 | 39.3 | 20.4 | 11.2 | 6.8 |
Speedup = PARDISO_time / BLQMR_time. Green (>1.0) = BLQMR wins. PARDISO uses real 2×2 block form for complex matrices.
| Grid | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 |
|---|---|---|---|---|---|---|---|---|
| 10³ (1K) | 4.3 | 0.68 | 0.67 | 0.66 | 0.60 | 0.43 | 0.26 | 0.17 |
| 15³ (3K) | 3.9 | 1.2 | 1.1 | 0.97 | 0.61 | 0.32 | 0.26 | 0.18 |
| 20³ (8K) | 2.2 | 1.5 | 1.4 | 1.0 | 0.82 | 0.58 | 0.29 | 0.16 |
| 25³ (16K) | 2.6 | 2.2 | 2.1 | 1.9 | 1.2 | 0.58 | 0.32 | 0.18 |
| 30³ (27K) | 3.0 | 2.9 | 2.9 | 2.7 | 1.5 | 0.77 | 0.42 | 0.25 |
| 35³ (43K) | 3.0 | 3.9 | 3.7 | 3.6 | 1.9 | 0.96 | 0.51 | 0.30 |
| 40³ (64K) | 3.9 | 4.9 | 4.7 | 4.4 | 2.2 | 1.1 | 0.58 | 0.32 |
Real symmetric 3D Laplacian (7-point stencil), 4 RHS, MATLAB R2024a.
Speedup = Direct_time / BLQMR_time. Green (>1.0) = BLQMR wins.
| Grid | 1 RHS | 2 RHS | 4 RHS | 8 RHS | 16 RHS | 32 RHS | 64 RHS | 128 RHS |
|---|---|---|---|---|---|---|---|---|
| 10³ (1K) | 0.65 | 0.45 | 0.44 | 0.29 | 0.19 | 0.16 | 0.15 | 0.13 |
| 15³ (3K) | 4.4 | 1.3 | 0.96 | 0.61 | 0.34 | 0.26 | 0.22 | 0.20 |
| 20³ (8K) | 6.5 | 3.3 | 2.2 | 1.3 | 0.69 | 0.53 | 0.42 | 0.35 |
| 25³ (16K) | 10.3 | 4.4 | 2.7 | 1.5 | 0.94 | 0.73 | 0.57 | 0.51 |
| 30³ (27K) | 16.3 | 9.1 | 5.4 | 3.2 | 1.9 | 1.3 | 1.0 | 0.77 |
pip install blocksolver
Linux: apt install gfortran libsuitesparse-dev
addpath('/path/to/blit/matlab')