Home
Page
Activities
- Overview
- Benefits
- Guiding principles
- Architecture
- Infrastructure
- Metadata pipeline
- Starter kit
- Browse datasets
- Search datasets
- MATLAB API
- Python API
- Tutorials
- REST-API
- Visualization
- BIDS data tools
- dataset_description.json
- BIDS-to-JSON (neuroj)
- BIDS-to-JSON (MATLAB)
- MATLAB/Octave
- jsonlab jnifty jdict jsnirfy zmat easyh5
- Python
- jdata bjdata
- JSON data formats
- JData BJData JNIfTI JSNIRF JMesh JSONmmap
Login

BLIT — Block Iterative Linear Solvers

High-performance sparse solvers for symmetric FEM matrices

Real Symmetric Complex Symmetric (A = Aᵀ) Sparse & Dense OpenMP Parallel

Python

MATLAB

Fortran

C++

Get Started GitHub 🐍 PyPI (v0.9.4)

Why Block QMR?

🧮

Block Solver

Multiple RHS simultaneously, sharing Krylov subspace for 4× fewer iterations

∑

Complex Symmetry

Quasi inner product ⟨x,y⟩ = Σxₖyₖ — natural for A = Aᵀ systems

🛡️

Preconditioned

Built-in ILU, Jacobi, and split preconditioners

⚡

Up to 84× Faster

Outperforms SuperLU and PARDISO for large sparse systems

💾

Memory Efficient

Short three-term recurrences, not full Krylov basis

🔀

OpenMP Parallel

Multi-threaded RHS solving with automatic BLAS thread control

Applications

🔬 Diffuse Optical Tomography

📡 Frequency-Domain EM

🔊 Acoustic Scattering

🩺 Microwave Imaging

🌊 Helmholtz Equations

⚙️ Structural Dynamics

Simple API

# pip install blocksolver
import numpy as np
from blocksolver import blqmr
from scipy.sparse import diags

# Complex symmetric FEM matrix
n = 5000
A = diags([-1, 4+0.1j, -1], [-1, 0, 1], shape=(n, n), dtype=complex)

# Block solve: 16 RHS at once
B = np.random.randn(n, 16) + 0j
result = blqmr(A, B, tol=1e-10, precond_type='diag')

print(f"Converged: {result.converged}")

% Complex symmetric FEM matrix
n = 5000;
K = spdiags([-ones(n,1) 4*ones(n,1) -ones(n,1)], -1:1, n, n);
A = K + 0.1i * speye(n);

% Block solve: 16 RHS at once
B = rand(n, 16) + 1i*rand(n, 16);
opt.precond = 'diag';
[X, flag] = blqmr(A, B, 1e-10, 1000, [], [], [], opt);

program solve_fem
    use blit_blqmr_real
    implicit none
    
    type(BLQMRSolver) :: qmr
    integer :: n, nnz, nrhs
    integer, allocatable :: Ap(:), Ai(:)
    real(8), allocatable :: Ax(:), B(:,:), X(:,:)
    
    n = 5000; nnz = 14998; nrhs = 16
    ! ... allocate and fill CSC arrays ...
    
    call BLQMRCreate(qmr, n)
    qmr%maxit = 1000
    qmr%qtol = 1.0d-10
    qmr%pcond_type = 3  ! Jacobi
    
    call BLQMRPrep(qmr, Ap, Ai, Ax, nnz)
    call BLQMRSolve(qmr, Ap, Ai, Ax, nnz, X, B, nrhs)
    call BLQMRDestroy(qmr)
end program

#include "blit_solvers.h"

int main() {
    const int n = 5000, nnz = 14998, nrhs = 16;
    std::vector<int> Ap(n+1), Ai(nnz);
    std::vector<double> Ax(nnz), b(n*nrhs), x(n*nrhs);
    
    BlitBLQMR<double> solver(n, nrhs, 1000, 1e-3);
    solver.Prepare(Ap.data(), Ai.data(), Ax.data(), nnz);
    solver.Solve(x.data(), b.data(), nrhs);
    return 0;
}

Performance

Complex symmetric Helmholtz FEM matrix (tetrahedral mesh), Jacobi split preconditioner, OpenMP with nblock=1. Tested on 20-core workstation.

84×

vs SuperLU (64K, 2 RHS)

4.9×

vs PARDISO (64K, 2 RHS)

4.3×

OpenMP Speedup (20 threads)

~130

Iterations (all RHS counts)

📊 BLQMR vs SuperLU — Solve Time (4 RHS)

Scaling: SuperLU grows as O(n²) due to fill-in. BLQMR scales nearly linearly with nnz — 70× faster at 40³ grid.

📊 BLQMR vs PARDISO — Solve Time (4 RHS)

Finding: PARDISO (MKL, multi-threaded) is much faster than SuperLU but BLQMR still wins — 4.7× faster at 40³.

⚡ Speedup vs SuperLU by Grid Size

Trend: Speedup increases with problem size. Even at 128 RHS, BLQMR is 6.8× faster for the 40³ grid.

⚡ Speedup vs PARDISO by Grid Size

Crossover: BLQMR beats PARDISO for ≤8 RHS at all grid sizes, and for ≤16 RHS at grids ≥25³.

🎯 Crossover: BLQMR vs SuperLU

Speedup = SuperLU_time / BLQMR_time. Green (>1.0) = BLQMR wins.

Grid	1	2	4	8	16	32	64	128
10³ (1K)	4.1	0.65	0.57	0.83	0.64	0.50	0.88	0.62
15³ (3K)	6.6	3.3	3.2	3.3	3.1	1.7	1.4	0.83
20³ (8K)	9.4	8.0	7.7	8.7	4.4	3.2	1.9	1.3
25³ (16K)	25.6	19.5	19.2	18.9	13.5	6.1	4.2	2.8
30³ (27K)	38.1	36.2	35.5	34.0	17.9	11.4	6.6	4.3
35³ (43K)	42.1	60.5	58.8	51.8	29.9	16.1	9.5	5.9
40³ (64K)	69.5	83.7	79.8	70.1	39.3	20.4	11.2	6.8

BLQMR Wins

SuperLU Wins

Dominance: BLQMR wins in 49 of 56 test cases. SuperLU only wins for the smallest grid (10³) with ≥2 RHS. For n ≥ 3,375, BLQMR always wins.

🎯 Crossover: BLQMR vs PARDISO (MKL)

Speedup = PARDISO_time / BLQMR_time. Green (>1.0) = BLQMR wins. PARDISO uses real 2×2 block form for complex matrices.

Grid	1	2	4	8	16	32	64	128
10³ (1K)	4.3	0.68	0.67	0.66	0.60	0.43	0.26	0.17
15³ (3K)	3.9	1.2	1.1	0.97	0.61	0.32	0.26	0.18
20³ (8K)	2.2	1.5	1.4	1.0	0.82	0.58	0.29	0.16
25³ (16K)	2.6	2.2	2.1	1.9	1.2	0.58	0.32	0.18
30³ (27K)	3.0	2.9	2.9	2.7	1.5	0.77	0.42	0.25
35³ (43K)	3.0	3.9	3.7	3.6	1.9	0.96	0.51	0.30
40³ (64K)	3.9	4.9	4.7	4.4	2.2	1.1	0.58	0.32

BLQMR Wins

Break-even

PARDISO Wins

Key Insight: BLQMR beats PARDISO for ≤8 RHS at all sizes and for ≤16 RHS when n ≥ 16K. PARDISO's advantage grows with RHS count since its factorization cost is amortized. At 40³ with 32 RHS, BLQMR still wins at 1.1×.

Real symmetric 3D Laplacian (7-point stencil), 4 RHS, MATLAB R2024a.

14×

Max Speedup vs Direct

4-16

Optimal Block Size

<1/m

Super-linear Iteration Reduction

2×

Faster than point-QMR

📊 Speedup vs Direct Solver (Complex, 4 RHS)

Finding: BLQMR achieves 14× speedup at 125K nodes. For single RHS, speedup reaches 65×!

⏱️ Solve Time Comparison

Scaling: Direct solver time grows as O(n²), while BLQMR scales nearly linearly — O(nnz + nm²).

🔢 Block Size Optimization (64 RHS, 20³)

Optimal: Block size 16-32 balances iteration reduction with per-iteration cost.

📉 Iteration Efficiency

Super-linear: RHS-grouping resulting in iteration reduction, better than theoretical 1/m — Krylov info sharing works!

🎯 Crossover Analysis: BLQMR vs Direct Solver (MATLAB)

Speedup = Direct_time / BLQMR_time. Green (>1.0) = BLQMR wins.

Grid	1 RHS	2 RHS	4 RHS	8 RHS	16 RHS	32 RHS	64 RHS	128 RHS
10³ (1K)	0.65	0.45	0.44	0.29	0.19	0.16	0.15	0.13
15³ (3K)	4.4	1.3	0.96	0.61	0.34	0.26	0.22	0.20
20³ (8K)	6.5	3.3	2.2	1.3	0.69	0.53	0.42	0.35
25³ (16K)	10.3	4.4	2.7	1.5	0.94	0.73	0.57	0.51
30³ (27K)	16.3	9.1	5.4	3.2	1.9	1.3	1.0	0.77

BLQMR Wins

Break-even

Direct Wins

Key Insight: For n ≥ 3,375 nodes, BLQMR wins with few RHS. Larger systems benefit even with many RHS. The crossover shifts right as problem size decreases.

Get Started

Python

pip install blocksolver

Fortran Backend (Optional)

Linux: apt install gfortran libsuitesparse-dev

MATLAB/Octave

addpath('/path/to/blit/matlab')

Register for future updates

https://github.com/fangq/blit/releases/tag/v0.5.0

References

Microwave image reconstruction from 3-D fields coupled to 2-D parameter estimation

Fang Q, Meaney PM, Geimer SD, et al.

IEEE Transactions on Medical Imaging, 23(4):475-484, 2004

Download

A block QMR method for computing multiple simultaneous solutions

Boyse WE, Seidl AA

SIAM Journal on Scientific Computing, 17(1):263-274, 1996