Methods and Algorithms¶
This page provides mathematical details and algorithmic descriptions of the methods implemented in sparse-kappa.
1-Norm Methods¶
Hager-Higham Algorithm¶
Mathematical Background
The 1-norm of a matrix \(A \in \mathbb{R}^{n \times n}\) is defined as:
The 1-norm condition number is:
Algorithm
The Hager-Higham algorithm estimates \(\|A^{-1}\|_1\) via iterative refinement:
Start with initial vector \(x = \frac{1}{n} \mathbf{1}\)
For \(k = 1, 2, \ldots\) until convergence:
Solve \(A^T z = \text{sign}(x)\)
Find \(j = \arg\max_i |z_i|\)
If converged, stop
Solve \(A x = e_j\) where \(e_j\) is the \(j\)-th unit vector
Compute \(\|x\|_1\)
Return \(\|A^{-1}\|_1 \approx \|x\|_1\)
Complexity: \(O(k \cdot \text{nnz}(A))\) where \(k\) is the number of iterations (typically 3-5)
References:
Hager, W. W. (1984). “Condition estimates.” SIAM J. Sci. Stat. Comput., 5(2), 311-316.
Higham, N. J., & Tisseur, F. (2000). “A block algorithm for matrix 1-norm estimation.” SIAM J. Matrix Anal. Appl., 21(4), 1185-1201.
2-Norm Methods¶
Power Method¶
Mathematical Background
The power method finds the largest singular value \(\sigma_{\max}(A)\) by iterating:
For the smallest singular value, use inverse iteration on \(A^T A\).
Algorithm
For \(\sigma_{\max}\):
Initialize random \(v_0\), normalize
Iterate \(v_{k+1} = \frac{A^T A v_k}{\|A^T A v_k\|}\)
Rayleigh quotient: \(\sigma^2 = v_k^T (A^T A) v_k\)
For \(\sigma_{\min}\):
Use shifted inverse iteration
Return \(\kappa_2(A) = \frac{\sigma_{\max}}{\sigma_{\min}}\)
Complexity: \(O(k \cdot \text{nnz}(A))\) per iteration
Convergence: Linear, rate depends on ratio \(\frac{\sigma_2}{\sigma_1}\)
Lanczos Method¶
Mathematical Background
The Lanczos algorithm tridiagonalizes a symmetric matrix \(B = A^T A\):
where \(T\) is tridiagonal and \(Q\) is orthogonal.
Eigenvalues of \(T\) approximate eigenvalues of \(B\), and \(\sigma(A) = \sqrt{\lambda(A^T A)}\).
Algorithm
Initialize \(q_1\) random, \(\beta_0 = 0\), \(q_0 = 0\)
For \(j = 1, \ldots, m\):
\(v = A^T (A q_j)\)
\(\alpha_j = q_j^T v\)
\(v = v - \alpha_j q_j - \beta_{j-1} q_{j-1}\)
\(\beta_j = \|v\|\)
\(q_{j+1} = v / \beta_j\)
Form tridiagonal matrix \(T\) with diagonals \(\alpha\) and off-diagonals \(\beta\)
Compute eigenvalues of \(T\)
Return \(\kappa_2(A) = \sqrt{\frac{\lambda_{\max}}{\lambda_{\min}}}\)
Complexity: \(O(m^2 \cdot \text{nnz}(A))\) for \(m\) iterations
Advantages:
Fast convergence for well-separated eigenvalues
Numerically stable with reorthogonalization
References:
Lanczos, C. (1950). “An iteration method for the solution of the eigenvalue problem.”
Golub, G. H., & Van Loan, C. F. (2013). “Matrix Computations” (4th ed.), Chapter 10.
Arnoldi Method¶
Mathematical Background
The Arnoldi iteration is a generalization of Lanczos for non-symmetric matrices. It builds an orthonormal basis for the Krylov subspace:
Algorithm
Initialize \(q_1 = v / \|v\|\)
For \(j = 1, \ldots, m\):
\(v = A^T (A q_j)\)
For \(i = 1, \ldots, j\):
\(h_{ij} = q_i^T v\)
\(v = v - h_{ij} q_i\)
\(h_{j+1,j} = \|v\|\)
\(q_{j+1} = v / h_{j+1,j}\)
Compute eigenvalues of Hessenberg matrix \(H = [h_{ij}]\)
Return condition number from extreme eigenvalues
Complexity: \(O(m^2 n + m \cdot \text{nnz}(A))\) for \(m\) iterations
References:
Arnoldi, W. E. (1951). “The principle of minimized iterations.”
Saad, Y. (2011). “Numerical Methods for Large Eigenvalue Problems” (2nd ed.).
Golub-Kahan Bidiagonalization¶
Mathematical Background
Instead of forming \(A^T A\), the Golub-Kahan algorithm directly bidiagonalizes \(A\):
where \(B\) is bidiagonal and \(U, V\) are orthogonal.
Singular values of \(B\) equal singular values of \(A\) (numerically more stable).
Algorithm
Initialize \(u_1 = 0\), \(\beta_1 v_1 = b\) (random \(b\))
For \(j = 1, \ldots, m\):
\(u_j = A v_j - \beta_j u_{j-1}\)
\(\alpha_j = \|u_j\|\), \(u_j = u_j / \alpha_j\)
\(v_{j+1} = A^T u_j - \alpha_j v_j\)
\(\beta_{j+1} = \|v_{j+1}\|\), \(v_{j+1} = v_{j+1} / \beta_{j+1}\)
Form bidiagonal matrix \(B\) with diagonals \(\alpha\) and superdiagonals \(\beta\)
Compute singular values of \(B\) (via QR iteration)
Return \(\kappa_2(A) = \frac{\sigma_{\max}}{\sigma_{\min}}\)
Complexity: \(O(m \cdot \text{nnz}(A))\) for \(m\) iterations
Advantages:
More numerically stable than forming \(A^T A\)
Direct computation of singular values
Better for ill-conditioned matrices
References:
Golub, G. H., & Kahan, W. (1965). “Calculating the singular values and pseudo-inverse of a matrix.”
Golub, G. H., & Van Loan, C. F. (2013). “Matrix Computations” (4th ed.), Section 8.6.
PyTorch Wrapper Methods¶
SVDS¶
Wraps PyTorch’s sparse_kappa.backend.sparse.linalg.svds, which implements:
Implicitly restarted Lanczos method
Computes \(k\) largest or smallest singular values
Most accurate method for moderate-sized problems
Usage: Best for matrices with \(n < 10{,}000\) when high accuracy is required.
EIGSH¶
Wraps PyTorch’s sparse_kappa.backend.sparse.linalg.eigsh for symmetric matrices:
Thick-restart Lanczos method (ARPACK-style)
Computes \(k\) largest/smallest eigenvalues
Optimized for Hermitian/symmetric matrices
Usage: Symmetric or Hermitian matrices, faster than general methods.
LOBPCG¶
Locally Optimal Block Preconditioned Conjugate Gradient:
Block algorithm: computes multiple eigenvalues simultaneously
Can incorporate preconditioners
Efficient for very large matrices
Algorithm:
Start with block of random vectors \(X\)
Iterate:
Compute \(W = A X\)
Rayleigh-Ritz procedure on subspace
Update \(X\) with refined eigenvectors
Usage: Large matrices (\(n > 50{,}000\)), especially with good preconditioners.
References:
Knyazev, A. V. (2001). “Toward the optimal preconditioned eigensolver.”
Algorithm Comparison¶
Method |
Complexity |
Memory |
Best For |
Notes |
|---|---|---|---|---|
Hager-Higham |
\(O(k \cdot \text{nnz})\) |
Low |
1-norm estimation |
Fast, 3-5 iterations |
Power |
\(O(k \cdot \text{nnz})\) |
Low |
Quick estimates |
Simple, robust |
Lanczos |
\(O(m^2 \cdot \text{nnz})\) |
Medium |
General purpose |
Good balance |
Arnoldi |
\(O(m^2 n)\) |
High |
Non-symmetric |
More memory |
Golub-Kahan |
\(O(m \cdot \text{nnz})\) |
Low |
Ill-conditioned |
Most stable |
SVDS |
\(O(k \cdot \text{nnz})\) |
Medium |
High accuracy |
Most accurate |
EIGSH |
\(O(k \cdot \text{nnz})\) |
Medium |
Symmetric matrices |
Optimized |
LOBPCG |
\(O(k \cdot \text{nnz})\) |
Medium |
Very large |
Preconditionable |
Convergence Theory¶
For power iteration methods, convergence rate depends on the eigenvalue gap:
For Krylov methods (Lanczos, Arnoldi), convergence is faster and depends on polynomial approximation properties.
Numerical Stability¶
Loss of Orthogonality
In Lanczos/Arnoldi, vectors can lose orthogonality due to rounding errors. Solutions:
Full reorthogonalization (expensive)
Selective reorthogonalization (adaptive)
Partial reorthogonalization (sparse-kappa default)
Condition Number Effects
For \(\kappa(A) \approx 10^d\), expect \(d\) digits of accuracy loss in IEEE double precision.
Practical Recommendations¶
Start with ``auto`` method selection
For symmetric matrices: Use
eigshFor high accuracy: Use
svds(if \(n < 10{,}000\))For large sparse: Use
golub-kahanorlobpcgFor ill-conditioned: Use
golub-kahan(more stable)For quick estimates: Use
powermethod