Methods and Algorithms¶

This page provides mathematical details and algorithmic descriptions of the methods implemented in sparse-kappa.

1-Norm Methods¶

Hager-Higham Algorithm¶

Mathematical Background

The 1-norm of a matrix \(A \in \mathbb{R}^{n \times n}\) is defined as:

\[\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|\]

The 1-norm condition number is:

\[\kappa_1(A) = \|A\|_1 \cdot \|A^{-1}\|_1\]

Algorithm

The Hager-Higham algorithm estimates \(\|A^{-1}\|_1\) via iterative refinement:

Start with initial vector \(x = \frac{1}{n} \mathbf{1}\)
For \(k = 1, 2, \ldots\) until convergence:
1. Solve \(A^T z = \text{sign}(x)\)
2. Find \(j = \arg\max_i |z_i|\)
3. If converged, stop
4. Solve \(A x = e_j\) where \(e_j\) is the \(j\)-th unit vector
5. Compute \(\|x\|_1\)
Return \(\|A^{-1}\|_1 \approx \|x\|_1\)

Complexity: \(O(k \cdot \text{nnz}(A))\) where \(k\) is the number of iterations (typically 3-5)

References:

Hager, W. W. (1984). “Condition estimates.” SIAM J. Sci. Stat. Comput., 5(2), 311-316.
Higham, N. J., & Tisseur, F. (2000). “A block algorithm for matrix 1-norm estimation.” SIAM J. Matrix Anal. Appl., 21(4), 1185-1201.

2-Norm Methods¶

Power Method¶

Mathematical Background

The power method finds the largest singular value \(\sigma_{\max}(A)\) by iterating:

\[v_{k+1} = \frac{A^T A v_k}{\|A^T A v_k\|}\]

For the smallest singular value, use inverse iteration on \(A^T A\).

Algorithm

For \(\sigma_{\max}\):
1. Initialize random \(v_0\), normalize
2. Iterate \(v_{k+1} = \frac{A^T A v_k}{\|A^T A v_k\|}\)
3. Rayleigh quotient: \(\sigma^2 = v_k^T (A^T A) v_k\)
For \(\sigma_{\min}\):

Use shifted inverse iteration
Return \(\kappa_2(A) = \frac{\sigma_{\max}}{\sigma_{\min}}\)

Complexity: \(O(k \cdot \text{nnz}(A))\) per iteration

Convergence: Linear, rate depends on ratio \(\frac{\sigma_2}{\sigma_1}\)

Lanczos Method¶

Mathematical Background

The Lanczos algorithm tridiagonalizes a symmetric matrix \(B = A^T A\):

\[Q^T B Q = T\]

where \(T\) is tridiagonal and \(Q\) is orthogonal.

Eigenvalues of \(T\) approximate eigenvalues of \(B\), and \(\sigma(A) = \sqrt{\lambda(A^T A)}\).

Algorithm

Initialize \(q_1\) random, \(\beta_0 = 0\), \(q_0 = 0\)
For \(j = 1, \ldots, m\):
1. \(v = A^T (A q_j)\)
2. \(\alpha_j = q_j^T v\)
3. \(v = v - \alpha_j q_j - \beta_{j-1} q_{j-1}\)
4. \(\beta_j = \|v\|\)
5. \(q_{j+1} = v / \beta_j\)
Form tridiagonal matrix \(T\) with diagonals \(\alpha\) and off-diagonals \(\beta\)
Compute eigenvalues of \(T\)
Return \(\kappa_2(A) = \sqrt{\frac{\lambda_{\max}}{\lambda_{\min}}}\)

Complexity: \(O(m^2 \cdot \text{nnz}(A))\) for \(m\) iterations

Advantages:

Fast convergence for well-separated eigenvalues
Numerically stable with reorthogonalization

References:

Lanczos, C. (1950). “An iteration method for the solution of the eigenvalue problem.”
Golub, G. H., & Van Loan, C. F. (2013). “Matrix Computations” (4th ed.), Chapter 10.

Arnoldi Method¶

Mathematical Background

The Arnoldi iteration is a generalization of Lanczos for non-symmetric matrices. It builds an orthonormal basis for the Krylov subspace:

\[\mathcal{K}_m(A^T A, v) = \text{span}\{v, A^T A v, (A^T A)^2 v, \ldots, (A^T A)^{m-1} v\}\]

Algorithm

Initialize \(q_1 = v / \|v\|\)
For \(j = 1, \ldots, m\):
1. \(v = A^T (A q_j)\)
2. For \(i = 1, \ldots, j\):
  - \(h_{ij} = q_i^T v\)
  - \(v = v - h_{ij} q_i\)
3. \(h_{j+1,j} = \|v\|\)
4. \(q_{j+1} = v / h_{j+1,j}\)
Compute eigenvalues of Hessenberg matrix \(H = [h_{ij}]\)
Return condition number from extreme eigenvalues

Complexity: \(O(m^2 n + m \cdot \text{nnz}(A))\) for \(m\) iterations

References:

Arnoldi, W. E. (1951). “The principle of minimized iterations.”
Saad, Y. (2011). “Numerical Methods for Large Eigenvalue Problems” (2nd ed.).

Golub-Kahan Bidiagonalization¶

Mathematical Background

Instead of forming \(A^T A\), the Golub-Kahan algorithm directly bidiagonalizes \(A\):

\[U^T A V = B\]

where \(B\) is bidiagonal and \(U, V\) are orthogonal.

Singular values of \(B\) equal singular values of \(A\) (numerically more stable).

Algorithm

Initialize \(u_1 = 0\), \(\beta_1 v_1 = b\) (random \(b\))
For \(j = 1, \ldots, m\):
1. \(u_j = A v_j - \beta_j u_{j-1}\)
2. \(\alpha_j = \|u_j\|\), \(u_j = u_j / \alpha_j\)
3. \(v_{j+1} = A^T u_j - \alpha_j v_j\)
4. \(\beta_{j+1} = \|v_{j+1}\|\), \(v_{j+1} = v_{j+1} / \beta_{j+1}\)
Form bidiagonal matrix \(B\) with diagonals \(\alpha\) and superdiagonals \(\beta\)
Compute singular values of \(B\) (via QR iteration)
Return \(\kappa_2(A) = \frac{\sigma_{\max}}{\sigma_{\min}}\)

Complexity: \(O(m \cdot \text{nnz}(A))\) for \(m\) iterations

Advantages:

More numerically stable than forming \(A^T A\)
Direct computation of singular values
Better for ill-conditioned matrices

References:

Golub, G. H., & Kahan, W. (1965). “Calculating the singular values and pseudo-inverse of a matrix.”
Golub, G. H., & Van Loan, C. F. (2013). “Matrix Computations” (4th ed.), Section 8.6.

PyTorch Wrapper Methods¶

SVDS¶

Wraps PyTorch’s sparse_kappa.backend.sparse.linalg.svds, which implements:

Implicitly restarted Lanczos method
Computes \(k\) largest or smallest singular values
Most accurate method for moderate-sized problems

Usage: Best for matrices with \(n < 10{,}000\) when high accuracy is required.

EIGSH¶

Wraps PyTorch’s sparse_kappa.backend.sparse.linalg.eigsh for symmetric matrices:

Thick-restart Lanczos method (ARPACK-style)
Computes \(k\) largest/smallest eigenvalues
Optimized for Hermitian/symmetric matrices

Usage: Symmetric or Hermitian matrices, faster than general methods.

LOBPCG¶

Locally Optimal Block Preconditioned Conjugate Gradient:

Block algorithm: computes multiple eigenvalues simultaneously
Can incorporate preconditioners
Efficient for very large matrices

Algorithm:

Start with block of random vectors \(X\)
Iterate:
1. Compute \(W = A X\)
2. Rayleigh-Ritz procedure on subspace
3. Update \(X\) with refined eigenvectors

Usage: Large matrices (\(n > 50{,}000\)), especially with good preconditioners.

References:

Knyazev, A. V. (2001). “Toward the optimal preconditioned eigensolver.”

Algorithm Comparison¶

Method	Complexity	Memory	Best For	Notes
Hager-Higham	\(O(k \cdot \text{nnz})\)	Low	1-norm estimation	Fast, 3-5 iterations
Power	\(O(k \cdot \text{nnz})\)	Low	Quick estimates	Simple, robust
Lanczos	\(O(m^2 \cdot \text{nnz})\)	Medium	General purpose	Good balance
Arnoldi	\(O(m^2 n)\)	High	Non-symmetric	More memory
Golub-Kahan	\(O(m \cdot \text{nnz})\)	Low	Ill-conditioned	Most stable
SVDS	\(O(k \cdot \text{nnz})\)	Medium	High accuracy	Most accurate
EIGSH	\(O(k \cdot \text{nnz})\)	Medium	Symmetric matrices	Optimized
LOBPCG	\(O(k \cdot \text{nnz})\)	Medium	Very large	Preconditionable

Convergence Theory¶

For power iteration methods, convergence rate depends on the eigenvalue gap:

\[\|v_k - v_*\| \leq C \left(\frac{|\lambda_2|}{|\lambda_1|}\right)^k\]

For Krylov methods (Lanczos, Arnoldi), convergence is faster and depends on polynomial approximation properties.

Numerical Stability¶

Loss of Orthogonality

In Lanczos/Arnoldi, vectors can lose orthogonality due to rounding errors. Solutions:

Full reorthogonalization (expensive)
Selective reorthogonalization (adaptive)
Partial reorthogonalization (sparse-kappa default)

Condition Number Effects

For \(\kappa(A) \approx 10^d\), expect \(d\) digits of accuracy loss in IEEE double precision.

Practical Recommendations¶

Start with ``auto`` method selection
For symmetric matrices: Use eigsh
For high accuracy: Use svds (if \(n < 10{,}000\))
For large sparse: Use golub-kahan or lobpcg
For ill-conditioned: Use golub-kahan (more stable)
For quick estimates: Use power method