Methods and Algorithms

This page provides mathematical details and algorithmic descriptions of the methods implemented in sparse-kappa.

1-Norm Methods

Hager-Higham Algorithm

Mathematical Background

The 1-norm of a matrix \(A \in \mathbb{R}^{n \times n}\) is defined as:

\[\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|\]

The 1-norm condition number is:

\[\kappa_1(A) = \|A\|_1 \cdot \|A^{-1}\|_1\]

Algorithm

The Hager-Higham algorithm estimates \(\|A^{-1}\|_1\) via iterative refinement:

  1. Start with initial vector \(x = \frac{1}{n} \mathbf{1}\)

  2. For \(k = 1, 2, \ldots\) until convergence:

    1. Solve \(A^T z = \text{sign}(x)\)

    2. Find \(j = \arg\max_i |z_i|\)

    3. If converged, stop

    4. Solve \(A x = e_j\) where \(e_j\) is the \(j\)-th unit vector

    5. Compute \(\|x\|_1\)

  3. Return \(\|A^{-1}\|_1 \approx \|x\|_1\)

Complexity: \(O(k \cdot \text{nnz}(A))\) where \(k\) is the number of iterations (typically 3-5)

References:

  • Hager, W. W. (1984). “Condition estimates.” SIAM J. Sci. Stat. Comput., 5(2), 311-316.

  • Higham, N. J., & Tisseur, F. (2000). “A block algorithm for matrix 1-norm estimation.” SIAM J. Matrix Anal. Appl., 21(4), 1185-1201.

2-Norm Methods

Power Method

Mathematical Background

The power method finds the largest singular value \(\sigma_{\max}(A)\) by iterating:

\[v_{k+1} = \frac{A^T A v_k}{\|A^T A v_k\|}\]

For the smallest singular value, use inverse iteration on \(A^T A\).

Algorithm

  1. For \(\sigma_{\max}\):

    1. Initialize random \(v_0\), normalize

    2. Iterate \(v_{k+1} = \frac{A^T A v_k}{\|A^T A v_k\|}\)

    3. Rayleigh quotient: \(\sigma^2 = v_k^T (A^T A) v_k\)

  2. For \(\sigma_{\min}\):

    Use shifted inverse iteration

  3. Return \(\kappa_2(A) = \frac{\sigma_{\max}}{\sigma_{\min}}\)

Complexity: \(O(k \cdot \text{nnz}(A))\) per iteration

Convergence: Linear, rate depends on ratio \(\frac{\sigma_2}{\sigma_1}\)

Lanczos Method

Mathematical Background

The Lanczos algorithm tridiagonalizes a symmetric matrix \(B = A^T A\):

\[Q^T B Q = T\]

where \(T\) is tridiagonal and \(Q\) is orthogonal.

Eigenvalues of \(T\) approximate eigenvalues of \(B\), and \(\sigma(A) = \sqrt{\lambda(A^T A)}\).

Algorithm

  1. Initialize \(q_1\) random, \(\beta_0 = 0\), \(q_0 = 0\)

  2. For \(j = 1, \ldots, m\):

    1. \(v = A^T (A q_j)\)

    2. \(\alpha_j = q_j^T v\)

    3. \(v = v - \alpha_j q_j - \beta_{j-1} q_{j-1}\)

    4. \(\beta_j = \|v\|\)

    5. \(q_{j+1} = v / \beta_j\)

  3. Form tridiagonal matrix \(T\) with diagonals \(\alpha\) and off-diagonals \(\beta\)

  4. Compute eigenvalues of \(T\)

  5. Return \(\kappa_2(A) = \sqrt{\frac{\lambda_{\max}}{\lambda_{\min}}}\)

Complexity: \(O(m^2 \cdot \text{nnz}(A))\) for \(m\) iterations

Advantages:

  • Fast convergence for well-separated eigenvalues

  • Numerically stable with reorthogonalization

References:

  • Lanczos, C. (1950). “An iteration method for the solution of the eigenvalue problem.”

  • Golub, G. H., & Van Loan, C. F. (2013). “Matrix Computations” (4th ed.), Chapter 10.

Arnoldi Method

Mathematical Background

The Arnoldi iteration is a generalization of Lanczos for non-symmetric matrices. It builds an orthonormal basis for the Krylov subspace:

\[\mathcal{K}_m(A^T A, v) = \text{span}\{v, A^T A v, (A^T A)^2 v, \ldots, (A^T A)^{m-1} v\}\]

Algorithm

  1. Initialize \(q_1 = v / \|v\|\)

  2. For \(j = 1, \ldots, m\):

    1. \(v = A^T (A q_j)\)

    2. For \(i = 1, \ldots, j\):

      • \(h_{ij} = q_i^T v\)

      • \(v = v - h_{ij} q_i\)

    3. \(h_{j+1,j} = \|v\|\)

    4. \(q_{j+1} = v / h_{j+1,j}\)

  3. Compute eigenvalues of Hessenberg matrix \(H = [h_{ij}]\)

  4. Return condition number from extreme eigenvalues

Complexity: \(O(m^2 n + m \cdot \text{nnz}(A))\) for \(m\) iterations

References:

  • Arnoldi, W. E. (1951). “The principle of minimized iterations.”

  • Saad, Y. (2011). “Numerical Methods for Large Eigenvalue Problems” (2nd ed.).

Golub-Kahan Bidiagonalization

Mathematical Background

Instead of forming \(A^T A\), the Golub-Kahan algorithm directly bidiagonalizes \(A\):

\[U^T A V = B\]

where \(B\) is bidiagonal and \(U, V\) are orthogonal.

Singular values of \(B\) equal singular values of \(A\) (numerically more stable).

Algorithm

  1. Initialize \(u_1 = 0\), \(\beta_1 v_1 = b\) (random \(b\))

  2. For \(j = 1, \ldots, m\):

    1. \(u_j = A v_j - \beta_j u_{j-1}\)

    2. \(\alpha_j = \|u_j\|\), \(u_j = u_j / \alpha_j\)

    3. \(v_{j+1} = A^T u_j - \alpha_j v_j\)

    4. \(\beta_{j+1} = \|v_{j+1}\|\), \(v_{j+1} = v_{j+1} / \beta_{j+1}\)

  3. Form bidiagonal matrix \(B\) with diagonals \(\alpha\) and superdiagonals \(\beta\)

  4. Compute singular values of \(B\) (via QR iteration)

  5. Return \(\kappa_2(A) = \frac{\sigma_{\max}}{\sigma_{\min}}\)

Complexity: \(O(m \cdot \text{nnz}(A))\) for \(m\) iterations

Advantages:

  • More numerically stable than forming \(A^T A\)

  • Direct computation of singular values

  • Better for ill-conditioned matrices

References:

  • Golub, G. H., & Kahan, W. (1965). “Calculating the singular values and pseudo-inverse of a matrix.”

  • Golub, G. H., & Van Loan, C. F. (2013). “Matrix Computations” (4th ed.), Section 8.6.

PyTorch Wrapper Methods

SVDS

Wraps PyTorch’s sparse_kappa.backend.sparse.linalg.svds, which implements:

  • Implicitly restarted Lanczos method

  • Computes \(k\) largest or smallest singular values

  • Most accurate method for moderate-sized problems

Usage: Best for matrices with \(n < 10{,}000\) when high accuracy is required.

EIGSH

Wraps PyTorch’s sparse_kappa.backend.sparse.linalg.eigsh for symmetric matrices:

  • Thick-restart Lanczos method (ARPACK-style)

  • Computes \(k\) largest/smallest eigenvalues

  • Optimized for Hermitian/symmetric matrices

Usage: Symmetric or Hermitian matrices, faster than general methods.

LOBPCG

Locally Optimal Block Preconditioned Conjugate Gradient:

  • Block algorithm: computes multiple eigenvalues simultaneously

  • Can incorporate preconditioners

  • Efficient for very large matrices

Algorithm:

  1. Start with block of random vectors \(X\)

  2. Iterate:

    1. Compute \(W = A X\)

    2. Rayleigh-Ritz procedure on subspace

    3. Update \(X\) with refined eigenvectors

Usage: Large matrices (\(n > 50{,}000\)), especially with good preconditioners.

References:

  • Knyazev, A. V. (2001). “Toward the optimal preconditioned eigensolver.”

Algorithm Comparison

Method

Complexity

Memory

Best For

Notes

Hager-Higham

\(O(k \cdot \text{nnz})\)

Low

1-norm estimation

Fast, 3-5 iterations

Power

\(O(k \cdot \text{nnz})\)

Low

Quick estimates

Simple, robust

Lanczos

\(O(m^2 \cdot \text{nnz})\)

Medium

General purpose

Good balance

Arnoldi

\(O(m^2 n)\)

High

Non-symmetric

More memory

Golub-Kahan

\(O(m \cdot \text{nnz})\)

Low

Ill-conditioned

Most stable

SVDS

\(O(k \cdot \text{nnz})\)

Medium

High accuracy

Most accurate

EIGSH

\(O(k \cdot \text{nnz})\)

Medium

Symmetric matrices

Optimized

LOBPCG

\(O(k \cdot \text{nnz})\)

Medium

Very large

Preconditionable

Convergence Theory

For power iteration methods, convergence rate depends on the eigenvalue gap:

\[\|v_k - v_*\| \leq C \left(\frac{|\lambda_2|}{|\lambda_1|}\right)^k\]

For Krylov methods (Lanczos, Arnoldi), convergence is faster and depends on polynomial approximation properties.

Numerical Stability

Loss of Orthogonality

In Lanczos/Arnoldi, vectors can lose orthogonality due to rounding errors. Solutions:

  1. Full reorthogonalization (expensive)

  2. Selective reorthogonalization (adaptive)

  3. Partial reorthogonalization (sparse-kappa default)

Condition Number Effects

For \(\kappa(A) \approx 10^d\), expect \(d\) digits of accuracy loss in IEEE double precision.

Practical Recommendations

  1. Start with ``auto`` method selection

  2. For symmetric matrices: Use eigsh

  3. For high accuracy: Use svds (if \(n < 10{,}000\))

  4. For large sparse: Use golub-kahan or lobpcg

  5. For ill-conditioned: Use golub-kahan (more stable)

  6. For quick estimates: Use power method