Motivation
When you think about random walks, what shape do you think about?
Is it like this? Or this?
These are good examples of random walks in two or three dimensions. But what about random walks in higher dimensions?
Formulation
We first consider the simplest case, a random walk in a d dimensional space Rd, i.e. continuous space, discrete time. For each time step, it took a step ηi i.i.d. sampled from a zero-mean and finite covariance distribution P, with xi,ηi∈Rd
xi=xi−1+ηi,ηi∼PNow we consider the trajectory x1,x2…xT formed by the random walk of T step. We could compute the PCA of this trajectory based on its covariance matrix Ci,j=cov(xi,xj).
We represent xi,ηi as row vectors, then we form the T-by-d data matrix X=[x1;x2;x3…xT], further we set x0=[0,0,…0], s.t. x1=η1. Similarly we could form a T-by-d matrix of steps H=[η1;η2;η3…ηT]
Reduce Data Matrix to Steps
How to compute this covariance matrix? This paper 1 noticed that it was easier to tackle H than X. First they noticed the relationship between the data matrix X and the matrix, where S is the T-by-T matrix of “differentiation” between rows.
SX=HS=[100...00−110...000−11...00......10000...−11]Then we know the inverse of this, S−1 is the T-by-T matrix of “accumulating” rows, i.e. the lower triangular matrix filled with 1.
X=S−1HS−1=[100...00110...00111...00111...111...10111...11]Next, after centering (mean subtracting) each column of X, then the covariance matrix is the expected inner product of the centered data matrix ˉX.
ˆX=CX, with C=I−1T11TΣ[x]=E[ˆXˆXT]=E[CS−1HHTS−TC]Note that C,S are both constant matrix with shape T-by-T , and C is symmetric, the only random variables are H. Thus we only need to compute the expectation for HHT.
Covariance of Steps H
Since we assumed the steps ηi were taken i.id. from the same distribution P, then it’s simple to compute elements of HHT, using the linearity of inner product and independence. (η(k)i represent the kth component of the row vector ηi). The expectation of each component vanish, since we assume the distribution P has zero mean,
E[ηiηTj]=E[d∑kη(k)iη(k)j]=d∑kE[η(k)i]E[η(k)j]=0, i≠jSimilarly, we can evaluate the diagonal values. Since the distribution P has a finite covariance, the diagonal values are the trace of its covariance matrix.
E[ηiηTi]=E[∑kη(k)i2]=d∑kE[η(k)i2]=d∑kVar[η(k)]=tr[Σ(P)]Without loss of generality, we can assume tr[Σ(P)]=1, by just rescaling x. Thus we have
E[HHT]=tr[Σ(P)]I=IWhat does this tell us? Geometrically this means, when we sample from a zero mean, finite variance distribution, the vectors are expected to be orthogonal to each other, with the norm of each vector equal to the trace of the covariance matrix. This is necessarily true for all dimension d. However, the magic is, when d gets high, the empirical covariance HHT converged arbitrarily close to the expected value I. Thus we have, at high dimension, randomly sampled step vectors are orthogonal to each other with high probability.
HHT→I, with d highWith this approximation, the covariance matrix becomes a constant T-by-T matrix.
Σ[x]=CS−1S−TCFrom this we can already see, the covariance structure of high dimensional random walk is universal, independent of the distribution P where steps are sampled from 2, even independent of the dimension d. All random walk of T steps in high dimension will have the same covariance structure up to scaling, thus they shall all have the same PCA structure.
We noticed that
S−1S−T=[111...11122...22123...33123...123...T−1T−1123...T−1T]Then the total variance or the trace of the covariance matrix is tr[Σ[x]]=tr[CS−1S−TC]=tr[CCS−1S−T]=tr[CS−1S−T]=tr[(I−1T11T)S−1S−T]=tr[S−1S−T−1T11TS−1S−T]=tr[S−1S−T]−1Ttr[1TS−1S−T1] The second term is the summation of all entries in the T-by-T matrix S−1S−T. We find this matrix could be viewed as the sum of T matrices, each of them filled with k-by-k submatrix full of 1. tr[1TS−1S−T1]=T∑k=1k2=[T(T+1)(2T+1)]6 Thus the original trace of the covariance matrix reads tr[Σ[x]]=tr[S−1S−T]−1Ttr[1TS−1S−T1]=T∑i=1k−T(T+1)(2T+1)6T=T(T+1)2−(T+1)(2T+1)6=(T+1)(T−1)6=T2−16 This is the sum of variance in all dimensions in the random walk. (which should be the correct denominator in Eq. 12 of 1)
Eigen Structures
Finally let’s analyze the eigen structure of CS−1S−TC. Since STS is a real symmetric matrix, its eigenvalue and eigenvectors are real, and (STS)−1 will share the same eigenvectors with inverted eigenvalues.
By calculation or intuition (S as row differentiation operator ), STS manifests itself as a center difference (2nd order differentiation) matrix on a grid formed by T points, with a difference of 1 in lower right corner.
STS=[1−10...0001−1...00001...00......1−1000...01][100...00−110...000−11...00......10000...−11]=[2−10...00−12−2...000−12...00......2−1000...−11]This matrix is different from a circulant matrix by only three elements ( −1 at upper-right and lower left and 1 at bottom), and it’s provable that its eigen structure converge to the eigenstructure of the circulant matrix3 when T→∞. We denote the exact circulant matrix as Q, and we denote the tridiagonal matrix without upper-right and lower left −1 as P, notice the 1 element difference between P and STS
Q=[2−10...0−1−12−2...000−12...00......2−1−100...−12]P=[2−10...00−12−2...000−12...00......2−1000...−12]Digression: Relation to diffusion operator
From physics to engineering, these matrix appears everywhere. What do we know about the eigenstructure of these matrices? As we know from the theory of 2nd order Partial Differential Equations (e.g. diffusion / heat equation and wave equation as below), the 2nd order derivative, or Laplacian operator has its distinct eigenfunctions – Fourier series. As a side note, actually, analyzing heat conduction was the exact reason Fourier discovered Fourier decomposition. These eigen functions of Laplacian are the reason we see sinusoidal waves on strings and on the surface of water!
∂u∂t=∂2u∂x2=Δu∂2u∂t2=∂2u∂x2=ΔuRecall from Quantum mechanics and Wave equation. An important lesson we learnt from PDE is that, the specific Fourier mode that goes into the solution depend on the boundary condition: u(t,0)=0 or ∂xu=0 or ∂tu=0. For example when one side of rope is fixed when you perturbing it, then only sin waves mode could be in the solution. This subtle point is important as we shall see below, since the different elements between SST and a circulant matrix Q is tightly related to boundary condition. The exact circulant matrix Q is the 2nd order differentiation operator on a circular boundary condition.
Eigenstructure of Circulant Q
Back to our matrix, let’s translate this intuition for continuous space to discrete spaces. We will expect the eigenvectors of STS to be also cosine and sine waves. Indeed, it’s known that the eigenvectors for circulant matrices are the Fourier modes. For T length sequence, the normalized complex eigenvectors and eigenvalues are as following (numbering starts from 0. )
vk=1√T[1,ωk,ω2k,ω3k,...,ω(T−1)k], with ω=exp(2πiT)λk=c0+cT−1ωk+cT−2ω2k+...c1ω(T−1)k, with k=0,...T−1In our case, there are only three non-zero element in each column c0=2,c1=cT−1=−1 the other terms are 0. Thus it’s easy to obtain the real eigenvalues
λk(Q)=2−exp(2πikT)−exp(2πi(T−1)k)T)=2−2cos(2πkT),k=1,2...TWe note that λk=λT−k for k≠0 , each eigenvalue has a 2d eigenspace. This is intuitive since the matrix Q is the laplacian on a circular space, we expect the eigenmodes to be rotational symmetric, thus it has two eigenvectors corresponding to sin and cos for each eigenvalue, weighted combining these two eigenvectors admit eigen functions of any phase.
Eigenstructure of Tridiagonal P
Next, let us consider the eigenstructure of the tridiagonal matrix P, its eigenvalues are less straightforward to calculate. We need to solve the eigenpolynomial directly.
However, we are still saved, mathematicians have derived close form solution to the eigenvalues of a tri-diagonal Toeplitz matrix 4. For a derivation of this solution, see my note!
λk(P)=2−2coskπT+1,k=1,...TNotice the difference from the eigenvalues of Q above? There are no more repeating eigenvalues, and λ=0 is no longer a eigenvalue to this matrix, so this matrix.
Eigenstructure of STS and CS−1S−TC
Per the paper1, and numerical simulation we found that the eigenvalues of the matrix CS−1S−TC also has analytical solution, with finite T values.
λk(CS−1S−TC)=(2−2coskπT)−1,k=1,...T−1λT(CS−1S−TC)=0Relating Back to PCA
So, we have talked enough about eigenvalues, how this have to do with the PCA of random walk?
Just to recap, performing PCA to a random walk in high dimension is just performing eigen-decomposition to the covariance matrix Σ[x]=CS−1S−TC . The eigenvectors are the projected coefficient on to each PC, and eigenvalues correspond to the explained variance of that PC.
From the section above we knew the eigenvalues of the matrix CS−1S−TC are
λk=(2−2coskπT)−1,k=1,...T−1λT=0Thus the explained variance of each PCs will be the following, which corrects the error version in the paper1 Eq.12.
ρ(k)=λktr[Σ[x]]=(2−2coskπT+1)−116(T2−1)=(1−coskπT+1)−113(T2−1)-
Antognini, J., & Sohl-Dickstein, J. (2018). PCA of high dimensional random walks with comparison to neural network training. Advances in Neural Information Processing Systems, 31. ↩ ↩2 ↩3 ↩4
-
Note that this is the T-by-T covariance matrix, it’s structured is independent of P, thus the projected trajectories are independent of P. However, this does not mean the PC axes i.e. the right singular vectors of ˆX in n dimensional space is independent of P. ↩
-
Physical intuition is this STS represents the 2nd derivative operator of length T signal with zero padding or constant boundary condition; while the circulant matrix represents the 2nd derivative operator of a cyclic signal of length T. This difference in boundary condition vanish for long enough T. ↩
-
Borowska, J., & Łacińska, L. (2015). Eigenvalues of 2-tridiagonal Toeplitz matrix Journal of Applied Mathematics and Computational Mechanics, 14(4). See Eq. 17. ↩