Motivation

When you think about random walks, what shape do you think about?

Is it like this? Or this?

These are good examples of random walks in two or three dimensions. But what about random walks in higher dimensions?

Formulation

We first consider the simplest case, a random walk in a d dimensional space Rd, i.e. continuous space, discrete time. For each time step, it took a step ηi i.i.d. sampled from a zero-mean and finite covariance distribution P, with xi,ηiRd

xi=xi1+ηi,ηiP

Now we consider the trajectory x1,x2xT formed by the random walk of T step. We could compute the PCA of this trajectory based on its covariance matrix Ci,j=cov(xi,xj).

We represent xi,ηi as row vectors, then we form the T-by-d data matrix X=[x1;x2;x3xT], further we set x0=[0,0,0], s.t. x1=η1. Similarly we could form a T-by-d matrix of steps H=[η1;η2;η3ηT]

Reduce Data Matrix to Steps

How to compute this covariance matrix? This paper 1 noticed that it was easier to tackle H than X. First they noticed the relationship between the data matrix X and the matrix, where S is the T-by-T matrix of “differentiation” between rows.

SX=HS=[100...00110...00011...00......10000...11]

Then we know the inverse of this, S1 is the T-by-T matrix of “accumulating” rows, i.e. the lower triangular matrix filled with 1.

X=S1HS1=[100...00110...00111...00111...111...10111...11]

Next, after centering (mean subtracting) each column of X, then the covariance matrix is the expected inner product of the centered data matrix ˉX.

ˆX=CX, with C=I1T11TΣ[x]=E[ˆXˆXT]=E[CS1HHTSTC]

Note that C,S are both constant matrix with shape T-by-T , and C is symmetric, the only random variables are H. Thus we only need to compute the expectation for HHT.

Covariance of Steps H

Since we assumed the steps ηi were taken i.id. from the same distribution P, then it’s simple to compute elements of HHT, using the linearity of inner product and independence. (η(k)i represent the kth component of the row vector ηi). The expectation of each component vanish, since we assume the distribution P has zero mean,

E[ηiηTj]=E[dkη(k)iη(k)j]=dkE[η(k)i]E[η(k)j]=0ij

Similarly, we can evaluate the diagonal values. Since the distribution P has a finite covariance, the diagonal values are the trace of its covariance matrix.

E[ηiηTi]=E[kη(k)i2]=dkE[η(k)i2]=dkVar[η(k)]=tr[Σ(P)]

Without loss of generality, we can assume tr[Σ(P)]=1, by just rescaling x. Thus we have

E[HHT]=tr[Σ(P)]I=I

What does this tell us? Geometrically this means, when we sample from a zero mean, finite variance distribution, the vectors are expected to be orthogonal to each other, with the norm of each vector equal to the trace of the covariance matrix. This is necessarily true for all dimension d. However, the magic is, when d gets high, the empirical covariance HHT converged arbitrarily close to the expected value I. Thus we have, at high dimension, randomly sampled step vectors are orthogonal to each other with high probability.

HHTI, with d high

With this approximation, the covariance matrix becomes a constant T-by-T matrix.

Σ[x]=CS1STC

From this we can already see, the covariance structure of high dimensional random walk is universal, independent of the distribution P where steps are sampled from 2, even independent of the dimension d. All random walk of T steps in high dimension will have the same covariance structure up to scaling, thus they shall all have the same PCA structure.

We noticed that

S1ST=[111...11122...22123...33123...123...T1T1123...T1T]

Then the total variance or the trace of the covariance matrix is tr[Σ[x]]=tr[CS1STC]=tr[CCS1ST]=tr[CS1ST]=tr[(I1T11T)S1ST]=tr[S1ST1T11TS1ST]=tr[S1ST]1Ttr[1TS1ST1] The second term is the summation of all entries in the T-by-T matrix S1ST. We find this matrix could be viewed as the sum of T matrices, each of them filled with k-by-k submatrix full of 1. tr[1TS1ST1]=Tk=1k2=[T(T+1)(2T+1)]6 Thus the original trace of the covariance matrix reads tr[Σ[x]]=tr[S1ST]1Ttr[1TS1ST1]=Ti=1kT(T+1)(2T+1)6T=T(T+1)2(T+1)(2T+1)6=(T+1)(T1)6=T216 This is the sum of variance in all dimensions in the random walk. (which should be the correct denominator in Eq. 12 of 1)

Eigen Structures

Finally let’s analyze the eigen structure of CS1STC. Since STS is a real symmetric matrix, its eigenvalue and eigenvectors are real, and (STS)1 will share the same eigenvectors with inverted eigenvalues.

By calculation or intuition (S as row differentiation operator ), STS manifests itself as a center difference (2nd order differentiation) matrix on a grid formed by T points, with a difference of 1 in lower right corner.

STS=[110...00011...00001...00......11000...01][100...00110...00011...00......10000...11]=[210...00122...00012...00......21000...11]

This matrix is different from a circulant matrix by only three elements ( 1 at upper-right and lower left and 1 at bottom), and it’s provable that its eigen structure converge to the eigenstructure of the circulant matrix3 when T. We denote the exact circulant matrix as Q, and we denote the tridiagonal matrix without upper-right and lower left 1 as P, notice the 1 element difference between P and STS

Q=[210...01122...00012...00......21100...12]P=[210...00122...00012...00......21000...12]

Digression: Relation to diffusion operator

From physics to engineering, these matrix appears everywhere. What do we know about the eigenstructure of these matrices? As we know from the theory of 2nd order Partial Differential Equations (e.g. diffusion / heat equation and wave equation as below), the 2nd order derivative, or Laplacian operator has its distinct eigenfunctions – Fourier series. As a side note, actually, analyzing heat conduction was the exact reason Fourier discovered Fourier decomposition. These eigen functions of Laplacian are the reason we see sinusoidal waves on strings and on the surface of water!

ut=2ux2=Δu2ut2=2ux2=Δu

Recall from Quantum mechanics and Wave equation. An important lesson we learnt from PDE is that, the specific Fourier mode that goes into the solution depend on the boundary condition: u(t,0)=0 or xu=0 or tu=0. For example when one side of rope is fixed when you perturbing it, then only sin waves mode could be in the solution. This subtle point is important as we shall see below, since the different elements between SST and a circulant matrix Q is tightly related to boundary condition. The exact circulant matrix Q is the 2nd order differentiation operator on a circular boundary condition.

Eigenstructure of Circulant Q

Back to our matrix, let’s translate this intuition for continuous space to discrete spaces. We will expect the eigenvectors of STS to be also cosine and sine waves. Indeed, it’s known that the eigenvectors for circulant matrices are the Fourier modes. For T length sequence, the normalized complex eigenvectors and eigenvalues are as following (numbering starts from 0. )

vk=1T[1,ωk,ω2k,ω3k,...,ω(T1)k], with ω=exp(2πiT)λk=c0+cT1ωk+cT2ω2k+...c1ω(T1)k, with k=0,...T1

In our case, there are only three non-zero element in each column c0=2,c1=cT1=1 the other terms are 0. Thus it’s easy to obtain the real eigenvalues

λk(Q)=2exp(2πikT)exp(2πi(T1)k)T)=22cos(2πkT),k=1,2...T

We note that λk=λTk for k0 , each eigenvalue has a 2d eigenspace. This is intuitive since the matrix Q is the laplacian on a circular space, we expect the eigenmodes to be rotational symmetric, thus it has two eigenvectors corresponding to sin and cos for each eigenvalue, weighted combining these two eigenvectors admit eigen functions of any phase.

Eigenstructure of Tridiagonal P

Next, let us consider the eigenstructure of the tridiagonal matrix P, its eigenvalues are less straightforward to calculate. We need to solve the eigenpolynomial directly.

However, we are still saved, mathematicians have derived close form solution to the eigenvalues of a tri-diagonal Toeplitz matrix 4. For a derivation of this solution, see my note!

λk(P)=22coskπT+1,k=1,...T

Notice the difference from the eigenvalues of Q above? There are no more repeating eigenvalues, and λ=0 is no longer a eigenvalue to this matrix, so this matrix.

Eigenstructure of STS and CS1STC

Per the paper1, and numerical simulation we found that the eigenvalues of the matrix CS1STC also has analytical solution, with finite T values.

λk(CS1STC)=(22coskπT)1,k=1,...T1λT(CS1STC)=0

Relating Back to PCA

So, we have talked enough about eigenvalues, how this have to do with the PCA of random walk?

Just to recap, performing PCA to a random walk in high dimension is just performing eigen-decomposition to the covariance matrix Σ[x]=CS1STC . The eigenvectors are the projected coefficient on to each PC, and eigenvalues correspond to the explained variance of that PC.

From the section above we knew the eigenvalues of the matrix CS1STC are

λk=(22coskπT)1,k=1,...T1λT=0

Thus the explained variance of each PCs will be the following, which corrects the error version in the paper1 Eq.12.

ρ(k)=λktr[Σ[x]]=(22coskπT+1)116(T21)=(1coskπT+1)113(T21)
  1. Antognini, J., & Sohl-Dickstein, J. (2018). PCA of high dimensional random walks with comparison to neural network training. Advances in Neural Information Processing Systems, 31 2 3 4

  2. Note that this is the T-by-T covariance matrix, it’s structured is independent of P, thus the projected trajectories are independent of P. However, this does not mean the PC axes i.e. the right singular vectors of ˆX in n dimensional space is independent of P

  3. Physical intuition is this STS​ represents the 2nd derivative operator of length T​ signal with zero padding or constant boundary condition; while the circulant matrix represents the 2nd derivative operator of a cyclic signal of length T​. This difference in boundary condition vanish for long enough T​. 

  4. Borowska, J., & Łacińska, L. (2015). Eigenvalues of 2-tridiagonal Toeplitz matrix Journal of Applied Mathematics and Computational Mechanics, 14(4). See Eq. 17.