Markov Chains

Making Matrices With Matlab

Making matrices with matlab is rather straight forward, as you separate elements in the same row by a ' ' or a ',', and you separate rows by the ';'.

For example, if we wanted to make the matrix

we would use the following code:

A = [1,0,0; 0,0.5,0.5; 0.1,0.4,0.5]
A = 
       1              0              0       
       0              1/2            1/2     
       1/10           2/5            1/2     

In fact, there is no requirement to have a square matrix, so we could make the following

matrix

B = [0.3,0.3,0.4; 0.6,0.0,0.4]
B = 
       3/10           3/10           2/5     
       3/5            0              2/5     

This works for any shape matrix, but if the matrix has some special structure, then it might be easier to input it differently to take less time. For example, if we want

we can use the short cut of

I = eye(3)
I = 
       1              0              0       
       0              1              0       
       0              0              1       

This allows us to make the

identity matrix.

Using Matrices to represent Markov Chains

Markov chains are a collection of states with probabilities to go to the next state. Since these are probabilities, we need to make sure that each entry of each transition is between 0 and 1, and that the total of all transitions sums to exactly 1. A matrix that has these properties are called 'stochastic'. This is what motivates the code 'isStochastic(A)', as it checks whether a given matrix is stochastic, and can be used to represent a Markov chain.

boolean = isStochastic(A)
boolean = logical
   1
doubleBool = isDoublyStochastic(A)
doubleBool = logical
   0

Visualizing Markov Chains

mc = dtmc(A);

figure;

graphplot(mc, 'LabelEdges',true, 'ColorNodes',true)

mc = dtmc(socialMobility);

figure;

graphplot(mc, 'LabelEdges',true, 'ColorNodes',true)

Multistep Transitions

socialMobility = [0.7,0.2,0.1;0.3,0.5,0.2;0.2,0.4,0.4]
socialMobility = 
       7/10           1/5            1/10    
       3/10           1/2            1/5     
       1/5            2/5            2/5     

A2 = A^2
A2 = 
       1              0              0       
       1/20           9/20           1/2     
       3/20           2/5            9/20    

A2m = mpower(A,2)
A2m = 
       1              0              0       
       1/20           9/20           1/2     
       3/20           2/5            9/20    

Classification of States

Stationary Distribution

A stationary distribution is a distribution

(where

, after all it is a distribution) such that taking one more step through the transition matrix p, we end up with the same distribution as before, namely

. This can be represented by

. We will explain this concept through 2 and 3 state Markov chains, and then generalize to a method that works for any size Markov Chain.

Stationary Distribution for 2- state Markov Chain

Recall the weather chain example. We would like to write it in the stationary distribution form.

This means we get the following system of equations:

From high school algebra, we should be able to solve this as there are 2 equations and 2 unknowns. However, we observe that

, so plugging into the second equation we get

, so this is not the way to solve it. However, we also know that since both equations become

, and

, then

Quickly checking our solution, we arrive at

Gerealization to Stationary Distribution for 2-state Markov Chains

Now let's try to use this method to a general 2 state Markov chain, where the two states are "1" and "2". Let a be the probability of going from 1 to 2, and b be the probability of going from 2 to 1. Since each row has to sum to 1, this means the general chain looks like

Now to find the stationary distribution, we form the equations

Since both equations simplify to

, we need to additionally inforce the

condition to solve the systems. Using substitution, we arrive at

, so

Lastly, we check that this formula is consistent with the answer that we found for the weather chain problem. There

, so

, and

, which is consistent to what we found above.

Stationary Distribution for 3- state Markov Chain

Now we turn our attention to 3-state Markov chains, such as the social mobility problem.

But observe that if we add the equations together, since the columns come from a stochastic matrix, we get the redudant equation that

, which will cause the system of equations to be unsolvable. So we need to change this to

, since it is a probability distribution, and it must sum to 1. This means that we get the following system of equations.

Now with the following

statSocialMob = stationaryDistribution(socialMobility)
statSocialMob = 
      22/47          16/47           9/47    
brandPref = [0.8,0.1,0.1;0.2,0.6,0.2;0.3,0.3,0.4];
statBrandPref = stationaryDistribution(brandPref)
statBrandPref = 
       6/11           3/11           2/11    

General Stationary Distribution for n- state Markov Chains (advanced topic)

This code works for stochastic matrices of any size, but why? One of the main topics in Linear Algebra is that of an eigenvalue /eigenvector pair.

, where λ is the eigenvalue and v is the eigenvector. However, we don't have something that looks like this, but upon transposing the equation, we have

Setting

, we arrive at

, which is exactly the same as trying to find the stationary distribution for a stochastic matrix

. This means that we can use techiniques from linear algebra to solve for the stationary distribution.

[v,d]=eig(socialMobility')
v = 
     787/1025      2544/3131       364/1565  
    2309/4135     -1396/2933     -1778/2239  
    5036/16033     -174/517        785/1398  
d = 
       1              0              0       
       0            795/1801         0       
       0              0            598/3771  

Note that MatLab chose different values for the eigenvectors than the ones we chose. However, the ratio of v1,1 to v1,2 and the ratio of v2,1 to v2,2 are the same as our solution; the chosen eigenvectors of a system are not unique, but the ratio of their elements is. (MatLab chooses the values such that the sum of the squares of the elements of each eigenvector equals unity). However, with our case we choose it so that the sum of the elements of each eigenvector equals unity. This is the difference between

(what MatLab does) and

, which is what we need.

Convergence to Stationary Distribution

Why is it that sometimes we have

? Well, if we think about it, by applying more and more steps n, we will get pushed into the average over all transitions taking n steps. For example, if there is a (set of) recurent states, odds are you will have taken many steps to be in this cycle and

Convergence to largest eigenvalue

We notice that with power iteration, we always converge to the highest eigenvalue, but we need to converge to the eigenvalue corresponding to 1. First it is important to show that 1 is always an eigenvalue of a stochastic matrix.

Let

be a stochastic matrix, so its rows sum to 1. By definition, an eigenvalue/ eigenvector pair for a matrix

is a

such that

. Observe that if

, then

would satisfy. Since we know that each row sums to one, then adding each element together will get us 1 for each row. This means

, which implies that

Now that we know that

is an eigenvalue, why do we know that power iteration will always converge to this value. The answer is quiet simple once we know this one theorem.

Theorem 1.1 (Perron-Frobenius). Let

be an

matrix with nonnegative real

entries (i.e.

). Then we have the following:

has a nonnegative real eigenvalue. The largest such eigenvalue, , dominates the absolute values of all other eigenvalues of . The domination is strict if the entries of B are strictly positive.
If has strictly positive entries, then is a simple positive eigenvalue, and the corresponding eigenvector can be normalized to have strictly positive entries.
If has an eigenvector v with strictly positive entries, then the corresponding eigenvalue is .

From this theorem, we know that there has to exist a unique nonnegative, real eigenvalue that is bigger than all others. This means doing power iteration will converge to this eigenvalue.

The Perron–Frobenius eigenvalue satisfies the inequalities

However, in this setting of Stochastic matrices, we know that by summing over the columns, we will only have equality to 1, so

. This means that I, A, S and

converges to the stationary distribution, then power method will converge.

[ev, evec] = powerIter(brandPref', 1,100)
ev = 
       1       
evec = 
       6/11    
       3/11    
       2/11    

Appendix 1: Function Codes for above

Checks if Matrix is a stochastic matrix

function bool = isStochastic(A)
    num_rows = size(A,1);   % Number of rows
    num_cols = size(A,2);   % Number of columns
    % Ensure A is square
    if num_cols ~= num_rows
        bool = false;
        return;
    end
    for i = 1:num_rows
        row_sum = 0;
        % Loop over columns of ith row
        for j = 1:num_cols
            % Checks for positive numbers 
            if A(i,j) < 0
                bool = false;
                return;
            else
                row_sum = row_sum + A(i,j);
            end
        end
        % Checks row sum adds to 1
        if row_sum ~= 1
            bool = false;
            return;
        end
    end
    bool = true;
end

Checks if Matrix is a doubly stochastic matrix

function bool = isDoublyStochastic(A)
    if (isStochastic(A) && isStochastic(A'))
        bool = true;
    else 
        bool = false;
    end
end
 

Solves the Stationary distribution of a matrix

function statDist = stationaryDistribution(A)
    n = size(A,1);
    A = A - eye(n); % pi P = pi I so pi (P-I) = 0
    A(:,n) = ones(1,n); % rewriting the last equation as pi = 1
    lastRow = zeros(1,n);
    lastRow(n) = 1;
    % Solves (0,0,...,0,1)P^{-1} in a more efficient way
    statDist = lastRow/A;
end

Power Iteration Method

function [eval, evec] = powerIter(A,rank, niter)
    n = size(A,1);
    v = rand(n,rank);
    Av = A * v;
    Av = Av/norm(Av,1);
    for i = 1 : niter
        Av = A * Av;
        normAv = norm(Av,1);
        Av = Av/normAv;
    end
 
    evec = Av;
    eval = Av' * (A* Av) / (Av' * Av);
end