Markov Chains
Making Matrices With Matlab
Making matrices with matlab is rather straight forward, as you separate elements in the same row by a ' ' or a ',', and you separate rows by the ';'.
For example, if we wanted to make the matrix
we would use the following code:
A = [1,0,0; 0,0.5,0.5; 0.1,0.4,0.5]
A =
1 0 0
0 1/2 1/2
1/10 2/5 1/2
In fact, there is no requirement to have a square matrix, so we could make the following matrix B = [0.3,0.3,0.4; 0.6,0.0,0.4]
B =
3/10 3/10 2/5
3/5 0 2/5
This works for any shape matrix, but if the matrix has some special structure, then it might be easier to input it differently to take less time. For example, if we want we can use the short cut of
This allows us to make the identity matrix. Using Matrices to represent Markov Chains
Markov chains are a collection of states with probabilities to go to the next state. Since these are probabilities, we need to make sure that each entry of each transition is between 0 and 1, and that the total of all transitions sums to exactly 1. A matrix that has these properties are called 'stochastic'. This is what motivates the code 'isStochastic(A)', as it checks whether a given matrix is stochastic, and can be used to represent a Markov chain.
boolean = isStochastic(A)
doubleBool = isDoublyStochastic(A)
Visualizing Markov Chains
graphplot(mc, 'LabelEdges',true, 'ColorNodes',true)
mc = dtmc(socialMobility);
graphplot(mc, 'LabelEdges',true, 'ColorNodes',true)
Multistep Transitions
socialMobility = [0.7,0.2,0.1;0.3,0.5,0.2;0.2,0.4,0.4]
socialMobility =
7/10 1/5 1/10
3/10 1/2 1/5
1/5 2/5 2/5
h
A2 = A^2
A2 =
1 0 0
1/20 9/20 1/2
3/20 2/5 9/20
A2m = mpower(A,2)
A2m =
1 0 0
1/20 9/20 1/2
3/20 2/5 9/20
Classification of States
Stationary Distribution
A stationary distribution is a distribution (where , after all it is a distribution) such that taking one more step through the transition matrix p, we end up with the same distribution as before, namely . This can be represented by . We will explain this concept through 2 and 3 state Markov chains, and then generalize to a method that works for any size Markov Chain. Stationary Distribution for 2- state Markov Chain
Recall the weather chain example. We would like to write it in the stationary distribution form.
This means we get the following system of equations:
From high school algebra, we should be able to solve this as there are 2 equations and 2 unknowns. However, we observe that , so plugging into the second equation we get , so this is not the way to solve it. However, we also know that since both equations become , and , then . Quickly checking our solution, we arrive at
Gerealization to Stationary Distribution for 2-state Markov Chains
Now let's try to use this method to a general 2 state Markov chain, where the two states are "1" and "2". Let a be the probability of going from 1 to 2, and b be the probability of going from 2 to 1. Since each row has to sum to 1, this means the general chain looks like
Now to find the stationary distribution, we form the equations
Since both equations simplify to , we need to additionally inforce the condition to solve the systems. Using substitution, we arrive at , so . Lastly, we check that this formula is consistent with the answer that we found for the weather chain problem. There , so , and , which is consistent to what we found above. Stationary Distribution for 3- state Markov Chain
Now we turn our attention to 3-state Markov chains, such as the social mobility problem.
But observe that if we add the equations together, since the columns come from a stochastic matrix, we get the redudant equation that , which will cause the system of equations to be unsolvable. So we need to change this to , since it is a probability distribution, and it must sum to 1. This means that we get the following system of equations. Now with the following
statSocialMob = stationaryDistribution(socialMobility)
statSocialMob =
22/47 16/47 9/47
brandPref = [0.8,0.1,0.1;0.2,0.6,0.2;0.3,0.3,0.4];
statBrandPref = stationaryDistribution(brandPref)
statBrandPref =
6/11 3/11 2/11
General Stationary Distribution for n- state Markov Chains (advanced topic)
This code works for stochastic matrices of any size, but why? One of the main topics in Linear Algebra is that of an eigenvalue /eigenvector pair. , where λ is the eigenvalue and v is the eigenvector. However, we don't have something that looks like this, but upon transposing the equation, we have . Setting , we arrive at , which is exactly the same as trying to find the stationary distribution for a stochastic matrix . This means that we can use techiniques from linear algebra to solve for the stationary distribution. [v,d]=eig(socialMobility')
v =
787/1025 2544/3131 364/1565
2309/4135 -1396/2933 -1778/2239
5036/16033 -174/517 785/1398
d =
1 0 0
0 795/1801 0
0 0 598/3771
Note that MatLab chose different values for the eigenvectors than the ones we chose. However, the ratio of v1,1 to v1,2 and the ratio of v2,1 to v2,2 are the same as our solution; the chosen eigenvectors of a system are not unique, but the ratio of their elements is. (MatLab chooses the values such that the sum of the squares of the elements of each eigenvector equals unity). However, with our case we choose it so that the sum of the elements of each eigenvector equals unity. This is the difference between (what MatLab does) and , which is what we need. Convergence to Stationary Distribution
Why is it that sometimes we have ? Well, if we think about it, by applying more and more steps n, we will get pushed into the average over all transitions taking n steps. For example, if there is a (set of) recurent states, odds are you will have taken many steps to be in this cycle and
Convergence to largest eigenvalue
We notice that with power iteration, we always converge to the highest eigenvalue, but we need to converge to the eigenvalue corresponding to 1. First it is important to show that 1 is always an eigenvalue of a stochastic matrix.
Let be a stochastic matrix, so its rows sum to 1. By definition, an eigenvalue/ eigenvector pair for a matrix is a such that . Observe that if , then would satisfy. Since we know that each row sums to one, then adding each element together will get us 1 for each row. This means , which implies that . Now that we know that is an eigenvalue, why do we know that power iteration will always converge to this value. The answer is quiet simple once we know this one theorem.
Theorem 1.1 (Perron-Frobenius). Let be an matrix with nonnegative real entries (i.e. ). Then we have the following: - has a nonnegative real eigenvalue. The largest such eigenvalue, , dominates the absolute values of all other eigenvalues of . The domination is strict if the entries of B are strictly positive.
- If has strictly positive entries, then is a simple positive eigenvalue, and the corresponding eigenvector can be normalized to have strictly positive entries.
- If has an eigenvector v with strictly positive entries, then the corresponding eigenvalue is .
From this theorem, we know that there has to exist a unique nonnegative, real eigenvalue that is bigger than all others. This means doing power iteration will converge to this eigenvalue.
The Perron–Frobenius eigenvalue satisfies the inequalities
However, in this setting of Stochastic matrices, we know that by summing over the columns, we will only have equality to 1, so . This means that I, A, S and converges to the stationary distribution, then power method will converge. [ev, evec] = powerIter(brandPref', 1,100)
Appendix 1: Function Codes for above
Checks if Matrix is a stochastic matrix
function bool = isStochastic(A)
num_rows = size(A,1); % Number of rows
num_cols = size(A,2); % Number of columns
% Loop over columns of ith row
% Checks for positive numbers
row_sum = row_sum + A(i,j);
% Checks row sum adds to 1
Checks if Matrix is a doubly stochastic matrix
function bool = isDoublyStochastic(A)
if (isStochastic(A) && isStochastic(A'))
Solves the Stationary distribution of a matrix
function statDist = stationaryDistribution(A)
A = A - eye(n); % pi P = pi I so pi (P-I) = 0
A(:,n) = ones(1,n); % rewriting the last equation as pi = 1
% Solves (0,0,...,0,1)P^{-1} in a more efficient way
Power Iteration Method
function [eval, evec] = powerIter(A,rank, niter)
eval = Av' * (A* Av) / (Av' * Av);