# Kolmogorov-Sinai entropy for Markov processes

I stumbled upon this interesting paper, which is already quite old:

V. Lecomte, C. Appert-Rolland and F. van Wijland
Thermodynamic formalism for systems with Markov dynamics, J. Stat. Phys. 2007

which naturally follows

P. Gaspard, Time-reversed dynamical entropy and irreversibility in Markovian random processes, J. Stat. Phys. 2004

The paper prosecutes on the line of bridging irreversibility in deterministic dynamical systems and in stochastic Markov processes, to which Pierre Gaspard made important contributions A broad review is Altaner’s Ph.D. Thesis Foundations of Stochastic Thermodynamics [arXiv:1410.3983]. On the one hand,  irreversibility in dynamical systems occurs because of the contraction of phase-space, which is quantified in terms of Lyapunov exponents and the Kolmogorov-Sinai entropy. The two concepts are related by Pesin’s theorem that establishes that the  KS entropy $h_{KS}$ is the sum of positive Lyapunov exponents. A notion of reversed KS entropy $h^R_{KS}$, which is the sum of (minus) the negative Lyapunov exponents, can also be introduced. On the other hand an analogue of the KS entropy, and of its time-reversed, can be constructed for Markov chains, and the difference $\sigma_t = h_{KS} - h^R_{KS}$ yields the thermodynamic entropy production that we know and love.

Till today I have always felt this analogy was incomplete, since I was not aware of a concept of Lyapunov exponent, phase-space contraction and hence an analogous Pesin theorem for Markov processes, so I was a bit dissatisfied by this parallel. In fact the authors comment that  “[…] therefore $\sigma_f$ is indeed the phase-space contraction rate (the sum of all Lyapunov exponents). We have of course no such a microscopic interpretation within the Markovian framework”.  Although, later on they write that “It can also be noted that, if one defines a Lyapunov exponent for the random walk through an equivalent one-dimensional map, as described in Refs. … , we recover Pesin’s theorem”, so now I have more literature to go though. This I leave for next time…

Let us now dwell into more technical considerations.

1 – KS entropy for continuous-time Markov jump processes

The theory woks well for Markov chains, but things are slightly less straightforward for continuous-time Markov jump processes, as already noticed by Gaspard. Let me try to formulate it my own way. The reason can be traced to the very definition of KS entropy:

$h_{KS} = - \lim_{t \to \infty} \frac{1}{t} \sum_{\mathrm{histories}} P_t(\mathrm{history}) \log P_t(\mathrm{history})$

where $P_t$ is a probability measure over the space of histories up to time $t$. The problem with this formula is common to all entropies: Taking the logarithm of a probability measure does not make much sense, it rather makes sense to take the logarithm of probability density, that is, of the Radon-Nikodym derivative or a probability measure with respect to another

$\log \frac{dP_t(\mathrm{history})}{dQ_t(\mathrm{history})}$

I have considered this problem in [EPL 2012, arXiv] and used it to construct a gauge theory of nonequilibrium processes. It is no surprise that it emerges in continuous-time Markov jump processes, since the probability of a sequence of states $\boldsymbol{x}$ and of waiting times in the interval $\boldsymbol{t}, \boldsymbol{t}+d\boldsymbol{t}$ between instant $0$ and $t$ is given by

$dP_t(\mathrm{history}) = dP_t(\boldsymbol{x},\boldsymbol{t}) = \delta\left(t - \sum t_i \right) \,e^{- t_n w_{x_n} } \prod_i w_{x_{i+1}\gets x_i} e^{- t_i w_{x_i} } ~ dt_1 \ldots dt_n$         (1)

where $w_{x \gets y}$ is the probability rate (with physical units of an inverse time) of a jump from $y$ to $x$, and $w_x = \sum_y w_{y \gets x}$ is the total probability rate out of $x$ Now, from this probability measure we cannot just take away the Lebesgue measure $dt_1 \ldots dt_n$, cause the probability density so obtained would be dimensional, and moreover this operation does not have any reasonable probabilistic meaning. In fact, the waiting times are stochastic variables themselves, since it is stipulated that the Markov jump process is a collection of Poissonian processes on top of a Markov chain.

The authors then propose to take as the argument of the logarithm

$P(\boldsymbol{x}) = \prod_i \frac{w_{x_{i+1}\gets x_i}}{w_{x_i}}$

This of course solves the problem, because it is a good adimensional probability density, positive and normalized. The interpretation of this as a probability though is a bit subtle. One might expect it to be the marginal of Eq. (1) with respect to times, but it’s not:

$P(\boldsymbol{x}) \neq \int_{\boldsymbol{t}} dP_t(\boldsymbol{x},\boldsymbol{t})$

In fact, if it were it would still depend on time, since the probability of a sequence of jumps up to time $t$ or to time $t'$ must be different. Instead it doesn’t. In fact, one can see by direct computation that

$P(\boldsymbol{x}) = \int dt \int_{\boldsymbol{t}} dP_t(\boldsymbol{x},\boldsymbol{t})$

where also $t$ is integrated away. Then, one can interpret $P(\boldsymbol{x})$ as the probability that the symbolic sequence $\boldsymbol{x}$ occurs at all, independently of how long it takes to produce it. Now let us stretch this a little further. Given that the symbolic sequence $\boldsymbol{x}$, what is the probability that the sequence of waiting times $\boldsymbol{t}$ is produced? Each time the system finds itself at a state, the pobability of jumping out of it is a Poisson process with rate $w_x$. Hence this probability is

$dQ_t(\boldsymbol{t}|\boldsymbol{x}) = \delta\left(t - \sum t_i \right) e^{- t_n w_{x_n} }\prod_{i

and of course

$dP_t (\boldsymbol{x},\boldsymbol{t}) = P(\boldsymbol{x}) dQ_t(\boldsymbol{t}|\boldsymbol{x})$

Now this comes to grips with my understanding that entropy is always relative entropy. In this case the KS entropy of a continuous-time Markov jump process is the relative entropy of the probability of a path relative to the probability of the waiting, conditioned on the a sequence of states. I also like this interpretation because interpreting $P(\boldsymbol{x})$ as the probability of a mere sequence of symbols makes the KS entropy closer to information-theoretic concepts like the Kolmogorov complexity of a string.

[follows?]