Coding Fractals by trees

Recently I’ve been editing a set of notes taken by professor Kra during Furstenberg’s course last year. (Well…I should have done this last year >.<) One of the main ideas in the course was to build a correspondence between trees and Fractal sets – and hence enables one to prove statements about dimensions and projections of Fractals by looking at the corresponding trees. I want to sketch some highlights of the basic idea here.

Let Q=Q^{(n)} denote the unit cube in \mathbb{R}^{n}.

Let A\subset\mathbb{R}^{n} and x\in A. For t\in\mathbb{R}, consider the family
t(A-x)\cap Q.

Question:Does the limit exist as t\to\infty? (here we consider the Hausdorff limit on compact subsets of Q)

i.e. we ‘zoom in’ the set around the point x, always crop the set to the cube Q and consider what does the set ‘look like’ when the strength of the magnifying glass approaches infinity.

Example:
If A is smooth (curve of surface), then this limit exists and is a subspace intersected with Q.

Generally one should not expect the above limit to exist for fractal sets, however if we weaken the question and ask for the limit when we take a sequence (n_k)\rightarrow\infty, then it’s not hard to see self-similar sets would have non-trivial limits. Hence in some sense fractal behavior is characterized by having non-trivial limit sets.

Definition: Let A\subset Q^{(n)},
A' is a mini-set of A if for some \lambda\geq 1 and u\in\mathbb{R}^{n}, A' =(\lambda A+u)\cap Q

A'' is a micro-set of A if there is sequence of minisets A'_{n} of A and A'' = \lim_{n\to\infty}A'_{n}

A is homogeneous if all of its microsets are minisets. As we should see below, homogeneous fractals are ‘well-behaved’ in terms of dimensions.

Here comes a version of our familiar definition:

Definition:A set A\subset X has Hausdorff \alpha-measure 0 if for every \varepsilon > 0, there exists a covering of A\subset \bigcup_{n=1}^{\infty}B_{\rho_{n}} with \sum_{n}\rho_{n}^{\alpha}  \alpha.

Thus it makes sense to define:

Definition: The Hausdorff dimension of A is

\dim(A) = \inf\{\alpha>0\colon \text{Hausdorff } \alpha \text{ measure is } 0 \}.

Now comes trees~ For A\subset[0,1] closed, consider expansion in base 3 of numbers in A. In the expansion of each number in A, there will be certain digits which appear. Following this digit, there may be certain digits that can appear. This defines a tree, which is a tree of shrinking triadic intervals that intersects the set A.

Definition: Let \Lambda be a finite set of symbols (which we will refer to as the alphabet and elements of \Lambda are letters of the alphabet).

A word w is any concatenation of finitely many symbols in \Lambda, the number of symbols in w is called the length of w, denoted by \ell(w).

A tree \tau in the alphabet \Lambda is a set of words satisfying
1. If uv\in\tau then u\in\tau.
2. If u\in\tau, then there exists a letter a\in\Lambda such that ua\in\tau.

Notation: if w\in\tau, denote \tau^{w} = \{v\colon wv\in\tau\}.

Definition: A section \Sigma of \tau is a finite set of words for which there exists N such that if s\in\tau and \ell(s) \geq N, then there exists r\in\Sigma and a word w such that s = rw.

Definition: The Hausdorff dimension of a tree is defined by \dim(\tau)=\inf\{ \beta \ | \ \inf_{\Sigma}\{\sum_{r\in\Sigma} e^{-\beta\ell(r)}\} = 0 \} where \Sigma is any section of \tau.

Theorem: The Hausdorff dimension of a tree equals the Hausdorff dimension of set it corresponds to.

The proof is merely going through the definition, hence we won’t present here. It suffice to check the dimension is unchanged if we only consider open ball covers with balls of radius $p^{-n}$ for any given $p \in \N$. Which is indeed true.

Now let’s give a simple example to illustrate how to make use of this correspondence:

It is easy to prove \dim(X \times Y) \geq \dim(X) \times \dim(Y), here we produce an example were the inequality is strict.

Let A \subseteq [0,1] be the set of numbers whose decimal expansion vanishes from the k_{2n} to k_{2n+1}-th digits to the right of 0.

To compute \dim(A), look at levels such that a small number of intervals will cover the set A. The number of intervals of size 10^{k_{2n+1}} is less than 10^{k_{2n}}. Then if \frac{10^{k_{2n}}}{k_{2n+1}} goes to 0, we’ll have that the Hausdorff dimension is 0. So we just
need k_{2n}/k_{2n+1}\to 0.

Let B be the set of numbers whose decimals vanish from k_{2n-1} to k_{2n}-1-th digits. By the same computation, \dim(B) = 0.

What about A\times B? The tree that corresponds to this is a regular tree which every point has 10 branches. So the Hausdorff dimension is 1.

Note: Here the structure of the tree gives you easy information on the set, but it is hard to look directly at the set and see this information.

Many theorems regarding dimension of projections and intersections of fractal sets turns out to be simple to prove after we coded the sets by trees.

Remarks from the Oxtoby Centennial Conference

A few weeks ago, I received this mysterious e-mail invitation to the ‘Oxtoby Centennial Conference’ in Philadelphia. I had no idea about how did they find me since I don’t seem to know any of the organizers, as someone who loves conference-going, of course I went. (Later I figured out it was due to Mike Hockman, thanks Mike~ ^^ ) The conference was fun! Here I want to sketch a few cool items I picked up in the past two days:

Definition:A Borel measure \mu on [0,1]^n is said to be an Oxtoby-Ulam measure (OU for shorthand) if it satisfies the following conditions:
i) \mu([0,1]^n) = 1
ii) \mu is positive on open sets
iii) \mu is non-atomic
iv) \mu(\partial [0,1]^n) = 0

Oxtoby-Ulam theorem:
Any Oxtoby-Ulam measure is the pull-back of the Lebesgue measure by some homeomorphism \phi: [0,1]^n \rightarrow [0,1]^n.

i.e. For any Borel set A \subseteq [0,1]^n, we have \mu(A) = \lambda(\phi(A)).

It’s surprising that I didn’t know this theorem before, one should note that the three conditions are clearly necessary: A homeo has to send open sets to open sets, points to points and boundary to boundary; we know that Lebesgue measure is positive on open sets, 0 at points and 0 on the boundary of the square, hence any pull-back of it must also has those properties.

Since I came across this at such a late time, my first reaction was: this is like Moser’s theorem in the continuous case! But much cooler! Because measures are a lot worse than differential forms: many weird stuff could happen in the continuous setting but not in the smooth setting.

For example, we can choose a countable dense set of smooth Jordan curves in the cube and assign each curve a positive measure (we are free to choose those values as long as they sum to 1. Now we can define a measure supported on the union of curves and satisfies the three conditions. (the measure restricted to each curve is just a multiple of the length) Apply the theorem, we get a homeomorphism that sends each Jordan curve to a Jordan curve with positive n dimensional measure and the n dimensional measure of each curve is equal to our assigned value! (Back in undergrad days, it took me a whole weekend to come up with one positive measured Jordan curve, and this way you get a dense set of them, occupying a full measure set in the cube, for free! Oh, well…>.<)

Question: (posed by Albert Fathi, 1970)
Does the homeomorphism \phi sending \mu to Lebesgue measure depend continuously on \mu?

My first thought was to use smooth volume forms to approximate the measure, for smooth volume forms, Moser’s theorem gives diffeomosphisms depending continuously w.r.t. the form (I think this is true just due to the nature of the construction of the Moser diffeos) the question is how large is the closure of smooth forms in the space of OU-measures. So I raised a little discussion immediately after the talk, as pointed out by Tim Austin, under the weak topology on measures, this should be the whole space, with some extra points where the diffeos converge to something that’s not a homeo. Hence perhaps one can get the homeo depending weakly continuously on \mu.

Lifted surface flows:

Nelson Markley gave a talk about studying flows on surfaces by lifting them to the universal cover. i.e. Let \phi_t be a flow on some orientable surface S, put the standard constant curveture metric on S and lift the flow to \bar{\phi}_t on the universal cover of S.

There is an early result:

Theorem: (Weil) Let \phi_t be a flow on \mathbb{T}^2, \bar{\phi}_t acts on the universal cover \mathbb{R}^2, then for any p \in \mathbb{R}^2, if \displaystyle \lim_{t\rightarrow \infty} ||\bar{\phi}_t(p)|| = \infty then \lim_{t\rightarrow \infty} \frac{\bar{\phi}_t(p)}{||\bar{\phi}_t(p)||} exists.

i.e. for lifted flows, if an orbit escapes to infinity, then it must escape along some direction. (No sprial-ish or wild oscillating behavior) This is due to the nature that the flow is the same on each unit square.

We can find its analogue for surfaces with genus larger than one:

Theorem: Let \phi_t be a flow on S with g \geq 2, \bar{\phi}_t: \mathbb{D} \rightarrow \mathbb{D}, then for any p \in \mathbb{D}, if \displaystyle \lim_{t\rightarrow \infty} ||\bar{\phi}_t(p)|| = \infty then \lim_{t\rightarrow \infty} \bar{\phi}_t(p) is a point on the boundary of \mathbb{D}.

Using those facts, they were able to prove results about the structure of \omega limiting set of such orbits (those that escapes to infinity in the universal cover) using the geometric structure of the cover.

I was curious about what kind of orbits (or just non-self intersecting curves) would ‘escape’, so here’s some very simple observations: On the torus, this essentially means that the curve does not wind around back and forth infinitely often with compatible magnitudes, along both generators. i.e. the curve ‘eventually’ winds mainly in one direction along each generating circle. Very loosely speaking, if a somewhat similar thing is true for higher genus surfaces, i.e. the curve eventually winds around generators in one direction (and non-self intersecting), then it would not be able to have very complicated \omega limiting set.

Measures on Cantor sets

In contrast to the Oxtoby-Ulam theorem, one could ask: Given two measures on the standard middle-third Cantor set, can we always find a self homeomorphism of the Cantor set, pushing one measure to the other?

Given there are so many homeomorphisms on the Cantor set, this sounds easy. But in fact it’s false! –There are countably many clopen subsets of the Cantor set (Note that all clopen subsets are FINITE union of triadic copies of Cantor sets, a countable union would necessarily have a limit point that’s not in the union), a homeo needs to send clopen sets to clopen sets, hence for there to exist a homeo the countably many values the measures take on clopen sets must agree.

So a class of ‘good measures’ on Cantor sets was defined in the talk and proved to be realizable by a pull back the standard Hausdorff measure via a homeo.

I was randomly thinking about this: Given a non-atomic measure \mu on the Cantor set, when can it be realized as the restriction of the Lebesgue measure to an embedding of the Cantor set? After a little bit of thinking, this can always be done. (One can simple start with an interval, break it into two pieces according to the measure \mu of the clopen sets before and after the largest gap, then slightly translate the two pieces so that there is a gap between them; iterate the process)

In any case, it’s been a fun weekend! ^^

On compact extensions

This is again a note on my talk in the Szemerédi’s theorem seminar, going through Furstenberg’s book. In this round, my part is to introduce compact extension.
Let \Gamma be an abelian group of measure preserving transformations on (X, \mathcal{B}, \mu), \alpha: (X, \mathcal{B}, \mu, \Gamma) \rightarrow ( Y, \mathcal{D}, \nu, \Gamma') be an extension map.
i.e. \alpha: X \rightarrow Y s.t. \alpha^{-1} sends \nu-0 sets to \mu-0 sets;

\gamma'\circ \alpha (x) = \alpha \circ \gamma (x)

Definition: A sequence of subsets (I_k) of \Gamma is a Folner sequence if |I_k| \rightarrow \infty and for any \gamma \in \Gamma,

\frac{| \gamma I_k \Delta I_k|}{|I_k|} \rightarrow 0

Proposition: For any Folner sequence I = (I_k) of \Gamma, for any f \in L^1(X), \displaystyle \frac{1}{|I_k|} \sum_{\gamma \in I_k} \gamma f converges weakly to the orthogonal projection of f onto the subspace of \Gamma-invariant functions. (Denoted P(f) where P: L^2(X) \rightarrow L^2_{inv}(X).

Proof: Let \mathcal{H}_0 = P^{-1}(\bar{0}) = (L^2_{inv}(X))^\bot
For all \gamma \in \Gamma,

\gamma (L^2_{inv}(X)) \subseteq L^2_{inv}(X)

Since \Gamma is \mu-preserving, \gamma is unitary on L^2(X). Therefore we also have \gamma( \mathcal{H}_0) \subseteq \mathcal{H}_0.

For f \in \mathcal{H}_0, suppose there is subsequence (n_k) where \displaystyle \frac{1}{|I_{n_k}|} \sum_{\gamma \in I_{n_k}} \gamma (f) converges weakly to some g \in L^2(X).

By the property that \frac{| \gamma I_k \Delta I_k|}{|I_k|} \rightarrow 0, we have for each \gamma \in \Gamma, \gamma(g) = g, \ g is \Gamma-invariant. i.e. g \in (\mathcal{H}_0)^\bot

However, since f \in \mathcal{H}_0 hence all \gamma(f) are in \mathcal{H}_0 hence g \in  \mathcal{H}_0. Therefore g \in \mathcal{H}_0 \cap (\mathcal{H}_0)^\bot, g=\bar{0}

Recall: 1)X \times_Y X := \{ (x_1, x_2) \ | \ \alpha(x_1) = \alpha(x_2) \}.

i.e. fibred product w.r.t. the extension map \alpha: X \rightarrow Y.

2)For H \in L^2(X \times_Y X), \ f \in L^2(X),

(H \ast f)(x) = \int H(x_1, x_2) f(x_2) d  \mu_{\alpha(x_1)}(x_2)

Definition: A function f \in L^2(X) is said to be almost periodic if for all \varepsilon > 0, there exists g_1, \cdots g_k \in L^2(X) s.t. for all \gamma \in \Gamma and almost every y \in Y,

\displaystyle \min_{1 \leq i \leq k} || \gamma (f) - g_i||_y < \varepsilon

Proposition: Linear combination of almost periodic functions are almost periodic.

Proof: Immediate by taking all possible tuples of g_i for each almost periodic function in the linear combination corresponding to smaller \varepsilonl.

Definition: \alpha: (X, \mathcal{B}, \mu, \Gamma) \rightarrow ( Y, \mathcal{D}, \nu, \Gamma') is a compact extension if:

C1: \{ H \ast f \ | \ H \in L^\infty (X \times_Y X) \cap \Gamma_{inv} (X \times_Y X), f \in L^2(X) \} contains a basis of L^2(X).

C2: The set of almost periodic functions is dense in L^2(X)

C3: For all f \in L^2(X), \ \varepsilon, \delta > 0, there exists D \subseteq Y, \ \nu(D) > 1- \delta, \  g_1, \cdots, g_k \in L^2(X) s.t. for any \gamma \in \Gamma and almost every y \in Y, we have

\displaystyle \min_{1 \leq i \leq k} || \gamma (f)|_{f^{-1}(D)} - g_i||_y < \varepsilon

C4: For all f \in L^2(X), \ \varepsilon, \delta > 0, there exists g_1, \cdots, g_k \in L^2(X) s.t. for any \gamma \in \Gamma, there is a set D \subseteq Y, \ \nu(D) > 1- \delta, for all y \in D

\displaystyle \min_{1 \leq i \leq k} || \gamma (f) - g_i||_y < \varepsilon

C5: For all f \in L^2(X), let \bar{f} \in L^1(X \times_Y X) where

\bar{f}: (x_1, x_2) \mapsto f(x_1) \cdot f(x_2)

Let I=(I_k) be a Folner sequence, then \bar{f}=\bar{0} iff P \bar{f} = \bar{0}.

Theorem: All five definitions are equivalent.

Proof: “C1 \Rightarrow C2″

Since almost periodic functions are closed under linear combination, it suffice to show any element in a set of basis is approximated arbitrarily well by almost periodic functions.

Let our basis be as given in C1.

For all H \in L^\infty (X \times_Y X) \cap \Gamma_{inv} (X \times_Y X), the associated linear operator \varphi_H: L^2(X) \rightarrow L^2(X) where \varphi_H: f \mapsto H \ast f is bounded. Hence it suffice to check H \ast f for a dense set of f \in L^2(X). We consider the set of all fiberwise bounded f i.e. for all y \in Y, ||f||_y \leq M_y.

For all \delta > 0, we perturb H \ast f by multiplying it by the characteristic function of a set of measure at least 1- \delta to get an almost periodic function.

“C2 \Rightarrow C3″:

For any f \in L^2(X), there exists f' almost periodic, with ||f-f'||< \frac{\epsilon \sqrt{\delta}}{2} . Let \{ g_1, g_2, \cdots, g_{k-1} \} be the functions obtained from the almost periodicity of f' with constant \varepsilon/2, g_k = \bar{0}.

Let D = \{ y \ | \ ||f-f'||_y < \varepsilon/2 \}, since

|| f - f'||^2 = \int ||f-f'||_y^2 d \nu(y)

Hence ||f-f'||< \frac{\varepsilon \sqrt{\delta}}{2} \ \Rightarrow \ ||f-f'||^2 < \frac{\varepsilon^2 \delta}{4}, \{ y \ | \ ||f-f'||_y > \varepsilon/2 \} has measure at most \delta/2, therefore \nu(D) > 1- \delta.

For all \gamma \in \Gamma, ify \in \gamma^{-1}(D) then

|| \gamma f|_{\alpha^{-1}(D)} - \gamma f'||_y  = ||f|_{\alpha^{-1}(D)} - f'||_{\gamma(y)} < \varepsilon /2

Hence \displaystyle \min_{1 \leq i \leq k-1} ||\gamma f|_{\alpha^{-1}(D)} - g_i||_y < \varepsilon /2 + \varepsilon /2 = \varepsilon

If y \notin \gamma^{-1}(D) then f|_{\alpha^{-1}(D)} vanishes on \alpha^{-1}(\gamma y) so that || \gamma f|_{\alpha^{-1}(D)} - g_i||_y = 0 < \varepsilon.

Hence \alpha satisfies C3.

“C3 \Rightarrow C4″:

This is immediate since for all y \in \gamma^{-1}(D), we have \gamma f = \gamma f|_{\alpha^{-1}(D)} on \alpha^{-1}(y) hence

\displaystyle \min_{1 \leq i \leq k} ||\gamma f - g_i||_y < \min_{1 \leq i \leq k-1} ||\gamma f_{\alpha^{-1}(D)} - g_i||_y < \varepsilon

\nu(\gamma^{-1}(D)) = \nu(D) > 1-\delta. Hence \alpha satisfies C4.

“C4 \Rightarrow C5″:

For all f \in L^2(X), \ \varepsilon, \delta > 0, by C4, there exists g_1, \cdots, g_k \in L^2(X) s.t. for any \gamma \in \Gamma, there is a set D \subseteq Y, \ \nu(D) > 1- \delta, for all y \in D

\displaystyle \min_{1 \leq i \leq k} || \gamma (f) - g_i||_y < \varepsilon

W.L.O.G. we may suppose all g_i are bounded since by making \delta slighter larger we can modify the unbounded parts to be bounded.

\bar{g_j} \otimes g_j \in L^\infty(X \times_Y X), suppose P(\bar{f}) = 0.

Recall in C5 we have \bar{f}: (x_1, x_2) \mapsto f(x_1) \cdot f(x_2), and \displaystyle P_I \bar{f}(x_1, x_2) = \lim_{k \rightarrow \infty} \frac{1}{|I_k|} \sum_{\gamma \in I+k} f(\gamma x_1) \bar{ f(\gamma x_2)}.

For each 1 \leq j \leq k, we have \int (\bar{g_j} \otimes g_j) \cdot P \bar{f} d(\mu \times_Y \mu) = 0

Hence we have \displaystyle \lim_{i \rightarrow \infty} \frac{1}{|I_i|} \sum_{\gamma \in I_i} \int (\bar{g_j(x_1)} g_j(x_2)) \cdot \gamma f(x_1) \bar{\gamma f(x_2)} d\mu \times_Y \mu(x_1, x_2) = 0

\Rightarrow \displaystyle \lim_{i \rightarrow \infty} \frac{1}{|I_i|} \sum_{\gamma \in I_i} \int | \int \bar{g_j(x)} \gamma f(x) d\mu_y(x)|^2 d \nu(y) = 0

\Rightarrow \displaystyle \lim_{i \rightarrow \infty} \frac{1}{|I_i|} \sum_{\gamma \in I_i} \{ \sum_{j=1}^k \int | \int \bar{g_j(x)} \gamma f(x) d\mu_y(x)|^2 d \nu(y) \} = 0

Hence for large enough i, there exists \gamma \in I_i s.t. \sum_{j=1}^k \int | \int \bar{g_j(x)} \gamma f(x) d\mu_y(x)|^2 d \nu(y) is as small as we want.

We may find D' \subseteq Y with \nu(D) > 1-\delta s.t. for all y \in D' and for all j, we have

| \int \bar{g_j(x)} \gamma f(x) d\mu_y(x)|^2 < \varepsilon^2

On the other hand, by construction there is j with || \gamma f - g_j||^2_y < \varepsilon^2 for all y \in D, with \nu(D) > 1-\delta .

Hence for y \in D \cap D', \ ||f||_{\gamma'^{-1}(y)}^2 = || \gamma f||_y^2 < 3 \varepsilon^2.

Let \varepsilon \rightarrow 0, \ \delta \rightarrow 0 we get f = \bar{0}. Hence C5 holds.

“C5 \Rightarrow C1″

Let f \in L^2(X) orthogonal to all of such functions. Let (I_k) be a Folner sequence.

Define \displaystyle H(x_1, x_2) := \lim_{i \rightarrow \infty} \frac{1}{|I_i|}\sum_{\gamma \in I_i} \gamma f(x_1) \cdot \gamma f(x_2) = P \bar{f}(x_1, x_2)

Let H_M(x_1, x_2) be equal to H whenever H(x_1, x_2) \leq M and 0 o.w.

H is \Gamma-invariant \Rightarrow \ H_M is \Gamma-invariant and bounded.

Therefore f \bot H_M \ast f, i.e.

\int \bar{f(x_1)} \{ \int H_M(x_1, x_2) d \mu_{\alpha(x_1)}(x_2) \} d \mu(x_1) = 0 <\p>

Since \mu = \int \mu_y d \nu(y), we get

\int \bar{f} \otimes f \cdot H_M d \mu \times_Y \mu = 0 <\p>

Hence H_M \bot (\bar{f} \otimes f). For all \gamma \in \Gamma, \ \gamma (\bar{f} \otimes f) \bot \gamma H_M = H_M.

Since H = P \bar{f} is an average of \gamma (\bar{f} \otimes f), \ \Rightarrow \ H \bot H_M.
0 = \int \bar{H} \cdot H_M = \int |H_M|^2 \ \Rightarrow \ H_M = \bar{0} for all M

Hence H = \bar{0}. By C5, we obtain f = \bar{0}. Hence \{ H \ast f \ | \ H \in L^\infty (X \times_Y X) \cap \Gamma_{inv} (X \times_Y X), f \in L^2(X) \} contain a basis for L^2(X).

Definition: Let H be a subgroup of \Gamma, \alpha: (X, \mathcal{B}, \mu, \Gamma) \rightarrow ( Y, \mathcal{D}, \nu, \Gamma') is said to be compact relative to H if the extension \alpha: (X, \mathcal{B}, \mu, H) \rightarrow ( Y, \mathcal{D}, \nu, H') is compact.

Notes for my lecture on multiple recurrence theorem for weakly mixing systems – Part 2

Now we can start to prove the multiple recurrence theorem in the weak mixing case. Again the material is from Furstenberg’s book ‘Recurrence in Ergodic Theory and Combinatorial Number Theory’ which, very unfortunately, is hard to find a copy of.

Definition: A sequence (x_i) \subseteq X converges in density to x if there exists Z \subseteq \mathbb{N} of density 0 s.t. for all neighborhood U of x, \exists N \in \mathbb{N}, \ \{ x_n \ | \ n \geq N and n \notin Z \} \subseteq U.

We denote (x_n) \rightarrow_D \ x.

Theorem: For (X, \mathcal{B}, \mu, T) weakly mixing,

then \forall f_0, f_1, \cdots, f_k \in L^\infty(X), we have

\int f_0(x) f_1(T^n(x)) f_2(T^{2n}(x)) \cdots f_k(T^{kn}(x)) \ d \mu

\rightarrow_D \int f_0 \ d \mu \int f_1 \ d \mu \cdots \int f_k \ d \mu as n \rightarrow \infty

In particular, this implies for any A \in \mathcal{B} with \mu(A)>0, by taking f_0 = f_1 = \cdots = f_k = \chi_A we have
\mu(A \cap T^{-n}(A) \cap \cdots \cap T^{-kn}(A)) \rightarrow_D \mu(A)^k > 0.
Hence we may pick N \in \mathbb{N} for which
\mu(A \cap T^{-N}(A) \cap \cdots \cap T^{-kN}(A)) > 0.

Establishes the multiple recurrence theorem.

To prove the theorem we need the following:

Lemma 1: Let (f_n) be a bounded sequence in Hilbert space \mathcal{H}, if \langle f_{n+m}, f_n \rangle \rightarrow_D a_m as n \rightarrow \infty, a_m \rightarrow_D 0 as m \rightarrow \infty. Then (f_n) converges weakly in density to \overline{0}

In order to prove the lemma 1, we need:

Lemma 2: Given \{ R_q \ | \ q \in Q \} a family of density 1 subsets of \mathbb{N}, indexed by density 1 set Q \subseteq \mathbb{N}. Then for all S \subseteq \mathbb{N} of positive upper density, for all k \geq 1. There exists \{ n_1, n_2, \cdots, n_k \} \subseteq S, n_1 < n_2 \cdots < n_k such that whenever ii \in Q and n_i \in R_{n_j-n_i}.

Proof of lemma 2: We’ll show this by induction. For k=1, since there is no such j>1, the statement is vacant.

We’ll proceed by induction: Suppose for k>1, there exists S_k \subseteq S of positive upper density and integers m_1< \cdots < m_k such that (S_k+m_1) \cup (S_k+m_2) \cup \cdots \cup (S_k+m_k) \subseteq S and for all j>i, m_j - m_i \in Q and S_k+m_i \subseteq R_{m_j-m_i}.

For k+1, we shall find S_{k+1} \subseteq S_k with positive upper density and m_{k+1}>m_k where S_{k+1}+m_{k+1} \subseteq S and for all 1 \leq i \leq k, m_{k+1} - m_i \in Q and S_{k+1}+m_i \subseteq R_{m_{k+1}-m_i}.

Let S_k^* = \{n \ | \ S_k+n \cap S_k has positive upper density \}.

Claim:S_k^* has positive upper density.

Since \overline{D}(S_k) = \epsilon >0, let N = \lceil 1/ \epsilon \rceil.

Hence there is at most N-1 translates of S_k that pairwise intersects in sets of density 0.

Let M < N be the largest number of such sets, let p_1, \cdots, p_M be a set of numbers where (S_k+p_i) \cap (S_k+p_j) has density 0.
i.e. S_k+(p_j-p_i) \cap S_k has density 0.

Therefore for any p>p_M, (S_k+p-p_i) \cap S_k has positive upper density for some i. Hence p-p_i \in S_k^*. S_k^* is syntactic with bounded gap 2 \cdot p_M hence has positive upper density.

Pick \displaystyle m_{k+1} \in S_k^* \cap \bigcap_{i=1}^k(Q+m_i).

(Hence m_{k+1}-m_i \in Q for each 1 \leq i \leq k)

Let \displaystyle S_{k+1} = (S_k - m_{k+1})  \cap S_k \cap \bigcap_{i=1}^k (R_{m_k+1 - m_i}-m_i).

(S_k - m_{k+1}) \cap S_k has positive upper density, \bigcap_{i=1}^k (R_{m_k+1 - m_i}-m_i) has density 1, S_{k+1} has positive upper density.

S_{k+1}, \ m_{k+1} satisfied the desired property by construction. Hence we have finished the induction.

Proof of lemma 1:

Suppose not. We have some \epsilon > 0, \ f \neq \overline{0},

S = \{ n \ | \ \langle f_n, f \rangle > \epsilon \} has positive upper density.

Let \delta = \frac{\epsilon^2}{2||f||^2}, let Q = \{m \ | \ a_m < \delta/2 \} has density 1.

\forall q \in Q, let R_q = \{ n \ | \ \langle f_{n+q}, f_n \rangle < \delta \} has density 1.

Apply lemma 2 to Q, \{R_q \}, S, we get:

For all k \geq 1. There exists \{ n_1, n_2, \cdots, n_k \} \subseteq S, n_1 < n_2 \cdots < n_k such that whenever i<j , n_j - n_i \in Q and n_i \in R_{n_j-n_i}.

i) n_i \in S \ \Leftrightarrow \ \langle f_{n_i}, f \rangle > \epsilon

ii) n_i \in R_{n_j-n_i} \ \Leftrightarrow \ \langle f_{n_i}, f_{n_j} \rangle < \delta

Set g_i = f_{n_i} - \epsilon \cdot \frac{f}{||f||}. Hence

\forall \ 1 \leq i < j \leq k \langle g_i, g_j \rangle = \langle f_{n_i} - \epsilon \frac{f}{||f||}, f_{n_j} - \epsilon \frac{f}{||f||}\rangle < \delta - 2\cdot \frac{\epsilon^2}{||f||^2} + \frac{\epsilon^2}{||f||^2} =  \delta - \frac{\epsilon^2}{||f||^2}= -\delta.

On the other hand, since (f_n) is bounded in \mathcal{H}, \ (g_n) is also bounded (independent of k). Suppose ||g_n||< M for all k,
then we have
\displaystyle 0 \leq || \sum_{i=1}^k g_i ||^2 = \sum_{i=1}^k ||g_i ||^2 + 2 \cdot \sum_{i < j} \langle g_i, g_j \rangle \leq kM - k(k-1) \delta

For large k, \ kM - k^2 \delta<0, contradiction.
Hence S must have density 0.

Proof of the theorem:
By corollary 2 of the theorem in part 1, since T is weak mixing, T^m is weak mixing for all m \neq 0.
We proceed by induction on l. For l=1, the statement is implied by our lemma 2 in part 1.

Suppose the theorem holds for l \in \mathbb{N}, let f_0, f_1, \cdots, f_{l+1} \in L^\infty(X),

Let C = \int f_{l+1} \ d \mu, \ f'_{l+1}(x) = f_{l+1}(x) - C.

By induction hypothesis, \int f_0(x) f_1(T^n(x)) f_2(T^{2n}(x)) \cdots f_l(T^{ln}(x)) \cdot C \ d \mu

\rightarrow_D \int f_0 \ d \mu \int f_1 \ d \mu \cdots \int f_l \ d \mu \cdot C as n \rightarrow \infty

Hence it suffice to show \int f_0(x) f_1(T^n(x)) f_2(T^{2n}(x)) \cdots f_l(T^{ln}(x)) \cdot
f'_{l+1}(T^{(l+1)n}(x)) \ d \mu \rightarrow_D 0

Let \int f_{l+1} \ d\mu = 0

For all n \in \mathbb{N}, set g_n (x)= f_1 \circ T^n(x) \cdot f_2 \circ T^{2n}(x) \cdots f_{l+1} \circ T^{(l+1)n}(x)

For each m \in \mathbb{N}, \ \forall \ 0 \leq i \leq l=1, let F^{(m)}_i (x)= f_i(x) \cdot f_i(T^{im}(x))

\langle g_{n+m}, g_n \rangle = \int (f_1(T^{n+m} (x) \cdots f_{l+1}(T^{(l+1)(n+m)} (x))) \cdot (f_1(T^n(x)) \cdots f_{l+1}(T^{(l+1)n} (x))) \ d\mu
= \int F^{(m)}_1(T^n(x)) \cdots F^{(m)}_{l+1}(T^{(l+1)n}(x)) \ d \mu

Since T^{l+1} is measure preserving, we substitute y = T^{(l+1)n}(x),

= \int F^{(m)}_{l+1}(y) \cdot F^{(m)}_1(T^{-ln}(y)) \cdots F^{(m)}_l(T^{-n}(y)) \ d \mu

Apply induction hypothesis, to the weak mixing transformation T^{-n} and re-enumerate F^{(m)}_i

\langle g_n, g_{n+m} \rangle \rightarrow_D ( \int F^{(m)}_1 \ d\mu) \cdots (\int F^{(m)}_{l+1} \ d\mu) as n \rightarrow \infty.

\int F^{(m)}_{l+1} \ d\mu = \int f_{l+1} \cdot f_{l+1} \circ T^{(l+1)m} \ d\mu

By lemma 2 in part 1, we have \int F^{(m)}_{l+1} \ d\mu \rightarrow_D 0 as m \rightarrow \infty.

We are now able to apply lemma 2 to g_n, which gives (g_n) \rightarrow_D \overline{0} under the weak topology.

i.e. for any f_0, we have \int f_0(x) g_n(x) \ d \mu \rightarrow_D 0.

Establishes the induction step.

Remark: This works for any group of commutative weakly mixing transformations. i.e. if G is a group of measure preserving transformations, all non-identity elements of G are weakly mixing. T_1, \cdots, T_k are distinct elements in G, then \int f_0(x) f_1(T_1^n(x)) f_2(T_2^n(x)) \cdots f_k(T_k^n(x)) \ d \mu \rightarrow_D \int f_0 \ d \mu \int f_1 \ d \mu \cdots \int f_k \ d \mu as n \rightarrow \infty.

Probability of leading N digits of 2^n

Okay, so there was this puzzle which pops out from the ergodic seminar a while ago:

What’s the probability for the leading digit of 2^N being k \in \{1,2, \cdots, 9 \} as N \rightarrow \infty?

It’s a cute classical question in ergodic theory, the answer is \log_{10}(k+1) - \log_{10}(k).

Proof: (all log are taken in base 10)
Given a natural number N, let \log(N) = k+\alpha where k \in \mathbb{Z}, \ \alpha \in [0, 1), since N = 10^{\log(N)} = 10^{k+\alpha} = 10^k \cdot 10^\alpha, 1 \leq 10^\alpha < 10, we see that the first digit of N is the integer part of 10^\alpha.

The first digit of 2^n is the integer part of 10^{ \log(2^n) \mod{1} } = 10^{ n \cdot \log(2) \mod{1}}.

For k \in \{1, 2, \cdots, 9 \}, leading digit of 2^n is k iff k \leq 10^{ n \cdot \log(2) \mod{1}} < k+1 iff \log(k) \leq n \cdot \log(2) \mod{1} < \log(k+1).

Let \alpha = \log(2) irrational, let \varphi: S^1 \rightarrow S^1 be rotation by \alpha (S^1 is considered as \mathbb{R} / \mathbb{Z}, \varphi(x) = x+\alpha). All orbits of \varphi are uniformly distributed i.e. for any A \subseteq S^1, \forall x \in S^1, \displaystyle \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \chi_A(\varphi^n(x)) = m(A)

In particular we have \displaystyle \lim_{N \to \infty} \frac{1}{N} \sum_{n=1}^N \chi_{[\log(k), \log(k+1))}(\varphi^n(0)) = m([\log(k), \log(k+1)) = \log(k+1) - \log(k)

Therefore the limiting probability of first digit of 2^n being k is \log(k+1) - \log(k).

To generalize, Pengfei asks Given some two digit number K, what’s the probability of the first two digits being K?

The natural thing to do is take base 100, however one soon figured out there is a problem since we don’t really want to count “0x” as the first two digits when the number of digits is odd.

I found the following trick being handy:

When the number of digits is odd, we may consider the orbit of \log_{100}(10) = 1/2 under the rotation \log_{100}(2). This will give us the first digit in base 100 of 2^n \cdot 10 which takes even number of digits precisely when 2^n has odd number of digits, the first two digit is the same as the original. Since this orbit is also uniformly distributed, we get the probability of 2^n having odd number of digits and the first two digit is K \ (10 \leq K < 100) is \log_{100}(K+1) - \log_{100}(K).

Applying the usual procedure to the orbit of 0 in base 100 gives us the probability of 2^n having even number of digits and the first two digit is K is \log_{100}(K+1) - \log_{100}(K).

Hence the actual probability of 2^n starting with K is just the sum of the two that’s 2 \cdot (\log_{100}(K+1) - \log_{100}(K)).

The same works for finding the distribution of the first n digits. i.e. taking the number of digit mod n, we would be summing the probability \log_{10^n}(K+1) - \log_{10^n}(K) \ n-times for each 10^{n-1} \leq K < 10^n, the limiting probability is n \cdot (\log_{10^n}(K+1) - \log_{10^n}(K)).

Remark: One can first calculate the probability of 2^n having odd number of digit. This would be the orbits of 0 under rotation \log_{100}(2) inside the interval [0, \log_{100}(10) which is [0, \frac{1}{2}). The limiting probability is 1/2 (make sense since this says about half of the time the number of digits is odd)

In general, the number of digits being k \mod{n} is 1/n for each k.

For some reason, professor Kra was interested in figuring out the distribution of the ‘middle’ digit…which I’m not exactly sure how one would define it.