Ultrametrics and the nonlinear Dvoretzky problem

Hi guys~ The school year here at Princeton is finally (gradually) starting. So I’m back to this blog :-P

In this past week before anything has started, Assaf Naor came here and gave a couple rather interesting talks, which I decided to make a note about. For the complete content of the talk please refer to this paper by Naor and Mendel. As usual, I make no attempt on covering what was written on the paper nor what was presented in the talk, just some random pieces and bits which I found interesting.

A long time ago, Grothendieck conjectured what is now known as Dvoretzky’s theorem:

Theorem: For any D>1, for any k \in \mathbb{N}, there exists N depending only on k, D such that any normed vector space of dimension N or larger has a k-dimensional linear subspace that embeds (via a linear transformation) with distorsion at most D into the Hilbert space.

i.e. this says in the case where D is close to 1 (let’s write D = 1+\varepsilon) for any given k, *any* norm on a vector space of sufficiently high dimension will have a k dimensional subspace that looks almost Eculidean (meaning unit ball is round up to a multiple 1+\varepsilon).

Well, that’s pretty cool, but linear. As we all know, general metric spaces can be much worse than normed vector spaces. So here comes a nonlinear version of the above, posted by Terrence Tao:

Nonlinear Dvoretzky problem: Given \kappa>0, D>1, does there exist a \alpha so that every metric space of Hausdorff dimension \geq \alpha has a subset S of Hausdorff dimension \kappa that embeds into the Hilbert space with distorsion D?

Indeed, we should expect this to be a lot more general and much harder than the linear version, since we know literally nothing about the space except for having large Hausdorff dimension and as we know metric spaces can be pretty bizarre. That why we should find the this new result of theirs amazing:

Theorem 1 (Mendel-Naor): There is a constant C such that for any \lambda \in (0,1), every compact metric space with dimension \alpha has a closed subset S of dimension at least (1-\lambda) \alpha that admits an embedding into Hilbert space with distorsion C/ \lambda.

Note that, in the original linear problem, N = N(k, D) of course blows up to infinity whenever k \rightarrow \infty or D \rightarrow 1. (whenever we need a huge dimensional space with fixed ‘flatness’ or a fixed dimension but ‘super-flat’ subspace). That is, when we are looking for subspaces inside a random space with fixed (large) dimension N, the larger dimension we need, the less flat (more distorsion) we can expect it to be. i.e.

k \rightarrow N necessarily forces D \rightarrow \infty and
D \rightarrow 1 forces k \rightarrow 0.

We can see that this nonlinear theorem is great when we need large dimensional subspaces (when \lambda is close to 0), but not so great when we want small distorsion (it does not apply when distorsion is smaller than C, and blows up when it gets close to C).

In the original statement of the problem this gives not only that a large \alpha exists, but \alpha(\kappa, D) \leq \frac{D}{D-C} \kappa! In fact, when we are allowing relatively large distorsion (compare to constant C) this theorem is sharp:

Theorem 2 (Mendel-Naor): There exists constant C' and a family of metric spaces \{ X_\alpha \ | \ \alpha \in \mathbb{R}^+ \} such that for any small \varepsilon, no subset of X_\alpha with dimension \geq (1-\varepsilon)\alpha embeds into Hilbert space with distorsion \leq C'/\varepsilon.

This construction makes use of our all-time favorite: expander graphs! (see pervious post for definitions)

So what if D is small? Turns out, unlike in the linear case when D<2, there is no \alpha(\kappa, D), in the paper they also produced spaces X_\alpha for each \alpha where the only subset that embeds with distorsion < 2 are of Hausdorff dimension 0!

In his words, the proof of theorem 1 (i.e. the paper) takes five hours to explain, hence I will not do that. But among many neat things appeared in the proof, instead of embedding into the Hilbert space directly they embedded it into an ultrametric space with distorsion D, and then make use of the fact that any such space sits in the Hilbert space isometrically.

Definition: An ultrametric on space X is a metric with the additional property that for any three points x, y, z \in X, we have the so-called strong triangle inequality, i.e.

d(x, y) \leq \max \{ d(x, z), d(y,z) \}

If one pause and think a bit, this is a quite weird condition: for example, all triangles in an ultrametric space are isosceles with the two same length edge both longer than the third edge; every point is the center of the ball, etc.

Exercise: Come up with an example of an ultrametric that’s not the discrete metric, on an infinite space.

Not so easy, hum? My guess: the first one you came up with is the space of all words in 2-element alphabet \Sigma_2, with the dictionary metric (two points are distance 2^{-n} apart where n is the first digit they differ), right?

In fact in some sense all ultrametrics look like that. (i.e. they are represented by an infinite tree with weights assigned to vertices, in the above case a 2-regular tree with equal weights on same level) Topologically our \Sigma_2 is a Cantor set and sits isometrically in l_2. It’s not hard to see in fact all such space embeds isometrically in l_2.

I just want to say that this operation of first construct an ultrametric space according to our given metric space, embed, and then embed the ultrametric metric space into the Hilbert space somehow reminds me of when we want to construct a measure satisfying a certain property, we first construct it on a Cantor set, then divide up the space and use the values on parts of the Cantor set for parts of the space…On this vein the whole problem gets translated to producing a tree that gives the ultrametric and work with the tree (by no means simple, though).

Finally, let’s see a cute statement which can be proved by applying this result:

Urbanski’s conjecture: Any compact metric space of Hausdorff dimension > n can be mapped surjectively [thanks to Tushar Das for pointing out the typo] onto the unit ball B_1^n by a Lipschitz map.

(note strictly larger is important for our theorem to apply…since we need to subtract an \varepsilon) and hence another cute statement that’s still not known:

Question: Does any compact metric space with positive Hausdorff n-dimensional measure maps Lipschitzly onto B_1^n?

The Carnot-Carathéodory metric

I remember looking at Gromov’s notes “Carnot-Carathéodory spaces seen from within” a long time ago and didn’t get anywhere. Recently I encountered it again through professor Guth. This time, with his effort in explaining, I did get some ideas. This thing is indeed pretty cool~ So I decided to write an elementary introduction about it here. We will construct a such metric in \mathbb{R}^3.

In general, if we have a Riemannian manifold, the Riemannian distance between two given points p, q is defined as

\inf_\Gamma(p,q) \int_0^1||\gamma'(t)||dt

where \Gamma is the collection of all differentiable curves \gamma connecting the two points.

However, if we have a lower dimensional sub-bundle E(M) of the tangent bundle (depending continuously on the base point). We may attempt to define the metric

d(p,q) = \inf_{\Gamma'} \int_0^1||\gamma'(t)||dt

where \Gamma' is the collection of curves connecting p, q with \gamma'(t) \in E(M) for all t. (i.e. we are only allowed to go along directions in the sub-bundle.

Now if we attempt to do this in \mathbb{R}^3, the first thing we may try is let the sub-bundle be the say, xy-plane at all points. It’s easy to realize that now we are ‘stuck’ in the same height: any two points with different z coordinate will have no curve connecting them (hence the distance is infinite). The resulting metric space is real number many discrete copies of \mathbb{R}^2. Of course that’s no longer homeomorphic to \mathbb{R}^3.

Hence for the metric to be finite, we have to require accessibility of the sub-bundle: Any point is connected to any other point by a curve with derivatives in the E(M).

For the metric to be equivalent to our original Riemannian metric (meaning generate the same topology), we need E(M) to be locally accessible: Any point less than \delta away from the original point p can be connected to p by a curve of length < \varepsilon going along E(M).

At the first glance the existence of a (non-trivial) such metric may not seem obvious. Let’s construct one on \mathbb{R}^3 that generates the same topology:

To start, we first identify our \mathbb{R}^3 with the 3 \times 3 real entry Heisenberg group H^3 (all 3 \times 3 upper triangular matrices with “1”s on the diagonal). i.e. we have homeomorphism

h(x,y,z) \mapsto \left( \begin{array}{ccc}  1 & x & z \\  0 & 1 & y \\  0 & 0 & 1 \end{array} \right)

Let g be a left-invariant metric on H_3.

In the Lie algebra T_e(H_3) (tangent space of the identity element), the elements X = \left( \begin{array}{ccc}  1 & 1 & 0 \\  0 & 1 & 0 \\  0 & 0 & 1 \end{array} \right) , Y = \left( \begin{array}{ccc}  1 & 0 & 0 \\  0 & 1 & 1 \\  0 & 0 & 1 \end{array} \right) and Z = \left( \begin{array}{ccc}  1 & 0 & 1 \\  0 & 1 & 0 \\  0 & 0 & 1 \end{array} \right) form a basis.

At each point, we take the two dimensional sub-bundle E(H_3) of the tangent bundle generated by infinitesimal left translations by X, Y. Since the metric g is left invariant, we are free to restrict the metric to E(M) i.e. we have ||X_p|| = ||Y_p|| = 1 for each p \in M.

The interesting thing about H_3 is that all points are accessible from the origin via curves everywhere tangent to E(H_3). In other words, any points can be obtained by left translating any other point by multiples of elements X and Y.

The “unit grid” in \mathbb{R}^3 under this sub-Riemannian metric looks something like:

Since we have

\left( \begin{array}{ccc}  1 & x & z \\  0 & 1 & y \\  0 & 0 & 1 \end{array} \right) \left( \begin{array}{ccc}  1 & 1 & 0 \\  0 & 1 & 0 \\  0 & 0 & 1 \end{array} \right) = \left( \begin{array}{ccc}  1 & x+1 & z \\  0 & 1 & y \\  0 & 0 & 1 \end{array} \right) ,

the original x-direction stay the same, i.e. a bunch of horizontal lines connecting the original yz planes orthogonally.

However, if we look at a translation by Y, we have

\left( \begin{array}{ccc}  1 & x & z \\  0 & 1 & y \\  0 & 0 & 1 \end{array} \right) \left( \begin{array}{ccc}  1 & 0 & 0 \\  0 & 1 & 1 \\  0 & 0 & 1 \end{array} \right) = \left( \begin{array}{ccc}  1 & x & z+x \\  0 & 1 & y+1 \\  0 & 0 & 1 \end{array} \right)

i.e. a unit length Y-vector not only add a 1 to the y-direction but also adds a height x to z, hence the grid of unit Y vectors in the above three yz planes look like:

We can now try to see the rough shape of balls by only allowing ourselves to go along the unit grid formed by X and Y lines constructed above. This corresponds to accessing all matrices with integer entry by words in X and Y.

The first question to ask is perhaps how to go from (0,0,0) to (0,0,1). –since going along the z axis is disabled. Observe that going through the following loop works:

We conclude that d_C((0,0,0), (0,0,1)) \leq 4 in fact up to a constant going along such loop gives the actual distance.

At this point one might feel that going along z axis in the C-C metric is always takes longer than the ordinary distance. Giving it a bit more thought, we will find this is NOT the case: Imagine what happens if we want to go from (0,0,0) to (0,0,10000)?

One way to do this is to go along X for 100 steps, then along Y for 100 steps (at this point each step in Y will raise 100 in z-coordinate, then Y^{-100} X^{-100}. This gives d_C((0,0,0), (0,0,10000)) \leq 400.

To illustrate, let’s first see the loop from (0,0,0) to (0,0,4):

The loop has length 8. (A lot shorter compare to length 4 for going 1 unit in z-direction)

i.e. for large Z, it’s much more efficient to travel in the C-C metric. d_C( (0,0,0), (0,0,N^2)) = 4N

In fact, we can see the ball of radius N is roughly an rectangle with dimension R \times R \times R^2 (meaning bounded from both inside and outside with a constant factor). Hence the volume of balls grow like R^4.

Balls are very “flat” when they are small and very “long” when they are large.

A convergence theorem for Riemann maps

So~ After 2.5 weeks of wonderful math discussions with Amie and Charles, I finished my winter vacation and got back to Princeton! (and back to my normal blogging Sundays ^^)

One thing I would like to shear here is that we (me and Charles) finally got an answer to the following question that’s been haunting me for a while:

Question: Given Jordan curve C \subseteq \mathbb{C} containing a neighborhood of \bar{0} in its interior. Given parametrizations \gamma_1:S^1 \rightarrow C.

Is it true that for all \varepsilon >0, there exists \delta >0 s.t. any Jordan curve C' with a parametrization \gamma_2:S^1 \rightarrow C_2 so that ||\gamma_1-\gamma_2||<\delta in the uniform norm implies the Riemann maps R, R' from \mathbb{D} to the interiors of C, C' that fixes the origin and have positive real derivatives at \bar{0} would be at most \varepsilon apart?

i.e. Is the projection map from the space of parametrized Jordan curves (with the uniform metric) to the space of unparametrized Jordan curves (with metric given by taking uniform distance between the canonical Riemann maps) continuous?

First, I think the development and problem-solving process for this one is quite interesting and worth mentioning (skip this if you just want to see math):

—Begin story—

The problem was initially of interest because I stated a lemma on our Jordan curves paper which asserts the above projection map is continuous at smooth curves. To my surprise, I was unable to prove this seemingly-direct lemma. I turned to Charles for help, after a day or so of thinking he proved it for smooth curves (via a very clever usage of cross-cuts as in the proof of Carathedory’s theorem) and asked back whether the map is actually continuous at all points.

This seemed to be such a natural question but we couldn’t find it in the literature. For a day or so we were both feeling negative about this since the cross-cut method fails when the Jordan curve has positive measure, which “should” happen a lot. In any case, I posted a question on mathoverflow to see if there is a standard theorem out there implying this. Almost right after I posted the question, during a wonderful lunch-conversation with Charles, I got this wonderful idea of applying extremal length techniques not to the semi-circular crosscut but only to the ‘feet’ of it. Which later that day turned out to be a proof of the continuity.

The next morning, after confirming the steps of the proof and made sure it works, I was thrilled to find that Thurston responded to the post and explained his intuition that the answer is positive. Although having solved the problem already, I am still amazed by his insights ^^ (It’s the second question I asked there, he left an comment again! It just feels great to have your idol giving you ideas, isn’t it? :-P)

Later on, McMullen pointed out to us that in fact a book by Pommerenke contains the result. Nevertheless, it was great fun proving this, hence I decided to sketch the proof here ^^

—End story—

Ingredients of the proof: We quote the following well-known but weaker theorem (one can find this in, for example Goluzin’s classical book, p228)

Theorem: If the Jordan domains converge (in the sense that parametrizations of the boundaries converge uniformly) then the Riemann maps converge uniformly on compact sets.

We also use the following topological lemma:

Lemma: Given Jordan curve C \subseteq \hat{\mathbb{C}}, \gamma: S^1 \rightarrow C be a parametrization. For all \varepsilon > 0, there exists \mu >0 s.t. for all \gamma' : S^1 \rightarrow \hat{\mathbb{C}} with || \gamma - \gamma'|| < \mu ( denote C' = \gamma'(S^1)$) , for all p, q \in C', d(p,q) < \mu \Rightarrow \mbox{diam}(A(p,q)) < \varepsilon

where A(p,q) is the short arc in C' connecting p, q.

The proof of the lemma is left as an exercise

Proof of the Theorem:

Given C and \varepsilon as in the theorem, apply the lemma to (C, \varepsilon/6), we obtain a \mu < \varepsilon / 6 so that all curves \mu-close to C has the property that the arc connecting any two points less than \mu-apart has diameter no more than \varepsilon/100.

By compactness of \partial \mathbb{D}, we can choose finitely many crosscut neignbourhoods \{ H_1, H_2, \cdots, H_N \}, H_i \subseteq \bar{\mathbb{D}} are "semi-discs" around points in \partial \mathbb{D} as shown:

By extremal length, we can choose the cross-cuts C_i bounding H_i with length \ell(R(C_i)) < \mu/4 where R: \bar{\mathbb{D}} \rightarrow \hat{\mathbb{C}} is the canonical Riemann map corresponding to C. Hence by lemma, we also get \mbox{diam}(R(H_i) < \varepsilon/3.

Let \{ f_1, f_2, \cdots, f_{2N} \} be endpoints of \{C_1, \cdots, C_N \}.

Let d = \min \{ d(f_i, f_j) \ | \ 1 \leq i < j \leq 2N \}.

Choose \sigma < \mu d / 40 and \{ B( \bar{0}, 1-2\sigma), H_1, \cdots, H_N \} covers \bar{\mathbb{D}}. Let R = 1-\sigma:

By the above theorem in Goluzin, since B_R = \bar{B(0, R)} is compact, there exists a 0 < \delta < \min \{\mu/4, d/10 \} s.t.

|| \gamma' - \gamma || < \delta \Rightarrow ||R|_{B_R} - R'|_{B_R}|| < \mu/4.

Fix a (C', \gamma') with || \gamma - \gamma'|| < \delta. Let R' be the canonical Riemann map corresponding to C'.

Claim: ||R-R'|| < \varepsilon.

First note that assuming the theorem in Goluzin, it suffice to show ||R|_{\partial \mathbb{D}} - R'|_{\partial \mathbb{D}}|| < \varepsilon.

For any 1 \leq i \leq N, let f_1, f_2 be endpoints of C_i. Apply the extremal length to the set of radial segments in the almost-rectangle [f_1, f_1+d/10] \times [0,\sigma].

We conclude there exists e_1 \in [f_1, f_1+d/10] s.t. the segment s_1 = \{e_1\} \times [0, \sigma] has length

\ell(R'(s_1)) \leq 2 \sigma (d/10) m_2(R'([f_1, f_1+d/10] \times [0,\sigma])).

Since \sigma < \mu d / 40 and m_2(R'([f_1, f_1+d/10] \times [0,\sigma])) \leq 1, we have

\ell(R'(s_1)) \leq \mu/4.

Similarly, find e_2 \in [f_2 - d/10, f_2] where \ell(R'(s_2)\leq \mu/4.

Connect e'_1, e'_2 by a semicircle contained in H_i, denote the enclosed region by V_i \subseteq H_i.

By construction, \{ B_R, V_1, \cdots, V_N \} still covers \bar{\mathbb{D}}.

Hence for all p \in \partial \mathbb{D}, there exists i where p \in latex V_i$.

Since inside V_i \cap B_R the two maps R, R' are less than d/10 apart, we have R(V_i) \cap R'(V_i) \neq \phi.

Hence d(R(p), R'(p)) \leq \mbox{diam}(R(H_i)) + \mbox{diam}(R'(V_i)).

By construction, \mbox{diam}(R(H_i)) < \varepsilon/2.

\mbox{diam}(R'(V_i)) = \mbox{diam}(\partial V_i), we will break \partial V_i into three parts and estimate diameter of each part separately.

Since ||\gamma-\gamma'|| < \delta, \tau = \gamma' \circ \gamma^{-1} \circ R|_{\partial \mathbb{D}} is another parametrization of C' with || \tau - R|_{\partial \mathbb{D}}|| < \delta.

The arc connecting e'_1 to e'_2 is contained in B_R \cap V_i, the arc in C' connecting \tau(e_1), \tau(e_2) is \delta away from R(H_i) hence the union of the two has diameter at most \mbox{diam}(R(V_i)) + \delta < \varepsilon/6 + \varepsilon/6 = \varepsilon/3

Length of the arcs R'(s_1), R'(s_2) are less than \mu/4 < \varepsilon/12.

Hence d(\tau(e_1), R'(e_1)) < \ell(R'(s_1)) + \delta < \mu. By lemma, this implies the arc in C' connecting \tau(e_1), R'(e_1) has length at most \varepsilon/12.

Hence altogether the we have \mbox{diam}(R'(V_i)) \leq \varepsilon/3+\varepsilon/12+\varepsilon/12 = \epsilon/2.

We deduce d(R(p), R'(p)) \leq \mbox{diam}(R(H_i)) + \mbox{diam}(R'(V_i)) < \varepsilon.

Q.E.D.