The travelling salesman problem

This is one of those items I should have written about long ago: I first heard about it over a lunch chat with professor Guth; then I was in not one, but two different talks on it, both by Peter Jones; and now, finally, after it appeared in this algorithms lecture by Sanjeev Arora I happen to be in, I decided to actually write the post. Anyways, it seem to live everywhere around my world, hence it’s probably a good idea for me to look more into it.

Has everyone experienced those annoy salesman who keeps knocking on you and your neighbors’ doors? One of their wonderful properties is that they won’t stop before they have reached every single household in the area. When you think about it, in fact this is not so straight foreword to do; i.e. one might need to travel a long way to make sure each house is reached.

Problem: Given N points in \mathbb{R}^2, what’s the shortest path that goes through each point?

Since this started as an computational complexity problem (although in fact I learned the analysts’s version first), I will mainly focus on the CS version.

Trivial observations:

In total there are about N! paths, hence the naive approach of computing all their lengths and find the minimum takes more than N! \sim (N/e)^N time. (which is a long time)

This can be easily improved by a standard divide and concur:

Let V \subseteq \mathbb{R}^2 be our set of points. For each subset S \subseteq V, for each p_1, p_2 \in S, let F(S, p_1, p_2) = length of shortest path from p_1 to p_2 going through all points in S.

Now the value of this F can be computed recursively:

\forall |S| = 2, F(S, p_1, p_2) = d(p_1, p_2);

Otherwise F(S, p_1, p_2) =

\min \{ F(S \backslash \{p_2\}, p_1, q) + d(q, p_2) \ | \ q \in S, q \neq p_1, p_2 \}

What we need is minimum of F(V, p_1, p_2) where p_1, p_2 \in V. There are 2^N subsets of V, for any subset there are \leq N choices for each of p_1, p_2. F(S, p_1, p_2) is a minimum of N numbers, hence O(N) time (as was mentioned in this pervious post for selection algorithm). Summing that up, the running time is 2^N\times N^2\times N \sim O(2^N), slightly better than the most naive way.

Can we make it polynomial time? No. It’s well known that this problem is NP-hard, this is explained well in the wikipedia page for the problem.

Well, what can we do now? Thanks to Arora (2003), we can do an approximate version in polynomial time. I will try to point out a few interesting ideas from that paper. The process involved in this reminded me of the earlier post on nonlinear Dvoretzky problem (it’s a little embracing that I didn’t realize Sanjeev Arora was one of the co-authors of the Dvoretzky paper until I checked back on that post today! >.< ) it turns out they have this whole program about ‘softening’ classic problems and produce approximate versions.

Approximate version: Given N points in \mathbb{R}^2, \forall \varepsilon > 0, find a path \gamma through each point such that length l(\gamma) < (1+\varepsilon)l(\mbox{Opt}).

Of course we shall expect the running time T to be a function of \varepsilon and N, as \varepsilon \rightarrow 0 it shall blow up (to at least exponential in N, in fact as we shall see below, it will blow up to infinity).

The above is what I would hope is proved to be polynomial. In reality, what Arora did was one step more relaxed, namely a polynomial time randomized approximate algorithm. i.e. Given V and \varepsilon, the algorithm produces a path \gamma such that E(l(\gamma)-l(\mbox{Opt}) < \varepsilon). In particular this means more than half the time the route is within (1+\varepsilon) to the optimum.

Theorem (Arora ’03): T(N, \varepsilon) \sim O(N^{1/\varepsilon}) for the randomized approximate algorithm.

Later in that paper he improved the bound to O(N \varepsilon^{C/\varepsilon}+N\log{N}), which remains the best know bound to date.

Selected highlights of proof:

One of the great features in the approximating world is that, we don’t care if there are a million points that’s extremely close together — we can simply merge them to one point!

More precisely, since we are allowing a multiplicative error of \varepsilon, we also have trivial bound l(\mbox{Opt}) > \mbox{    diam}(V), Hence the length can be increased by at least \varepsilon \mbox{diam}(V) which means if we move each point by a distance no more than \varepsilon \mbox{diam}(V) / (4N) and produce a path \gamma' connecting the new points with l(\gamma')< (1+\varepsilon/2)l(\mbox{Opt}), then we can simply get our desired \gamma from \gamma', as shown:

i.e. the problem is “pixelated”: we may bound V in a square box with side length \mbox{diam}(V), divide each side into 8N/\varepsilon equal pieces and assume all points are in the center of the gird cell it lies in (for convenience later in the proof we will assume 8N/\varepsilon = 2^k is a power of 2, rescale the structure so that each cell has side length 1. Now the side length of the box is 8N/\varepsilon = 2^k):

Now we do this so-called quadtree construction to separate the points (reminds me of Whitney’s original proof of his extension theorem, or the diatic squares proof of open sets being countable) i.e. bound V in a square box and keep dividing squares into four smaller ones until no cell contains more than one point.

In our case, we need to randomize the quad tree: First we bound V in a box that’s 4 times as large as our grid box (i.e. of side length 2^{k+1}), shift the larger box by a random vector (-i/2^k,-j/2^k) and then apply the quad tree construction to the larger box:

At this point you may wonder (at least I did) why do we need to pass to a larger square and randomize? From what I can see, doing this is to get

Fact: Now when we pick a grid line at random, the probability of it being an ith level dividing line is 2^i/2^k = 2^{i-k}.

Keep this in mind.

Note that each site point is now uniquely defined as an intersection of no more than k nesting squares, hence the total number of squares (in all levels) in this quad tree cannot exceed N \times k \sim N \times \log{N/\varepsilon}.

Moving on, the idea for the next step is to perturb any path to a path that cross the sides of the square at some specified finite set of possible “crossing points”. Let m be the unique number such that 2^m \in [(\log N)/\varepsilon, 2 (\log N)/ \varepsilon ] (will see this is the best m to choose). Divide sides of each square in our quad tree into 2^m equal segments:

Note: When two squares of different sizes meet, since the number of equally squares points is a power of 2, the portals of the larger square are also portals of the smaller one.

With some simple topology (! finally something within my comfort zone :-P) we may assume the shortest portal-respecting path crosses each portal at most twice:

In each square, we run through all possible crossing portals and evaluate the shortest possible path that passes through all sites inside the square and enters and exists at the specified nodes. There are (2^{4 \times 2^m})^2 \sim (side length)^2 \sim (N/\varepsilon)^2 possible entering-exiting configurations, each taking polynomial time in N (in fact \sim N^{O(1/\varepsilon)} time) to figure out the minimum.

Once all subsquares has their all paths evaluated, we may move to the one-level larger square and spend another \log(N/\varepsilon) \times (N/\varepsilon)^2 operations. In total we have

N \times \log{N/\varepsilon} \times (N/\varepsilon)^2 \times N^{O(1/\varepsilon)}

\sim N^{O(1/\varepsilon)}

which is indeed polynomial in N/\varepsilon many operations.

The randomization comes in because the route produced by the above polynomial time algorithm is not always approximately the optimum path; it turns out that sometimes it can be a lot longer.

Expectation of the difference between our random portal respecting minimum path \mbox{OPT}_p and the actual minimum \mbox{OPT} is bounded simply by the fact that minimum path cannot cross the grid lines more that \mbox{OPT} times. At each crossing, the edge it crosses is at level i with probability 2^{i-k}. to perturb a level i intersection to a portal respecting one requires adding an extra length of no more than 2 \times 2^{k-i}/2^m \sim 2^{k+1-i}/(\log N / \varepsilon):

\displaystyle \mathbb{E}_{a,b}(\mbox{OPT}_p - \mbox{OPT})

\leq \mbox{OPT} \times \sum_{i=1}^k 2^{i-k} \times 2^{k+1-i} / (\log N / \varepsilon)

= \mbox{OPT} \times 2 \varepsilon / \log N < \varepsilon \mbox{OPT}

P.S. You may find images for this post being a little different from pervious ones, that’s because I recently got myself a new iPad and all images above are done using iDraw, still getting used to it, so far it’s quite pleasant!

Bonus: I also started to paint on iPad~

–Firestone library, Princeton. (Beautiful spring weather to sit outside and paint from life!)

Adding faulted data to produce accurate results, faster

In computer algorithms, there is this thing called ‘soft heap‘, due to Princeton computer scientist Bernard Chazelle (2000), which is basically corrupting a certain percentage of data when building the data structure. When first heard of that, I can’t imagine how on earth could such a thing be anywhere useful.

Indeed it is, and the reason it works is quite interesting to me. Hence I’m writing this post about finding the (real) k-th smallest element in a set via this corrupted data structure. By the way, this was presented to me in a very interesting lecture by Robert Tarjan. For you computer scientists: be warned, I really don’t know much about the subject.

Part 1: Heap

Naively sorting a set (i.e. compare the first two elements, put the smaller first; compare the 2nd and 3rd, move the third past the 2nd if it’s smaller; then compare the 3rd and the first, etc.) could take \sum_{i=1}^N \sim O(N^2) many comparisons. One can of course do better, the current known fastest sort takes O(N \log N) operations.

One way to achieve O(N \log N) is the so-called heap sort:

A heap is a binary tree with a number (key) in each node and the child of a node always has key larger than that of the node. (i.e. the minimum element is at the root)

Now the problem of sorting breaks into two parts:

i) build a heap with our given keys
ii) sort elements in a heap

As we can see, if we manage to build a heap with all branches having roughly the same length (computer scientist call that a balanced binary tree), there will be 2^n elements at level n. The height of the heap is \log_2 (N+1) \sim O(\log N). Main idea of the sort is to go through all elements and aim for doing a constant times the height many operations at each element (then we would get our N \log N).

Building the heap: Suppose we already have the first n element put in a heap, occupying all nodes in the first k levels and some nodes in the k+1-th level. We put the n+1-th element at the first empty node on the k+1-th level; compare it to its parent, swap if it’s smaller than it’s parent; compare to its parent again… Do this until it’s larger than it’s parent. Note that 1) swap does not change the shape of the tree. 2) when the process stops, we end up with a new heap. Hence inserting this new element takes at most twice the height many operations (compare and swap once at each level).

In total, building the heap takes no more than N \times 2 \log N \sim O(N \log N) operations. (in fact one can do this in O(N) time, but for our purpose it’s not needed)

Sort from the heap: Take the root as the minimum element, delete it from the heap; now compare the two elements in level 1, put the smaller into the root, now compare the two children of the empty node, etc. (if at some stage the empty node has only one child then just move it, stop if the empty node has no child.) At the end, delete the leaf node that’s empty. Again this process takes no more than
twice the height of the tree many operations.

Combining the above two steps, we have N \log N operations.

As one can guess heaps does much more than sorting stuff, once we spent our time to build data into a heap, we can save such structure and update it once the data set changes:

Inserting element: \sim O(\log N) (put the element at the next available leaf and move upwards)
Find minimum: \sim O(1) (well, the min is at the root)
Delete minimum: \sim O(\log N) (Delete, move up the smaller level-2 element, etc.)

OK~ so much for that, let’s move to the more interesting business:

Part 2: Error

The idea is: given a set of data with keys, instead of putting them into a heap, we put them into another structure (i.e. ‘soft heap’) such that the structure assigns a fake key every element and treat it as the real key. It does so in a way that:

i) All elements have fake keys larger than or equal to their real key.
ii) Only a certain percentage (\varepsilon) of elements have fake key different (larger than) real key.

The advantage of doing that is, unlike heap, it now takes only \sim O(1) (amortized) time to delete the minimum element, while building it still takes O(N) time. (Explaining exactly how this is done requires another post which I might write…for now, check out Chazelle ’00.)

Meanwhile, I guess the most curious thing is, how can we get any uncorrupted results with a corrupted data structure? As an example, let’s see how to find the (genuine) medium with data structure having \varepsilon precent of corrupted keys:

The mean observation is that, since all fake keys are larger than real keys, by deleting the smallest 1/2-\varepsilon of elements (according to fake keys), the genuine medium will be larger than all deleted elements (in real key) and hence surely remains in the set.

Now we have reduced the problem to finding the N\varepsilon/2-th smallest element in a set of cardinality (1/2+\varepsilon)N. Of course we can solve this by putting the elements in a reversed soft heap (i.e. all real keys are changed to their negative value), delete the smallest (1/2+\varepsilon) (fake key) elements, etc.

At each step, we reduce the size of the data set by at least a factor of (1/2+\varepsilon), and the amortized running time at each step is O(n) (building the soft heap) + O(n) (deleting about half of the elements) which is, in total \sim O(n).

Since the size shrinks in constant factor, the total running time is simply a geometric series, hence \sim O(N).


1. In the above argument, \varepsilon can be anything less than 1/2 hence in fact quite a lot of data are allowed to be corrupted!

2. It doesn’t matter whether we are finding the medium or the k-th smallest element, indeed, after the first step in the above argument we are literally finding the k-th smallest element. The general rule being simply taking the negative of the keys before building the soft heap whenever k<n/2.

3. The problem of finding the k-th smallest element is very well studied and algorithms that do this are called selection algorithms. Obviously our O(N) running time is optimum since it takes that much time to read off the data set! However, there are many other ways to achieve this linear running time, some are not even amortized, but as far as I know, one cannot do this with a real heap.