How is Seam Carving like Latent Semantic Indexing?
In this post I’m going to do a quick overview of seam carving and how it looks just like dimensional reduction, an operation particularly useful in applications like search engines. If you are interested in Latent Semantic Analysis (or Latent Semantic Indexing), information retrieval, search engine optimisation or other related topics then you want to read on. If you don’t know what any of the things I’ve mentioned above are, you definitely will by the end of this post, especially after seeing the video. So take a look at the video and read on afterwards to find out how I think this is a good analogy to dimensional reduction.
So I’m comparing dimensional reduction on datasets like document stores with the resizing of images… it’s a stretch I know, but listen in and we’ll go over some of the similarities. Dimensionality reduction works differently to seam carving, but the aim is the same. To reduce a large space or set of data into a smaller one that best represents the original space. So this means the smaller data set is an approximation of the larger dataset. For example like we can use an average to summarise many, many points in a dataset, dimensional reduction and seam carving are basically like an average of a very large dataset.
In the seam carving video the least important “dimensions” are taken out of the picture incrementally, this leaves the features of the image in tact. A noise function is used to find the seams in the image that will be missed least when removed. The way dimensional reduction happens in many statistical operations is to find the features in the dataset that stand out the most. Tough maths is often used to find these features in a dataset. These two operations seem to be at opposites, but generally we end up with a remaining set of features that best describe the original dataset. This means we can store, manipulate, load and compare the smaller dataset a lot more effectively than the original larger dataset.
So we see that the principles between dimensionality reduction in datasets and seam carving are very similar, read on to find out some thing that dimsionality reduction in data does that seam carving doesn’t…
The Magic of Latent Semantic Indexing
The magic is actually apparent in many different dimensional reduction algorithms, not just LSI, but LSI is the one that most people are familiar with. What actually happens in very large datatsets such as those found in information retrieval systems is that dimensional reduction actually brings out hidden latent information that is in the data. This may be an association between two people, or words that was not identified before. Naturally this latent information doesn’t make a whole lot of sense in image resizing…
To see some other interesting uses of dimensionality reduction in image recognition check out the Eigenface page at wikipedia. There are lots of other resources for interesting dimensionality reduction applications out in the wild, if there is some interest I might write about some of them.
Worked Example with Code
If there is enough interest in this post I might do up an example of doing dimensional reduction on a dataset with some running code. I’ve got the code sitting amongst all my research code, I’d just have to pretty it up etc.
I should note here before someone else mentions it, seam carving is used on reducing a space of between 1 and 2 thousand dimensions to something like 400-500 dimensions. Typical datasets in information stores can have hundreds of thousands of dimensions, so dealing with information dimensional reduction can be a lot more processor heavy.

I wonder, have you heard of anyone applying seam carving to problems where LSA is typically used? It seems like there is an analog between expanding images and predicting missing values in the matrix. And I wouldn’t mind seeing that code, if it’s not too much work.
@Jason, actually while I was writing this I thought of that too. My intuition about calculating missing values as you expand the matrix is that it would be prohibitively expensive to calculate. My reasoning for this is that with the image resize there really are only two dimensions and the seam travels along one of them, but when you have 200,000 dimensions how do you efficiently calculate the seam? Or more generally how do you even define what a seam is?
As for the code, I’ll try and put something together. When I do I’ll post it here. My thesis is due in July, so I’m starting to feel the pressure.
Hi David,
I had the same idea when I saw the video a while back. My stumbling block is that the seam is not straight. Finding the least information containing none straight line in an image makes for the great result. But how do you do it in a HAL or LSA matrix? One could easily identify the least ‘useful’ columns but that appears to be a bit of a rough, low-tech SVDish approach
Hi Christian,
The other consideration is that a table of HAL/LSA data is actually normally very sparse, so there is no way the points would be able to join the whole way through, if at all! An image is a 2D representation, so if we had a table of data (how we normally see hal/lsa data) it’d only be two columns (x,y) so visualising removing columns and rows doesn’t quite fit.
In fact the closest thing I’ve found to seam carving is a group of theories known as diffusion geometries, I asked Keith van Rijsbergen about them when i saw him in Glasgow last year and he said they had piqued his interest too. They certainly look very intersting and powerful. The maths is quite hard to grasp though…. perhaps someone could shed more light on them.
my understanding is that there are a combination of adjacency information (ie graphs) and markov chains in high dimensional non-linear space. Some very very interesting stuff.