I have been reading a bit about neural nets and – I must admit – most of the stuff that you find in books or on the web is not easy reading for me. Kevin Gurney’s book (An Introduction to Neural Networks; its web version is available here) is an exception to a certain degree — at least I could follow it pretty well until I got kind of tired and confused when he started discussing Hopfield nets. Kohonen maps or self-organizing maps were a bit easier to swallow, but I actually had to start playing with them myself before it started to make sense. And then I realized that thinking of Kohonen maps in terms of neural nets is much more difficult than their geometric interpretation — that is, a two-dimenisional lattice with a certain number of nodes that moves and stretches in the data space until it fits or describes relatively well the data cloud. In other words, it is quite like nonlinear principal component analysis, which is still easier to grasp than input layers, winning neurons, etc.
I have stolen the animated gif above from a superb website on neural nets that has several java applets showing how these things work. I especially recommend the demo of the 3D Kohonen map.
It took me a while to realize that it is wrong to assume that a Kohonen map is picking out real clusters in the data. That could be true if, on one hand, there is good clustering in the data in a statistical sense, and, on the other hand, there are only a few nodes in the map – that is, about as many as in the data. However, often there are no real, well-defined clusters in the data, but Kohonen’s classification method is still applied – and should be applied. What Kohonen’s group recommend is a two-step classification: start out with a large number of nodes in the SOM (self-organizing map) and reduce the number of nodes or clusters in a second step, with k-means clustering (see details here) applied to the map itself.
I think however that the second step is not very useful if the data does not have clusters. It is still definitely worth applying the Kohonen classifier to reduce dimensionality and visualize multidimensional data, but applying the k-means clustering as well only results in an image with less resolution and more arbitrary boundaries. It is a little bit like posterizing a color photograph, that is, reducing the number colors to only a few, although there were a lot more information and no well-defined classes in the original image (I know this latter assumption is usually not valid, but put that aside for now). Why would one do that?
This type of classification (probably real nerds would say that ‘quantization’ is a better word), when there are no well-defined classes, is comparable in many ways to the classification systems used in the more descriptive natural sciences. For example, in sedimentary geology, sediments or rocks are often divided into facies A, B, C, and so on; and, in most cases, the boundaries between these facies are not very clear and it would be difficult to show any statistical siginificance for the existence of clustering. So, strictly speaking, this classification is incorrect, but it still can be useful (e.g., you can write long journal articles 🙂 ).
Of course, once you leave these somewhat subjective territories of science and think about how ‘clustering’ is done in everyday life, you realize that this simple-minded ‘quantization’ is even more questionable. People like to and tend to think dichotomically: us vs. them, black vs. white, liberal vs. conservative, christian vs. muslim, and so on — this is not news. The result is that, to use the Kohonen terminology, the quantization errors are huge, and the cluster boundaries are essentially arbitrary. The sad part is that sometimes – many times – people kill each other because of these errors.
In light of this, isn’t it reassuring that two-dimensional Kohonen maps can use lots of nodes or clusters, thus describe reality better, and still be useful in making things more visible and intelligible?