I always like to capture the joyful moment when something finally 'works'. It makes all the labour seem worthwhile, even if, to an untrained eye, it would appear to suffer yet from many deficiencies. So to cut to the chase: the demo site now links the page image one word at a time to the text on the right, and vice versa. Just select an example and click on "upload", then "link". The best example is the Harpur Sonnets. There are problems with splitting shapes that belong to several words (try splitting a polygon some time), and also there are many other deficiencies: for example, I don't like the shape of the polygons – they're ugly convex ones and I want concave ones that surround the word elegantly. And I am painfully aware that my word and line-recognition modules still need some work. But all these improvements and others can be comfortably consigned to 'future work'.
The total development time from start to this point has been around 6 and a half weeks, part-time work for one programmer. When you compare other projects that have worked on this same problem, and didn't get as far, and how much they cost, that's pretty damn fast.
Addendum: There is now a better version but it is much slower. The problem is that words get recognised on the wrong lines. Once that is fixed it should work OK on all the examples. But I won't upload a new version until I'm happy with the speed.