TILT continues to improve

Thursday, May 7, 2015

TILT continues to improve

TILT has now been thoroughly revised for each stage except the final linking. The result is better separation of text from the image, better line and word-recognition. It is almost at the point now where for many manuscripts the automatic output will be good enough for a first pass set of text-to-image links. Here's the automatic results for the difficult De Roberto V2 manuscript of The Viceroys. Note the good automatic word-identification. This will be refined further in the linking step. By reference to the real text these shapes represents it will be possible to split some of the merged words, though not those where there is overwriting, of course. The main difference is seen in the way that words are restricted to their own line-region, which is now polygonal rather than rectangular. The latter strategy was too simple, and led to words being recognised that crossed line-boundaries. Joined ascenders and descenders are all too common in manuscripts.

No comments:

Post a Comment

Desmond

Desmond Schmidt has degrees in classical Greek papyrology from the University of Cambridge, UK, and in Information Technology from the University of Queensland, Australia. He has worked in the software industry, in information security, on the Vienna Edition of Ludwig Wittgenstein, on Leximancer, a concept-mining tool, and on the AustESE (Australian electronic scholarly editing) project at the University of Queensland. He is currently a Research Scientist at the Institute for Future Environments, Queensland University of Technology.