TILT finally has an editing GUI of sorts. Although it doesn't do much yet, over the next few days it should acquire all the new parts that have been created in the past two weeks, and incorporate the TILT back-end service to recognise pages dynamically. It will also allow the user to edit and save alignments, which have mostly been produced automatically. So far all it can do is switch between justified view of the text and line-by-line mode, and zoom the image. But as can be seen from the buttons on the left, there is plenty to come.
Wednesday, February 25, 2015
Friday, February 20, 2015
Slicing and having multiple polygons on a page
One of the problems I described in the last entry was slicing a polygon in two. Doing this required a pretty good understanding of high-school geometry. But it now works in the demo on this page.
But having more than one polygon creates a new problem. As you move your mouse over the page you have to decide over which polygon you are moving, or on which corner-point you are clicking. And there might be hundreds of polygons or points on a page, and you have only a few milliseconds to decide between them. Imagine that you have some way to test "is the point where the mouse is inside this polygon"? or "is the mouse over a corner point of this polygon"? If you have,say, 100 polygons, you will have to call those two tests 100 times whenever the mouse moves even a small distance. And that will be much too slow if you want the interface to be responsive.
So I decided to divide the page-image up into four rectangles (NW, NE, SW, SE) containing either at most four points, or nothing. If there is nothing then the rectangle acts as a container for four smaller rectangles that are inside it. And as the mouse can only be in one place at a time it can only ever be over one rectangle that has any points. As well as the points each such rectangle also contains a list of polygons that overlap with some part of it. Deciding which rectangle you are in is easy because they are nested. So now it is a simple matter to test "is the mouse currently over a polygon or a corner-point?" because there will only be a few of them in each rectangle. The only problem with this is that as you edit the polygons and the points the rectangles must be kept up to date. But that is a solvable problem. Click on the image below to see it in action:
So now what I have is point-delete, point-add, point-drag, and polygon-anchor (freeze and highlight points), polygon-highlight and polygon-split. That's enough to try to create a usable GUI for editing the output of the TILT recognition process. And that, of course, is the next step.
Sunday, February 15, 2015
Second steps with user interface
I have refined the test program in the previous post to add points. So now you can both add and delete as well as move points. Some would argue that this is enough. But there are some tools that would greatly speedup editing that no one else seems to have thought about yet:
- Slicing a polygon in two. Imagine that you have a polygon that covers several words. You need to cut it quickly into two. With just the ability to delete and add points to existing polygons (no new polygons) how else can you do that? With the mouse all you would need to do would be to drag a line over an anchored polygon, then release the mouse to slice it along that line.
- Merge two or more polygons. If you have fragmented polygons it would be great to just shift-click them and then merged them in a single stroke of the mouse. This could be done by dragging from inside one of the polygons and dragging across the ones to be merged, ending inside another polygon. Then all dragged-over polygons could be merged.
- Create blobs. By clicking on a region that has no polygon you could send a message to the service to try to recognise a word in one go.
I've nearly got 1. to work.1 and 2 are a bit counter-intuitive because dragging in drawing programs is supposed to draw a square marquee. But marquees are just not very useful in this case, so I think overriding the default is a good idea. We need to facilitate the operations that the editor of a set of polygons will use all the time, or it will quickly become tedious.