Sunday, March 14, 2010

The "Trail-Making" test and 2D layout

A while back I put together a version of the "Trail-making" test.  This is a test that is apparently used for detecting different kinds of brain injury, but has also traditionally been used as a test of eye-hand coordination and probably procedural skill.  The most common version is by Reitan, as part of the Halstad-Reitan Battery, but it apparently goes back at least 60 years to tests like the Taylor Number Series and the Partington Pathways test, and it became a part of the  Army testing batteries during WWII.

But it is also used in occupational screening, and versions of it appear in many different cognitive-style batteries, and is apparently widely used to judge older driver's abilities. What is it?  Basically, connect-the-dots. Here is a depiction of PEBL's version using one of the classic layouts

There are two conditions in this test (A and B), one of which involves switching between letters and numbers, and one which is pure numbers.  I wanted to implement a version of this, but my main concern is that the original had just a few layouts.  This is nice for old-style paper-and-pencil tests, and you can develop good norms of those particular layouts, but this doesn't make too much sense if you want repeated measurements over time, because a person could quickly learn the pattern.  In fact, recent research by Buck and Ryan and Atkinson (see references) suggest it is not appropriate for repeated testing, because people learn the pattern and get better at it.

So for PEBL Test Battery, it was important to be able to auto-generate interesting problems, to at least avoid learning from the particular form.  There may still be learning of course, but with the potential for auto-generation of tests, there is even the possibility of running several practice rounds to get people used to the task (assuming that the novelty isn't what drives the interesting aspects of the task).  This turns out to have some complicated layout issues that turn out to be nice solutions for common problems, which is what I'm writing about here.

There are two things to notice about the points.  First, they are not laid out completely randomly.  In fact, the points do not overlap, and are reasonably-well spaced.  Second, the path that is followed is also not random.  It is not exactly a nearest-neighbor path, but it seems carefully designed so that you don't completely jump around the screen, and the 'trails' don't overlap.

These are two distinct layout issues that I want to be able to solve quickly in PEBL. First: random 2D output without overlap.  Second: rough-but-efficient paths through the points.

2D Point Layout
There are many visual search tasks that require distributed points in a 2D field.  You frequently don't want things to overlap, or overlap with at most some level of tolerance.  The simplest thing to do would be to just randomly assign [x,y] locations, and hope for the best.  This almost never works.  With enough targets, and a small enough field, you will almost always get at least one pair of points that are too close.  For example:

You can't rely on luck, because occasionally a point will be hidden, and that will cause all types of problems.  One strategy to avoid this would be to make a set of regular cells or gridpoints, place the points there, but jitter them so they aren't too regular.  But this usually doesn't work well either., because it looks a lot like a grid.  Below is an example, as you add uniform noise to a 10x10 grid that are originally spaced at 10-pixel intervals. It is hard to add enough noise to get rid of the regularity, but still ensure no overlap, so it isn't ideal.  But it is simple, and could be very effective if you were selecting a subset of the 100 gridpoints.

So, what did I do?  The heuristic approach I took was to first let the points be placed at random locations anywhere within the field, regardless of overlap.  Then, I looked for the two points that were closest to one another (as long as they were closer than an overlap threshold), and moved one to a new random position, as long as it improved the minimum difference.   One could probably come up with a movement strategy that moves the current point away from the closest point, but this simplest thing to do is just pick a new location randomly.  I allow this to happen repeatedly, not necessarily until a perfect solution is found, but for a fixed number of cycles.  Thus, even if we don't find a perfect solution, the one we get will probably be OK, because it continuously improves.  Here is a video that shows the layout in-process.

If you don't plot it in real time as I did for the video, the process is actually very fast.  And for the trails task, the points are small enough that you rarely have more than a couple overlaps, so it works within a few hundred ms. This basic layout algorithm is very useful, and I've reused it a number of times in other tests.  I expect to include it in a future release of PEBL, but it can always be copied from the Trails test for use.

Now, the second problem is that in the original test, the trail generally moved on a deliberate path, sometimes to adjacent points, sometimes not, but it is not random movement through the points.  It seems like a random layout would not be ideal: here is what a solution with random layout looks like:

It is sort of a mess.  It can be difficult to find the next target, because it gets lost in the 'sea'.   This could be important for test sensitivity, because searching through the entire field can add a lot of time to the test performance.  If that process is not the aspect of the test that is sensitive to whatever clinical condition or stressor one is interested in, you essentially are adding unexplained variance.  I'll do a brief test of this below, to see if it really matters.

So, I want to make the path move from nearby to nearby points. This is basically a 'traveling salesman problem'.  This is a really hard problem, and I only need an approximate solution, so we can try a few things.  This simplest dumb strategy is 'nearest neighbor.'  Pick a point, move to its nearest unvisited neighbor, then repeat.  Nearest neighbor is really easy to implement, and fairly efficient for small lists.  The PEBL code is:
Here, tour is the current tour (a set of point ids), pts is the xy coordinates, and mat is a distance matrix between points. 

define ShortenPathNN(tour,pts,mat)
##nearest-neighbor method.
  cur <- First(tour)
  toura <- [cur]
  rest <- Rest(tour)
     distances <- ExtractListItems(Nth(mat,cur),rest)
    ##find closest neighbor.
     order <- Sortby(rest, distances)
     cur <- First(order)  ##the first item is the closest
     rest<- Rest(order)   ##the rest of the list remains to be searched
     toura <- Append(toura,  cur)

   return Merge(toura,rest)

This is pretty simple, but it doesn't work very well sometimes.  Here is a sample:

It creates a problem where you might miss a few points on a tour, and they will get picked up at the end of the solution, but that means you have to make a big jump across the whole field.  This tends to happen near the end.  The algorithm is nice, because it is short, and executes in fairly constant time because it is deterministic, and it solves the tour really quickly even in PEBL, so you don't have to wait a long time either at initialization or between trials.  But maybe we can do better.

I tried another approach I won't go into that is akin to the Lin-Kernighan method.  It does a pretty good job, but it involves randomly choosing pairs of edges and seeing if there is an improvement.  The randomness is not great, because it means you don't know when you are finished, and it can take several thousand iterations/several seconds to get a good solution.  I wanted a simple solution that is deterministic and doesn't take too long. 

The last approach I tried was one that I don't have an explicit reference to, but think I have heard it referred to as a 'rubber-band' solution.  It is a path insertion routine, in which you start with a small tour, and then add nodes one at a time so that it forms the shortest path.  This again won't find the shortest tour, but it tends to do pretty well, with few path crossings.

Compare the above to the solution of this algorithm:

Here, we didn't get any crossovers (although it does happen sometimes), and we don't miss any 'islands' that get picked up at the end.  The algorithm does not find the shortest path all the time, (it would probably have been better if the 16 point were placed between the 4 and 5), but it is pretty good.   The code is:

define ShortenPathInsert(tour,pts,mat)
  toura <- SubList(tour,1,2)
  rest <- SubList(tour,3,Length(tour))
   loop(next, rest)
       ##For the next item, look at the cost incurred by adding it to
       ##each of the parts.
      links <- Transpose([toura, Rotate(toura,1)])
      dists <- []

        ## compute the cost to inserting between the two elements of i
        cost <- M(mat,First(i),next) + M(mat,next,Second(i)) - M(mat,First(i),Second(i))
        dists <- Append(dists, cost)


       ##we went through all the links
       d2 <- SortBy(Sequence(1,Length(toura),1),dists)
      ##The first element of d2 gives the location to insert
       toura <- Insert(toura, next,First(d2))
       #toura now is the growing tour

  return   toura

This routine is pretty similar to one that can be used to make Attneave figures, and so in the future I may update it to ensure no path crossings occur. 

Here is the final version


So, I implemented the path shortening code because I suspect it will matter when comparing the alternating versus pure numbers condition, and the original task this is modeled after was not random, but rather it had a consistent path through the points.   Will it matter?  Here are the alternative hypotheses:

1. random configuration makes search harder.  In a switching context, this places more load on the solver, and makes them forget where they are at more, and so will not only INCREASE the absolute magnitude of the difference between solutions, it could increase the relative magnitude of the difference.

2a. Random configuration makes search harder, but has little or no impact on the switching.  The greater difficulty will make the solutions more variable, and the greater times will make the relative impairment less, both with respect to the absolute solution times and the variability.

2b.  The slower search times that are a consequence of random points will allow for more slack time to remember alternating sets of points, and will reduce both absolute and relative costs of the alternating.

Obviously I'm predicting 2a, and maybe dreaming about 2b.  I tested it on myself I know the answer, which I'll blog about here soon. What happened? Stay tuned.  Or better yet, download the test and find out for yourself.

Atkinson, T. M., & Ryan, J. P. (2008). The Use of Variants of the Trail Making Test in Serial Assessment: A Construct Validity Study. Journal of Psychoeducational Assessment, 26(1), 42-53. doi:10.1177/0734282907301592

Buck, K. K., Atkinson, T. M., & Ryan, J. P. (2008). Evidence of practice effects in variants of the Trail Making Test during serial assessment. Journal of Clinical & Experimental Neuropsychology, 30(3), 312-318. doi:Article

Post a Comment