Sunday, March 21, 2010

Design and Randomization I

Proper randomization is critical for the causal inference that experiments provide.  This is true across subjects, but also within a subject, in the way he or she is exposed to a set of trials or conditions.  Many standard programming languages permit randomization, sampling, and design to be performed, but do not make it easy.  In PEBL, I strive to provide flexible functions that make standard designs simple, and complex designs possible.  As a consequence, there are probably a dozen sampling, shuffling and design functions available.  The downside of this is that it can be hard to find exactly what you want (and
there are many ways you might want to do a design that are different in subtle ways).



 These include shuffling, sampling, and design functions, such as:

Shuffle()
ShuffleRepeat()
ShuffleWithoutAdjacents()
ChooseN()
SampleN()
SampleNWithReplacement()
DesignLatinSquare()
LatinSquare()
DesignGrecoLatinSquare()
DesignBalancedSampling()
DesignFullCounterBalance()
CrossFactorWithoutDuplicates()

I also have a few tempates for Taguchi designs: so-called partial orthogonal designs.  These aren't available in PEBL directly, but appear in a library you can copy into your own experiment if you need it.

Shuffling
The basic Shuffle()  is the workhorse, and just about all you really need for many designs.  It rearranges a list in a new random configuration.  By the way, you have to be careful when building a shuffling algorithm to do it right.  It is really easy to make mistakes when building a shuffling routine, so much so that it warrants a whole section in a Wikipedia article.  In PEBL, I use a slightly inefficient method that is essentially foolproof.   How does it do it?  What it actually does is creates a new list of a size equal to the argument list, but filled with random numbers (between 0 and 1).  Then it sorts the given list by the random list.  The new list is guaranteed to be unaffected by the original configuration.  However, sorting has time complexity something like O(nlog(n)), whereas efficient shuffling has complexity can have O(n), it hardly matters on modern hardware, at least for typical experiments.  In a little test I ran, shuffling a 1000-item list takes about 3 ms, and shuffling a 10,000-item list about 280 ms on my computer. Even 100,000 items is bearable (28 seconds), because it probably would only happen once, during experiment startup.


How can you use Shuffle() in an experiment? Lets say you have two experimental conditions ("primed" and "unprimed") and a control, and you want a trial block to have ten trials of each, completely mixed.  How would you do it?

trialsControl <- Repeat("control",10)
trialsPrimed <- Repeat("primed",10)
trialsUnprimed <- Repeat("unprimed",10)
trials <- Shuffle(Flatten(trialsPrimed,trialsUnprimed,trialsControl))

This is simple, but it seems like more effort than it needs to be. ShuffleRepeat might make it simpler.  What it does is takes a base list, repeats it multiple times, and shuffles each individual list. So, the following will do almost the same thing:

trials <- ShuffleRepeat(["control","primed","unprimed"], 10)

The main difference is that each group of three will get shuffled individually.  Every three trials the subject will be assured to get each of the three trial types, which may mean that after every second one, they could predict what the third will be.  This may be fine, but in our case it might not.  I could have provided a specific function to do a full mixture, but all one really needs to do is add a Shuffle on the outside:


trials <- Shuffle(ShuffleRepeat(["control","primed","unprimed"], 10))

This is a little less efficient, because you are shuffling the list twice, but these types of operations are so fast on modern computers that it is pretty inconceivable it would really matter.

Note that when thinking about designing a mixture of trials within a block, I typically default to a design-and-shuffle, rather than a sample-on-demand strategy.

But lets say I wanted to shuffle a set but make sure some set of conditions don't happen together.  Maybe you are showing stimuli from a small set of classes, and you never want two from the same class to happen in succession.  Do this with the ShuffleWithoutAdjacents() function.  It takes a nested list and tries its best to make sure anything within a sublist does not appear adjacently.  It cannot always do this (some input lists would make this impossible), in which case it will still return a shuffled list, but then some of the items will be adjacent. So, doing something like:


trialsControl <- Repeat("control",5)
trialsPrimed <- Repeat("primed",5)
trialsUnprimed <- Repeat("unprimed",5)
trials <- ShuffleWithoutAdjacents([trialsPrimed,trialsUnprimed,trialsControl])



will produce something like this:
[primed, unprimed, primed, unprimed, control, unprimed, primed, control, primed, control, unprimed, control, unprimed, primed, control]

No two conditions occur adjacently. However, if it gets unlucky, it might produce something like:

[unprimed, primed, unprimed, primed, unprimed, control, primed, control, primed, unprimed, primed, unprimed, control, control, control]
 This is not too bad, but runs in this case will tend to happen near the end.


Sampling
Typically, I favor designing a condition set and shuffling over choosing a number of trials and randomly choosing conditions for each trial.  But sampling can be very useful as well, such as in cases where there is a large pool of stimuli to choose from repeatedly. PEBL has three functions to support this:

ChooseN()
SampleN()
SampleNWithReplacement()
ChooseN is a sampling routine that maintains the order of the original.  SampleN() is similar, but will choose them in a random order. Essentially, SampleN() is equivalent to Shuffle(ChooseN()).  Neither of these will replace the items of the list, which is what SampleWithReplacement() is for.  Sometimes, I want to randomly pick a single item (say, to control the condition a subject will be in).  In this case, I use:

First(SampleN(list, 1)).


Factorial Design
Many experiments require varying multiple factors.  For many simple cases, you can use the routines described above along with some simple list combination routines to create the design.  For example, suppose  I have three factors, one with 2 levels, one with 3 levels, and one with 4 levels.  This would require 24 total combinations, which we want presented in a random order.

#Merge combines two lists into one:
factorA <- Merge(Repeat(1,12],Repeat(2,12))
#Flatten gets rid of sublists; RepeatList repeats the elements of a list into a single new list
factorB <- RepeatList(Flatten([Repeat(1,4), Repeat(2,4), Repeat(3,4)]),2

#Flatten outside repeat is similar to RepeatList
#Sequence creates numbers in a sequence
factorC <- Flatten(Repeat(Sequence(1,4,1),6) )

Now, factorA, factorB, and factorC are all the right length and correspond to the levels of each one we want on each particular trial.  We a random order of them however, so in essence we want to shuffle them all by the same factor.  The easiest way to do this is to create a transposed composite list of the three factors, then shuffle that composite:

factors <- Transpose([factorA, factorB, factorC])
This creates a list nested so that its first element is a list containing the first parts of A,B, and C.  Note that Transpose only works if all of the components are the same size.  You can think of it as a matrix, and Transpose simply transforms the matrix  so the columns become rows and the rows become columns.  All you need to do now is Shuffle the factors list to put them in a random order:

trials <- Shuffle(factors)

There are several canned factorial design functions available in PEBL.  DesignFullCounterbalance is the simplest: it takes as arguments two lists, and returns a nested list, each sublist of which will contain a pair from the two lists.  It will do this for every possible pair, and so will be M x N items long, for input lists of length M and N.

x <- DesignFullCounterBalance( [1,2,3],[10,11,12,13])
produces:

[[1, 10]
, [1, 11]
, [1, 12]
, [1, 13]
, [2, 10]
, [2, 11]
, [2, 12]
, [2, 13]
, [3, 10]
, [3, 11]
, [3, 12]
, [3, 13]
]

Notice that it is not shuffled.  A similar function is:

CrossFactorWithoutDuplicates()

Which is useful if you want to consider all pairs of items from a set, but not pairs of identical items.  So CrossFactorWithoutDuplicates([1,2,3]) would return:

[[1, 2]
, [1, 3]
, [2, 1]
, [2, 3]
, [3, 1]
, [3, 2]
]

which does not include [1,1], [2,2], or [3,3].  This can be useful for collecting similarity matrices, because you will sometimes only be concerned with non-identical stimuli.

Full factor crossing can get big fast, and so many partial factorial design schemes are available.   There are several related to Latin Squares.  LatinSquare is the simplest, taking a single list of conditions, and assuming that the order is the secondary factor that needs to be counterbalanced:

Print(LatinSquare(["a","b","c","d"]))
 [[a, b, c, d]
, [b, c, d, a]
, [c, d, a, b]
, [d, a, b, c]
]
 

Print(DesignLatinSquare(["a","b","c","d"],[1,2,3,4]))
[[[a, 1]
, [b, 2]
, [c, 3]
, [d, 4]
]
, [[a, 2]
, [b, 3]
, [c, 4]
, [d, 1]
]
, [[a, 3]
, [b, 4]
, [c, 1]
, [d, 2]
]
, [[a, 4]
, [b, 1]
, [c, 2]
, [d, 3]
]
]



DesignGrecoLatinSquare() is a similar function that takes three factors.

The final function DesignBalancedSampling() is more heuristic. It is sort of like ShuffleRepeat, but it (1) makes sure that between list epochs, no adjacent items will appear, and creates a list of specified length, rather than a specific number of repetitions.  If the length of the base list does not go evenly into the length of the ultimate list, it chops off the last few items so that the difference between fewest and most presentation of a condition is at most 1.


 Using a condition list

These randomization functions typically work to create a list of stimuli, conditions, types, and so on.  Once you have a list of these, you can use the loop() primitive to carry out the experiment.  So, for example:

trials <- ShuffleList([1,2],10)

loop(i,trials)
{
   resp <- Trial(i)
}

These shuffling and randomization routines should, together, handle many standard experimental designs.  In the near future, I'll show a new shuffling method inspired by a question by a user, so stay tuned.

No comments: