Saturday, September 22, 2012

New in PEBL 0.13: List Access and List Editing

Traditionally, PEBL has used a list as its basis for sets or arrays of data.  This was done because of the flexibility of lists, and other programming languages (LISP and scheme) were built around lists as the primary array format.

Using a list enforces a little bit of discipline in how you create and access data, and provides flexibility that vectors typically cannot offer, but it also has some limitations, especially when one wants to access arbitrary elements or edit particular elements.    However, this is set to change in the next release (0.13).




Some examples of the limitations include that in a list, you only really have direct access to its first (and sometimes last) element(s).  This means that if you want to add something to a list, it can be simple.  You don't need to know how long the list needs to be before you create it, and you don't have to think about its size when you add new elements.  But if you want to access the 37th (or 3000th) element of a 5000-element list, the inner software actually needs to walk along the entire list until it finds that element.
So, the commands:

a <-   Nth(list, 3000)
b <-   Nth(list, 6000)

took differing amounts of time.

Nevertheless, this was rarely an issue, because even walking lists that are thousands of elements long can be done very very quickly.  However, there are occasions  when this can become problematic.  Another situation is when you want to add an element to the list, the typical practice to do something like:

  list <- [1,2,3]
  list <- Append(list,4)

 For small lists, this takes almost no time, but what PEBL actually does is make an entire new copy of the list when this happens, and discards the old one.  Again, for short lists, even ones that take hundreds or even thousands of elements, this hardly matters.  But it can provide time delays at times.


I always planned on  adding another format that allowed direct (constant-time) memory access,  but as PEBL grew, these limitations rarely seemed to matter, so I never bothered.  List access times don't often matter, because the Nth() is often not needed--the simplest and most efficient way to iterate across a list of items is to do this:

stim <- Shuffle(Sequence(1,10,1))
 loop(i,stim)
 {
    Print(i)
 }

And not this:

 
 stim <- Shuffle(Sequence(1,10,1))
 indices <- sequence(1,10,1)
 loop(i,indices)
 {
    Print(Nth(stim,i))
 }

But in developing the 0.13 release, I finally had a good reason to add a true vector format, and some related functions that will improve the ability to work with lists.  So I sat down and considered how to do it, and realized that many of the truly powerful things that you can do with lists (making arbitrary data structures and dynamically manipulating them) are not as simple to use in PEBL as in LISP. In fact, I realized that by moving the internal list data structure to a vector, we could only lose efficiency in a couple edge cases, while making access easier in many others.

So, as of version 0.13, when you use a 'list' data structure, it will internally actually be a vector.  This means that using
Nth(list,index)

won't depend on the length of the list or the index you are accessing.  Furthermore, it enables changing particular elements.  A new function is available:

SetElement(list,index,value)

Which provide the ability to change an item in constant time.  Finally, regardless of whether the internal list data structure was used, I felt it was important to provide a new way to put things on the end of a list that was more efficient.  Thus,

PushOnEnd()

Was born.  It can typically work as a drop-in replacement to Append, but it will modify the list you are working with.  It also return the resulting list, but this is very low-overhead.  So, previously, to construct a list of random numbers you might do:
   i <-0
   list <- []
  while(i < 10)
  {
     list <- Append(list, Random())
    i <- i + 1
  }

Now you can do:
   i <-0
   list <- []
  while(i < 10)
  {
     PushOnEnd(list, Random())
    i <- i + 1
  }

But you can also make the new line as follows, with only a tiny (under 5%) efficiency hit:
     list <- PushOnEnd(list, Random())


In future releases, I'll probably try to update the syntax to support accessing elements using something like [] accessors, but right now you can still Nth() to do this.

There is currently a beta release available for 0.13 on the sourceforge project page, and the final 0.13 should be out soon so you can try out this and other new features.

No comments: