Iterating Sparse Arrays in Lua 5.1

The Lua programming language is a scripting language that is light-weight (in terms of both syntax and size), fast, and easily embedded. I won’t spend a lot of time introducing Lua here (though I may do that in another article). If you haven’t used it, but you’re interested in getting started, a good place to look is the well-written Lua documentation.

I’m currently writing a library of higher-order functions in Lua called lua-functional. One of the early challenges was how to allow iteration over sparse arrays. Out of the box, Lua doesn’t stops iteration after reaching the first nil value. This article explores a two possible solutions to this problem and their consequences.

Tables

Tables, generally known as associative arrays, are the basis for all data structures in Lua. A table in Lua looks like:

my_table = { a = 1, b = 2, b = 3 }

Arrays

Lua also provides support for treating tables as non-associative arrays, as in:

my_array = { 'value1', 'value2', 'value3' }

Internally Lua makes a table with this structure:

my_array = { [1] = ‘value1′, [2] = ‘value2′, [3] = ‘value3′ }

such that

for i, v in ipairs(my_array) do
    print(i .. " " .. v)
end

will print

1 value1
2 value2
3 value3

The ipairs function allows a table to be iterated over as an array. It returns the index (i above) and the value (v above). In Lua .. is the string concatenation operator.

You may have noticed that the indices started at 1. That was intentional - it’s just the way Lua works.

Sparse Arrays

A sparse array is easy to create in Lua:

my_array = { 'value1', [1000] = ‘value1000′ }

Unfortunately, iteration doesn’t work with it. Using ipairs like this:

for i, v in ipairs(my_array) do
    print(k .. " " .. v)
end

will print

1 value1

As soon as ipairs reaches a nil value (at index 2 in this case) it assumes it has reached the end of the array and stops.

Solution 1: Creating a Custom Array Constructor with Metatables

One obvious approach to solving this problem would be to write a custom array constructor using Lua’s metatables. The following code uses a constructor, new_array, to return a custom array with a metatable that intercepts every table update:

local array_mt = {
    __newindex = function(t, k, v)
        if type(k) == "number" and k > t.n then rawset(t, 'n', k) end
        rawset(t, k, v)
    end
}

function new_array()
    arr = { n = 0 }
    setmetatable(arr, array_mt)
    return arr
end

__newindex is called each time code attempts to set a value on the table to which it is applied. Here it checks to see if the index, k, is greater than the largest index it knows, n, and if so it uses rawset to update the value of n.

The new_array function creates the new array, sets n to 0, and applies the custom metatable.

Next, we need a custom iterator that uses array.n to determine the end of the array, not the first nil value. Creating iterators in Lua is simple:

function nipairs_custom_array(arr)
    if not arr.n then
        error("The given array was not creaated with new_array()", 2)
    end

    return function(arr, i)
        i = i + 1
        if i <= arr.n then
            return i, a[i]
        end
    end, arr, 0
end

This code first checks to see if the given array was created by the new_array constructor. If it was, then it returns a closure which iterates over each index while i <= arr.n, the array to iterator over (which is passed to the function on each iteration), and the starting index (which is also passed to the function on each iteration). See generic for for more information about how iterators work.

Finally, we need some code to test the performance of this iterator. I used the following code, which creates a new array with two values, one at index 1 and one at index 50,000,000. It prints the first and last value and then terminates.

function test_mt(size)
    a = new_array()
    a[1] = ‘beginning’
    a[size] = ‘ending’

    for k, v in nipairs_custom_array(a) do
        if v then print(k, v) end
    end
end

test_my(50 * 1000 * 1000)

On my iMac with a Core2 Duo processor, I get the following results:

$ time lua test1.lua
1       beginning
50000000        ending

real    0m17.039s
user    0m16.567s
sys     0m0.436s

I don’t like this solution. As an API developer, I don’t want to force users to use a custom array implementation. What do they do if they get an array from another library?

Solution 2: Use the Iterator from Lua-functional

Yes, I’m plugging a custom iterator I wrote for lua-functional. First, I’ll show you the performance results to compare with those above:

$ time lua test2.lua
1       beginning
50000000        ending

real    0m12.809s
user    0m12.436s
sys     0m0.345s

The above results came from using the nipairs iterator, which is included in lua-functional. Here’s the current implementation:

-- Defined in 'functional' module
function nipairs(arr)
    require_type(arr, "table")
    local upper = table.maxn(arr)
    return function(a, i)
        i = i + 1
        if i <= upper then
            return i, a[i]
        end
    end, arr, 0
end

And here’s the code to run the test:

function test_functional(size)
    a = { 'beginning', [size] = ‘ending’ }

    for k, v in functional.nipairs(a) do
        if v then print(k, v) end
    end
end

test_functional(50 * 1000 * 1000)

The first thing to note is that client code can use an ordinary array with this iterator. This is really key for me in designing the API. Clients do not need to use special arrays with this iterator.

The second thing to note is that instead of use a field on the table (arr.n in the last example), I use a local variable upper and access it through the function, which is acting as a closure. table.maxn(arr), part of the standard Lua library, does a linear search through the table of the maximum index. This is unfortunate, but at least the performance of the whole operation is still O(n).

You may have noticed that the performance of this iterator is counter-intuitively much better than that of the previous implementation. In Lua, access to a field from a closure is less expensive than accessing a field from a table. Likely, caching the value of arr.n above would have yielded better performance.

Comparison to Explicit Iteration

Just for the sake of comparison, I thought I would see how Lua performed if we were some how able to know the upper bound of a sparse array. Here is the code:

function test_explicit(size)
    a = { 'beginning', [size] = ‘ending’ }

    for k = 1, size do
        v = a[k]
        if v then print(k, v) end
    end
end

test_explicit(50 * 1000 * 1000)

And the result:

$ time lua test3.lua
1       beginning
50000000        ending

real    0m11.752s
user    0m11.420s
sys     0m0.303s

The result demonstrates that there is a cost associated with my solution. However, it is the best I’ve yet found as we do not know the upper bound of the array.

Ruby Comparison

One key factor in my decision to learn Lua was performance. I’d written a genetical algorithm framework in Ruby, but as it is computationally expensive, I thought Lua might be better suited to the challenge. I have yet to write the framework in Lua, but I thought I would test Ruby’s performance on a sparse array for comparison.

Here’s the Ruby code:

size = 50 * 1000 * 1000
a = { 1 => 'beginning', size => 'ending' }
1.upto(size) { |i| v = a[i]; puts(i.to_s + ” ” + v) if v }

And here are the results:

$ time ruby test4.rb
1 beginning
50000000 ending

real    0m28.177s
user    0m27.373s
sys     0m0.740s

Lua is 2.5 times faster in an apples-to-apples comparison and my implementation of nipairs is well over 2 times faster than the more optimal Ruby implementation.

Final Notes

I hope this article has given you some tools to consider when working with sparse arrays in Lua. It has touched on a number of interesting features of Lua, including iterators, closures, and metatables. Given enough interest, I will spend more time talking about these topics in upcoming articles.

3 Comments

  1. Adam Wolff Says:

    Nice article Chris.

    I’m curious about how you iterate over the keys in an associative array in Lua, since it seems to me that the halting behavior in built-in iteration is just a bug.

    In JavaScript, if you define a sparse array, only the defined keys appear in iteration, even though the .length property of the array reflects 1+the last index in the array.

  2. Chris Pettitt Says:

    Thanks Adam.

    In Lua you can use pairs to iterate over an associative array. Here’s an example:

    a = { foo = 123, bar = 456 }
    for k, v in pairs(a) do
        print(k, v)
    end

    The output is:

    bar     456
    foo     123

    In my local environment pairs seems to be consistent in how it iterates through the array, but there is no guarantee.

    Lua also has a length operator (#), but it’s definition is loose enough to allow it to stop at any nil (”hole”) it encounters during iteration. It sounds like JavaScript has better guarantees in this regard.

  3. Chris Pettitt Says:

    There are a number of interesting solutions to this problem (with benchmarks) presented here: http://lua-users.org/wiki/VarargTheSecondClassCitizen.

Leave a Reply