Inspirated

 
 

May 14, 2009

User-defined iterators in Python

Filed under: Blog — krkhan @ 5:19 am

Iterable classes are one of the features which make Python code more readable. Simply put, they let you iterate over a container a la:

1
2
for s in ("Spam", "Eggs"):
	print s

Here, s iterates over the tuple printing the words one by one.:

Spam
Eggs

Now comes the interesting part: How do I make my own classes iterable? The official Python Tutorial gives a working example for how to do it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Reverse:
	"Iterator for looping over a sequence backwards"
	def __init__(self, data):
		self.data = data
		self.index = len(data)
 
	def __iter__(self):
		return self
 
	def next(self):
		if self.index == 0:
			raise StopIteration
		self.index = self.index - 1
		return self.data[self.index]
 
value = Reverse('spam')
for char in value:
		print char

Output:

m
a
p
s

The example appeared perfectly fine to a beginner like me. However, since I’m just kinda twisted in the head, I added a new line in the for loop:

16
17
18
19
value = Reverse('spam')
for char in value:
	if char in value:
		print char

Which resulted in the (quite unexpected) output:

 

That’s it. Nothing. Even though the code should make perfect sense and does work in case of built-in types. For example:

1
2
3
4
tup = ("Spam", "Eggs")
for s in tup:
	if s in tup: 
		print s

So daisy-ly gives:

Spam
Eggs

The culprit in case of tutorial’s example for user-defined iterators? After toying around the code sample a little, here’s what I pinned down:

  • On the nested lines where another iterator is required, the Reverse class is supposed to return instance of an iterator which would define the next() method for returning successive values.
  • Since the Reverse class returns only itself in this scenario, the self.index variable is shared among iterators of the Reverse('spam').
  • As a result, Reverse.next() raises the StopIteration in the nested condition.

Once I understood the underlying problem, some further head-scratching and a can of malted drink resulted in the solutions:

  • Return a copy for the iterative functions instead of the instance itself:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    
    import copy
     
    class Reverse:
    	"Iterator for looping over a sequence backwards"
    	def __init__(self, data):
    		self.data = data
    		self.index = len(data)
     
    	def __iter__(self):
    		return copy.copy(self)
     
    	def next(self):
    		if self.index == 0:
    			raise StopIteration
    		self.index = self.index - 1
    		return self.data[self.index]
     
    value = Reverse('spam')
    for char in value:
    	if char in value:
    		print char

    Pro: Less strain on the programmer, only a couple of extra lines of code are needed.
    Con: copying the instance can be expensive in case of larger containers.

  • Use another class:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    
    class Reverse:
    	"Iterator for looping over a sequence backwards"
    	def __init__(self, data):
    		self.data = data
     
    	def __iter__(self):
    		return ReverseIter(self)
     
    class ReverseIter:
    	def __init__(self, inst):
    		self.inst = inst
    		self.index = len(self.inst.data)
     
    	def next(self):
    		if self.index == 0:
    			raise StopIteration
    		self.index = self.index - 1
    		return self.inst.data[self.index]
     
    value = Reverse('spam')
    for char in value:
    	if char in value:
    		print char

    Pro: Since the whole container is not copied, only the index is unique among iterators — less burden on the memory.
    Con: Not everyone likes defining new classes.

Both solutions worked equally well and resulted in the same output (the expected one this time):

m
a
p
s

The choice of either solution is solely dependent on the programmer’s preference. As a side note, after equating Python programming with carnal activities in few of my previous posts, I’m gonna take it to the next step and finally tag this post accordingly.

Tags: , , , , , , , , , ,

2 Comments »

  1. I was able to use this info to make one of my user defined classes iterable. Which is to say that I was able to iterate thru it with a “for x in foobar” statement.

    But, apparently, this approach does not allow you to use the ‘[]’ index selection syntax as is foobar[2]. Is there some additional method you have to add to your defined class for this, or it is that strictly for Python built-in types?

    Comment by Ray Wood — December 14, 2010 @ 2:36 am

  2. Ray, you have to use the __setitem__ method to override [] operator. Here’s an example:

    >>> class A:
    ... def __setitem__(self, index, value):
    ... print index, value
    ...
    >>> a = A()
    >>> a[23] = 45
    23 45

    Comment by krkhan — December 14, 2010 @ 4:48 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

One small verification for man, one giant PITA for bots: