# Inspirated

June 19, 2010

## Using Boyer-Moore-Horspool algorithm on file streams in Python

Filed under: Blog — krkhan @ 4:53 am

Horspool’s algorithm is a simple and efficient string-searching algorithm which trades space for time and performs better as length of search string is increased. Another (perhaps overlooked) advantage of this algorithm is its ability to search through stream files without requiring random access. As I was working on Launchpad for my SoC project I required this particular stream-handling attribute as the file descriptors opened by urllib2 didn’t support seek()ing. Modifying the example code from Wiki page a little, I was able to read() only the required bytes sequentially:

horspool.py

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 #!/usr/bin/env python   import locale import os import sys import urllib2   def boyermoore_horspool(fd, needle): nlen = len(needle) nlast = nlen - 1   skip = [] for k in range(256): skip.append(nlen) for k in range(nlast): skip[ord(needle[k])] = nlast - k skip = tuple(skip)   pos = 0 consumed = 0 haystack = bytes() while True: more = nlen - (consumed - pos) morebytes = fd.read(more) haystack = haystack[more:] + morebytes   if len(morebytes) < more: return -1 consumed = consumed + more   i = nlast while i >= 0 and haystack[i] == needle[i]: i = i - 1 if i == -1: return pos   pos = pos + skip[ord(haystack[nlast])]   return -1   if __name__ == "__main__": if len(sys.argv) < 3: print "Usage: horspool.py <url> <search text>" sys.exit(-1)   url = sys.argv[1] needle = sys.argv[2] needle = needle.decode('string_escape')   fd = urllib2.urlopen(url) offset = boyermoore_horspool(fd, needle) print hex(offset), '::', offset fd.close()

Now comes the fun part:

• The code can search through any URL without downloading it completely, stopping at the first match. For example, the following command will download only the first few bytes of the provided URL:
$./horspool.py http://www.gutenberg.org/files/132/132.txt "The Art of War" 0x1d :: 29 • Unicode searches work perfectly as well. Although the matching takes place according to the character encoding of the terminal used. That’s to say, since I’m using a UTF-8 terminal the “bytes” searched were assumed to be UTF-8 encoded as well: $ ./horspool.py http://www.gutenberg.org/files/29011/29011-0.txt "Σημείωση: Ο Πίνακας περιεχομένων"

0x44f :: 1103

• Same goes for multi-line searches:
$./horspool.py http://www.gutenberg.org/files/29011/29011-0.txt "διευκόλυνση\r\nτου αναγνώστη" 0x4b5 :: 1205 Tags: , , , , , , , , , May 16, 2010 ## HOWTO: Query WordPress posts in CMS Made Simple Filed under: Blog — krkhan @ 3:53 pm While I run my blog at inspirated.com, I aggregate posts related to coding at the subdomain code.inspirated.com. The websites are run through WordPress and CMS Made Simple respectively. For the latter, I needed to find a way of fetching blog posts from the main site for linking. Initially, I used tag feeds for this purpose. That is, I used the RSS module for CMS-MS and fetched the feed for a particular tag (e.g., inspirated.com/tag/code/feed). This worked well for a while until the feeds became large and I noticed that only the most recent 10 posts were showing up in the listings. Digging around, I found this piece of documentation which explains how one can use custom queries for collecting WordPress posts from a blog. There was a catch however as the code for doing so could only be run globally. In other words, if I tried running the code mentioned on the page inside a User Defined Tag in CMS-MS I would get strange errors. The solution was to run the code in a separate PHP file. Here’s how: 1. Create a file named wp.php in your CMS-MS folder with the following code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 <?php require_once('/path/to/wordpress/wp-blog-header.php'); // edit the path in the line above to point to your wp-blog-header.php$tag = isset($_REQUEST['--tag']) ?$_REQUEST['--tag'] : 'code'; $count = isset($_REQUEST['--count']) ? $_REQUEST['--count'] : '-1';$after = isset($_REQUEST['--after']) ?$_REQUEST['--after'] : '1970-01-01';   function filter_where($where = '') { global$after; $where .= " AND post_date >= '".$after."'"; return $where; } add_filter('posts_where', 'filter_where'); query_posts('tag='.$tag.'&posts_per_page='.$count); echo "<ul>"; if ( have_posts() ) : while ( have_posts() ) : the_post(); echo "<li>"; echo "<a href=\""; echo the_permalink(); echo "\">"; echo the_title(); echo "</a>"; echo " ("; echo the_time('F jS, Y'); echo ")"; echo "</li>\n"; endwhile; else: echo "<li>No posts found</li>\n"; endif; echo "</ul>"; wp_reset_query(); ?> 2. Add a User Defined Tag in CMS-MS with the following code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20$path = 'whoami | php -q /path/to/wp.php'; // edit the path in the line above to point to the wp.php created in previous step   // whoami command piped for no reason because my script wasn't // producing any output without it   if(isset($params['tag'])) {$path .= ' --tag='.$params['tag']; } if(isset($params['count'])) { $path .= ' --count='.$params['count']; }   if(isset($params['after'])) {$path .= ' --after='.$params['after']; } echo $path;
3. Use the tag in any CMS-MS page with any of the following combinations:
• List all posts with the tag code:
{wp_posts_with_tag tag="code"}
• List 10 posts with the tag code:
{wp_posts_with_tag tag="code" count="10"}
• List all posts with the tag code after May 1st, 2009:
{wp_posts_with_tag tag="code" after="2009-05-01"}

Custom queries are very powerful once you get them working. Anyone planning on using them should take a look at the function reference for getting to grips with the flexibility they offer.

Tags: , , , , ,

February 7, 2010

## Faking User-Agent with PyS60

Filed under: Blog — krkhan @ 12:00 am

“Anyone who slaps a ‘this page is best viewed with Browser X’ label on a Web page appears to be yearning for the bad old days, before the Web, when you had very little chance of reading a document written on another computer, another word processor, or another network.” — Tim Berners-Lee in Technology Review, July 1996

People never learn. Slapping such labels is one thing, they even go as far as adopting brain-dead practices of checking user-agent strings and refusing service to any browser not originating from Redmond. For example, Opera Mobile — the sexiest mobile application on planet — works astonishingly well for Javascript websites. Nevertheless, when trying to browse my university’s academic management portal on it I am presented with a big ugly “We’re dumb, you need to open this page with Internet Explorer 6.0 or later because it uses JAVASCRIPTXX0RZ.” Even though Opera does allow spoofing of user-agent strings, the fake strings still contained “Symbian” as the operating system which still resulted in incompatibility errors.

As ever, Python came to the rescue. Firing up the Twisted framework, I created a simple HTTP proxy which modifies the user-agent string on the fly. Peaches:

The script is still very quirky and is the farthest thing from what you’d call a stable solution. You can download the inital release here. The zip file contains the tiny proxy server script as well as Twisted and Zope dependencies. Good luck with trying to counter retards who’re doing everything they can to avoid compatibility. Yes, even 14 years after Sir Tim’s veracious proclamation.

Tags: , , , , , , , , , , ,

January 6, 2010

## (GUI, Mathematical Equations, Scientific Plotting) = (GTK+, LaTeX, Matplotlib)

Filed under: Blog — krkhan @ 10:16 am

GTK+ needs no introduction. LaTeX is the first thing that pops in anyone’s mind if mathematical equations’ typesetting is under consideration. Matplotlib — while not as well-known as the former two — is the super easy and elegant solution for scientific plotting on *nix platforms.

For an application demo, I required all three. Past experience has taught me that the most straightforward way of “gluing” things together is Python. GTK+ therefore = PyGTK. Next up was LaTeX, and a previous solution of mine for embedding LaTeX in PyGTK came to the rescue. The final requirement of Matplotlib was fulfilled without any hassle since the library was already written in Python.

The collective result was pretty:

radareq-0.1.tar.gz

(Click on the image for larger version.)

The linked tarball contains the Python scripts for the application. For everything to run smoothly, LaTeX and Matplotlib packages need to be installed on your system. If you encounter any issues running the code, feel free to flame your distribution for the apparent lack of sanity regarding package management.

Tags: , , , , , , , , , , ,

October 31, 2009

## HOWTO: Use PyS60’s Bluetooth Console on Fedora/Ubuntu/Debian Linux

Filed under: Blog — krkhan @ 11:03 pm

While developing PyS60 apps is one of the most fun things you could do with your Nokia phone, debugging them isn’t as zippy as one would hope for in a Py development environment. To make up for that, PyS60 gives the developers an option for directly connecting to the interpreter through Bluetooth. Doesn’t sound very appealing? How about this: You connect your laptop with the cellphone, jump in at some place in the code while your app is executing and then use lappy’s big keyboard for exploiting different code and values in the interpreter. Sounds better?

To accomplish this on a Linux distro, you will need the following packages installed on your system:

Name Links
gnome-bluetooth
uucp/cu

After making sure that both are present on your system, install PyS60 on your phone if you haven’t already done so.

Now the fun part:

1. Switch on Bluetooth in the cellphone.

2. Launch bluetooth-properties and click on “Setup New Device”.

3. Select your cellphone.

4. You will be shown a pin.

5. Enter the pin when queried on the cellphone.

6. The phone should be successfully paired.

7. Authorize your Linux system to make automatic connections to the phone.

8. As root, run this shell script:
[root@orthanc ~]# ./rfcomm-listen.sh

Serial Port service registered
Waiting for connection on channel 2

9. Launch PyS60 interpreter and select “Bluetooth Console” from the application menu.

10. Select your Linux machine.

The command you ran in previous step should have new output:

[root@orthanc ~]# ./rfcomm-listen.sh

Serial Port service registered
Waiting for connection on channel 2
Connection from 00:17:4B:B6:35:31 to /dev/rfcomm0
Press CTRL-C for hangup

The cellphone screen should be showing something like this:

11. As root again, open a new terminal and run:
[root@orthanc ~]# cu -l /dev/rfcomm0

Connected.

12. Hit Enter till prompt (>>>) appears, then type:

>>> import appuifw
>>> appuifw.query(u'Hello World', 'text')

13. Viola, you should have an input box on the mobile screen:

14. Enter any text and press the OK key. It should be show up in the terminal you were using to type in code:

>>> import appuifw
>>> appuifw.query(u'Hello World', 'text')
u’Finally’

15. Exit the interpreter by typing CTRL+D on an empty line:

>>> import appuifw
>>> appuifw.query(u'Hello World', 'text')
u’Finally’
>>>
Interactive interpreter finished.
cu: Got hangup signal

Disconnected.

Pat yourself on the back. Now, you can use your Bluetooth console to import your modules, execute some stuff and then jump in the middle to test some extra lines or values. In fact, I found it to be a pretty darned good way of learning about PyS60’s API. Res secundae!

Tags: , , , , , , , , , , , , , , , ,

September 10, 2009

## HOWTO: Use LaTeX mathematical expressions in PyGTK

Filed under: Blog — krkhan @ 10:04 pm

I had never really laid my hands on LaTeX until I required it in one of the helper applications for my graduation project. Unfortunately, the requirement wasn’t as simple as producing some documents as I had to embed mathematical expressions on the fly in my PyGTK apps. Googling around for the solution, I found GtkMathView which accomplished something similar to this albeit using MathML. However, my luck ran out on me again as the widget lacked Python bindings. The other solution was to generate transparent PNGs on the fly and include them as GtkImages. This worked rather well, as the final code allowed easy modifications to the generated expressions.

Requirements for the code were:

Final results:

And the simple code behind it:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 #!/usr/bin/env python """An example demonstrating usage of latexmath2png module for embedding math equations in PyGTK   Author: Kamran Riaz Khan <krkhan@inspirated.com> """   import gtk import os import latexmath2png   pre = 'gtktex_' eqs = [ r'$\alpha_i > \beta_i$', r'$\sum_{i=0}^\infty x_i$', r'$\left(\frac{5 - \frac{1}{x}}{4}\right)$', r'$s(t) = \mathcal{A}\sin(2 \omega t)$', r'$\sum_{n=1}^\infty\frac{-e^{i\pi}}{2^n}$' ] latexmath2png.math2png(eqs, os.getcwd(), prefix = pre)   def window_destroy(widget): for i in range(0, len(eqs)): os.unlink(os.path.join(os.getcwd(), '%s%d.png' % (pre, i + 1))) gtk.main_quit()   window = gtk.Window() window.set_border_width(10) window.set_title('LaTeX Equations in GTK') window.connect('destroy', window_destroy) vbox = gtk.VBox(spacing = 10) window.add(vbox)   images = [None] * len(eqs) for i in range(len(eqs)): images[i] = gtk.image_new_from_file('%s%d.png' % (pre, i + 1)) vbox.pack_start(images[i])   window.show_all() gtk.main()
Tags: , , , , , , , , , ,

August 30, 2009

## HOWTO: Add grid lines to a GtkDrawingArea in PyGTK

Filed under: Blog — krkhan @ 6:29 pm

Recently, I needed to create some visualizations in a drawing area for one of my PyGTK apps. The PyGTK tutorial had an excellent writeup complete with a working example for immediate help. Unfortunately, my requirements were of a “grid” view on which other objects would be drawn. Naturally, I hoped for some variable on the GtkDrawingArea which I would set to true and have the grid lines drawn automatically. But since I couldn’t find any such magic setting in the API, I had to draw the lines manually.

Consequently, I could use either of the following approaches:

• Loop horizontally and vertically while draw_line()ing.
• Draw a 10×10 box as a tile and repeat it over the background.

The first approach would’ve worked faster but involved writing more code. The second was uglier — especially for larger images — but resulted in me having to add only a few lines to the example:

72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 xpm_data = ["10 10 2 1", " c #EEEEEE", ". c #FFFFFF", " ", " ........ ", " ........ ", " ........ ", " ........ ", " ........ ", " ........ ", " ........ ", " ........ ", " "] tile_pixmap, tile_mask = gtk.gdk.pixmap_create_from_xpm_d( area.window, None, xpm_data) tile_gc = area.window.new_gc(fill = gtk.gdk.TILED, tile = tile_pixmap) width = int(self.hruler.get_range()[3]) height = int(self.vruler.get_range()[3]) area.window.draw_rectangle(tile_gc, True, 0, 0, width, height)

Turning this:

Into this:

Note that the 10×10 tiles’ borders match prettily with the markers on top and left rulers. There will definitely be some more efficient method for achieving this but for the time being tiling serves my needs perfectly.

Tags: , , , , , , ,

August 1, 2009

## Writing a minimal shell in Python

Filed under: Blog — krkhan @ 1:48 am

Being a Unix geek has its own little benefits. For example, when a friend wants you to code a shell for his Operating Systems assignment, you can extort a lunch out of him for the services. Moreover, if you can somehow get him to convince his teacher for accepting submissions in Python, you can have sex with the code itself.

The bare-essentials shell was required to have the following:

• Support for internal commands such as cd or ls which would be implemented using system calls.
• Support for launching external programs with arguments provided to the shell — optionally putting them to background with an ampersand at the end of the arguments.

That is it. No fancy intput/output stream redirection, no jobs, no pipes. Just a simple launcher. A quick doze of reflective programming resulted in a class that checked the given commands against member functions (e.g., Shell.on_cd is queried for “cd path“) and called them to process the input. In case of unrecognized commands, the default handler is called which in turn uses the subprocess module and attempts to launch the command as an external program. Support for escaped or quoted arguments wasn’t required (the assignment recommended tokenizing strings at spaces), but the godsent module … :

import shlex # ... args = shlex.split(command)

… allowed me to support proper argument parsing with only half a line of code anyway. Similarly, connecting to the standard input/output of the child program wasn’t in the requirements, but the simple call … :

import subprocess # ... proc = subprocess.Popen(args, executable = args[0], stdin = sys.stdin, stdout = sys.stdout, stderr = sys.stderr)

… did the trick perfectly; allowing proper executions of commands such as “cat -“.

The complete code is downloadable here. Unfortunately, coding was the easy part. Getting the guy to pay for the extorted lunch wasn’t and as of this writing, I’m still deadlocked on that front.

Tags: , , , , , ,

May 14, 2009

## User-defined iterators in Python

Filed under: Blog — krkhan @ 5:19 am

Iterable classes are one of the features which make Python code more readable. Simply put, they let you iterate over a container a la:

1 2 for s in ("Spam", "Eggs"): print s

Here, s iterates over the tuple printing the words one by one.:

Spam
Eggs

Now comes the interesting part: How do I make my own classes iterable? The official Python Tutorial gives a working example for how to do it:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 class Reverse: "Iterator for looping over a sequence backwards" def __init__(self, data): self.data = data self.index = len(data)   def __iter__(self): return self   def next(self): if self.index == 0: raise StopIteration self.index = self.index - 1 return self.data[self.index]   value = Reverse('spam') for char in value: print char

Output:

m
a
p
s

The example appeared perfectly fine to a beginner like me. However, since I’m just kinda twisted in the head, I added a new line in the for loop:

16 17 18 19 value = Reverse('spam') for char in value: if char in value: print char

Which resulted in the (quite unexpected) output:

That’s it. Nothing. Even though the code should make perfect sense and does work in case of built-in types. For example:

1 2 3 4 tup = ("Spam", "Eggs") for s in tup: if s in tup: print s

So daisy-ly gives:

Spam
Eggs

The culprit in case of tutorial’s example for user-defined iterators? After toying around the code sample a little, here’s what I pinned down:

• On the nested lines where another iterator is required, the Reverse class is supposed to return instance of an iterator which would define the next() method for returning successive values.
• Since the Reverse class returns only itself in this scenario, the self.index variable is shared among iterators of the Reverse('spam').
• As a result, Reverse.next() raises the StopIteration in the nested condition.

Once I understood the underlying problem, some further head-scratching and a can of malted drink resulted in the solutions:

• Return a copy for the iterative functions instead of the instance itself:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 import copy   class Reverse: "Iterator for looping over a sequence backwards" def __init__(self, data): self.data = data self.index = len(data)   def __iter__(self): return copy.copy(self)   def next(self): if self.index == 0: raise StopIteration self.index = self.index - 1 return self.data[self.index]   value = Reverse('spam') for char in value: if char in value: print char

Pro: Less strain on the programmer, only a couple of extra lines of code are needed.
Con: copying the instance can be expensive in case of larger containers.

• Use another class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 class Reverse: "Iterator for looping over a sequence backwards" def __init__(self, data): self.data = data   def __iter__(self): return ReverseIter(self)   class ReverseIter: def __init__(self, inst): self.inst = inst self.index = len(self.inst.data)   def next(self): if self.index == 0: raise StopIteration self.index = self.index - 1 return self.inst.data[self.index]   value = Reverse('spam') for char in value: if char in value: print char

Pro: Since the whole container is not copied, only the index is unique among iterators — less burden on the memory.
Con: Not everyone likes defining new classes.

Both solutions worked equally well and resulted in the same output (the expected one this time):

m
a
p
s

The choice of either solution is solely dependent on the programmer’s preference. As a side note, after equating Python programming with carnal activities in few of my previous posts, I’m gonna take it to the next step and finally tag this post accordingly.

Tags: , , , , , , , , , ,

May 3, 2009

## “All methods in Python are effectively virtual”

Filed under: Blog — krkhan @ 8:07 pm

Dive Into Python really is one of the best programming books I have ever laid my hands on. Short, concise and to-the-point. The somewhat unorthodox approach of presenting an alien-looking program at the start of each chapter and then gradually building towards making it comprehensible is extraordinarily captivating. With that said, here’s an excerpt from the chapter introducing Python’s object orientation framework:

Guido, the original author of Python, explains method overriding this way: “Derived classes may override methods of their base classes. Because methods have no special privileges when calling other methods of the same object, a method of a base class that calls another method defined in the same base class, may in fact end up calling a method of a derived class that overrides it. (For C++ programmers: all methods in Python are effectively virtual.)” If that doesn’t make sense to you (it confuses the hell out of me), feel free to ignore it. I just thought I’d pass it along.

If you were able to comprehend the full meaning of that paragraph in a single go, you most definitely are one of the following:

• Guido van Rossum himself
• Donald Ervin Knuth
• Pinocchio

Neither of which happens to be my identity, so it took me around three rereads to grasp the idea. It brought back memories of an interesting question that I used to ask students while I was working as a teacher’s assistant for the C++ course: “What is a virtual function?” The answer always involved pointers and polymorphism; completely ignoring any impact virtual functions would be having on inheritance in referential/non-pointer scenarios. (Considering that most of the C++ books never attempt to portray the difference either, I didn’t blame the students much.) Confused again? Here’s some more food for thought: Python does not even have pointers, so what do these perpetually virtual functions really entail in its universe? Let’s make everything peachy with a nice example.

Consider a Base class in C++ which defines three functions:

• hello()
• hello_non_virtual()
• hello_virtual()

The first function, i.e., hello() calls the latter two (hello_non_virtual() and hello_virtual()). Now, we inherit a Derived class from the Base, and override the functions:

• hello_non_virtual()
• hello_virtual()

Note that the hello() function is not defined in the Derived class. Now, what happens when someone calls Derived::hello()? The answer:

Since Derived::hello() does not exist, Base::hello() is called instead. Which, in turn, calls hello_non_virtual() and hello_virtual(). For the non-virtual function call, the Base::hello_non_virtual() function is executed. For the virtual function call, the overridden Derived::hello_virtual() is called instead.

Here’s the test code for C++:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 #include <iostream>   using namespace std;   class Base { public: void hello() { cout<<"Hello called from Base"<<endl;   hello_non_virtual(); hello_virtual(); }   void hello_non_virtual() { cout<<"Hello called from non-virtual Base function"<<endl; }   virtual void hello_virtual() { cout<<"Hello called from virtual Base function"<<endl; } };   class Derived : public Base { public: void hello_non_virtual() { cout<<"Hello called from non-virtual Derived function"<<endl; }   void hello_virtual() { cout<<"Hello called from virtual Derived function"<<endl; } };   int main() { Derived d;   d.hello();   return 0; }

And its output:

Hello called from Base
Hello called from non-virtual Base function
Hello called from virtual Derived function

Similarly, a Python program to illustrate the statement “all methods in Python are effectively virtual”:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 class Base: def hello(self): print "Hello called from Base"   self.hello_virtual()   def hello_virtual(self): print "Hello called from virtual Base function"   class Derived(Base): def hello_virtual(self): print "Hello called from virtual Derived function"   d = Derived() d.hello()

Output:

Hello called from Base
Hello called from virtual Derived function

I hope this clears up the always-virtual concept for other Python newcomers as well. As far as my experience with the language itself is concerned, Python is sex; simple as that. Mere two days after picking up my first Python book for reading, I have fallen in love with its elegance, simplicity and overall highly addictive nature.

Tags: , , , , , , , , , , , , ,