Statistics | Inspirated

Find your most used words in Pidgin logs with Python

Filed under: Blog — krkhan @ 12:08 am

Here’s a quick little script which I wrote to tabulate the word frequencies in Pidgin logs. Simple, you point it towards a contact’s log directory:

$ ./purple-stats.py /home/krkhan/.purple/logs/msn/krkhan\@inspirated.com/some.friend\@some.gmail.com

And it gives you the words in their descending order of usage:

         0: you        (38)
         1: it         (30)
         2: to         (24)
         3: the        (22)
         4: in         (22)
         5: lol        (22)
         6: of         (22)
         7: so         (18)
         8: is         (18)
         9: what       (16)
         ...

As usual, Python was used for the dirty work:

purple-stats.py

#!/usr/bin/env python
 
from operator import itemgetter
from string import punctuation
 
import locale
import os
import sys
 
from BeautifulSoup import BeautifulSoup
 
if __name__ == "__main__":
    if len(sys.argv) < 2:
        print "usage:", sys.argv[0], "<logs directory>"
    dir = sys.argv[1]
 
    contents = filter(lambda x: x[-5:] == '.html', os.listdir(dir))
    stats = {}
    for entry in contents:
        path = os.path.join(dir, entry)
        with open(path, 'r') as fd:
            data = fd.read()
        soup = BeautifulSoup(data,
            convertEntities=BeautifulSoup.ALL_ENTITIES)
        spans = soup.findAll('span')
        for span in spans:
            for word in span.text.split():
                word = word.strip(punctuation).lower()
                if len(word) < 2:
                    continue
                stats[word] = stats.get(word, 0) + 1
 
    sorted_stats = sorted(stats.iteritems(), key=itemgetter(1))
    sorted_stats.reverse()
    for num, (word, count) in enumerate(sorted_stats):
        line = "%10d: %-10s (%d)" % (num, word, count)
        line = line.encode(locale.getpreferredencoding())
        print line

Tags: Beautiful Soup, Code, IM, Logs, Open Source, Pidgin, Python, Statistics, Technology

Comments Off

Facebook Friends Graph v0.2 — Deb and RPM packages for Ubuntu and Fedora

Filed under: Blog — krkhan @ 2:36 am

Thanks to Christoph Korn, Ubuntu users can now install the package with a single click from the GetDeb repository. The Deb file itself is available on the release page here, along with an RPM for Fedora users.

The looks:

Facebook Friends Graph v0.2 Screenshot

And the hooks:

Changelog:

Fixed:

Bug #522735: Facebook: Application Request Limit Reached

Bug #523378: Connection reset by peer

Bug #522487: Facebook Friends Graph fails when friends have a dash in their name [patch by Little Jawa]

Tags: Code, Deb, Facebook, Facebook Friends Graph, Friends, Graph, Graphics, Internet, Open Source, PyFacebook, Python, Social Networking, Statistics, Technology, Ubuntu, Web 2.0

Comments (3)

Facebook Friends Graph on Ubuntu

Filed under: Blog — krkhan @ 12:04 am

I never really thought anyone other than me would be interested in seeing gargantuan graphs of their friends’ connections until I found out through this post on the OMG! Ubuntu! blog that my application was included in the GetDeb repository for Ubuntu users. I have not used Ubuntu myself since about never, but apparently you can now install the application on Karmic Koala with just a few clicks.

Edit: I have now tested the installation on Karmic myself and can guarantee that it indeed works without any fuss. Gotta love Launchpad/Ubuntu.

The application itself was in a pretty much skeletal state of being so I was a little taken aback by the exposure. Nevertheless, I was reminded of the famous aphorism apropos of open source development:

“Release early, release often.” — Linus Torvalds

And indeed, the bug reports that came from users were a valuable byproduct of the Ubuntu push as I had stopped development on the script after it started working fine for me.

Tags: Code, Deb, Facebook, Facebook Friends Graph, Friends, Graph, Graphics, Internet, Open Source, PyFacebook, Python, Social Networking, Statistics, Technology, Ubuntu, Web 2.0

Comments Off

Inbox Stats v1.1 — S60 3rd Edition Compatibility

Filed under: Blog — krkhan @ 3:16 pm

Continuing the migration to E71, here’s the new release for Inbox Stats which works on Python 2.5 releases:

inboxstats-1.1.zip
Inbox Stats v1.1 Screenshot

Tags: Code, Graphics, Inbox, Inbox Stats, Nokia, Open Source, PyS60, Python, Series 60, SMS, Statistics, Symbian, Technology

Comments (1)

Facebook Friends Graph v0.1

Filed under: Blog — krkhan @ 12:01 am

The motivation behind obsession with these manically large graphs is explained in this previous post of mine. The current post’s purpose is to instead link to the (finally) working code for generating the graphs. I have created a project page at Launchpad for this little application. The trunk contains the most recent code, which is still in its nascent form but works pretty well given an installation of PyGTK, Python GtkMozembed and pydot. I might port the application to Windoze in future provided I get the time for it.

Starting with the tradition of linking to Inspirated Code subsection, latest updates about the application shall be posted on this page.

And as is the custom, the image itself:

(Click on the thumbnail for larger version.)
(Warning: The larger version is a gigantic 32 MB PNG image with a resolution of 16517x13808 (even larger than the last time). If you want to view it, I recommend downloading it to your hard-disk first and then opening it outside your web browser.)

Tags: Code, Facebook, Facebook Friends Graph, Friends, Graph, Graphics, Internet, Open Source, PyFacebook, Python, Social Networking, Statistics, Technology, Web 2.0

Comments (3)

Facebook Friends Graph — Plotting your social network together

Filed under: Blog — krkhan @ 3:58 pm

Update: New version

While visiting profile of a friend, I noticed that he and I had about 70 mutual friends. Immediately it gave me the idea to plot the common friend connections and see what interesting patterns emerge in the larger picture. Something like this:

Facebook Friends' Graph Sample

In the sample above, the names in circles (“nodes”) happen to be in my friend list. The connecting lines represent their own friendship status. For example, Saad Jasra and Hassan Ahmad are friends among themselves apart from being my friends on their own. Similarly, Ali Zeeshan Ijaz is a friend of both Saad Jasra and Abdullah Afaq Ali.

Luckily, Facebook API had Python bindings available which considerably simplified my task. Those, coupled with pydot, resulted in a dot file with all the required connections. Graphviz did the remaining work:

(Click on the thumbnail for larger version.)
(Warning: The larger version is a 7329x5953 PNG image with a humongous file size of 13 MB. If your hardware specs are squeamish, don’t blame me if it brings your machine on its knees — this is not a DoS attack.)

Now came the intriguing part. The resulting graph was visibly split in two large portions. This resulted from the fact that I had spent a major portion of my life (15 years to be exact) in Saudi Arabia before moving to Pakistan. More interestingly, an old friend of mine from Saudi Arabia — Atif Sheikh — was also enrolled at my university in Pakistan. When I zoomed into the graph, I spotted him at the nexus of two networks. Similarly, the names in the middle of a network were the most connected people in that network. That is, the names congested in the middle of Pakistan network were friends from university and the names at the edges of that network were friends outside the university who didn’t share my academic connections.

I haven’t polished the code for a stable release yet as I doubt that other people would be interested in having gigantic plots of their social lives. Nevertheless, I’ll try to package it in form of a proper Facebook application in near future. After all, as I quoted in a previous post of mine:

“Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital.”

Tags: Facebook, Facebook Friends Graph, Friends, Graph, Graphics, Internet, Open Source, PyFacebook, Python, Social Networking, Statistics, Technology, Web 2.0

Comments (6)

Semi-annual blog conscience report

Filed under: Blog — krkhan @ 1:17 am

Inspirated Browser Stats (January -- July 2009)

If there ever was an insanely staggering year in terms of unexpected geekological developments, it has to be 2009. Since January I have regularly been taken aback by news such as record labels dropping DRM, Duke Nukem Forever finally bowing out; Microsoft confessing that ActiveX is retarded from security’s point of view, Google Apps moving out of beta, VLC reaching 1.0, Chrome OS’ announcement, XHTML Part Deux’s quiet death, HTML 5 and CSS 3’s adoption in major browsers and well; defying all expectations, Inspirated’s browser hit stats managing to keep their head high even in the half-yearly round-up. It’s been about 136,000 hits on the blog from Firefox alone, markedly more than twice the IE hits. The first time I noticed the vulpine victory I did dedicate a post to the stats. Nevertheless, consistency achieved over six months just gives me another chance to gloat about it.

I don’t know if this is at long last the year of Linux on desktop, but one thing is for sure: only a final release of GNU Hurd now stands between our planet and the apocalypse. If that does happen, however, please make sure that you refer to the calamity by its correct technical term “GNU/Apocalpyse” and not just the ignorant layman’s phrase which totally undermines the FSF’s impact on universe’s evolution.

Tags: Firefox, Internet Explorer, Rants, Statistics

Comments (2)

Inbox Stats v1.0 — Because graphs speak louder than numbers

Filed under: Blog — krkhan @ 4:00 am

Inbox Stats v1.0

Changelog:

As can be seen from the screenshot above, graphs can be turned on through the options menu. Implementing them resulted in two useful modules:
- scrolledcanvas: Provides a derived Canvas class which has built-in support for scrolling oversizes images.
- roundedrectangle: Provides functions for drawing rectangles with rounded corners on a Canvas or Image. Optionally, text can be given which will be prettily centered (and truncated upon requirement) in the drawn shapes.
More minor code enhancements and bugfixes.

You can find both the modules and the application itself here. All code is released under the PSF license so feel free to use it any way you want. Oh, and if you still haven’t figured out why anyone would be interested in the SMS stats in the first place, here’s a little quote for you:

“Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital.”

Tags: Code, Graphics, Inbox, Inbox Stats, Nokia, Open Source, PyS60, Python, Series 60, SMS, Statistics, Symbian, Technology

Comments (2)

SMS Inbox statistics for Series 60 mobile phones v0.2

Filed under: Blog — krkhan @ 7:04 pm

Update: New version

Improvements in the new version:

Previous version hung up while calculating the statistics. The new version dispatches a thread for the dirty work and keeps the user interface responsive with a “Processing” notification.
Contact stats are sorted in descending order by the number of messages per each contact.
Code improvements for making it more “Pythonic”.

inboxstats.py

# -*- coding: utf-8 -*-
"""Script for printing trivial statistics about inbox, such as:
	Number of texts
	Number of unique contacts who sent the texts
	Number of texts sent by respective contacts
"""
 
__author__ = "Kamran Riaz Khan"
__email__ = "krkhan@inspirated.com"
__version__ = "0.2"
__copyright__ = "Copyright (c) 2009 Kamran Riaz Khan"
__license__ = "Python"
__status__ = "Production"
 
import appuifw, e32, inbox, thread
 
def exit_key_handler():
	"Release app_lock."
	app_lock.signal()
 
def parse_inbox_stats(stats):
	"""Parse the inbox statistics,
	Updates the stats dictionary with:
		sms-count : Number of texts
		sms-contacts: List of tuples with following pairs:
			Name of contact, Number of corresponding
			(ordered according to decreasing number of texts)"""
	curr_inbox = inbox.Inbox()
	messages = curr_inbox.sms_messages()
	contacts = {}
 
	for i in messages:
		address = curr_inbox.address(i)
		if contacts.has_key(address):
			contacts[address] = contacts[address] + 1
		else:
			contacts[address] = 1
 
	contacts = contacts.items()
	contacts.sort(lambda x, y: cmp(x[1], y[1]))
	contacts.reverse()
 
	stats["sms-count"] = len(messages)
	stats["sms-contacts"] = contacts
 
def print_inbox_stats(content, stats):
	"""Print inbox stats in the content Text field,
	Remembers the cursor position of Text before the call
	and points at it again after updating the content."""
	pos = content.get_pos()
 
	statsmap = [
		(u"SMS Count", unicode(stats["sms-count"])),
		(u"Unique Contacts", unicode(len(stats["sms-contacts"]))),
		(u"", u"")
		]
 
	statsmap += [(k, unicode(v)) for k, v in stats["sms-contacts"]]
 
	for i in statsmap:
		content.style = appuifw.STYLE_BOLD
		content.add(i[0] + (i[0] and u": " or u""))
		content.style = 0
		content.add(i[1] + u"n")
 
	content.set_pos(pos)
 
if __name__ == "__main__":
	content = appuifw.Text()
	appuifw.app.title = u'Inbox Stats'
	appuifw.app.body = content
	appuifw.app.exit_key_handler = exit_key_handler
 
	stats = {}
	t = thread.start_new_thread(parse_inbox_stats, (stats,))
 
	content.style = appuifw.STYLE_ITALIC
	content.add(u"Processing text messages...n")
	thread.ao_waittid(t)
	content.add(u"Done!nn")
	content.style = 0
 
	print_inbox_stats(content, stats)
 
	app_lock = e32.Ao_lock()
	app_lock.wait()

# -*- coding: utf-8 -*- """Script for printing trivial statistics about inbox, such as: Number of texts Number of unique contacts who sent the texts Number of texts sent by respective contacts """ __author__ = "Kamran Riaz Khan" __email__ = "krkhan@inspirated.com" __version__ = "0.2" __copyright__ = "Copyright (c) 2009 Kamran Riaz Khan" __license__ = "Python" __status__ = "Production" import appuifw, e32, inbox, thread def exit_key_handler(): "Release app_lock." app_lock.signal() def parse_inbox_stats(stats): """Parse the inbox statistics, Updates the stats dictionary with: sms-count : Number of texts sms-contacts: List of tuples with following pairs: Name of contact, Number of corresponding (ordered according to decreasing number of texts)""" curr_inbox = inbox.Inbox() messages = curr_inbox.sms_messages() contacts = {} for i in messages: address = curr_inbox.address(i) if contacts.has_key(address): contacts[address] = contacts[address] + 1 else: contacts[address] = 1 contacts = contacts.items() contacts.sort(lambda x, y: cmp(x[1], y[1])) contacts.reverse() stats["sms-count"] = len(messages) stats["sms-contacts"] = contacts def print_inbox_stats(content, stats): """Print inbox stats in the content Text field, Remembers the cursor position of Text before the call and points at it again after updating the content.""" pos = content.get_pos() statsmap = [ (u"SMS Count", unicode(stats["sms-count"])), (u"Unique Contacts", unicode(len(stats["sms-contacts"]))), (u"", u"") ] statsmap += [(k, unicode(v)) for k, v in stats["sms-contacts"]] for i in statsmap: content.style = appuifw.STYLE_BOLD content.add(i[0] + (i[0] and u": " or u"")) content.style = 0 content.add(i[1] + u"n") content.set_pos(pos) if __name__ == "__main__": content = appuifw.Text() appuifw.app.title = u'Inbox Stats' appuifw.app.body = content appuifw.app.exit_key_handler = exit_key_handler stats = {} t = thread.start_new_thread(parse_inbox_stats, (stats,)) content.style = appuifw.STYLE_ITALIC content.add(u"Processing text messages...n") thread.ao_waittid(t) content.add(u"Done!nn") content.style = 0 print_inbox_stats(content, stats) app_lock = e32.Ao_lock() app_lock.wait()

Inbox Stats v0.2 Screenshot

Tags: Code, Inbox, Inbox Stats, Nokia, Open Source, PyS60, Python, Series 60, SMS, Statistics, Symbian, Technology

Comments (2)

SMS Inbox statistics for Series 60 mobile phones

Filed under: Blog — krkhan @ 8:03 pm

Update: New version

Self-indulgence is what I do best. It usually results in me trying to figure out random statistics about my personal life; e.g., graphs about which hours of day I’m mostly awake on and pie-charts about my bathroom habits. Such stuff doesn’t only make me feel more important than I actually am, but also polishes my fundamental math skills which were lost while trying to calculate average number of viruses a Windows user is hit by on an yearly basis.

Texting is what I do second best. Combine the two of my most productive practices and the need emerges of having a way to produce useless statistics about my cell phone’s inbox. This is where PyS60 comes to the rescue. In my previous post I praised Python’s s** appeal. Here’s the demonstration:

Total time spent with Python: Less than a week
Total time spent with PyS60: Less than a minute
Total time spent with Symbian development: Less than never

And still, even a total n00b like me could easily accomplish what he wanted to, using only the library reference manual and 70 lines of understandable code:

inboxstats.py

"""Script for printing trivial statistics about inbox, such as:
	Number of texts
	Number of unique contacts who sent the texts
	Number of texts sent by respective contacts
"""
 
__author__ = "Kamran Riaz Khan <krkhan@inspirated.com>"
__version__ = "$Revision: 0.1 $"
__date__ = "$Date: 2009/05/10 15:30:00 $"
__copyright__ = "Copyright (c) 2009 Kamran Riaz Khan"
__license__ = "Python"
 
import appuifw
import e32
import inbox
 
def exit_key_handler():
	"Release app_lock."
	app_lock.signal()
 
def inbox_stats():
	"""Parse the inbox statistics,
	Returns the dictionary:
		sms-count : Number of texts
		sms-contacts: Dictionary with the pairs:
			contact-name : Number of texts from contact"""
	cur_inbox = inbox.Inbox()
	messages = cur_inbox.sms_messages()
	contacts = {}
 
	for i in messages:
		address = cur_inbox.address(i)
		if contacts.has_key(address):
			contacts[address] = contacts[address] + 1
		else:
			contacts[address] = 1
 
	return {
		"sms-count" :  len(messages),
		"sms-contacts" :  contacts
		}
 
if __name__ == "__main__":
	content = appuifw.Text()
	appuifw.app.title = u'Inbox Stats'
	appuifw.app.body = content
	appuifw.app.exit_key_handler = exit_key_handler
 
	stats = inbox_stats()
	statsmap = (
		(u"SMS Count", unicode(stats["sms-count"])),
		(u"Unique Contacts", unicode(len(stats["sms-contacts"]))),
		)
 
	for i in statsmap:
		content.style = appuifw.STYLE_BOLD
		content.add(i[0] + u": ")
		content.style = 0
		content.add(i[1] + u"n")
 
	content.add(u"n")
	for k, v in stats["sms-contacts"].iteritems():
		content.style = appuifw.STYLE_BOLD
		content.add(k + u": ")
		content.style = 0
		content.add(unicode(v) + u"n")
 
	app_lock = e32.Ao_lock()
	app_lock.wait()

Which gives me:

Inbox Stats Screenshot

Tags: Code, Inbox, Inbox Stats, Nokia, Open Source, PyS60, Python, Series 60, SMS, Statistics, Symbian, Technology

Comments (4)