Inspirated

 
 

August 17, 2010

Summer of Code Progress: Wrapping up

Filed under: Blog — krkhan @ 5:44 am

As eleven weeks of the-best-summer-ever draw to an end, here’s the final coding report for GSoC 2010.

Related Links

Summer of Code Archive Inspirated Code
Report Guidelines Ubuntu Wiki
Original Proposal Ubuntu Wiki

Time Spent

60 hours.

Highlights

The week was spent mostly cleaning and packaging the code accumulated over the summer. To demonstrate some of the aspects of the Arsenal library, I also created a proof-of-concept CGI script which upstreams Launchpad attachments for a bug to a remote Bugzilla. The task was fun, as the efforts put into refactoring things into launchpadlib-toolkit and BugzillaAdapter finally paid off and it took only a few hours to get the script working (that too with most of the time spent learning AJAX).

Concerns

None.

Waiting Items

None.

Stalled Items

None.

Accomplishments

  • Branch, Merge Revision:
    • Revision: Added support for quilt.
    • Revision: Added support for using patch utility for quilt packages where the diff files update debian/* stuff themselves.
    • Revision: Cleaned up the library to provide LaunchpadApplication and LaunchpadBugzillaApplication.
    • Revision: Fixed BugPatcher to use LaunchpadApplication as base class.
    • Revision: Cleaned up LaunchpadBugzillaApplication to take username password as arguments instead of modifiers.
  • Branch, Merge Revision: Fixed packaging issues to release debs for Karmic and Lucid.
  • Branch: Implemented a CGI script demonstrating the upstreaming capabilities of Arsenal library. An example run can be seen in this screencast.

Minor Tasks

  • Revision: Some more code cleanup.
  • Revision: Check launchpadlib version before appending ‘/beta‘ during API URL detection.

Actions for the Following Report

  • Fill the final evaluation.
  • Write a summary of the overall GSoC experience.
  • Start waiting for the t-shirt.
Tags: , , , , , , , , , , , , , ,

August 10, 2010

Summer of Code Progress: Refactoring, Matching and Patching

Filed under: Blog — krkhan @ 3:53 am

Related Links

Summer of Code Archive Inspirated Code
Report Guidelines Ubuntu Wiki
Original Proposal Ubuntu Wiki

Time Spent

90 hours.

Highlights

  • Refactored Attachment Upstreamer in order to migrate Launchpad and Bugzilla chores to launchpadlib-toolkit and BugzillaAdapter.
  • Implemented match-upstream.py for matchmaking Launchpad bugs in remote trackers.
  • Implemented bug-patcher.py for generating patched Debian packages for Launchpad bugs.

Concerns

None.

Waiting Items

None.

Stalled Items

None.

Accomplishments

  • Branch, Merge Revision: Migrated Attachment Upstreamer to use launchpadlib-toolkit.
  • Branch, Merge Revision: Migrated Attachment Upstreamer to use BugzillaAdapter.
  • Revision: Updated launchpadlib-toolkit to serve scripts such as Attachment Upstreamer through attachment filters and wrappers.
  • Branch, Merge Revision: Implemented match-upstream.py with support for multiple level searches for finding a bug’s attributes in a remote tracker. Supports searching titles, git commit ids and attachment filenames.
  • Branch: Implemented bug-patcher.py with support for modifying Debian packages which use cdbs patch system to generate a patched version using a bug’s attachments.

Minor Tasks

  • Created a LaunchpadBugzillaApp class which shares majority of the initialization code for send-attachments-upstream.py, match-upstream.py and bug-patcher.py.
  • Fixed various bugs in LaunchpadBugzillaApp for dealing with Gnome Keyring and Launchpad authentication.
  • Added various filters to launchpadlib-toolkit.

Actions for the Following Report

  • LaunchpadBugzillaApp should be derived from LaunchpadApp which would allow using the latter for scripts such as bug-patcher.py where Bugzilla portions aren’t required.
  • Update Bug Patcher to include support for remaining Debian patch systems.
  • Clean up the code accumulated over GSoC development and write tests.
Tags: , , , , , , , , , , , , ,

July 24, 2010

Summer of Code Progress: Attachment Upstreamer improvements

Filed under: Blog — krkhan @ 9:15 pm

Related Links

Summer of Code Archive Inspirated Code
Report Guidelines Ubuntu Wiki
Original Proposal Ubuntu Wiki

Time Spent

60 hours.

Highlights

New features were added to Attachment Upstreamer in order to make it more suitable for issues encountered by Ubuntu maintainers (as suggested by Bryce from his experience as the X.org maintainer).

Concerns

None.

Waiting Items

None.

Stalled Items

None.

Accomplishments

  • Branch, Merge Revision: Implemented caching of Bugzilla credentials using Gnome Keyring and ConfigParser.
  • Merge Revision:
    • Branch: Added support for excluding attachments based on filename matching using glob patterns.
    • Branch: Added support for extracting Tar and Zip archives when the number of files in them is below a specified limit.
    • Branch: Added support for excluding attachments based on their sizes, optionally Gzipping them in an effort to make the size acceptable.
    • Branch: Added support for enforcing content-types of attachments based on their filenames.

Minor Tasks

  • Various bugfixes and code-cleanup for previously merged GSoC code.

Actions for the Following Report

The Launchpad and Bugzilla sides of the Upstreamer are to be cleaned up and made dependent on launchpadlib-toolkit and BugzillaAdapter respectively. This will help future scripts which rely on Bugzilla communication as well as make such things agnostic to the implementation lying beneath the adapter (e.g., whether we’re using Curl/XML-RPC/REST to talk to the server).

Tags: , , , , , , , , , , , ,

July 18, 2010

Beta Repository for making Firefox 4 coexist peacefully with 3.6 on Fedora 13

Filed under: Blog — krkhan @ 9:45 pm

Firefox 4 offers some compelling features such as HTML 5 improvements and a new add-on manager. Since it’s quite painstaking to compile the beta from source and quite messy to place pre-compiled binaries in system default folders (not to mention the compatibility checks and upgrade chores that would interrupt at each launch if you go back and forth between different versions), I created a repository at repo.inspirated.com which can be used to test the beta version without touching any 3.6 stable release already installed on the system:

Firefox 4 Beta 1 Menu Shortcut

To use the repository, issue the following commands:

$ su -c 'wget http://repo.inspirated.com/inspirated.repo -O /etc/yum.repos.d/inspirated.repo'
$ su -c 'yum install firefox-beta'

The beta refuses to run if an instance of old Firefox is already active. Therefore, close the older Firefox and then launch the 4.0b1 version using firefox-beta command or the “Firefox Beta” shortcut in the applications menu. A new profile shall be created at ~/.mozilla/firefox/beta/ in order to leave your older profile’s settings, bookmarks and extensions etc. intact.

Firefox 4 Beta 1 Screenshot
(Click on the thumbnail for larger version.)

Tags: , , , , , , ,

July 12, 2010

Summer of Code Progress: Curl It Unlike Beckham

Filed under: Blog — krkhan @ 11:10 pm

Related Links

Summer of Code Archive Inspirated Code
Report Guidelines Ubuntu Wiki
Original Proposal Ubuntu Wiki

Time Spent

50 hours.

Highlights

The send-attachments-upstream.py script was migrated from XML-RPC to Curl for communicating with Bugzilla. The script was the refactored in order to provide capabilities such as attachment filtering. Various bugfixes and improvements were catered to along the way.

Concerns

None.

Waiting Items

None.

Stalled Items

None.

Accomplishments

  • Branch, Merge Revision: Reimplemented attachment sending using pycurl.
  • Branch, Merge Revision: Refactored the script in order to provide options such as -o (copy only attachments uploaded by bug owner).

Minor Tasks

  • Branch, Merge Revision: Fixed regular expressions for parsing results and handling of Unicode attachment titles.

Actions for the Following Report

Implement the following improvements in send-attachments.py:

  • Caching Bugzilla credentials.
  • Filename exclusion for attachments.
  • Archive extraction.
  • File size limits.
  • A command-line switch to enforce MIME content-types based on file extensions.
Tags: , , , , , , , , , , , ,

July 3, 2010

Summer of Code Progress: Attachment Upstreamer

Filed under: Blog — krkhan @ 5:04 pm

Related Links

Summer of Code Archive Inspirated Code
Report Guidelines Ubuntu Wiki
Original Proposal Ubuntu Wiki

Time Spent

10 hours.

Highlights

Communicating with Bugzilla is done through the python-bugzilla wrapper library. This could have been achieved by using xmlrpclib directly but doing that would require reinventing a whole lot of wheels by handling Bugzilla specific XML-RPC eccentricities.

Concerns

None.

Waiting Items

None.

Stalled Items

None.

Accomplishments

  • Branch: Added support for copying attachments to a remote bugzilla:
    $ ./send-attachments-upstream.py --user=krkhan@inspirated.com --pass=xxx https://bugs.launchpad.net/ubuntu/+bug/223435  https://partner-bugzilla.redhat.com/show_bug.cgi?id=593603
    Logging in Launchpad [Success <Logged in as Kamran Riaz Khan>]
    Logging in Bugzilla [Success <Logged in as krkhan@inspirated.com>]
    Uploading: Dependencies.txt [Success]
    Uploading: Disassembly.txt [Success]
    Uploading: ProcMaps.txt [Success]
    Uploading: ProcStatus.txt [Success]
    Uploading: Registers.txt [Success]
    Uploading: Stacktrace.txt [Success]
    Uploading: ThreadStacktrace.txt [Success]
    Uploading: Stacktrace.txt (retraced) [Success]
    Uploading: ThreadStacktrace.txt (retraced) [Success]

Minor Tasks

  • Patch: Added python-bugzilla in lib and modified setup.py accordingly.
  • Patch: Initial commit for sending attachments to a remote Bugzilla.
  • Patch: Added error handling for API calls.

Actions for the Following Report

Add support for creating new bugs in a remote Bugzilla based on data from a Launchpad bug.

Tags: , , , , , , , , , , ,

July 1, 2010

dd: The Ultimate Backup Solution

Filed under: Blog — krkhan @ 7:27 am

Over the 8 years of my acquaintance with computers valuable data has been lost at an average of twice per annum. I have tried all kinds of solution to help my situation only to fail miserably by forgetting to back up some important bits and pieces of information before upgrading my distro.

Backup solutions can mostly be factored into two approaches of archiving and cloning. If space is limited, you can archive your important data using utilities such as tar. This in fact was the approach I had been using until now. The downside appeared to be lesser accessibility of the files inside the backup. Say, I needed a small text-file from a 200 GB archive. It’d take me around 20 minutes to “get” to its location in the archive.

Which is why, I decided to shift to a newer approach. My laptop has a 320 GB hard disk and I own another 320 GB Western Digital Passport for extra data. To utilize the similitude, I bought another 500 GB Passport, transferred the “extra” data to it and then cloned the entire laptop hard disk to its 320 GB external cousin.

$ dd if=/dev/sda of=/dev/sdb

That is all. dd‘s performance was questionable, as it took around 15 hours to clone the entire 320 GB. Nevertheless, this time around I was satisfied with the final backup. Not only was it a bit-by-bit replica of my original data but also an accessible repository which I could access easily by plugging in the USB.

Tags: , , , , , ,

June 27, 2010

Find your most used words in Pidgin logs with Python

Filed under: Blog — krkhan @ 12:08 am

Here’s a quick little script which I wrote to tabulate the word frequencies in Pidgin logs. Simple, you point it towards a contact’s log directory:

$ ./purple-stats.py /home/krkhan/.purple/logs/msn/krkhan\@inspirated.com/some.friend\@some.gmail.com

And it gives you the words in their descending order of usage:

         0: you        (38)
         1: it         (30)
         2: to         (24)
         3: the        (22)
         4: in         (22)
         5: lol        (22)
         6: of         (22)
         7: so         (18)
         8: is         (18)
         9: what       (16)
         ...

As usual, Python was used for the dirty work:

purple-stats.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/env python
 
from operator import itemgetter
from string import punctuation
 
import locale
import os
import sys
 
from BeautifulSoup import BeautifulSoup
 
if __name__ == "__main__":
    if len(sys.argv) < 2:
        print "usage:", sys.argv[0], "<logs directory>"
    dir = sys.argv[1]
 
    contents = filter(lambda x: x[-5:] == '.html', os.listdir(dir))
    stats = {}
    for entry in contents:
        path = os.path.join(dir, entry)
        with open(path, 'r') as fd:
            data = fd.read()
        soup = BeautifulSoup(data,
            convertEntities=BeautifulSoup.ALL_ENTITIES)
        spans = soup.findAll('span')
        for span in spans:
            for word in span.text.split():
                word = word.strip(punctuation).lower()
                if len(word) < 2:
                    continue
                stats[word] = stats.get(word, 0) + 1
 
    sorted_stats = sorted(stats.iteritems(), key=itemgetter(1))
    sorted_stats.reverse()
    for num, (word, count) in enumerate(sorted_stats):
        line = "%10d: %-10s (%d)" % (num, word, count)
        line = line.encode(locale.getpreferredencoding())
        print line
Tags: , , , , , , , ,

June 22, 2010

Summer of Code Progress: Merging Launchpad branches

Filed under: Blog — krkhan @ 5:21 pm

Related Links

Summer of Code Archive Inspirated Code
Report Guidelines Ubuntu Wiki
Original Proposal Ubuntu Wiki

Time Spent

40 hours.

Highlights

This week was spent on trying to get my Launchpad branches merged upstream. During the process many concerns were raised which resulted in a number of patches and discussions.

Concerns

Quoting Stuart Bishop’s response from Launchpad-dev:

I’m really not sure of the best way to tackle this problem. The
Librarian data is not stored in the database because there are
multiple TB of files. The team membership information is in the
relational database. There are no indexes anywhere to the contents of
the Librarian files. I think we need some sort of external search
engine (I don’t think we don’t want to integrate this into the
Librarian core). Ideally we could feed it subscriber information
allowing it to determine the set of 32000 attachments that ubuntu-bugs
has access to rather than having to calculate this information from
the relational db and then feed the ids to the search engine.

Whatever approach certainly needs signoff from the LP team leads, as
the resource requirements are non trivial and someone needs to pay for
the hardware.

Waiting Items

None.

Stalled Items

  • Implementation of Bug.findAttachments().

Accomplishments

  • Merge Proposal: Got the export-Person-getBugSubscriberPackages branch approved after fixing various tests and bugs.
  • Merge Proposal: Implemented Horspool’s algorithm and fixed various bugs in the implement-Bug-findAttachments branch. The branch itself didn’t get approved because of its design approach for searching the attachments:

    I’m going to mark this review as ‘disapproved,’ not because the code is
    bad (it isn’t) but because I don’t think this is the right solution to
    the problem. I’m sorry to say that I don’t know what the right solution
    to the problem actually is at this point, but I’d guess that something
    involving FTIs would be a start, or some kind of asynchronous processing
    of searches (though then you get into all kinds of knotty stuff with
    callbacks).

Minor Tasks

For reading the file in chunks, I took the Wiki code for Horspool algorithm, converted it to Python and modified a little so that it would work with stream files.

Actions for the Following Report

There doesn’t appear to be a straightforward efficient way for searching bug attachments. I’ll discuss the course of my future development with Bryce tonight and decide whether I should head over to Arsenal development or should I focus on the proposed (albeit germinal) solutions from IRC and the mailing list.

Tags: , , , , , , , , ,

June 19, 2010

Using Boyer-Moore-Horspool algorithm on file streams in Python

Filed under: Blog — krkhan @ 4:53 am

Horspool’s algorithm is a simple and efficient string-searching algorithm which trades space for time and performs better as length of search string is increased. Another (perhaps overlooked) advantage of this algorithm is its ability to search through stream files without requiring random access. As I was working on Launchpad for my SoC project I required this particular stream-handling attribute as the file descriptors opened by urllib2 didn’t support seek()ing. Modifying the example code from Wiki page a little, I was able to read() only the required bytes sequentially:

horspool.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#!/usr/bin/env python
 
import locale
import os
import sys
import urllib2
 
def boyermoore_horspool(fd, needle):
    nlen = len(needle)
    nlast = nlen - 1
 
    skip = []
    for k in range(256):
        skip.append(nlen)
    for k in range(nlast):
        skip[ord(needle[k])] = nlast - k
    skip = tuple(skip)
 
    pos = 0
    consumed = 0
    haystack = bytes()
    while True:
        more = nlen - (consumed - pos)
        morebytes = fd.read(more)
        haystack = haystack[more:] + morebytes
 
        if len(morebytes) < more:
            return -1
        consumed = consumed + more
 
        i = nlast
        while i >= 0 and haystack[i] == needle[i]:
            i = i - 1
        if i == -1:
            return pos
 
        pos = pos + skip[ord(haystack[nlast])]
 
    return -1
 
if __name__ == "__main__":
    if len(sys.argv) < 3:
        print "Usage: horspool.py <url> <search text>"
        sys.exit(-1)
 
    url = sys.argv[1]
    needle = sys.argv[2]
    needle = needle.decode('string_escape')
 
    fd = urllib2.urlopen(url)
    offset = boyermoore_horspool(fd, needle)
    print hex(offset), '::', offset
    fd.close()

Now comes the fun part:

  • The code can search through any URL without downloading it completely, stopping at the first match. For example, the following command will download only the first few bytes of the provided URL:
    $ ./horspool.py http://www.gutenberg.org/files/132/132.txt "The Art of War"

    0x1d :: 29

  • Unicode searches work perfectly as well. Although the matching takes place according to the character encoding of the terminal used. That’s to say, since I’m using a UTF-8 terminal the “bytes” searched were assumed to be UTF-8 encoded as well:
    $ ./horspool.py http://www.gutenberg.org/files/29011/29011-0.txt "Σημείωση: Ο Πίνακας περιεχομένων"

    0x44f :: 1103

  • Same goes for multi-line searches:
    $ ./horspool.py http://www.gutenberg.org/files/29011/29011-0.txt "διευκόλυνση\r\nτου αναγνώστη"

    0x4b5 :: 1205

Tags: , , , , , , , , ,
Next Page »