<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Paul Butler &#187; Python</title>
	<atom:link href="http://paulbutler.org/archives/category/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://paulbutler.org</link>
	<description></description>
	<lastBuildDate>Tue, 28 Feb 2012 14:45:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Why R doesn&#8217;t suck</title>
		<link>http://paulbutler.org/archives/why-r-doesnt-suck/</link>
		<comments>http://paulbutler.org/archives/why-r-doesnt-suck/#comments</comments>
		<pubDate>Sat, 19 Jun 2010 13:45:01 +0000</pubDate>
		<dc:creator>Paul Butler</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://paulbutler.org/?p=336</guid>
		<description><![CDATA[I first encountered the R programming language a few years ago when I needed to make some plots. Although I&#8217;ve used it occasionally since, I always considered it a sort of &#8220;Perl for statisticians&#8221; &#8212; a useful swiss-army knife with &#8230; <a href="http://paulbutler.org/archives/why-r-doesnt-suck/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I first encountered the <a href="http://www.r-project.org/">R programming language</a> a few years ago when I needed to make some plots. Although I&#8217;ve used it occasionally since, I always considered it a sort of &#8220;Perl for statisticians&#8221; &mdash; a useful swiss-army knife with ugly syntax and inconsistent semantics. My workflow generally involved manipulating the data in Python and using R to make a simple plot, minimizing the amount of R code I wrote as much as possible.</p>
<p>When I recently decided to sit down and properly learn the language, I was pleasantly surprised that underneath the line noise was an interesting and unique language. R is a descendant of LISP and, deep down, maintains some of the beauty its ancestor. It also borrows some unique and interesting features from other functional and dynamic languages.</p>
<h3>Code is Data</h3>
<p>R is true to its LISP roots in that you can create, modify, and evaluate parse trees from the code itself. One way to do so is with the <samp>quote()</samp> special-function, which returns its argument, unevaluated, as an expression object that can be traversed, modified and evaluated.</p>
<p>A fun (though not especially useful) consequence of this is that you can write an <a href="http://en.wikipedia.org/wiki/Quine_(computing)">expression which returns itself</a> as a quote:<br />
<code><br />
> (function(x) substitute((x)(x)))(function(x) substitute((x)(x)))<br />
(function(x) substitute((x)(x)))(function(x) substitute((x)(x)))<br />
> expression <- (function(x) substitute((x)(x)))(function(x) substitute((x)(x)))<br />
> expression == eval(expression)<br />
[1] TRUE<br />
</code></p>
<h3>Optional Laziness</h3>
<p>By default, R uses <a href="http://en.wikipedia.org/wiki/Eager_evaluation">eager evaluation</a>, so expressions are evaluated as soon as they are assigned. However, R takes after functional languages like Haskell and O&#8217;Caml in that it allows lazy evaluation, where expressions are only evaluated at the time they are first used.</p>
<p>For example, consider the Haskell code:<br />
<code><br />
m = sum [1..]<br />
</code></p>
<p>Where <samp>sum</samp> returns the sum of a list and <samp>[1..]</samp> is the (infinite) list of all natural numbers. In most languages, the assignment would cause the program to loop forever trying to sum all the natural numbers so it can assign that value to <samp>m</samp>. In Haskell, the assignment does complete; it simply assigns the expression <samp>sum [1..]</samp> to <samp>m</samp> so that it can be evaluated when the value of <samp>m</samp> is first used.</p>
<p>In R we can accomplish something similar with the <samp>delayedAssign()</samp> function:<br />
<code><br />
delayedAssign("m", sum(1:Inf))<br />
</code></p>
<p>Note that in R, unlike O&#8217;Caml, the variables may be explicitly made lazy with <samp>delayedAssign</samp>, but are evaluated automatically when they are used.</p>
<p>Unfortunately, R evaluates lazy variables when they are pointed to by a data structure, even if their value is not needed at the time. This means that infinite data structures, one common application of laziness in Haskell, are not possible in R.</p>
<h3>Operators are functions</h3>
<p>When using higher-order functions, it&#8217;s sometimes useful to be able to treat operators as functions. Python accomplishes this in a clunky way: there is an <samp>operator</samp> module which redefines the built-in operators as functions. R takes a more functional approach. As in Haskell and O&#8217;Caml, operators are just syntactic sugar for ordinary functions. Enclosing any operator in backticks lets you use it as if it were an ordinary function. For example, calling <samp>`+`(2, 3)</samp> returns <samp>5</samp>.</p>
<p>In fact, both the infix and prefix form are indistinguishable once they are parsed.<br />
<code><br />
> quote(3 + 4) == quote(`+`(3, 4))<br />
[1] TRUE<br />
</code></p>
<p>One surprising fact in R is that the assignment operators (<samp>&lt;-</samp>, <samp>&lt;&lt;-</samp> and <samp>=</samp>) are functions like any other. As a result, they can be overwritten or passed around as desired, though neither strikes me as a particularly good idea.</p>
<h3>Continuations</h3>
<p><a href="http://en.wikipedia.org/wiki/Continuation">Continuations</a> in R are a way of &#8220;breaking out&#8221; of a computation and jumping down the call stack to return early. The R function <samp>callCC()</samp> (<strong>call</strong> with <strong>c</strong>urrent <strong>c</strong>ontinuation) takes one argument, a function. It then evaluates that function, passing in a special function as an argument. <samp>callCC()</samp> then returns the first value that the special function is called with, or the return value of evaluating its argument if the special function is not called before the function returns.</p>
<p>To give you a better idea of what that looks like, consider this example:<br />
<code><br />
> callCC(function(m) {return(4)})<br />
[1] 4<br />
> callCC(function(m) {m(2); return(4)})<br />
[1] 2<br />
</code></p>
<p>Calling the function <samp>m(2)</samp> essentially cuts the computation short, drops down in the call stack to <samp>callCC</samp>, and returns <samp>2</samp>.</p>
<p>If you&#8217;ve used continuations in another language, note that in R the exit function can only be called before <samp>callCC()</samp> returns. This makes R&#8217;s continuation semantics less powerful than those of languages like Scheme, Smalltalk, and Ruby.</p>
<p>R is not without its flaws and legacy baggage (you can trace its roots back to the <a href="http://en.wikipedia.org/wiki/S_(programming_language)">S programming language</a> 35 years ago), but once you learn to use it right, it&#8217;s a very powerful and indispensable language.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulbutler.org/archives/why-r-doesnt-suck/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Python Debugging with Decorators</title>
		<link>http://paulbutler.org/archives/python-debugging-with-decorators/</link>
		<comments>http://paulbutler.org/archives/python-debugging-with-decorators/#comments</comments>
		<pubDate>Mon, 23 Jun 2008 00:20:40 +0000</pubDate>
		<dc:creator>Paul Butler</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.paulbutler.org/archives/python-debugging-with-decorators/</guid>
		<description><![CDATA[I&#8217;ve written a little python function which I have found to be very helpful for debugging. It takes a function, and returns a function which is identical to the original except that it prints a message to the console with &#8230; <a href="http://paulbutler.org/archives/python-debugging-with-decorators/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve written a little python function which I have found to be very helpful for debugging. It takes a function, and returns a function which is identical to the original except that it prints a message to the console with useful information every time the function is called or returns.</p>
<p>Here is the function:</p>
<pre lang="python">
# Number of times to indent output
# A list is used to force access by reference
__report_indent = [0]

def report(fn):
    """Decorator to print information about a function
    call for use while debugging.
    Prints function name, arguments, and call number
    when the function is called. Prints this information
    again along with the return value when the function
    returns.
    """

    def wrap(*params,**kwargs):
        call = wrap.callcount = wrap.callcount + 1

        indent = ' ' * __report_indent[0]
        fc = "%s(%s)" % (fn.__name__, ', '.join(
            [a.__repr__() for a in params] +
            ["%s = %s" % (a, repr(b)) for a,b in kwargs.items()]
        ))

        print "%s%s called [#%s]"
            % (indent, fc, call)
        __report_indent[0] += 1
        ret = fn(*params,**kwargs)
        __report_indent[0] -= 1
        print "%s%s returned %s [#%s]"
            % (indent, fc, repr(ret), call)

        return ret
    wrap.callcount = 0
    return wrap</pre>
<p>The function can be used as a decorator. For example, in this simple (and inefficient) recursive Fibonacci sequence function:</p>
<pre lang="python">
@report
def fibonacci(n):
    if n in [0,1]:
        return n
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)</pre>
<p>The result:</p>
<pre lang="text">
>>> fibonacci(4)
fibonacci(4) called [#1]
 fibonacci(3) called [#2]
  fibonacci(2) called [#3]
   fibonacci(1) called [#4]
   fibonacci(1) returned 1 [#4]
   fibonacci(0) called [#5]
   fibonacci(0) returned 0 [#5]
  fibonacci(2) returned 1 [#3]
  fibonacci(1) called [#6]
  fibonacci(1) returned 1 [#6]
 fibonacci(3) returned 2 [#2]
 fibonacci(2) called [#7]
  fibonacci(1) called [#8]
  fibonacci(1) returned 1 [#8]
  fibonacci(0) called [#9]
  fibonacci(0) returned 0 [#9]
 fibonacci(2) returned 1 [#7]
fibonacci(4) returned 3 [#1]
3</pre>
<p>The level of indent reflects the level of recursion, and the [#...] at the end of each line is the number of times the function has been called.</p>
<p>The level of indent is independent of the function being called, so it is helpful with mutual recursion as well. For example, when used with the functions <samp>even</samp> and <samp>odd</samp> from my earlier <a href="http://www.paulbutler.org/archives/tail-recursion-in-python/">post on tail recursion</a>, the result looks like this:</p>
<pre lang="text">
>>> even(5)
even(5) called [#1]
 odd(4) called [#1]
  even(3) called [#2]
   odd(2) called [#2]
    even(1) called [#3]
     odd(0) called [#3]
     odd(0) returned False [#3]
    even(1) returned False [#3]
   odd(2) returned False [#2]
  even(3) returned False [#2]
 odd(4) returned False [#1]
even(5) returned False [#1]
False</pre>
<p>I find it useful to stick <samp>@report</samp> before the function I am having trouble with, and use comments to turn it on and off while I&#8217;m debugging that function. It can also be used at times other than function declaration, for example: <samp>report(base64.encodestring)(&#8216;test&#8217;)</samp>.</p>
<p><strong>Update (July 6, 2008)</strong>: Fixed so that keyword arguments are printed as well.</p>
<p><strong>Update (August 16, 2008)</strong>: Changed .__repr__() to the more proper repr().</p>
]]></content:encoded>
			<wfw:commentRss>http://paulbutler.org/archives/python-debugging-with-decorators/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SimpleDiff in Python</title>
		<link>http://paulbutler.org/archives/simplediff-in-python/</link>
		<comments>http://paulbutler.org/archives/simplediff-in-python/#comments</comments>
		<pubDate>Wed, 13 Feb 2008 21:22:32 +0000</pubDate>
		<dc:creator>Paul Butler</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.paulbutler.org/archives/simplediff-in-python/</guid>
		<description><![CDATA[A while ago I posted a PHP implementation of a diff algorithm I came up with1. Since it was well received, and it&#8217;s a useful little algorithm to have, I created a Python version as well. There are a few &#8230; <a href="http://paulbutler.org/archives/simplediff-in-python/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A while ago I posted a <a href="http://www.paulbutler.org/archives/a-simple-diff-algorithm-in-php/">PHP implementation</a> of a diff algorithm I came up with<sup>1</sup>. Since it was well received, and it&#8217;s a useful little algorithm to have, I created a Python version as well.</p>
<p>There are a few performance improvements as well. The PHP version creates an array in memory proportional to the square of the size of the input, while the Python version&#8217;s array is directly proportional to the size of the input. I also sped up how the algorithm finds the indexes of the &#8220;new&#8221; elements in the &#8220;old&#8221; array.</p>
<p><a href="http://github.com/paulgb/simplediff/blob/5bfe1d2a8f967c7901ace50f04ac2d9308ed3169/simplediff.py">Download simplediff.py</a></p>
<p><sup>1</sup> It is probably the same algorithm that others use, but I haven&#8217;t gotten around to getting an ACM membership to access the related papers</p>
]]></content:encoded>
			<wfw:commentRss>http://paulbutler.org/archives/simplediff-in-python/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Tail recursion in Python</title>
		<link>http://paulbutler.org/archives/tail-recursion-in-python/</link>
		<comments>http://paulbutler.org/archives/tail-recursion-in-python/#comments</comments>
		<pubDate>Fri, 14 Dec 2007 03:58:32 +0000</pubDate>
		<dc:creator>Paul Butler</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.paulbutler.org/archives/tail-recursion-in-python/</guid>
		<description><![CDATA[After spending a lot of time in Scheme, it&#8217;s hard not to think in recursion from time to time. When I recently started to improve my Python skills, I missed having Scheme optimize my tail recursive calls. For example, consider &#8230; <a href="http://paulbutler.org/archives/tail-recursion-in-python/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>After spending a lot of time in Scheme, it&#8217;s hard not to think in recursion from time to time. When I recently started to improve my Python skills, I missed having Scheme optimize my tail recursive calls.</p>
<p>For example, consider the mutually recursive functions <samp>even</samp> and <samp>odd</samp>. You know a number, <em>n</em>, is even if it is 0, or if <em>n</em> &#8211; 1 is odd. Similarly, you know a number is not odd if it is 0, and that it is odd if <em>n</em> &#8211; 1 is even. This translates to the python code:</p>
<pre lang="python">
def even(x):
  if x == 0:
    return True
  else:
    return odd(x - 1)

def odd(x):
  if x == 0:
    return False
  else:
    return even(x - 1)
</pre>
<p>This code works, but only for <em>x</em> &lt; 1000, because Python limits the recursion depth to 1000. As it turns out, it is easy to get around this limitation. Included below is a generic <samp>tail_rec</samp> function that could be used for most cases where you need tail recursion, and an example of it used for the odd/even problem.</p>
<pre lang="python">
def tail_rec(fun):
   def tail(fun):
      a = fun
      while callable(a):
         a = a()
      return a
   return (lambda x: tail(fun(x)))

def tail_even(x):
  if x == 0:
    return True
  else:
    return (lambda: tail_odd(x - 1))

def tail_odd(x):
  if x == 0:
    return False
  else:
    return (lambda: tail_even(x - 1))

even = tail_rec(tail_even)
odd = tail_rec(tail_odd)</pre>
<p>It&#8217;s not as pretty as the Scheme version, but it does the trick. Of course, the odd/even functions are just for the sake of a simple example and have no real-world use, but the <samp>tail_rec</samp> function could be used in practice.</p>
<p><strong>April 2009 Update</strong>: this article has recently had some popularity. One of the more common comments is that tail_rec could be used as a decorator. In fact, this isn&#8217;t true, because <samp>even</samp> and <samp>odd</samp> need access to the raw, undecorated versions of each other in the creation of the lambda.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulbutler.org/archives/tail-recursion-in-python/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Garden Path Sentences</title>
		<link>http://paulbutler.org/archives/garden-path-sentences/</link>
		<comments>http://paulbutler.org/archives/garden-path-sentences/#comments</comments>
		<pubDate>Wed, 27 Jun 2007 17:05:27 +0000</pubDate>
		<dc:creator>Paul Butler</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.paulbutler.org/archives/garden-path-sentances/</guid>
		<description><![CDATA[I recently came across an interesting post on the Powerset Blog recently about garden path sentences. Garden path sentences are sentences that lead you down the wrong path through a string of words with multiple meanings. For example, The complex &#8230; <a href="http://paulbutler.org/archives/garden-path-sentences/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I recently came across an interesting post on the Powerset Blog recently about <a href="http://blog.powerset.com/2007/6/18/search-engines-leaking-oil-for-holes">garden path sentences</a>. Garden path sentences are sentences that lead you down the wrong path through a string of words with multiple meanings. For example,</p>
<blockquote><p>The complex houses married and single students and their families</p></blockquote>
<p>In this case, most readers would probably think <em>complex</em> was an adjective that modified the plural noun <em>houses</em>. The post ended with a challenge &#8211; how easy would it be to create a program to automatically generate these sentences. Since school is out and I have some free time, I tried it myself. I found a decent <a href="http://www.ibiblio.org/webster/">free xml dictionary</a>, and wrote a Ruby script to parse the important bits (the type of word and alternate forms) into an SQL database. I cross-checked all the words against a word frequency table to make sure there were no obscure words. I then wrote a Python script to put the words together into a (hopefully meaningful, but not often) sentence. <strike>I put the Python script onto my server so you can play with it <em>here</em></strike> <em><strong>April 2009 Update</strong>: I removed the live demo as part of a server move</em>.</p>
<p><a href="http://test.paulbutler.org/wp-content/uploads/2009/04/gardenpath.png"><img src="http://test.paulbutler.org/wp-content/uploads/2009/04/gardenpath.png" alt="His concrete spheres foster complexities" title="gardenpath" width="300" height="70" class="size-full wp-image-110" /></a></p>
<p>As you can see, the sentences that it comes up with are far from meaningful. However, in most cases you can at least see how a reader could be taken down the wrong path (at least in the cases where there is a right path). In the above example, concrete could be an adjective or a noun, and spheres could be a noun or a verb (to form a sphere). Foster could be an adjective or a noun depending on the context, but I couldn&#8217;t see the reader seeing it as an adjective here. Certainly the sentence generator leaves a lot to be desired (especially considering that this was one of the better sentences), but I got about as far with it as I expected to. I think it could be improved further with a few modifications:</p>
<ul>
<li>Words in the database are already cross-checked to make sure they aren&#8217;t obscure, but often a word will be common as a noun and uncommon as a verb, or vice versa. I didn&#8217;t have a dataset that allowed me to determine if this was the case for a particular word.</li>
<li>The valency of verbs is ignored. All verbs are assumed to be transitive, even though valency information is available in the database.</li>
<li>I underestimated the difficulty of having a computer generate a meaningful sentence. It is difficult to determine what verbs are compatible with what nouns, I guess you would need to parse a large amount of English text (perhaps some of <a href="http://www.gutenberg.org/wiki/Main_Page">Project Gutenberg</a> &#8211; I think Wikipedia would not be varied enough but I could be wrong).</li>
</ul>
<p>I noticed later that <a href="http://blog.dkbza.org/2007/06/powerset-and-garden-path.html">Ero Carrera</a> had taken a similar approach to what I did, but with his linguistics experience he better anticipated the problems I ran into. He has some good ideas, and his post is an interesting read.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulbutler.org/archives/garden-path-sentences/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

