directed procrastination: Efficiency of Pseudo Random Numbers in Lisp

I came across Alasdair's posts regarding his exploration of Pseudo Random Number Generators, or PRNG, in high and low level languages. These posts show timings of a hypothetical PRNG for different implementation languages and "big integer" libraries. This PRNG is defined as:

\[ x_{0} = b = 7,~~ p=2^{31} - 1 \]

\[ x_{i} = b^{x_{i-1}}\pmod{p} \]

His python version follows (you can go to his site to see the C versions):

def printpowers(n):
  a,p = 7,2^31-1
  for i in range(10^n):
    a = power_mod(7,a,p)
    if mod(i,10^(n-1))==0:
       print i,a

When I read this post, I wondered how SBCL would stack up, so I took literally one minute to port the C version into a Lisp version (the python version might have went faster) and evaluated it in SBCL. I spent absolutely no time at all thinking about optimization. Afterward, when writing this post, I made a few cosmetic changes to make it a little more pleasing to the eye but the execution is identical. My Lisp version looks like this:

(defun print-powers (n)
  (let ((b 7)
        (p (- (expt 2 31) 1)))
    (iter (for i below (expt 10 n))
      (for x
           initially (cl-primality::expt-mod b 7 p)
           then (cl-primality::expt-mod b x p))
      (when (zerop (mod i 100000))
        (collect x)))))

Here are the times taken to calculate one million PRNs in the various implementations. Because I am comparing implementation across different system, I cannot compare directly, so I compiled the GMP version and used that to give a scaling factor (I tried to also test how the Python version compared, but that wouldn't run with the information Alasdair provided). This means that only the bold values in the "calibrated time" column are actually measured. The others are approximated by assuming the performance differences between my machine and Alasdair's can be summed up as a simple numeric constant. Results are in seconds.


Implementation	Reported time	Calibrated time
Python	74.	110
C GMP	.90	1.3
C no big ints	1.5	2.2
C MPIR	.78	1.1
C PARI	.52	.75
C TomMath	13.	18.
Lisp (SBCL)	—	1.8

Judging from the fact that Alasdair describes himself as a high level language programmer and this is his first foray into the world of low level optimization, it is safe to assume that some of these numbers could be made significantly lower. In particular it is surprising that the C calculation without any big integer math at all does so poorly. Also, the extremely poor performance of TomMath is something to be investigated. Be that as it may, the "one minute to write" Lisp version holds up pretty well against all competitors in the timing tests and absolutely shines compared to the Python version. It maintains Python's readability and is nearly two orders of magnitude faster.

Though I am quite fond of several aspects of Common Lisp, I am not really in the business of promoting Common Lisp as an alternative to everything. That said, I absolutely do think it has a place in the scientific and mathematical computing, quite possibly more than Python does. I think that Common Lisp gives a clear path from tinkering at the REPL, to quickly prototyping an algorithm, to optimizing it if necessary, then porting to a lower level if absolutely necessary. Much the same can probably be said of Python, the crucial difference is that the initial algorithm prototype in Common Lisp will likely be within a factor of two of your initial C implementation, rather than a factor of 100. The fact that most software in academia never goes past the prototyping phase (it usually isn't necessary to optimize in order to get a publishable result), makes this all the more important. These results reinforce my opinion that Python should not be used outside of a glue language for low level libraries and, of course, I/O bounded programming such as user interfaces.

That difference between Python and Lisp (SBCL in this case) is partly in the compiler, to be sure, but also in the community and its opinions of what is important. If you are willing to wait, that gap in the quality of the compiler will certainly become smaller but that might not be a priority for the Python community. While you wait, some other language may come into vogue, perhaps improving or hurting performance. I think that in the mean time, however, Common Lisp has a very large advantage in this area that should not be ignored.

directed procrastination

Wednesday, July 18, 2012

Efficiency of Pseudo Random Numbers in Lisp

No comments :

Post a Comment