Monthly Archives: June 2013

Learning From Project Euler

If you’re just starting out with programming, like me, and haven’t yet checked out Project Euler and GitHub, please do. They’re not hard to get into, and I’ve found them really useful!

Project Euler is a particularly powerful impetus for you to improve on your programming skills. Like Codecademy, you start out simple and work your way up through challenges – unlike Codecademy, the challenge is actually worthwhile.

Github, meanwhile, is really useful for storing the code you’re working on and encouraging commentary/interaction with your fellow programmers. It’s also an introduction to version control, which is a useful thing to understand.

I’d urge you to listen to the people at Project Euler and avoid committing Problem answers to Github, as it kind of takes away from the challenge.

Project Euler

In the in-between, a problem with Project Euler forced me to re-design the function I built here, and I’m pretty happy with the results.

My original prime-finding function was taking well over a minute, so I had no choice but to refactor.

It’s a great demonstration of how Project Euler forces you out of your comfort zone – in my case, into Racket vectors. Even though they’re essentially the same as arrays in most other languages, something in me balked at using vectors in Racket. Something about their mutability struck me as not-right in a language so defined by its immutability.

Here’s the first draft of what I wrote, working with vectors to create a static sieve of eratosthenes of length (n):

(define (primes-to x)
  (letrec ([f (lambda (count vs n)
                (cond
                  [(= n 0) (begin (vector-set! vs 0 #f) (f (+ count 1) vs (+ n 1)))]
                  [(= n 1) (begin (vector-set! vs 1 #f) (f (+ count 1) vs (+ n 1)))]
                  [#t
                   (if (= n x)
                       vs
                       (if (< count x)
                           (if (= n count)
                               (f (+ count n) vs n)
                               (begin (vector-set! vs count #f)  (f (+ count n) vs n)))
                           (f (+ n 1) vs (+ n 1))))])

                )])
    (f 0 (make-vector x #t) 0)))

Since you can access vector members using vector-ref, and this program currently runs all the way to (= n x) – which means that we check almost every item on the list – there’s a lot of optimisation which could be done, and done profitably.

The important thing is (if (= n count), which takes away progressively larger chunks of things to check as the formula grows – I believe this puts it in O(n log n) space, or close to it. With this formula, I can check two million digits in ~6 seconds and I could cut that down substantially with some quick tweaks.

Eventually, I’ll hit a wall with the sieve of eratosthenes, and be forced to use the sieve of atkin – which I do not understand and actually dread facing. When I do have to face it, though, it will magically transform into a fun and interesting challenge. That’s the magic of Project Euler!

Last day, Capricorn-15s

Tagged , , ,

Merge Sort In Racket

I talked briefly about merge sort only yesterday, in my post about quicksort.

I also mentioned Lisp, which in the form of “Common Lisp” is a really nice practical language I really intend to learn properly one day. The more I learn about it, the more startled I am at how far ahead of its time it was – or more accurately, at how primitive popular ‘modern’ programming languages such as C, Java and C++ can be. Note that primitive doesn’t mean easy to grasp – if anything, the lower levels of abstraction possible with OOP and procedural languages mean that it’s harder to build the same kinds of things and more difficult to understand what’s going on once you have assembled a simulacrum of the abstraction you intended.

The things people build in these languages constantly astound me, and they have the added bonus that they’re often backed by powerful enough compilers that they’ll run much faster than their FP brethren. I’ve kind of got away from the topic though.

Merge Sort

So yes. Merge sort. Off we go.

I had a naive view that merge sort was quicksort’s less glamorous older brother, until watching the videos on the preview of my Coursera course in algorithm design, I realised that algorithm design and speed were incredibly complex issues subject to a whole host of factors.

I also realised, on trying to implement a quick and ugly merge sort in Racket, that it was harder to do than I had anticipated.

Merge sort pretty much forces you to create at least two different functions:

  • A zipping-up function, which merges ordered arrays, lists or vectors into bigger ordered listy things
  • An unzipping function, which splits the initial lists until they are small enough that ordering them is a trivial operation

Generally, either the zipping-up function or the unzipping function can also take care of ordering the tiny little lists.

For the tiny little lists, we either order them at length 2 by a simple conditional check, reversing the list if they’re “out of order”, at length 7 by doing an insertion sort (I’ve chosen this length because it’s the same length Java apparently use in the default Java sort function (source: Wikipedia) or at length 1 by simply having the zipping-up function handle them.

Length 1 is the easiest for me, so I go with that. I’d probably go with the length 7 option if I was actually building a sort function of my own for god-knows-what reason.

Let’s build our zippy function then – it’s going to be really simple. As before, comments are marked out with a semi-colon.

(define (merge-lists xs ys)
  (cond
    [(null? xs) ys]
;if xs is empty, return ys [(null? ys) xs]
;if ys is empty, return xs [(< (car xs) (car ys))
;if the head of list "xs" is bigger than head of list "ys" (cons (car xs) (merge-lists (cdr xs) ys))]
;cons head xs to (recurse) [#t ;I use true as an else cond here - possibly bad style? (cons (car ys) (merge-lists xs (cdr ys)))]))
;cons head ys to (recurse)

Sure, it only sorts ordered lists – but we’re gonna make sure these are ordered in the sort function.

(define (merge-sort xs)
  (cond
    [(or (null? xs) (null? (cdr xs))) xs]
    [(null? (cddr xs))
     (merge-lists (list (car xs)) (cdr xs))]
    [#t
     (let ([x (ceiling (/ (length xs) 2))])
       (merge-lists (merge-sort (take xs x))
                    (merge-sort (drop xs x))))]))

Wow, we’re done. That was SO easy. I love Racket.

All we do is check if the list is empty (null?) or has a length of 1 (if its tail is null? it must have a length of 1). If either of these is the case, we return the list.

The next bit uses (cddr), which checks if the tail of the tail is null. Similarly, (cdddr) would return the tail of the tail of the tail of the list, or (cadr) would return the head of the tail of the list – it’s simply a matter of combining (car) – head – and (cdr) – tail – in the proportions you desire. If this returns true, we merge-lists on a list of the head of list and tial of list.

But why do we specifically turn (car xs) into a list, and keep (cdr xs) as-is?

Well, that’s due to how linked lists work.

A pair in Racket, and other “tuples” in other languages, are constructed by (cons)ing values together. So (cons 1 2) results in a pair ‘(1 2). A linked list can be constructed by (cons)ing another value to this pair (e.g. (cons 1 (cons 2 3))).

However, a list is only officially a list in Racket if it ends in a null value, for many reasons – we’ve seen some of them here! When we find where the null value is, we know to end the recursive function for sure. So (cons 1 (cons 2 3)) – not a list. (cons 1 (cons 2 null)) – a list.

This affects us here because the head of our list cannot end in null – it is by definition a single value. Although the tail looks like a single value, ‘(x), it’s actually a list ‘(cons x null). So the head goes in a brand new list, while the tail remains as it is.

Finally, we define a local variable equal to half the list’s length (rounded up with the (ceiling) function) and we call (merge-list) on the first half and second half of our list, divided up using the (take) and (drop) functions from the Racket library. They do more-or-less what you’d expect – take the list up to position x, or drop the list up to position x. We keep merging and merging our merge-sort function until we finally end up with a 2-length list, at which point we order and merge them – and merge this with the other list that has been merged – and so on, all the way back until the function returns.

It’s nice and fast, we separate our concerns prettily, and it’s not even as complex as quicksort – merge sort is awesome.

As always, if you’ve seen something egregious in this post, please correct me. I make these posts primarily to learn, so anything you can let me know is gratefully accepted!

Tagged , , , ,

In Which I Decide That Ruby Is A Silly Place

I have nothing against Ruby’s image, despite the hipster-bashing that its name tends to invoke.

I think that the “Rails coding ninja” archetype the start-up world has conjured up is obnoxious, but I also think it’s strictly a product of the tech start-up world. Outside that limited circle, I doubt that anyone who uses Rails is any different from the people who were using PHP fifteen-odd years ago.

However, I cannot stand the language itself.

Ruby seems to offer roughly the same package as PHP (easy enough to code things up even for non-computer-science types like me, particularly well-suited to web development, slow, something of a mish-mash from a language-design perspective), just improved. Its niche, as far as I can tell, is the same. It’s avoided the mistakes PHP made, it’s a little slower, and it has the ability to do some clever cross-paradigm things (functional programming is a different paradigm because Ruby is object-oriented-turtles all the way down). It saves developer time, I guess, used right.

Meh. It doesn’t excite me, and I don’t learn anything new from it. I’m sure once I’m experienced enough with programming to be jaded I will find it utterly thrilling – but right now, all I see is a heap of syntactic sugar on top of a big ol’ mess of blah.

Let’s not go to Ruby, it is a silly place. Here is some more messing around with Lisp’s grandchild, Racket instead, along with explanations.

Quicksort

Quicksort is a decently fast sorting algorithm, as the name would imply. Here’s a simple way to implement a rough version in Racket, with explanations of what I’m doing as I go.

Comments are marked out with “;”, in case anyone reading doesn’t do Racket.

(define (quicksort xs)
    (if (null? xs)
    xs ;if the list we pass to quicksort is empty, return the list.
    (let* ;let* allows you to define multiple local bindings
        ([hd (car xs)] ;variable "hd" = the head of our list.
         [tail (cdr xs)] ;tail = list tail
         [smaller (filter (lambda (x) (< x hd)) tail)]
         [bigger (filter (lambda (x) (>= x hd)) tail)])
      (append (quicksort smaller) (list hd) (quicksort bigger))))

This is pretty concise, but it’s not really that simple of course. I’ve used the simplest way of writing things without really examining why it works or how it will work in practice.

The “smaller” and “bigger” bindings are made by passing our anonymous function (the bit with the keyword “lambda”) to the library function “filter”. That’s a bit of a cop-out, although filter is fairly simple it is a pretty cool and useful function to know how to write. Let’s have a look at my understanding of how it works:

(define (filter-local fn xs) ;defined locally to avoid shadowing
    (letrec ([f (lambda (xs acc) ;introducing an accumulator
        (if (null? xs)
            (reverse acc) ;give back the consed list at end (in right order)
            (if (fn (car xs)) ;if our function is true for head of list
                (f (cdr xs) (cons (car xs) acc));recurse and cons head to accumulator
                (f (cdr xs) acc))))]);else recurse and do nothing
    (f xs null))) ;starting the recursive function

This is probably not how it’s done exactly, but it gives you the general idea of what we’re doing with it – grabbing a list of numbers smaller than the head of our list, and a list of numbers bigger than or equal to the head of our list. This illustrates that this seemingly simple function actually is doing more work than it looks like, filtering the same list twice for every call. We could definitely make this more efficient, if we needed to, with a more specific single local function:

(letrec ([f (lambda (x ys sml big)
    (if (null? ys)
        (append (quicksort sml) (list x) (quicksort big))
        (if (> x (car ys))
            (f x (cdr ys) (cons (car ys) sml) big)
            (f x (cdr ys) sml (cons (car ys) big)))))])
    (f hd tail null null))

Feel free to check this function, it may be a little off as I’ve just written it in WordPress now. The idea, however, is that we do less work with each call to quicksort. This is not traditionally that important in algorithms (the Big O notation is usually more relevant, especially as if we’re not using the sort on large lists we’ll probably choose, say, an insertion-sort method instead) but it could probably knock at least some time off its run-time.

The way it works is by recursively calling quicksort on the lists of smaller and larger numbers, gradually sifting the big list blocks so that eventually we’re sorting (1) (2) and (3) rather than (213) (5) and (6559).

Now I run quicksort on a boring random list of 10000 numbers.

It’s super-fast! Awesome.

Just for kicks, I’ll run it again on the ordered list and -huh?

On ordered lists, this quicksort algorithm is slooooow. If we start with a pre-ordered list and use head/car as our “pivot”, quicksort runs at its worst-case performance, O(n^2). Usually, worst-case performance is very rare, but since ordered lists are pretty common we’ve just upped the chance that we’ll get a worst-case performance. Now, we could simply check to see if a list is ordered before we run the algorithm (boo!) or we could refactor the algorithm (yay!).

We can choose the middle-value by dividing the length of the list by two and using list-ref to choose the appropriate element – that’s the simplest way.

Further optimisations are a little more complicated (honestly, not much more complicated) and can be seen here: http://en.wikipedia.org/wiki/Quicksort

I’ve also written an implementation of merge sort[1] in Racket to make sure I understand that. It’s possible I’m a little over-excited about my upcoming Coursera course on algorithm design and analysis, and it’s definitely the case that I need to go write in some less pretty languages in the near future.

As always, if you’ve seen something egregious in this post, please correct me. I make these posts primarily to learn, so anything you can let me know is gratefully accepted!

[1] You can see how I created my merge sort here if you’re interested.

Tagged , ,
%d bloggers like this: