Wednesday, February 15, 2012

Project Euler and Refactoring

I started doing the problems on Project Euler the other day. For anyone that doesn't know, Project Euler is a website that has a collection of math problems for you to solve. You can solve the problem any way you want. The site keeps track of which problems you have submitted correct answers to.

Just finding the right answer is a good programming challenge, but refactoring makes for much deeper learning.

Another benefit from doing the problems is that you are building samples of your work that you can show to prospective employers. If you are going to show the world your code, you want it to be clean.

Here is an example of my process solving one of the problems in Clojure. I can't show what I have learned without displaying my ignorance, but I am less ignorant now because of this process.

Every line of code you see here represents at least 5 lines of experimentation preceding it.

Problem 3 asks, "What is the largest prime factor of the number 600851475143 ?"

Solution 1

Find all of the divisors of n, filter the ones that are prime and then take the largest. To find all of the divisors, iterate through all of the numbers from 1 to n, and test remainder of n divided by i.

Works just fine with 42. Doesn't show any signs of returning with 600851475143 . Restart the REPL.

  • Learned a bit more about 'for' like the difference between :while and :when.

Solution 2

Create a function that finds a single factor, called afactor. In Clojure you don't use loops with breaks, you use sequences and take the items you want. Lazy sequences don't calculate numbers until you ask for them, so you can create an expression which calculates all of the divisors, but only actually calculate one or two.

(defn afactor [base]
  (nth (for [x (range 1 (+ 1 base))
             :when (= 0 (rem base x))] x ) 1))

  • Learned how to use 'for' in a lazy way.

Now to create a function that returns a pair of factors.

(defn factor [base]
  (let [first (afactor base) ]
    (list first (/ base first))))

  • Got more confidence with 'let' binding.

The next step is a test whether a number is prime. A prime number has 2 divisors, but counting the number of divisors of our number is going to leave us restarting the REPL again, which of course is just a menu item in Clooj, but it is painful to admit that I was dumb enough to think that would work, so we will skip that step. Since a prime number is only divisible by 1 and itself, its 2nd largest factor will be itself.

(defn prime? [base]
  (= base (afactor base) ))

  • I assumed (= base (afactor base)) would allow me to return a boolean, glad to see that it actually does.

The first call to factor with our number yields a result large enough that I don't care to retype it. Let's set up a manual recursion of 1 level.

(defn prob3 [base]
  (map factor 
       (filter #(complement prime?
                (factor base)))))

  • Playing around with this function and others like it really got me confident with the #(%) syntax of anonymous functions. Not long ago they were a mystery, but with repetition they are comfortable.
  • Playing with the ugly (map factor (filter ... construction was good practice with maps and lists even if the resulting code is terrible.

It is not hard to see a way to the right answer from this. Either you take the output of prob3 and call prob3 again, typing in the first result, or you could nest the map factor filter inside of another map factor filter. This isn't anything you are going to show off to your friends.

Solution 3

Solving a problem earlier on 4clojure I found out that Clojure has a function called tree-seq. We always drew trees when we were factoring numbers in school, so that will probably yield a more satisfying result. Tree-seq takes 3 parameters, branch? children and root. branch? is a function that tests whether a node has children, children is a function that returns the children for a node and root is the initial value.

(defn factor-tree [base]
  (tree-seq (complement prime?) factor base))

  • I got to learn how to use tree-seq. Also, prime? was returning true when a node didn't have children, exactly opposite the desired result. After (not prime?) failed, I understand when to use not and when to use complement.

factor-tree is returning a list of all of the factors. That is easy enough to handle. Getting all of the prime factors we may want to do on another problem, so lets give it its own function.

(defn prime-factors [base] 
    (filter #(prime? %)  (factor-tree base)  ))

For this problem, we just want the largest prime factor.

(defn problem3 []
  (apply max (prime-factors 600851475143)))       

  • I was surprised that I had to type (apply max(...)) instead of just (max (..)). In the imperative world so often you are calling methods on parameters, so it is not hard to think of LISP syntax differently. Every language I have ever used though has had a function called max that operates on a list of data, which is why it is hard for my brain to see that max(list) isn't correct.

Solution 4

Using the tree-seq makes for a much nicer solution, and I was tempted to stop there. One thing still bothered me though. In one video I saw, Stuart Halloway said that with lazy evaluation, there seldom was a reason to return just one of anything. It is much more flexible to return a sequence and let the caller decide which elements they need.

(defn afactor [base]
  (nth (for [x (range 1 (+ 1 base))
             :when (= 0 (rem base x))] x ) 1))

Needs to be rewritten.

(defn lazy-factor [base]
  (for [x (range 1 (+ base 1))
     :when (= 0 (rem base x))]
     x ))
 

  • The idea of returning a lazy sequence instead of a single value is a different philosophy that I have used in the past.

As you can see, the change was very minor. In fact, it is easier to do it the right way than the wrong way. If I hadn't rewritten it, I never would have found that out!

Because I am now returning a lazy sequence instead of a single element, I do have to modify the callers to choose the element they want.

(defn prime? [base]
  (= base (nth (lazy-factor base) 1)))

(defn factor [base]
  (let [first (nth (lazy-factor base) 1) ]
    (list first (/ base first))))

Solution 5

I thought I was done, but moving when I moved on to other problems, I kept finding myself typing things like (range 1 (+ 1 base)), as I did in the lazy-factor function. Range is built to give you a c style loop, so (range 5) gives you (0 1 2 3 4). Usually this is fine, but for the Project Euler problems, I constantly find myself wanting a basic style loop, (1 2 3 4 5) instead. This was enough motivation for me to attempt my first macro.

(defmacro rangeb [last]
  (list 'range 1 (list '+ 1 last)))

With this change, lazy-factor becomes:

(defn lazy-factor [base]
  (for [x (rangeb base)
     :when (= 0 (rem base x))] 
     x ))

Conclusion

I expect that in the not too distant future I will be able to look back at this post and cringe. I am sure that I will find other situations where I find a simpler way to do things and will be tempted to refactor again.

I find myself using 'for' quite a bit, though only in the factor function (afactor and then its replacement lazy-factor) here. for feels imperative, and I wonder if I am overusing it.

My rangeb macro is overly specific. Right now you can only pass an ending number; it always starts at one and increments by one. When the spirit moves me, or when a situation demands it, I will update it. In the meantime I will pretend that YAGNI applies here.

One other thing I think my subconscious is learning is how easy it is to write and test things in Clojure. Next time I use C# I am sure I am going to miss the REPL. Even now, I can ponder having to write methods to call and output my code, compiling every change. /shudder

1 comment:

  1. What I like about the Euler problems is that they force you to think mathematically. Some of the later solutions can't even be brute forced by a typical laptop. Like trying to find the smallest sum of the path up a triangle of numbers.

    ReplyDelete