On Beyond Lambda

2016 Year In Review

2016-01-03T08:08:00.000-08:00

I got to participate in the RentPath engineering department planning session for 2016. We tried to develop a common vision for how we would like the department to operate at the start of 2017, and use that vision to evaluate our current processes and proposed changes.

I spent the last week of 2015 in bed with the flu. We all have things in our lives that we would like to do better, but the week I wasted, unable to do much of anything, all because I hadn’t even taken the time to consider getting a flu shot was really frustrating. It also reminded me that there are a lot of ways that I have been neglecting my health, and the diseases my current course will lead to won’t go away in a week.

Recognizing the need for change, I am going to try the same planning approach and start by asking who do I want to be at the beginning of 2017? And what kinds of things do I need to do, and what habits do I need to form, in 2016 to become that person?

At the start of 2017 protecting my health is a priority, because I recognize that being healthy leads to more energy and more enjoyment of life. Poor health not only means feeling bad, it also makes everything else in life more difficult.

I spend more time around other people. It is easy for me to get wrapped up in my own projects and pursuits, but life is more fun with others. I explore new music, listening to artists I haven’t heard before, going out to hear music live, and making horrible noises on the piano in my basement.

In the past, I have given up a lot in exchange for convenience and never realized what it was costing me. In 2016, I am without excuse. For the first time since 2002, I start the year with no thought about how to prepare for my next job. In 2016 my only career goals are to get better at the job I already have, and to help the people around me succeed.

I am looking forward to 2017 and who I am going to be. And I am going to enjoy getting there.

Happy New Year.

Trying Out Property-Based Testing in Clojure

2014-11-02T21:45:00.000-08:00

I am taking Erik Meijer’s Introduction to Functional Programming course on edX. In the homework for the lesson on list comprehensions, the Caesar Cipher is introduced. The way the cipher works is that each character in a message is shifted a number of characters. If the number is 3, then a becomes d, b becomes e, and it wraps around so z becomes c.

Implementing the cipher in Clojure seems like a good project to try out test.check, a property based testing library. With traditional unit testing it is up to the programmer to come up with example cases and their expected results. In property based testing, the programmer describes the relationship between the inputs and outputs of a function and specifies the valid inputs and the testing tool generates the examples itself.

It was really easy to get started, I just added test.check to my dependencies, and then referred the appropriate namespaces in to the core_test.clj file that leiningen creates by default. Tests can be run either from the repl or with lein test.

Phase One

In the project, the code is in core.clj, and the tests are in core_test.clj.

My implementation of the Caesar Cipher is only going to work with lowercase letters. The first thing I need to do is translate a character to a number between 0 and 25.

(defn let-to-num [l]

(- (int l) (int \a)))

To test the function, I am going to tell test.check to generate random letters. Because I only want to test lowercase letters, I am going to use a lower-case? function as a filter:

(defn lower-case? [c]

(and (>= (int c) (int \a))

(<= (int c) (int \z))))

(defspec let-to-num-returns-0-to-25

100

(prop/for-all [c (gen/such-that lower-case? gen/char-alpha 100)]

(let [v (let-to-num c)]

(and (>= v 0)

(< v 26)))))

defspec is used to hook the test into the clojure.test runner. I name the test, then specify that I want it to be run 100 times each time the tests are run. prop/for-all specifies that for all of the values in the binding, I want the expression after the binding to evaluate to true. For each iteration of the tests, c is going to be bound to a value generated by test.check.

The base filter I am using is char-alpha which generates a random letter that can be either uppercase or lowercase. gen/such-that takes a predicate that can be used to filter the values from a generator. The optional third parameter specifies the number of times the generator should attempt to generate a qualifying value before throwing a runtime error. By specifying 100, I am saying that the generator should try to generate a lowercase letter 100 times before giving up and throwing a runtime exception. 100 is a ridiculous number of times to try a 50-50 proposition, but I did have one run where I got a runtime exception using the default value of 10.

lein test produces the following output:

lein test caesar-cipher.core-test

{:test-var let-to-num-returns-0-to-25, :result true, :num-tests 100, :seed 1414982695288}

Ran 1 tests containing 1 assertions.

0 failures, 0 errors.

For each test, we can see the result and how many times each test was run. The :seed value is provided so if there is a failing test we can supply that seed value to the test function and it will run the tests against the same test cases that were generated when the tests failed. This allows us to ensure that tests are passing because the problem was actually fixed, not because we got lucky and didn’t hit the failing test case the next time.

Adding a number-to-letter function provides a new relationship to test. Translating from a letter to a number and back again should yield the original letter. Because I am again using the lower-case letter generator, I have refactored it into its on def.

(defn num-to-let [l]

(char (+ l (int \a))))

(def gen-lower-case (gen/such-that lower-case? gen/char-alpha 100))

(defspec let-to-num-to-let

100

(prop/for-all [c gen-lower-case]

(= c (num-to-let (let-to-num c)))))

Adding a shift function gives another relationship that should yield the initial value after a round trip. Shifting down 3 letters the result of shifting up 3 letters should yield the original result.

(defn shift [n i]

(+ i n))

(defspec shift-round-trip

100

(prop/for-all [n (gen/elements (range 26))

c gen-lower-case]

(= (num-to-let (shift (* -1 n) (shift n (let-to-num c))))

c)))

Testing shift requires a new generator. gen/elements selects a random element from a collection, in this case (range 26), which is the full range of translations we want to allow.

The encoding is a 3 step process: convert to int, shift, convert to letter. While not strictly necessary, it is nice to have a function that aggregates these steps. The test just duplicates the logic of the translate (xlate) function, but because we have the test, if later we find a more efficient way to perform the translation we can still rely on the known good algorithm to validate it.

(defn xlate-let [n let]

(->> (let-to-num let)

(shift n)

(num-to-let)))

(defspec xlate-let-combines-the-steps

100

(prop/for-all [n (gen/elements (range 26))

c gen-lower-case]

(= (num-to-let (shift n (let-to-num c)))

(xlate-let n c))))

All that remains is to translate a whole phrase:

(defn translate [n phrase]

(for [letter phrase]

(xlate-let n letter)))

(defspec translate-round-trip

100

(prop/for-all [n (gen/elements (range 26))

v (gen/vector gen-lower-case)]

(= (translate (* -1 n) (translate n v))

v)))

We have added another generator here. gen/vector generates a vector containing 0 or more elements from the provided generator, in this case gen-lower-case.

Running all 5 tests yields this result:

lein test caesar-cipher.core-test

{:test-var xlate-let-combines-the-steps, :result true, :num-tests 100, :seed 1414985079696}

{:test-var shift-round-trip, :result true, :num-tests 100, :seed 1414985079723}

{:test-var translate-round-trip, :result true, :num-tests 100, :seed 1414985079734}

{:test-var let-to-num-to-let, :result true, :num-tests 100, :seed 1414985079784}

{:test-var let-to-num-returns-0-to-25, :result true, :num-tests 100, :seed 1414985079790}

Ran 5 tests containing 5 assertions.

Phase Two

The tests are passing, but I am not happy with the results. The translate function is passed a string but it returns a sequence of characters:

(translate -13 (translate 13 "Property-based testing is fun."))

(\P \r \o \p \e \r \t \y \- \b \a \s \e \d \space \t \e \s \t \i \n \g \space \i \s \space \f \u \n \.)

Also, translate will yield unprintable characters:

(translate 13 "Property-based testing is fun.")

(\] \ \| \} \r \ \ \ \: \o \n \ \r \q \- \ \r \ \ \v \{ \t \- \v \ \- \s \ \{ \;)

For phase two of this project, I want to only translate lowercase letters, I want the result of translating a lowercase character to always yield a lowercase character and I want the result of translation to be a string, not a sequence of characters. Each of these are testable properties.

Adding the test that shift returns a number between 0 and 25 (which we translate to lowercase letters) lets us see what a failing test looks like.

(defspec shift-returns-0-to-25

100

(prop/for-all [n (gen/elements (range 26))

c gen-lower-case]

(let [v (shift n (let-to-num c))]

(and (>= v 0)

(< v 26)))))

{:test-var shift-returns-0-to-25, :result false, :seed 1414986620265, :failing-size 2, :num-tests 3, :fail [23 l], :shrunk {:total-nodes-visited 19, :depth 3, :result false, :smallest [15 l]}}

Here we see that we failed when we tried shifting the character ‘l’ 23 characters. test.check has a feature called shrinking, that tries to find the smallest value for an input that will cause the test to fail. Here we see that test.check has determined that shifting the letter l by 15 is the minimum value that will cause it to exceed 25. L, as the 12th letter has a value of 11, since we start counting at 0. In this case, it is not an important feature, but if we were testing complicated data structures it would be really nice to have our failing cases simplified for us.

This problem is easily solved:

(defn shift [n i]

(mod (+ i n) 26))

To test that we are translating only lower-case letters, we can generate a string and compare the non-lowercase letters to the non-lowercase letters of the translated string.

(defspec translate-only-lowercase

100

(prop/for-all [n (gen/elements (range 26))

s gen/string-ascii]

(= (remove lower-case? s)

(remove lower-case? (translate n s)))))

For this test to work, instead of generating lowercase letters, we will use the build in gen/string-ascii.

To implement this, I will move the lower-case? function to the core namespace (since I am doing :refer :all in the test namespace, it will still be available in my tests).

(defn translate [n phrase]

(for [letter phrase]

(if (lower-case? letter)

(xlate-let n letter)

letter)))

For the last step the translate-round-trip test needs to be modified to generate strings instead of vectors so that the translate function will only pass the tests if it returns a string.

(defspec translate-round-trip

100

(prop/for-all [n (gen/elements (range 26))

v gen/string-ascii]

(= (translate (* -1 n) (translate n v))

v)))

And modify the function to return a string:

(defn translate [n phrase]

(apply str

(for [letter phrase]

(if (lower-case? letter)

(xlate-let n letter)

letter))))

Now we have 7 passing tests and if we call our translate function, the results are more satisfying:

(translate -13 (translate 13 "Property-based testing is fun."))

"Property-based testing is fun."

(translate 13 "Property-based testing is fun.")

"Pebcregl-onfrq grfgvat vf sha."

A Few More Tests

For the sake of completeness, there are 3 more tests I want to write. I want to test the lower-case? function and I want to test that if translate is called with 0, the translated string equals the original string, and I want to test that if translate is called with a number other than 0 that the translated string does not equal the original string.

(defspec lower-case?-works

100

(prop/for-all [c gen/char-ascii]

(= (lower-case? c)

(and (>= (int c) (int \a))

(<= (int c) (int \z))))))

(defspec translate-0-change-nothing

100

(prop/for-all [s gen/string-ascii]

(= s (translate 0 s))))

(defspec translate-changes-things

100

(prop/for-all [n (gen/elements (range 1 26))

v (gen/vector gen-lower-case)]

(not (= v (translate n v)))))

Notice, for the test that translate really does change its inputs, I had to go back to the lower-case generator. If I had used the string generator, sometimes the generator would have produced strings that had no lowercase values, and the tests would have failed because there was nothing to translate.

Conclusion

My core namespace has 6 functions in it. The 10 tests in the test namespace test the return values for all legal inputs to these 6 functions. How many unit tests would it take to do that? How would you know if you had tested all of the possible inputs?

If you did manage to write unit tests to cover every possible case, how many tests would you have to change when refactoring rendered the tests obsolete? I don’t know if property-based tests will be more resilient against such changes, but I do know that when I have to change my tests, I will have a lot fewer tests to change.

Today is my first day writing property-based tests. And right away I am a believer. I am completely confident in my code. It is a nice feeling.

You can find the code for this project on github.

Trying Out Reference Cursors

2014-10-19T06:44:00.001-07:00

On Thursday David Nolen teased that "the next Om release is gonna to be a doozy”. On Saturday, he revealed “Reference Cursors”. As he explains and demonstrates in this tutorial reference cursors let your ui components and your data have different hierarchies.

As an example of where this might be useful, imagine your data looks like this:

(defonce app-state (atom {:messages [{:sender "Paul" :text "Let It Be"}]
                          :members [{:name "John" :instrument :guitar}
                                    {:name "George" :instrument :guitar}
                                    {:name "Paul" :instrument :bass}
                                    {:name "Ringo" :instrument :drums}
                                    {:name "Clarence" :instrument :saxophone}]}))

And your UI is structured like this:

(defn main-panel [app owner]
(reify
om/IRender
    (render [this]
      (dom/div nil
               (dom/div #js {:className "row"}
                        (om/build msgs-panel (:messages app)))
               (dom/div #js {:className "row"}
                        (om/build user-panel (nth (:members app) 0))
                        (om/build user-panel (nth (:members app) 1))
                        (om/build user-panel (nth (:members app) 2))
                        (om/build user-panel (nth (:members app) 3)))))))

The messages panel knows about the messages, and each user panel knows only about the user it represents.

What happens if you want to display each user’s messages in the user panels? You could pass the entire app-state to each panel, but then you have to find another way to indicate which particular user each panel represents.

Reference cursors allow exposing a data hierarchy independently from the ui structure. In this application, we expose the messages:

(defn messages []
(om/ref-cursor (:messages (om/root-cursor app-state))))

We can then observe this cursor in the user panels, as in the let binding here:

(defn user-panel [app owner]
(reify
    om/IRender
    (render [_]
      (let [xs (->> (om/observe owner (messages))
                    (filter #(= (:sender %) (:name app)))
                    reverse)]
        (dom/div #js {:className "col-xs-3 panel panel-warning msg-log"}
                 (dom/h4 #js {:className "panel-heading"}
                         (str (:name app) " " (:instrument app)))
                 (dom/div #js {:className "panel-body"}
                          (apply dom/ul #js {:className "list-group"}
                                 (map #(dom/li #js {:className "list-group-item"}
                                               (:text %))
                                      (take 3 xs)))))))))

Even in this small application, reference cursors make for a much cleaner solution. For larger applications, they are going to make a huge difference.

You can find the code for this application on github and you can see the application running here.

Machine Learning in Clojure - part 2

2014-07-21T06:15:00.000-07:00

I am trying to implement the material from the Machine Learning course on Coursera in Clojure.

My last post was about doing linear regression with 1 variable. This post will show that the same process works for multiple variables, and then explain why we represent the problem with matrices.

The only code in this post is calling the functions introduced in the last one. I also use the same examples, so post this will make a lot more sense if you read that one first.

For reference, here is the linear regression function:

(defn linear-regression [x Y a i]
  (let [m (first (cl/size Y))
        X (add-ones x)]
    (loop [Theta (cl/zeros 1 (second (cl/size X))) i i]
      (if (zero? i)
        Theta
        (let [ans (cl/* X (cl/t Theta))
              diffs (cl/- ans Y)
              dx (cl/* (cl/t diffs) X)
              adjust-x (cl/* dx (/ a m))]
          (recur (cl/- Theta adjust-x)
                   (dec i)))))))

Because the regression function works with matrices, it does not need any changes to run a regression over multiple variables.

Some Examples

In the English Premier League, a team gets 3 points for a win, and 1 point for a draw. Trying to find a relationship between wins and points gets close to the answer.

(->> (get-matrices [:win] :pts)
    reg-epl
    (print-results "wins->points"))

** wins->points **
 A 1x2 matrix
 -------------
 1.24e+01  2.82e+00

When we add a second variable, the number of draws, we get close enough to ascribe the difference to rounding error.

(->> (get-matrices [:win :draw] :pts)
     reg-epl
     (print-results "wins+draws->points"))

** wins+draws->points **
 A 1x3 matrix
 -------------
-2.72e-01  3.01e+00  1.01e+00

In the last post, I asserted that scoring goals was the key to success in soccer.

(->> (get-matrices [:for] :pts)
     reg-epl
     (print-results "for->points"))


** for->points **
 A 1x2 matrix
 -------------
 2.73e+00  9.81e-01

If you saw Costa Rica in the World Cup, you know that defense counts for a lot too. Looking at both goals for and against can give a broader picture.

(->> (get-matrices [:for :against] :pts)
     reg-epl
     (print-results "for-against->pts"))


** for-against->pts **
 A 1x3 matrix
 -------------
 3.83e+01  7.66e-01 -4.97e-01

The league tables contain 20 fields of data, and the code works for any number of variables. Will adding more features (variables) make for a better model?

We can expand the model to include whether the goals were scored at home or away.

(->> (get-matrices [:for-h :for-a :against-h :against-a] :pts)
     reg-epl
     (print-results "forh-fora-againsth-againsta->pts")) 


** forh-fora-againsth-againsta->pts **
 A 1x5 matrix
 -------------
 3.81e+01  7.22e-01  8.26e-01 -5.99e-01 -4.17e-01

The statistical relationship we have found suggests that that goals scored on the road are with .1 points more than those scored at home. The difference in goals allowed is even greater; they cost .6 points at home and only .4 on the road.

Wins and draws are worth the same number of points, no matter where the game takes place, so what is going on?

In many sports there is a “home field advantage”, and this is certainly true in soccer. A team that is strong on the road is probably a really strong team, so the relationship we have found may indeed be accurate.

Adding more features indiscriminately can lead to confusion.

(->> (get-matrices [:for :against :played :gd :for-h :for-a] :pts)
     reg-epl
     (map *)
     (print-results "kitchen sink”))

** kitchen sink **
(0.03515239958218979 0.17500425607459014 -0.22696465757628984 1.3357911841232217 0.4019689136508527 0.014497060396707949 0.1605071956778842)

When I printed out this result the first time, the parameter representing the number of games played displayed as a decimal point with no digit before or after. Multiplying each term by 1 got the numbers to appear. Weird.

The :gd stands for “goal difference” it is the difference between the number of goals that a team scores and the number they give up. Because we are also pulling for and against, this is a redundant piece of information. Pulling home and away goals for makes the combined goals-for column redundant as well.

All of the teams in the sample played the same number of games, so that variable should not have influenced the model. Looking at the values, our model says that playing a game is worth 1.3 points, and this is more important than all of the other factors combined. Adding that piece of data removed information.

Let’s look at one more model with redundant data. Lets look at goals for, against and the goal difference, which is just the difference of the two.

(->> (get-matrices [:for :against :gd] :pts)
     reg-epl
     (print-results "for-against-gd->pts"))

** for-against-gd->pts **
 A 1x4 matrix
 -------------
 3.83e+01  3.45e-01 -7.57e-02  4.21e-01

points = 38.3 + 0.345 * goals-for - 0.0757 * goals-against + 0.421 * goal-difference

The first term, Theta[0] is right around 38. If a team neither scores nor allows any goals during a season, they will draw all of their matches, earning 38 points. I didn’t notice that the leading term was 38 in all of the cases that included both goals for and against until I wrote this model without the exponents.

Is this model better or worse than the one that looks at goals for and goals against, without goal difference. I can’t decide.

Why Matrices?

Each of our training examples have a series of X values, and one corresponding Y value. Our dataset contains 380 examples (20 teams * 19 seasons).
Our process is to make a guess as to the proper value for each parameter to multiply the X values by and compare the results in each case to the Y value. We use the differences between the product of our guesses, and the real life values to improve our guesses.

This could be done with a loop. With m examples and n features we could do something like

for i = 1 to m 
     guess = 0 
     for j = 1 to n 
          guess = guess + X[i, j] * Theta[j] 
     end for j
 difference[i] = guess - Y
end for i

We would need another loop to calculate the new values for Theta.

Matrices have operations defined that replace the above loops. When we multiply the X matrix by the Theta vector, for each row of X, we multiply each element by the corresponding element in Theta, and add the products together to get the first element of the result.

Matrix subtraction requires two matrices that are the same size. The result of subtraction is a new matrix that is the same size, where each element is the difference of the corresponding elements in the original matrices.

Using these two operations, we can replace the loops above with

Guess = X * Theta
Difference = Guess - Y

Clearly the notation is shorter. The other advantage is that there are matrix libraries that are able to do these operations much more efficiently than can be done with loops.

There are two more operations that our needed in the linear regression calculations. One is multiplying matrices by a single number, called a scalar. When multiplying a matrix by a number, multiply each element by that number. [1 2 3] * 3 = [3 6 9].

The other operation we perform is called a transpose. Transposing a matrix turns all of its rows into columns, and its columns into rows. In our examples, the size of X is m by n, and the size of Theta is 1 x n. We don’t have any way to multiply an m by n matrix and a 1 by n matrix, but we can multiply a m by n matrix and an n by 1 matrix. The product will be an m by 1 matrix.

In the regression function there are a couple of transposes to make the dimensions line up. That is the meaning of the cl/t expression. cl is an alias for the Clatrix matrix library.

Even though we replaced a couple of calculations that could have been done in loops with matrix calculations, we are still performing these calculations in a series of iterations. There is a technique for calculating linear regression without the iterative process called Normal Equation.

I am not going to discuss normal equation for two reasons. First, I don’t understand the mathematics. Second the process we use, Gradient Descent, can be used with other types of machine learning techniques, and normal equation cannot.

Linear Regression in Clojure, Part I

2014-07-05T07:18:00.000-07:00

Several months ago I recommended the Machine Learning course from Coursera. At the time, I intended to retake the course and try to implement the solutions to the homework in Clojure. Unfortunately, I got involved in some other things, and wasn’t able to spend time on the class.

Recently, a new book has come out, Clojure for Machine Learning. I am only a couple of chapters in, but it has already been a good help to me. I do agree with this review that the book is neither a good first Clojure book, or a good first machine learning resource, but it does join the two topics well.

Linear Regression
The place to start with machine learning is Linear Regression with one variable. The goal is to come up with an equation in the familiar form of y = mx + b, where x is the value you know and y is the value you are trying to predict.

Linear regression is a supervised learning technique. This means that for each of the examples used to create the model the correct answer is known.

We will use slightly different notation to represent the function we are trying to find. In place of b we will put Theta[0] and in place of m we will put Theta[1]. The reason for this, is that we are going to be using a generalized technique that will work for any number of variables, and the result of our model will be a vector called Theta.

Even though our technique will work for multiple variables, we will focus on predicting based on a single variable. This is conceptually a little simpler, but more importantly it allows us to plot the input data and our results, so we can see what we are doing.

The Question
A number of years ago I read the book Moneyball, which is about the application of statistics to baseball. One of the claims in the book is that the best predictor for the number of games a baseball team wins in a season is the number of runs they score that season. To improve their results, teams should focus on strategies that maximize runs.

The question I want to answer is whether the same is true in soccer: Are the number of points a team earns in a season correlated with the number of goals that they score. For any that don’t know, a soccer team is awarded 3 points for a win and 1 point for a tie.

The importance of goals is a relevant question for a Manchester United fan. At the end of the 2012-13 season, head coach Sir Alex Ferguson retired after winning his 13th Premier League title. He was replaced by David Moyes. Under Moyes the offense which had been so potent the year before looked clumsy. Also, the team seemed unlucky, giving up goals late in games, turning wins into draws and draws into defeats. The team that finished 1st the year before finished 7th in 2013-14. Was the problem a bad strategy, or bad luck?

The Data
I have downloaded the league tables for the last 19 years of the English Premier League from stato.com. There have actually been 22 seasons in the Premier League, but in the first 3 seasons each team played 42 games, vs 38 games for the last 19 seasons, and I opted for consistency over quantity.

I actually want to run 3 regressions, first one on a case where I am sure there is a correlation, then on a case where I am sure there is not, and then finally to determine whether a correlation exists between goals and points.

There should be a high correlation between the number of wins a team has and their number of points. Since every team plays the same number of games, there should be no correlation between the number of games played and a teams position in the standings.

The Process
We will use a technique called gradient descent to find the equation we want to use for our predictions. We will start with an arbitrary value for Theta[0] and Theta[1]; setting both to 0. We will multiply each x value by Theta[1] and add Theta[0], and compare that result to the corresponding value of Y. We will use the differences between Y and the results of Theata * X to calculate new values for Theta, and repeat the process.

One way of measuring the quality of the prediction is with a cost function that measures the mean square error of the predictions.

1/2m * sum(h(x[i]) - y[i])^2

Where m is the number of test cases we are evaluating, and h(x[i]) is the predicted value for a test case i. We will not use the cost function directly, but its derivative is used in improving our predictions of Theta as follows:

Theta[0] = Theta[0] - alpha * 1/m * sum(h(x[i]) - y([i])
Theta[1] = Theta[1] - alpha * 1/m * sum((h(x[i]) - y([i]) * x[i])

We have added one more symbol here. alpha is called the learning rate. The learning rate determines how much we modify Theta each iteration. If alpha is set too high, the process will oscillate between Thetas that are too low and two high and the process will never converge. When alpha is set lower than necessary, extra iterations are necessary to converge.

I need to mention again that this methodology and these equations come directly from Professor Ng’s machine learning course on Coursera that I linked above. He spends over an hour on linear regression with one variable, and if you want more information that is the place to go.

The Code
The actual calculations we are going to do are operations on matrices. When we multiply the matrix X by the matrix Theta, we obtain a matrix of predictions that can be compared element by element with the matrix Y. The same results could be obtained by looping over each test case, but expressing the computations as matrix operations yields simpler equations, shorter code and better performance.

I used the clatrix matrix library for the calculations.

One other thing to note, in the equations above, Theta[0] is treated differently than Theta[1], it is not multiplied by any x terms, either in the predictions or in the adjustments after the predictions. If we add an additional column to our X matrix, an X[0], and make all of the values in this column 1, we then no longer have to make a distinction between Theta[0] and Theta[1].

(defn add-ones "Add an X[0] column of all 1's to use with Theta[0]"
[x]
(let [width (first (cl/size x))
        new-row (vec (repeat width 1))
        new-mat (cl/matrix new-row)]
    (cl/hstack new-mat x)))

(defn linear-regression [x Y a i]
(let [m (first (cl/size Y))
        X (add-ones x)]
    (loop [Theta (cl/zeros 1 (second (cl/size X))) i i]
      (if (zero? i)
        Theta
        (let [ans (cl/* X (cl/t Theta))
              diffs (cl/- ans Y)
              dx (cl/* (cl/t diffs) X)
              adjust-x (cl/* dx (/ a m))]
          (recur (cl/- Theta adjust-x)
(dec i)))))))

The linear-regression function takes as parameters the X and Y values that we use for training, the learning rate and the number of iterations to perform. We add a column of ones to the passed in X values. We initialize the Theta vector, setting all the values to 0.

At this point X is a matrix of 380 rows and 2 columns. Theta is a matrix of 1 row and 2 columns. If we take the transpose of Theta (turn the rows into columns, and columns into rows) we get a new matrix, Theta’ which has 2 rows and 1 columns. Multiplying the matrix X with Theta’ yields a matrix of 380x1 containing all of the predictions, and the same size as Y.

Taking the difference between the calculated answers and our known values yields a 380x1 matrix. We transpose this matrix, making it 1x380, and multiply it by our 380x2 X matrix, yielding a 1x2 matrix. We multiply each element in this matrix by a and divide by m, ending up with a 1x2 matrix which has the amounts we want to subtract from Theta, which is also a 1x2 matrix. All that is left to do is recur with the new values for Theta.

The Results
Since I am going to be performing the same operations on three different data sets, I wrote a couple of helper functions. plot-it uses Incanter to display a scatter plot of the data. reg-epl calls the linear-regression function specifying a learning rate of .0001 and 1000000 iterations. I also have a get-matrices function, which downloads the data and creates the X and Y matrices for the specified fields.

(def wins (get-matrices [:win] :pts))
(plot-it wins)
(def win-theta (reg-epl wins))
(println "Wins-points: " win-theta)

Yields this graph

and these results

Wins-points: A 1x2 matrix
-------------
1.24e+01 2.82e+00

The relationship between wins and points is obvious in the graph. The equation we developed estimates wins as being worth 2.82 points, instead of the correct 3. This is because it had no way to account for draws, and use a high intercept to get those extra points in there.

A team with 0 wins would be expected to have 12.4 points. A team with 10 wins would have 12.4 + 2.82 * 10 = 40.6 points. A team with 20 wins would have 12.4 + 2.82 * 25 =
82.9 points.

(def played (get-matrices [:played] :rank))
(plot-it played)
(def played-theta (reg-epl played))
(println "played-rank: " played-theta)
(println "expected finish:" (+ (first played-theta)
(* 38 (second played-theta))))

Playing 38 games gives you an equal chance of having a finishing position anywhere between 1 and 20. The graph gives a good illustration of what no-correlation looks like.

If we use the terms in Theta to find the expected finishing position for a team playing 38 games, we find exactly what we expect, 10.5.

played-rank: A 1x2 matrix
-------------
7.27e-03 2.76e-01

expected finish: 10.499999999999996

Ok, now that we have seen what it looks like when we have a strong correlation, and no correlation, is there a correlation between goals and points?

(def goals (get-matrices [:for] :pts))
(plot-it goals)
(def goal-theta (reg-epl goals))
(def goal-lm (find-lm goals))
(println "goals-points: " goal-theta)
(println "goals-points (incanter): " goal-lm)

Looking at the graph, while not quite as sharp as the goals-points graph, it definitely looks like scoring more goals earns you more points.

To double check my function, I also used Incanter’s linear-model function to also generate an intercept and slope. (And yes, I am relieved that they match).

goals-points: A 1x2 matrix
-------------
2.73e+00 9.81e-01

goals-points (incanter): [2.7320304686089685 0.9806635460888629]

We can superimpose the line from our regression formula on the graph, to see how they fit together.

(def goal-plot (scatter-plot (first goals) (second goals)))
(defn plot-fn [x]
(+ (* (second goal-theta) x) (first goal-theta)))
(def plot-with-regression (add-function goal-plot plot-fn 0 100))

(view plot-with-regression)

The Answer
We can calculate how many points we would expect the team to earn based on their 86 goals in 2012-13 and 64 goals in 2013-14.

(println "86 goals = " (+ (first goal-theta)
(* (second goal-theta) 86)))

(println "64 goals = " (+ (first goal-theta)
(* (second goal-theta) 64)))

86 goals = 87.07011197597255
64 goals = 65.49481001604704

In the last year under Sir Alex, Manchester United earned 89 points, 2 more than the formula predicts. In their year under David Moyes, they earned 64 points, 1.5 less than the formula predicts.

Of the 25 point decline in Manchester United’s results, 21.5 points can be attributed to the failure of the offense under Moyes, and 3.5 points can be attributed to bad luck or other factors.

Manchester United’s attacking style isn’t just fun to watch, it is also the reason they win so much. Hopefully the team’s owners have learned that lesson, and will stick to attack minded managers in the future.

You can find all of the code for the project on github.

Chess Clocks with ClojureScript

2014-05-22T15:12:00.000-07:00

Recently I watched a talk by David Nolen about Clojure's core.async library. For anyone who wanting to learn about core.async, this talk is a great place to start.

In the talk David demonstrated a process function that used one channel as a control, to turn off and on output on another channel. He did not have time to go into detail about how it worked, so I wanted to build something with it to make sure I understood it.

A pair of chess clocks is a system that has two processes that are turned off and on by pushing buttons. I used Om for the rendering. You can see the complete code on github.

The operation of each clock is represented by a function called counter. It takes an om cursor with the time it will be counting and a control channel which turns the counter off and on.

(defn counter [app control]
  (go
   (loop [clock-state (cycle [:off :on])]
     (do
       (let [t (timeout 1000)
             [v c] (alts! [t control])]
         (cond
          (and (= (first clock-state) :on)
               (= c t))
          (do
            (om/transact! app :time minus-sec)
            (recur clock-state))
          (and (= (first clock-state) :off)
               (= c control)
               (= v :start))
          (recur (next clock-state))
          (and (= (first clock-state) :on)
               (= c control)
               (= v :stop))
          (recur (next clock-state))
          (and (= c control)
               (= v :end))
          (.log js/console "game over")
          :else
          (recur clock-state)))))))

Each time through the loop the function listens on the control channel and also sets a timeout for 1 second. If there is a message on the control channel, the clock state is adjusted and the loop repeats. If a timeout occurs and the clock is on, om/transact! is called to subtract a second from the cursor. If the timeout occurs when the clock is not on, then it repeats the loop without subtracting the second. If there is an :end message in the control channel, then the loop exits.

The UI for each clock is defined in the clock-view function. As an om component it gets its state from the cursor passed in as the first argument, but it can also have state passed in an option map by its parent function.

(defn clock-view [app owner]
  (reify
    om/IWillMount
    (will-mount [_]
      (let [input (om/get-state owner :input)
            ctrl (counter app input)]))
    om/IRenderState
    (render-state [this {:keys [master]}]
      (let [tag (:tag app)]
        (dom/div #js {:className "clock"}
                 (dom/h3 nil (str "Player " (name tag)))
                 (dom/h3 #js {:className "clockface"} (time->string (:time app)))
                 (dom/button #js {:onClick
                                  (fn [e] (put! master tag))} "Move"))))))

The local state for the clock view contains two channels. The input channel is used to by the counter function that controls the clock. In om the appropriate place for controls is the IWillMount protocol. The master channel is needed only at rendering time, where it is used in an event handler for the clock’s Move button.

Messages are put onto the control channel for the clocks by the switch-clock function.

(defn switch-clock [tag wc bc msgchan]
  (go
   (cond
    (= tag :white)
    (do
      (>! wc :stop)
      (>! bc :start)
      (>! msgchan "Black to move"))
    (= tag :black)
    (do
      (>! wc :start)
      (>! bc :stop)
      (>! msgchan "White to move"))
    (= tag :end)
    (do
      (>! wc :end)
      (>! bc :end)
      (>! msgchan "Game over")))))

The possible tags are :white for the button on the white clock being pressed, :black for the button on the black clock and :end. The game can end when either the End Game button is pressed, or when either clock runs out of time.

The parent of the clock views is the board-view function. The board view creates tells the clock-view function where to draw itself and passes each clock its portion of the cursor and creates the component local state with the channels that each component needs for communication.

(defn board-view [app owner]
  (reify
    om/IInitState
    (init-state [_]
      {:white-control (chan)
       :black-control (chan)
       :message (chan)})
    om/IWillMount
    (will-mount [_]
      (let [main (om/get-state owner :main-control)
            message (om/get-state owner :message)
            wc (om/get-state owner :white-control)
            bc (om/get-state owner :black-control)]
        (go (loop []
              (let [tag (<! main)]
                (switch-clock tag wc bc message)
                (recur))))
        (go (loop []
              (let [msg (<! message)]
                (om/update! app [:msg] msg)
                (recur))))))
    om/IRenderState
    (render-state [this {:keys [white-control black-control main-control message]}]
      (dom/div nil
               (dom/h2 #js {:className "header"} "Chess Clocks")
               (dom/h3 #js {:className "message"} (:msg app))
               (dom/button #js {:onClick
                                (fn [e] (put! main-control :end))}
                           "End Game")
               (dom/div #js {:className "clocks"}
                        (om/build clock-view (:white-clock app)
                                  {:init-state {:master main-control
                                                :input white-control}})
                        (om/build clock-view (:black-clock app)
                                  {:init-state {:master main-control
                                                :input black-control}}))))))

The control channels for the white clock and black clock are created in the IInitState implementation and will be passed to the individual clocks and to the switch clock function. switch-clock also needs a channel to write messages that will be displayed to the user.

The IWillMount function uses a fourth channel, :main-control that is passed in from the root function. The main-control channel is where messages are written when the buttons are clicked or when time runs out. The first go loop listens for messages on the main control channel and sends them to the switch-clock function. The second go loop listens for messages to be displayed to the user.

IRenderState gets a reference to each of the channels it needs. Whether they were created in IInitState or passed in the root function they are contained in the state map.

The main channel gets written to in the event handler for the End Game button. This channel is also passed to the clock-view function, because each clock view contains a button that writes to the channel.

The main-control channel is created outside of the board-view function, because in addition to the buttons on the UI, there is one other event that needs to write to it. When either clock runs out the game needs to end. This is accomplished using the :tx-listen option in the om/root function.

(om/root
  board-view
  app-state
  (let [main (chan)]
    {:target (. js/document (getElementById "app"))
     :init-state {:main-control main}
     :tx-listen (fn [update new-state]
                  (when (= (:new-value update) {:min 0 :sec 0})
                   (put! main :end)))}))

:tx-listen allows you to specify a function that will be called whenever the root app-state changes. The function takes two arguments. The first contains information about the update, with tags :path, :old-value :new-value, :old-state and :new-state. The second parameter is the cursor in its new state.

In this case, I check the new value in the update to see if the time has run out, so that I can end the game.

One problem I had that I want to call attention to is the difference between a map and a cursor. Within the render phase of a function, app is effectively a map, and the data elements can be accessed with the appropriate key. Outside of the render phase, the values inside of the cursors are available only after dereferencing the cursor.

Om's introductory tutorial explicitly calls attention to the fact that outside of the render phase you have to deref the cursor, but in an earlier version of my code I had a go loop, inside of a let, inside of a function, that was called by a component; and I had trouble keeping straight whether I was using a cursor or its data on any given line.

I did not play with it enough to give a detailed explanation, but hopefully just being aware of the issue can save you some debugging time.

Asynchronous Naiveté

2014-04-09T20:06:00.001-07:00

This post has been heavily edited. Initially it was describing some behavior I didn't expect using core.async in both ClojureScript and Clojure. It turns out the behavior I observed in ClojureScript was a bug, that has now been fixed. The Clojure and ClojureScript versions can still behave differently than each other, depending on the lifetime of the main thread in Clojure. I have rewritten this to focus on that.

I created a buffered channel in ClojureScript, and tried to put 3 values into it:

(let [c (chan 1)]
  (go
   (.log js/console (<! c)))
  (go
   (>! c 1)
   (.log js/console "put one")
   (>! c 2)
   (.log js/console "put two")
   (>! c 3)
   (.log js/console "Put all my values on the channel")))

The result:

put one
put two
1

Running the equivalent code in a Clojure REPL

  (let [c (chan 1)]
    (go
     (println (<! c)))
    (go
     (>! c 1)
     (println "put one")
     (>! c 2)
     (println "put two")
     (>! c 3)
     (println "put all of my values on the channel")))

Gives the same 3 lines printed out though which line has the number 1 can change from invocation to invocation.

put one
1
put two

When I put this in a -main method and ran it with lein run, I got no results at all.

Adding Thread/sleep the -main function causes the expected results to return. Though, again which order the 1 gets printed in can vary.

(defn -main [& args]
  (let [c (chan 1)]
    (go
     (println (<! c)))
    (go
     (>! c 1)
     (println "put one")
     (>! c 2)
     (println "put two")
     (>! c 3)
     (println "put all of my values on the channel"))
    (Thread/sleep 1000)))

put one
put two  
1

Putting the main thread to sleep gives the go blocks time to execute. For confirmation of what is happening, we can check what thread each operation is happening on.

(defn -main [& args]
  (let [c (chan 1)]
    (go
     (println (<! c))
     (println  (str "out " (.getId (Thread/currentThread)))))
    (go
     (>! c 1)
     (println "put one")
     (println (str "in1 " (.getId (Thread/currentThread))))
     (>! c 2)
     (println "put two")
     (println (str "in2 " (.getId (Thread/currentThread))))
     (>! c 3)
     (println "put all of my values on the channel"))
    (Thread/sleep 1000)
    (println (str "main  " (.getId (Thread/currentThread))))))

When I ran it I had the first go block on thread 10, the second on 11 and the main thread of execution on thread 1.

put one
in1 11
put two
in2 11
1
out 10
main  1

If we remove the Thread/sleep expression, but leave the calls to println, our output is:

main  1

This is just demo code. I don't know when in production systems your main thread would be likely to exit before asynchronous tasks are complete.

I do think it is interesting that the blocks of code I have run have executed in the same order every time in ClojureScript, but that the order can vary when running on the JVM. I do not know if that would be true of more complicated examples or on different browsers. I have done all of my tests with Google Chrome.

Simulated stock ticker with core.async

2014-03-28T15:10:00.000-07:00

I created a simulated stock ticker in Clojure using core.async.

(ns ticker.core
  (:require [clojure.core.async
             :refer [chan ! timeout go]
              :as async]))

In core.async functions in your program communicate over channels. There are several ways to put items into a channel and take them off, but in this project only uses >! and <!. These functions must be called within a 'go block'.

<! takes a value off of a channel when one is available. If no value is available, execution within the go block is suspended until a value becomes available, at which time execution resumes.

>! does just the opposite. It puts a value onto a channel when a channel can accept a value, and suspends if the channel is not ready to accept a value. An unbuffered channel can only accept a new value when another block is trying to take a value off of the channel. Channels can use buffers to accept values before readers are ready for them, but this project doesn't use them.

The other core.async function that I refer is timeout. timeout creates a channel that closes after a specified number of milliseconds. Reading from a closed channel immediately returns nil. As the name suggests, this can be used to implement timeouts, but it can also be used to create delays.

I have a couple of utility functions for adding some randomness to the price and timing of transactions. I also have a function that creates a map to represent each transaction.

(defn adjust-price [old-price]
  (let  [numerator (- (rand-int 30) 15)
         adjustment (* numerator 0.01M)]
    (+ old-price adjustment)))

(defn random-time [t]
  (* t (+ 1 (rand-int 5))))

(defn new-transaction [symbol price]
  {:symbol symbol
   :time (java.util.Date.) 
   :price price})

The make-ticker function takes a stock symbol, a minimum number of milliseconds between transactions and a starting price. It returns a channel that will have a new transactions placed on it after a random interval as long as there is a listener to take the transactions off of the channel.

(defn make-ticker [symbol t start-price]
  (let [c (chan)]
    (go
     (loop [price start-price]
       (let [new-price (adjust-price price)]
         (<! (timeout (random-time t)))
         (>! c (new-transaction symbol new-price))
         (recur new-price))))
    c))

This function creates a channel, and then sets up an infinite loop that puts messages on to that channel. In terms of execution order what really happens is the channel is created and returned, a new price is calculated and the timeout is encountered. After the timeout has expired, a new value is put on the channel and we repeat the loop.

One thing that can get tricky with core.async is the lifetime of the channels. It is important to create the main channel 'c' outside of the loop because it needs to exist for the entire lifetime of the ticker. If it is created inside of the loop, each message will be on a new channel.

The timeout channel must be created inside of the loop. Each timeout is for a single use. If the timeout were created outside of the go loop, we would wait for the timeout channel to close during the first iteration. Later attempts to read the same channel would return immediately.

I created a collection of stocks symbols along with arbitrary values to use for the time interval and starting price.

(def stocks [ ;; symbol min-interval starting-price
             ["AAPL" 1400 537 ]
             ["AMZN" 4200 345]
             ["CSCO" 400  22]
             ["EBAY" 1200 55]
             ["GOOG" 8200 1127]
             ["IBM" 2200  192]
             ["MSFT" 500 40]
             ["ORCL" 1000 39]
             ["RHT" 10200  53]
             ["T" 600 35]])

Each stock symbol will have its transactions created on its own channel. For the ticker, we want to create a single channel that combines the outputs of each stock's channel. the merge function does exactly that, it takes several channels as inputs, and combines their outputs into a single channel.

(defn run-sim []
  (let [ticker (async/merge
                (map #(apply make-ticker %) stocks))]
    (go
     (loop [x 0]
       (when (< x 1000)
         (do (println (str x "-" (<! ticker)))
             (recur (inc x))))))))

This function creates channels for each of the stock symbols and combines their outputs into a channel called ticker. It then creates a loop within a go block that will run until 1000 transactions have been printed out.

I chose to call the merge function with async/merge instead of bringing merge in with :refer. The other core.async functions have names that make it clear that they pertain to core.async. Merge is a name that could apply to lots of different things, so I wanted to be explicit.

Ohms Law Calculator in Om

2014-03-14T12:28:00.000-07:00

Reading David Nolen's tweets and blog posts about his new ClojureScript UI library, Om got me excited to try it out. If you haven't read about it yet, The Future of JavaScript MVC Frameworks is a good place to start.

David has also written a couple of tutorials. I was working through the introductory tutorial. I stopped short of the section on "Higher Order Components" to see if I could I apply what I had learned to that point.

I created a very simple Ohm's Law calculator. I borrowed liberally from David's code but there are enough differences that a comparison may be informative. I have put the code out on github.

I hope this is a useful supplement to David's tutorial, but it may make little sense on its own.

With Om you build your user interface in components. Each component is defined with om/root. root takes a component constructor function, an associative data structure that contains application data, and a map of options for the rendering functions, one of which must be the target dom element to be rendered.

(def app-state (atom {:result ""}))

(om/root
calculator-view
  app-state
  {:target (. js/document (getElementById "app"))})

The function passed to om/root takes as arguments application state and what the documentation calls the "owning pure node" and returns a reified object that must implement either IRender or IRenderState, and may implement other life cycle protocols. You will see second parameter is called 'owner' which is created by Om. It is this owner that contains component local state, as you will find in the entry-pane function later on.

Components built with om/root may be composed of smaller components. These subcomponents are created with the om/build function. Like root, build takes a two argument function that returns an object that implements either IRender or IRenderState and optionally other protocols. Build also needs to be passed in the state the component requires, and may be passed a map of options.

In David's contact list example, the display for each contact was rendered by its own subcomponent. The state for each was a map containing details for a single contact. He constructed these components with om/build-all, passing it his constructor function and a vector of contacts. om/build-all maps om/build over the vector of individual contacts.

In my example, I am constructing two subcomponents, neither of which display a sequence of data, so I call om/build twice.

(defn calculator-view [app owner]
  (reify
    om/IRender
    (render  [this]
      (dom/div nil
               (om/build entry-pane app)
               (om/build result-pane app)))))

(defn result-pane [app owner]
  (reify
    om/IRender
    (render [this]
      (dom/div #js {:className "result-pane"}
               (dom/label #js {:className "result"} (:result app))))))

In David's example, his components both implemented IRenderState. That was because his contact-view created a core.async channel that was placed in each component's state to allow for delete messages to be sent from the child components to the parent. My calculator-view and result-pane share nothing except the app-state which is available in IRender.

My entry-pane function does rely on component state. I copied David's implementation of handling button clicks with a channel and the keyboard events with an anonymous function that calls a handle-change function. The channel and the values of the text box are all state that is needed only within the component.

(defn entry-pane [app owner]
  (reify
    om/IInitState
    (init-state [_]
      {:calculate (chan)
       :ohms ""
       :amps ""
       :volts ""})
    om/IWillMount
    (will-mount [_]
      (let [calculate (om/get-state owner :calculate)]
        (go (loop []
              (let [inputs (<! calculate)]
                (om/transact! app
                              (fn [xs]
                                (assoc xs :result (do-calc inputs))))
                (recur))))))
    om/IRenderState
    (render-state [this state]
      (dom/div #js {:className "entry-pane"}
       (dom/h3 nil "Ohm's Law Calculator")
       (dom/label nil "Resistance")
       (dom/input #js {:type "text" :ref "ohms"
                       :value (:ohms state)
                       :onChange #(handle-change % owner :ohms state)})
       (dom/label nil "Current")
       (dom/input #js {:type "text" :ref "amps"
                       :value (:amps state)
                       :onChange #(handle-change % owner :amps state)})
       (dom/label nil "Voltage")
       (dom/input #js {:type "text" :ref "volts"
                       :value (:volts state)
                       :onChange #(handle-change % owner :volts state)})
       (dom/button #js {:className "button"
                        :onClick
                        (fn [e] (put! (:calculate state) state))}
                   "Calculate")))))

When the user clicks the calculate button the click handler puts the state map onto the calculate channel. The go block in the IWillMount implementation takes the state map from the channel and passes it to the calculate function which returns a string. The :result is set in the app state using the om/transact!. This mutates the state and triggers a re-render.

The handle-change function calls om/set-state! to update the local state with the data entered by the user. set-state! mutates the data and triggers a re-render just like transact! does. The difference is set-state! operates on state contained within a component.

(defn handle-change [e owner key state]
  (let [value (.. e -target -value)
        text (key state)
        allowed (set (seq (str (range 10))))]
    (if (every? allowed (seq value))
      (om/set-state! owner key value)
      (om/set-state! owner key text))))

The concepts in Om are still new to me. I have explained my rationale for the code I have written. If you are aware of any mistakes, please leave a comment letting me know where I have gone wrong.

ML Class Notes: Lesson 1 - Introduction

2014-03-08T15:06:00.000-08:00

I am taking the Machine Learning class at Coursera. These are my notes on the material presented by Professor Ng.

The first lesson introduces a number of concepts in machine learning. There is no code to show until the first algorithm is introduced in the next lesson.

Machine learning grew out of AI research. It is a field of study that gives computers the ability to learn algorithms and processes that can not be explicitly programmed. Computers could be programmed to do simple things, but doing more complicated things required the computer learn itself. A well posed learning program is said to learn some task if its performance improves with experience.

Machine Learning is used for a lot of things including data mining in business, biology and engineering; performing tasks that can't be programmed by hand like piloting helicopters or computer vision; self-customizing programs like product recommendations; and as a model to try to understand human learning.

Two of the more common categories of machine learning algorithms are supervised and unsupervised learning. Other categories include reinforcement learning and recommender systems, but they were not described in this lesson.

Supervised Learning

In supervised learning the computer taught to make predictions using a set of examples where the historical result is already known. One type of supervised learning tasks is regression where the predicted value is in a continuous range (the example given was predicting home prices). Other supervised learning algorithms perform classification where examples are sorted into two or more buckets (the examples given were of email, which can be spam or not spam; and tumor diagnosis which could be malignant or benign.)

Unsupervised Learning

In unsupervised learning, the computer must teach itself to perform a task because the "correct" answer is not known. A common supervised learning task is clustering. Clustering is used to group data points into different categories based on their similarity to each other. Professor Ng gave the the example of Google News, which groups related news articles, allowing you to select accounts of the same event from different news sources.

The unsupervised learning discussion ended with a demonstration of an algorithm that had been used to solve the "cocktail party problem", where two people were speaking at the same time in the same room, and were recorded by two microphones in different parts of the room. The clustering algorithm was used to determine which sound signals were from each speaker. In the initial recordings, both speakers could be heard on both microphones. In the sound files produced by the learning algorithm, each output has the sound from one speaker, with the other speaker almost entirely absent.

Take the Machine Learning Class at Coursera

2014-02-23T07:47:00.000-08:00

Coursera is offering its Machine Learning course again, beginning March 8, and I highly recommend it. You already know the obvious, that it is a course on an incredibly timely career skill and it is free, but until you take the course you can't know just how good the course really is.

You will learn how to write algorithms to perform linear regression, logistic regression, neural networks, clustering and dimensionality reduction. Throughout the course Professor Ng explains the techniques that are used to prepare data for analysis, why particular techniques are used, and how to determine which techniques are most useful for a particular problem.

In addition to the explanation of what and why, there is an equal amount of explaining how. The 'how' is math, specifically linear algebra. From the first week to the last, Ng clearly explains the mathematical techniques and equations that apply to each problem, how the equations are represented with linear algebra, and how to implement each calculation in Octave or Matlab.

The course has homework. Each week, there is a zip file that contains a number of incomplete matlab files that provide the structure for the problem to be solved, and you need to implement the techniques from the week's lessons. Each assignment includes a submission script that is run from the command line. You submit your solution, and it either congratulates you for getting the right answer, or informs you if your solution was incorrect.

It is possible to view all of the lectures without signing up for the class. Don't do that. Sign up for the class. Actually signing up for the class gives you a schedule to keep to. It also allows you to get your homework checked. When you watch the lectures, you will think you understand the material; until you have done the homework you really don't. As good as the teaching is, the material is still rigorous enough that it will be hard to complete if you are not trying to keep to a schedule. Also, if you complete the course successfully, you will be able to put it on your resume and LinkedIn profile.

You have the time. When I took the class, there was extra time built in to the schedule to allow people who started the course late to stay on pace. Even if you fall behind, the penalty for late submission is low enough that it is possible to complete every assignment late and still get a passing grade in the course.

I am going to take the course again. I want to make review the material. I also want to try to implement the homework solutions in Clojure, in addition to Octave. I will be posting regularly about my progress.

You may also be able to find a study group in your area. I decided to retake the course when I found out that there was going to be a meetup group in my area. Even without a local group, the discussion forums are a great source of help throughout the class. The teaching assistants and your classmates provide a lot of guidance when you need it.

Refactoring Blob Store Access

2014-01-31T09:05:00.001-08:00

My last post was a translation of Microsoft's examples of accessing the Windows Azure blob storage from Java to Clojure. The post consisted of a series of interop calls without any context.

I thought it would be interesting to see how they looked in an application. The project I am building is a file backup program.

In the Microsoft examples, each example began with creating a connection string and a reference to the blob store container. In my translation, I just created set up the reference to the container once at the top of the file.

blobstore.clj

I decided to hold the reference to the container in a closure. The container function takes a connection string and container name, uses the Azure sdk classes to build the reference to the container, create the container in Azure if it doesn't already exists, and returns a map of functions that can be executed on the container.

(defn container [conn-str container-name]
  (let [ctr
        (-> conn-str
            CloudStorageAccount/parse
            .createCloudBlobClient
            (.getContainerReference container-name))]
    (.createIfNotExist ctr)

    {:upload (fn [{:keys [file target]}]
               (let [blob-ref (.getBlockBlobReference ctr target)]
                 (with-open [r (FileInputStream. file)]
                   (.upload blob-ref r (.length file)))))

     :download (fn [{:keys [blob target]}]
                 (do
                   (.mkdirs (.getParentFile (File. target)))
                   (with-open [w (FileOutputStream. target)]
                   (.download blob w))))

     :find-blob (fn [blobname]
                  (.getBlockBlobReference ctr blobname))

     :delete (fn [blob]
               (.delete blob))

     :remove-container (fn []
                         (.delete ctr))

     :blob-seq (fn []
                 (filter #(instance? CloudBlockBlob %)
                  (tree-seq
                   (fn [f] (not (instance? CloudBlockBlob f)))
                   (fn [f] (.listBlobs f))
                   ctr)))

     :delete-container (fn []
                         (.delete ctr))
     }))

There is one other change I want to call attention to. Testing the code in the repl, I discovered that the FileOutputStream in the download function was keeping a connection to the file on the file system. I assume the FileInputStream in the upload function works the same way. To fix this, I used the with-open macro to cleanup the streams when I was done using them.

I created a new code file to hold this function. I wanted my core.clj file to make the decisions about what needed to be done, but to know nothing about how anything would be done.

core.clj

My ns declaration looks like:

(ns filer.core
  (:gen-class)
  (:require [filer.config :as config]
            [filer.blobstore :as store]
            [filer.filestore :as files]))

Jumping to the bottom of the file the main function causes one of three actions to be taken: The default is for file system folders specified in a configuration file to be backed up to the blob store; a specified blob container can be downloaded to a restore folder specified in the config file; or a blob container can be deleted.

(defn -main [& args]
  (cond
   (= "delete" (first args))
     (delete-blobs (store/container config/conn-str (second args)))
   (= "restore" (first args))
    (restore-folder config/restore-folder (store/container config/conn-str (second args)))
   :default
     (doseq [p config/back-folders]
       (backup-folder p (make-container p)))))

The backup function operates on a collection of folders to backup. Each root folder is stored in a separate container in the blob store. To create a naming system for my backup containers, I added a function to my config.clj file that returns a container name to use based on the file system folder and the date.

The call to subs in the upload-settings function is to strip off the part of the file name that pertains to the root folder, which is already represented by the name of the container the file is being put into. Looking at it now, this definitely violates my goal of separating what to do from how to do it. I may want to move my whole upload settings function into the blobstore.clj file, but certainly the details of translating file system names to blob store names does not belong here.

(defn make-container [root-folder]
  (store/container config/conn-str
    (config/container-name root-folder)))

(defn upload-settings [f root-folder]
  {:file f
   :target (subs (.toString f) (inc (count root-folder)))})

(defn upload-file [file container root-folder]
  ((:upload container) (upload-settings file root-folder)))

(defn backup-folder [folder container]
  (doseq [f (files/all-files folder)]
    (upload-file f container folder)))

The restore function is similar to the backup function, in that both walk through a sequence of files on a source system, determine their name on the destination system and then call the appropriate function on the container.

The primary difference is that backing up files is done for a collection of root folders, which each get their own container, so I need a function to execute for each folder. The program is set up to only restore a single container specified as a command line argument. The -main function gets the single reference to the container, and passes it to the restore folder.

(defn get-destination [blob folder]
  (files/ms-name
   (str folder "/" (.getName blob))))

(defn download-settings [blob folder]
  {:blob blob
   :target (get-destination blob folder)})

(defn restore-folder [folder container]
  (doseq [f ((:blob-seq container))]
    ((:download container) (download-settings f folder))))

The delete function is the simplest of all. Deleting a container also deletes all of the files in it. The delete container function could be called straight from -main, but for now it is its own function.

(defn delete-blobs [ctr]
  ((:delete-container ctr)))

Thoughts about this design

Creating the container in one place and then returning a map of functions that reference the container works pretty well. The one bit of awkwardness is that it means that all of the functions have to be invoked with double parentheses. The inner set is for the lookup on the map, the outer set invokes the returned function.

I can make the code look better by binding the function to a symbol in a let, and then using that symbol in the function call. For the restore-folder function it should also help performance some.

(defn restore-folder [folder container]
  (let [downfn (:download container)]
    (doseq [f ((:blob-seq container))]
      (downfn (download-settings f)))))

Now I look up the function only once, and then use the same function for every file I download. The cost of looking up a function compared to the cost of downloading a file is minimal, so I will think about it for a while, and keep the version I decide looks better.

Using a .clj file for my configuration file was a pretty obvious choice. Clojure is a superset of edn, so I could probably make use of tagged elements, but I just used def statements. The functions to provide standardized folder names seemed right at home here.

All of the calls in this program are synchronous calls. In many applications it makes sense to make calls out to the file system or the cloud asynchronous. For this application, however, I don't think it would add much. This is a program that is meant to be called from the command line, with no user interface to block. At one point, I did have an asynchronous version of my upload function but I didn't think it added much besides complexity.

Conclusion

Writing this post I found several errors in my code, and a couple of ways that I could have expressed things differently. Adding let bindings for the function lookups seems obvious now, but I hadn't yet thought of it an hour ago.

Thank you to anyone who reads this post. I hope you have gotten some benefit from seeing my thought process. I will be doing more posts like this in the future. And if you don't benefit from these posts, sorry about the noise. :)

Azure Blob Storage from Clojure

2014-01-20T20:23:00.000-08:00

I need to write some files to Windows Azure Blob Storage. Windows Azure provides libraries in several languages and there is also a REST API.

Each version of the library comes with very good documentation. It explains how blob storage works, explains how to create an account and walks you through adding, downloading and deleting the blobs. The Java version is at http://www.windowsazure.com/en-us/documentation/articles/storage-java-how-to-use-blob-storage/

The project I am working on is in Clojure. Translating the examples is pretty straight forward.

The Java libraries are on Maven Central, so I added a dependency to my project.clj.

[com.microsoft.windowsazure/microsoft-windowsazure-api "0.4.6"]

The classes we need to import classes from the windowsazure services.core.storage package and the services.blob.client packages. The examples also use several classes from java.io. I also created a config.clj file in my src directory to hold my account name and key. So, my core.clj ns macro is:

(ns azure-blob-test.core
  (:require [azure-blob-test.config :as config])
  (:import [com.microsoft.windowsazure.services.core.storage CloudStorageAccount]
           [com.microsoft.windowsazure.services.blob.client CloudBlobClient CloudBlobContainer CloudBlockBlob CloudBlob BlobContainerPermissions BlobContainerPublicAccessType]
           [java.io File FileInputStream FileOutputStream]))

Each of the examples begin with creating a reference to a blob container. I think they are doing that so that is someone just looks at how to add a blob, they have a complete example. Really, you just create it once:

(def conn-str
  (str "DefaultEndpointsProtocol=http;"
         "AccountName=" config/account-name
         ";AccountKey=" config/account-key ";"))

(def container
  (-> conn-str
      CloudStorageAccount/parse
      .createCloudBlobClient
      (.getContainerReference "mycontainer")))

(.createIfNotExist container)

Setting the container permissions looks clunky to me in both languages. Setting public access means that people do not need an account key to access the folder. Actually, not what I want for my application, but for the sake of completeness:

(let [container-permissions (BlobContainerPermissions.)]
  (.setPublicAccess container-permissions BlobContainerPublicAccessType/CONTAINER)
  (.uploadPermissions container container-permissions))

For testing the upload, I decided to use project.clj, since I know it will be in the root folder when I am running in the REPL.

(let [blob-ref (.getBlockBlobReference container "project.clj")
      source-file (File. "project.clj")]
  (.upload blob-ref (FileInputStream. source-file) (.length source-file)))

Rather than print out a list of blobs in a container, I decided to write a function that returns a sequence of blobs. The filter checks for instances of CloudBlob to exclude virtual directories. And then I use that function to download all of the blobs. I append .bak to the filenames because I don't want to overwrite my project.clj, even if it is the same (superstitious?).

(defn blob-list [container]
  (filter #(instance? CloudBlob %) (.listBlobs container)))

(doseq [blob (blob-list container)]
  (.download blob (FileOutputStream. (str (.getName blob) ".bak"))))

And finally to clean up:

;; Delete the blob
(.delete (.getBlockBlobReference container "project.clj"))

;; Delete container
(.delete container)

Learning functional programming at Coursera

2013-10-18T19:22:00.000-07:00

I am currently taking Martin Odersky's course Functional Programming Principles in Scala on Coursera. This is my first time taking a course from Coursera. At the same time I signed up for this course, I also signed up for a course on Reactive Programming that Odersky will be teaching with Erik Meijer and Roland Kuhn beginning November 4.

There are hundreds of courses available on all sorts of subjects like humanities, science, engineering, and of course computer science, and all are free. In addition to the Scala course, I have started taking a machine learning course. Its format is the same as the Scala course, so I am going to assume the format is standard. (The machine learning course was the class that launched Coursera, which is another reason to think it is the standard.)

Each week new video lectures are posted. Lectures are typically 10 to 15 minutes long, and the total amount of material each week is 1.5 to 2 hours. There has been a programming assignment each of the first 4 weeks. An extra week was provided for the 4th assignment, and after watching the week 5 lectures, it was clear that the assignment covered material from both weeks.

After completing each assignment, it is submitted by using a 'submit' command in Scala's Simple Build Tool. After a couple of minutes, you can go to the assignment page on the course website and see your grade. 80% of each grade comes from passing automated tests, and 20% comes from a style checker, which will take points for using mutable state or nulls. You can submit each assignment up to 5 times with only the highest score being counted. (After that you can continue to submit the assignment, and you will receive the same feedback on your work, but you will not get credit for it.) You need to achieve an average of 70% on the homework to receive a certificate of completion for the course.

I really enjoy the format of the lectures. Some of the time Odersky is talking in front of the camera, but most of the time there are slides up on the screen. He is able to write on the slide. The translucent image of his head as he leans over a slide, or his hand and pen as he writes is a really minor feature that somehow makes the video more interesting to watch. From time to time, the video is paused while a question appears on the screen. Some questions are multiple choice and you submit an answer before moving on. Others are open ended (how would you right a function that…) and you are left to try it on your own, but there is nothing to submit before you hit continue. Odersky then proceeds to provide a complete explanation of the solution.

The quality of the teaching is excellent. The course builds a foundation by teaching the substitution method of function evaluation (which if I had learned before, I have forgotten it), then moves on to recursion, higher order functions and currying. Because Scala is a hybrid functional/object oriented language, there has also been a lot of discussion of object hierarchies and Scala's type system. Pattern matching, tuples and lists have also been covered.

I have found all of the assignments to be challenging. The format is great. You download a zip file that contains a skeleton for the solution and a series of test cases. The tests don't cover the whole assignment but they provide a good start, and give guidance on how to write additional tests. The first week I spent a lot of time, because I decided to read Scala for the impatient until I knew enough syntax to solve the problem. (It would have been faster if lists had been covered before chapter 13). After that, I would estimate that I have spent 6 or 7 hours per week on the assignments.

I believe that I am learning the material better through the course than I would reading a book. I have a tendency when reading a book to skim parts that don't interest me as much, or somehow I think aren't relevant to things I am likely to do. Also, the graded homework mean that I have to stick to a problem until I get it right, rather than until I think I know what I am doing.

I did have a little apprehension at first because the course assumes that you are going to be working with Eclipse, which I have just never really gotten the feel for. I remembered setting up Scala, SBT and Eclipse to be challenging. The course provided clear written instructions and video instructions for installing all of the necessary tools, with all of the appropriate download links.

The workload is not trivial, but I highly recommend taking classes at Coursera. The teaching is excellent. The variety of courses is amazing. I am very grateful to them for making such wonderful resources available for free.

On Lisp in Clojure chapter 11 (11.3 - 11.6)

2012-06-25T11:01:00.000-07:00

I am continuing to translate the examples from On Lisp by Paul Graham into Clojure. The examples and links to the rest of the series can be found on github.

Section 11.3 Conditional Evaluation

Graham makes such a great point in this section that I wanted to start with it: "When faced with a choice between a clean idiom and an efficient one, we go between the horns of the dilemma by transforming the former into the latter."

I decided to write Graham's if3 macro using true, false and nil for 3 value truthiness. Normally in an if statement nil and false are evaluated the same. The nif macro, is very similar. In the Clojure form, the test is evaluated in a let binding, so that it only has to be evaluated once.

(defmacro if3 [test t-case nil-case f-case]
  `(let [expr# ~test]
     (cond
      (nil? expr#) ~nil-case
      (false? expr#) ~f-case
      :default ~t-case)))

(defmacro nif  [expr pos zero neg]  `(let [expr# ~expr]
    (cond
     (pos? expr#) ~pos
     (zero? expr#) ~zero
     :default ~neg)))

Graham presents us with an `in` macro which tests for membership using a series of tests of equality joined in an or expression.

(defmacro in? [needle & haystack]  ( let [target# needle]
    `(or ~@(map (fn [x#] `(= ~x# ~target#)) haystack))))

(macroexpand-1
 '(in? 1 1 2 3))

;;(clojure.core/or (clojure.core/= 1 1) (clojure.core/= 2 1) (clojure.core/= 3 1))


;; Just to make sure it is working the way we hope
(in? 1 1 (do (println "called") 2) 3)

The Clojure function `some` uses `or` recursively to find the first match.

Clojure's lazy sequences provide another way to get the same functionality. The member? function below returns the first match in the sequence. The argument list is different in this implementation because i wanted the caller to pass a collection, rather than several elements that I wrap in to a collection, because this allows me to work with an infinite collection, such as all positive integers.

;; lazy function
(defn member? [needle haystack]
  (take 1 (filter (partial = needle) haystack)))

(member? 2 (iterate inc 1) )

in-f is almost the same as in?, except that it allows us to pass the function to use for the comparison.

(defmacro in-if [func needle & haystack]
  (let [target# needle]
    `(or ~@(map (fn [x#] `(~func ~x# ~target#)) haystack))))

Graham creates a >case macro that applies a case statement with expressions as keys instead of constants. Each key will be evaluated until a match is found. Once a match is found, no more keys will be evaluated. Clojure's cond statement already behaves like that.

(cond
 (do (println "First cond") false) (println "one")
 (do (println "Second cond") true) (println "two")
 (do (println "Third cond") true) (println "three")
 :default (println "Nothing matched"))

Section 11.4 iteration

Clojure's partition function makes breaking the source parameter into chunks pretty easy. Of course, partition makes the macro easier, it also makes for a short inline invocation. Here is do-tuples-o, followed by an example call to the macro, and an example of writing the map expression directly.

(defmacro do-tuples-o [parms source & body]
  (let [src# source]
    `(map (fn [~parms] ~@body)
          (partition (count (quote ~parms)) 1 ~src#))))

(do-tuples-o [x y] [1 2 3 4 5] (+ x y))

(map
     (fn [[x y]] (+ x y))
     (partition 2 1 [1 2 3 4 5]))

If we use partition in conjunction with cycle, we can create our parameter list that wraps around. The only change we have to make for do-tuples-c is to change partition to partition-round. I also changed my sample call, to show it can work with a function of any arity.

(defn partition-round [size step source]
  (partition size step
             (take (- (+ size (count source)) step)
                   (cycle source))))
(defmacro do-tuples-c [parms source & body]
  (let [src# source]
    `(map (fn [~parms] ~@body)
          (partition-round (count ( quote ~parms)) 1 ~src#))))

(do-tuples-c [x y z] [1 2 3 4 5] (+ x y z))

(map
     (fn [[x y z]] (+ x y z))
     (partition-round 3 1 [1 2 3 4 5]))

Section 11.5 Iteration with Multiple Values

In this section, Graham shows us a macro that executes a do loop that increments several variables in parallel and allows for multiple return values. He then shows us an example of a game that might use this sort of construct to track moving objects.

Multiple return values in a list based language really seems like a non-issue to me.

The sample game that Graham describes looks interesting. I hope to do a Clojure implementation of it soon, and then I will have a better context to evaluate the need for a multi-varibale do.

Section 11.6 Need for Macros

In earlier sections, Graham described using macros for conditional evaluation and iteration. Here he shows us how the some of the same things can be done with functions.

(defn fnif [test then else]
  (if test (then) (else)))

(fnif true (fn [] (do (println "true") (+ 1 2))) (fn []  (do (println "false")  (- 2 1))))

(defn forever [fn]
  (if true
    (do
      (fn)
      (recur fn))
    'done))

#_(forever (fn [] (println "this is dumb")))

We have to wrap the code we want executed within an anonymous function which we invoke when we want the code evaluated. Graham points out that in these situations, the macro solution is much cleaner, if not strictly necessary. He also says simple binding situations can be handled with map, but that more complicated situations are only possible with macros.

On Lisp in Clojure chapter 11 (section 11.2)

2012-06-05T11:19:00.001-07:00

I am continuing to translate the examples from On Lisp by Paul Graham into Clojure. The examples and links to the rest of the series can be found on github.

I am only covering one section in this post, but this one section includes file I/O, exception handling and locking. This is a post about the examples from On Lisp, and so isn't a tutorial on any of these topics. Hopefully, it provides a gentle introduction to each topic.

Section 11.2 The with- macro

The with-open-file macro Graham describes is just with-open in Clojure. It can be used with any resource that implements a close method.

(with-open [writer (clojure.java.io/writer "output-file" :append true)]
  (.write writer "99"))

Clojure has a pair of functions for doing stream I/O. slurp and spit, for reading and writing, both use with-open to manage their streams.

(spit "output.txt" "test" :append true )
(slurp "output.txt")

Graham's unwind-protect becomes a try-catch-finally block, which works just like you would expect.

(try
  (do (println "What error?")
      (throw (Exception. "This Error."))
      (println "This won't run"))
  (catch Exception e (.getMessage e))
  (finally (println "this runs regardless")))

Graham's with-db example combines mutations, locks and exception handling. In his first example, he rebinds *db* to a new value, locks it, uses the new value, releases the lock and resets the value. In Clojure, you can create dynamic variables, but changes to their values only appear in the current thread. For Clojure datatypes locks are unnecessary.

(def ^:dynamic *db* "some connection")

(binding [ *db* "other connection"]
         (println *db*))

Because the value assigned to a var with binding is only visible on the current thread, this will not work with code that you want to execute on a different thread. If we are using mutable Java objects across different threads, locking can come into play. Clojure has a locking macro which accepts the object to lock and the code to be executed.

Strings in Java are immutable, so I am going to use a StringBuilder. Graham's let form becomes something like this:

(def db2 (StringBuilder. "connection"))

(let [old-val (.toString db2)]
  (.replace db2 0 (.length db2) "new connection")
  (locking db2
    (println (.toString db2)))
  (.replace db2 0 (.length db2) old-val))

Clearly Graham's call to with-db is preferable to writing the let form over and over. And, as he points out, it is easy enough to add a try-finally block to make a safer implementation.

(defmacro with-db [db & body]
  `(let [temp# (.toString db2)]
    (try
      (.replace db2 0 (.length db2) ~db)
      (locking db2
        ~@body)
      (finally
       (.replace db2 0 (.length db2) temp#)))))

(with-db "new connection"
   (println (.toString db2)))

Graham also gives an example that uses both a macro and a function, which has most of the work being done in the function inside of a context created by the macro. I am not going to claim to understand how Common Lisp manages memory, or how that is different from Clojure. Instead, I will simply acknowledge his point, that perhaps a macro can be used to create a context, and the rest of the work can be done in a function.

On Lisp in Clojure chapter 11 (section 11.1)

2012-05-25T19:13:00.000-07:00

I am continuing to translate the examples from On Lisp by Paul Graham into Clojure. The examples and links to the rest of the series can be found on github.

Normally I would cover more than just 1 section in a post, but I thought material in this section could stand to be in its own discussion. In addition, the next section has so much in it that it will need its own post.

Section 11.1 Creating Context

The call to let is included for completeness. I thought the definition of our-let was pretty neat. I think it shows pretty clearly what Graham means when he talks about using a macro to create a context.

(let [x 'b] (list x))

(defmacro our-let [binds & body]
  `((fn [~@(take-nth 2 binds)]
       (do ~@body)) ~@(take-nth 2 (rest binds))))

(macroexpand-1 '(our-let [x 1 y 2] (+ x y)))
;; ((clojure.core/fn [x y] (do (+ x y))) 1 2)

(our-let [x 1 y 2] (+ x y))
;; 3

It seems like we get to see the when macro every chapter. It looked to me like the when-bind* was dropping all its variables, so clearly I don't understand it to translate it. In Clojure, gensym is done by appending # to any variable that you name in a macro.

(defmacro when-bind [[var expr] & body]
  `(let [~var ~expr]
     (when ~var
       ~@body)))

Graham showed us a cond-let function that accepted a sequence of pairs containing a boolean expression followed by a let binding. In Graham's implementation he had two helper functions and one macro. I wrote it with one helper function that recursively walks through the boolean expressions until it finds one that is true.

(defn condlet-bind [bindings]
  (loop [binds bindings]
    (cond (empty? binds) []
          (first binds) (first (rest binds))
          :default (recur (rest (rest binds))))))

(defmacro condlet [clauses & body]
  `(let ~(condlet-bind clauses)
     ~@body))

I want to point out the difference between this `condlet` function and Clojure.core's `if-let` function. condlet takes a list containing booleans and bindings, and will apply the first binding for which the boolean is true. if-let says if an expression is true, bind it to a symbol and execute one branch of code; if it is not true don't bind anything, and take a different branch.

In the following pseudo code, if some-expression returns a value other than false or nil, x gets bound to the result and do-something is called. If some-expression evaluates to false or nil, x does not get bound to anything and do-something-else gets called, just like a normal if.

#_(if-let [x (some-expression)]
  (do-something x)
  (do-something-else))

Functional Tic-Tac-Toe

2012-05-22T08:09:00.000-07:00

Recently in the #Clojure channel on Freenode I heard someone say that you could write a tic-tac-toe game without any mutable variables. I had trouble conceiving how that could be possible. I decided to write one to find out.

The full running code is on github. This post will just look at a few of the functions that show how what seems like such an imperative problem can be solved in a functional way.

When I first created the project, I called it stateless-tic-tac-toe, but this was a misnomer. There is state everywhere. Each of the functions get passed the state they need to operate on, and usually the return is an updated version of the state.

Writing individual functions was pretty straight forward. I knew I was going to need a function that took no parameters and returned an empty board. It didn't take me long to realize that I would need to keep track of who's turn it was also.

(defn new-game [] 
  [(into [] (map str (range 1 10))) "X"])

The first big difference between a functional implementation and an imperative one is in the move function. In an imperative version, the board would be stored in a variable or property, and move would be a method that accepted a move as a parameter and modified the board variable.

In the functional version, move will not modify anything. The state of the game is passed in to move, along with the move to be applied. If the move legal, move returns a new version of the board. If the move was illegal then the calling function will get an unmodified board returned.

(defn move [[board player :as game] move]
  (if (legal-input? move)
    (let [board-index (- (Integer/parseInt move) 1)]
      (if (= (nth board board-index) move)
        [(assoc board board-index player)
         (if (= player "X") "O" "X")] 
        game)) game))

Functions like check-winner again show the different style of functional verses imperative. check-winner is passed in the current game state. It passes the current board configuration to functions that check for horizontal, vertical and diagonal lines. Finally, it returns the winner if there is one, or nil, which is evaluated to false, if there is no winner.

(defn check-winner [[board _]]
  (first (concat (check-horizontal board)
                 (check-vertical board) 
                 (check-diaganal board))))

I used a text based UI, which is pretty straight forward. Drawing the board is done with a function that takes in the board vector and returns strings that can be printed out to show the board. Everything was going great until I was ready for the actual event loop.

Using a while loop seemed like such a natural way to structure the game. While the game is not over, redraw the board and ask for a new move. I couldn't find a good way to write that without mutation. Instead of a while loop, I needed to use recursion.

Once I realized that I needed recursion, the function came pretty quickly. I have a function that takes in the game state and draws the board. After that, it checks to see if the game is over. If someone has won the game, or there are no more legal moves, the function prints out a message, and then exits. If the game is not over yet, the play-game function calls the move function and recursively calls itself with the result.

(defn play-game [the-game]
  (loop [game the-game]
    (println (render-board game))
    (if (game-over? game)
          (if-let [winner (check-winner game)]
            (println (str "Congrats to " winner))
            (println "tie game"))
          (recur (move game (get-move))))))

On Lisp in Clojure, chapter 10

2012-05-16T18:25:00.000-07:00

I am continuing to translate the examples from On Lisp by Paul Graham into Clojure. The examples and links to the rest of the series can be found on github.

Chapter 10 presented some interesting problems. Some of them were the result of rampant mutability, and I have skipped those, but much of the rest applies to Clojure.

Stuart Halloway also has written a post on this chapter on his blog.

Section 10.1 Number of Evaluations

The difference between the correct version of the for loop and the multiple evaluation version is pretty straight forward. By binding ~stop to gstop#, stop only gets evaluated once.

;; correct version
(defmacro for' [[var start stop] & body]
  `(let [~var (atom ~start) gstop# ~stop]
     (while (< (deref ~var) gstop#)
         ~@body
         (swap! ~var inc))))

;; subject to multiple evaluations
(defmacro for' [[var start stop] & body]
  `(let [~var (atom ~start)]
     (while (< (deref ~var) ~stop)
       ~@body
       (swap! ~var inc))))

The the problem with the last version of the for loop really belongs in the next section.

;; incorrect order of evaluation
(defmacro for' [[var start stop] & body]
    `(let [gstop# ~stop ~var (atom ~start)]
       (while (< (deref ~var) ~stop)
         (swap! ~var inc)
         ~@body)))

Section 10.2 Incorrect Order of Evaluation

When I read the for macro labeled as having the incorrect order of evaluation, I thought Graham meant that the counter was being incremented at the top of the loop, and so I wrote my loop that way, and thought it was kind of a silly example. Then I tried the call to for in this section. First, Graham shows us an example where the order of evaluation gives an interesting result.

(def x (atom 10))
(+ (reset! x 3) @x)
;; => 6

In the version of the for loop with the incorrect order of evaluation, the stop variable appears first, so that gets evaluated, which sets x to 13. Then, start gets bound to the value of x, which is now 13, so the loop never actually runs. In the correct version of the for loop, in the let expression, the start is bound first, to 1, and then stop is bound to 13. I think Graham is right when he says that a caller has a right to expect start to be evaluated before stop because they appear left to right in the argument list. He is definitely right when he says this is a pathological way to call for'.

(let [x (atom 1)]
  (for' [i @x (reset! x 13)]
        (println @i)))

Section 10.3 Non-functional Expanders

This section shows a lot of awful things that can happen when you mix macros and mutation. But since we are in Clojure, we already avoid mutation when possible. Moving on...

Section 10.4 Recursion

Graham shows us how write a function with recursion and then shows us how to rewrite the same function in a more imperative manner. He does this because in this section he shows a potential pitfall of recursion in macros, and the imperative loop is an alternative.

The imperative version of our-length is a little extra painful in Clojure. Rather than mutating the list and the counter in a while loop, I am going to stick with the recursive function. We will just be careful when we recurse in macros.


(defn our-length [x]
  (loop [lst x acc 0]
    (if (empty? lst) acc
        (recur (rest lst) (inc acc)))))

Graham's ntha function works just fine. Rewriting it as the macro, nthb causes an infinite loop in the macro expansion.

 
(defn ntha [n lst]
  (if (= n 0)
    (first lst)
    (ntha (- n 1) (rest lst))))

(defmacro nthb [n lst]
  `(if (= ~n 0)
     (first ~lst)
     (nthb (- ~n 1) (rest ~lst))))

(macroexpand-1 '(nthb 2 [1 2 3 4 5]))

Graham shows a couple of ways to rewrite nth as a macro that doesn't lead to an infinite loop. I have just rewritten the version with the recursion in the macro.

(defmacro nthe [n lst]
  `(loop [n# ~n lst# ~lst]
     (if (= n# 0)
       (first lst#)
       (recur (dec n#) (rest lst#)))))

Graham also presents a pair of examples of writing a an `or` macro that sidestep the pitfalls of a recursive macro. In the first, the macro calls a recursive function. The second macro does its own recursion. This seems to be less difficult to do safely in Clojure, because recursion is done with `recur` rather than a function calling itself by name.

(defn or-expand [args]
  (if (empty? args)
    nil
    (let [sym (first args)]
       (if sym
         sym
         (or-expand (rest args))))))

(defmacro ora [& args]
  (or-expand args))

(defmacro orb [& args]
  (loop [lst args]
    (if (empty? lst) false
        (let [sym (first lst)]
          (if sym sym (recur (rest lst)))))))

On Lisp in Clojure chapter 9

2012-05-13T05:31:00.000-07:00

I am continuing to translate the examples from On Lisp by Paul Graham into Clojure. The examples and links to the rest of the series can be found on github.

Chapter 9 is presented for the sake of completeness. Clojure's let bindings solve most of the problems Graham describes in this chapter. You can solve the rest of the problems by putting a # at the end of any variable name you don't want to be subject to capture. e.g. x#.

Stuart Halloway also has written a post on this chapter on his blog. He gives a good description of how these capture issues relate to Clojure.

Section 9.1 Macro Argument Capture

We can write Clojure macros that run into some the same argument capture problems.

(defmacro for' [[var start stop] & body]
  `(do
     (def ~var (atom ~start))
     (def limit ~stop)
     (while (< (deref ~var) limit)
       ~@body
       (swap! ~var inc))))

;; seems to work
(for' [ x 1 10] (println (str "current value is " @x)))

;; fails with error
(for' [ limit 1 10] (println (str "current value is" @limit)))

The version that is supposed to fail silently actually works just fine.

(let [limit 5]
  (for' [i 1 10]
        (when (> @i limit)
          (println (str @i) ) )))

Section 9.2 Free Symbol Capture

Translating the examples from this section does not lead to the same errors. The w referred to in simple-ratio is different from the w referred to in gripe. I don't know why the gripe macro doesn't get fooled by the parameter in simple-ratio, but it doesn't.

(def w (atom []))

(defmacro gripe [warning]
  `(do (swap! w #(conj % (list ~warning)))
       nil))

(defn simple-ratio [v w]
  (let [vn (count v) wn (count w)]
    (if (or (< vn 2) (< wn 2))
      (gripe "sample < 2")
      (/ vn wn))))

Section 9.3 When Capture Occurs

The first couple of code snippets just show let binding to describe the free variables. The first capture example is almost identical in Clojure.

(defmacro cap1 []
  `(+ x 1))

The next several capture examples don't apply in Clojure. Even if you do have an x defined in your environment, this macro won't compile:

(defmacro cap2 [var]
  `(let [x 2 ~var 3]
     (+ x ~var)))

When the macro expands, x gets expanded to a namespace qualified symbol, such as user/x, but you can't use let to bind a value to a namespace qualified symbol. In the first example from this chapter, the for loop, I used def to bind my variables, because I wanted to go out of my way to get the error.

The only way to define a cap2 macro in Clojure is like this:

(defmacro cap2 [var]
  `(let [x# 2 ~var 3]
     (+ x# ~var)))

The # symbol after the variable name causes the macro expansion to do a gensym, which will give a unique name to x#, which no doubt is how Graham will have us solve the problem, after he finishes warning us about it.

Section 9.4 Avoiding Capture with Better Names

This section doesn't give any examples.

Section 9.5 Avoiding Capture by Prior Evaluation

The failing call to before doesn't fail in Clojure. Even with the caller using def from within a do block. Probably this is because we can't bind to a namespace qualified symbol in a let. Oh, don't call my position function on a value that doesn't exist in the sequence.

(defn position [needle haystack]
  (loop [acc 0 lst haystack]
    (cond (empty? lst) nil
          (= needle (first lst)) acc
          :default (recur (inc acc) (rest lst)))))


(defmacro before [x y seq2]
  `(let [seq# ~seq2]
     (< (position ~x seq#)
        (position ~y seq#))))

(before (do (def seq2 '(b a)) 'a) 'b '(a b))

Here is the Clojure version of the new improved for loop. Graham defined the code to be executed as a lambda, and then looped over the invocation of that function. I am just going to bind my end value as limit# and move on.

(defmacro for' [[var start stop] & body]
  `(let [~var (atom ~start) limit# ~stop ]
     (while (< (deref ~var) limit#)
       ~@body
       (swap! ~var inc))))

Sections 9.6 and 9.7

We have seen the for loop several times, and we have already mentioned that gensym in CLISP is done with appending a # to the var name. Clojure is broken up into namespaces, not packages. Namespaces make name collisions less common, but not impossible, just as Graham describes for Lisp.

Section 9.8 Capture in Other Name Spaces

I wasn't able to reproduce the error in Clojure. What mac thinks is fun is different than what my let binding does for fun.

(defn fun [x] (+ x 1))

(defmacro mac [x] `(fun ~x))

(mac 10)

(let [fun  (fn [y] (- y 1))]
  (mac 10))

Section 9.9 Why Bother

Graham's answer to the question is a good one. And Rich Hickey's implementation of Lisp solves a lot of the problems for us.

On Lisp in Clojure chapter 8

2012-05-07T06:23:00.000-07:00

I am continuing to translate the examples from On Lisp by Paul Graham into Clojure. The examples and links to the rest of the series can be found on github.

The first two sections of chapter 8 contain a lot of discussion and only a couple of small examples. Section 8.3 is a lot more involved, and I think we do get the first suggestion that maybe macros really are as powerful as those who already know Lisp would have us believe.

Section 8.1 When Nothing Else WIll Do

Graham describes 7 situations in which macros can do things that functions cannot. The text is very informative. The examples aren't significant, but for the sake of completeness:

(defn addf [x]
  (+ x 1))

(defmacro addm [x]
  `(+ 1 ~x))

(defmacro our-while [test & body]
  `(if ~test
     (do
       (swap! ~test not)
       ~@body)))

(defmacro foo [x]
  `(+ ~x y))

Section 8.2 Macro or Function

Graham has a list of 7 points in this section too. This time it is the pros and cons of using a macro instead of a function in a situation when either will do. Interestingly, there are 3 pros and 4 cons.

(defn avg [& args]
  (/ (apply + args) (count args)))

(defmacro avgm [& args]
  `(/ (+ ~@args) ~(count args)))

((fn [x y] (avgm x y)) 1 3)

Section 8.3 Applications for Macros

nil! is different in Clojure, because most values are immutable. Mutable values may be one of several types, each of which has its own semantics. Last chapter, we did one macro to set a ref to nil and another to set an atom to nil. There may indeed be situations where you want to have the same command to change either a ref or an atom.

(defmacro nil! [x]
  `(cond (instance? clojure.lang.Atom ~x ) (reset! ~x nil)
         (instance? clojure.lang.Ref ~x) (dosync (ref-set ~x nil))))

These two equivelent definitions of foo show how the defn macro works in Clojure.

(defn foo [x] (* x 2))
(def foo (fn [x] (* x 2)))

And of course, we can write a simplistic defn macro.

(defmacro our-defn [name params & body]
  `(def ~name
     (fn ~params  ~@body)))

The last set of examples is much more involved. Graham describe a CAD system and shows how a move function and a scale function might be written.

Graham did not provide an implementation for redraw or bounds, but we need both, if our code is going compile and run.

(defn redraw [from-x from-y to-x to-y]
  (println (str "Redrawing from: " from-x "," from-y " to "
                to-x "," to-y)))

(defn bounds [objs]
  (list
   (apply min (for  [o ( :objects objs)]
                (deref  (:obj-x o))))
   (apply min (for  [o ( :objects objs)]
                (deref  (:obj-y o))))
   (apply max (for  [o ( :objects objs)]
                (+  (deref  (:obj-x o)) (deref (:obj-dx o)))))
   (apply max (for  [o ( :objects objs)]
                (+  (deref  (:obj-y o)) (deref (:obj-dy o)))))))

The move-objs and scale-objs functions take in a collection of objects that contain their x and y coordinates and their sizes. Each of the objects keep their properties in a map, because I prefer named parameters to positional ones. Each of the functions walks through the objects and transforms them. Then the redraw function is called, to redraw the affected portion of the screen.

(defn move-objs [objs dx dy]
  (let [[x0 y0 x1 y1] (bounds objs)]
    (doseq [o (:objects objs)]
      (swap! (:obj-x o) + dx)
      (swap! (:obj-y o) + dy))
    (let [[xa ya xb yb] (bounds objs)]
      (redraw (min x0 xa) (min y0 ya)
               (max x1 xb) (max y1 yb)))))

(defn scale-objs [objs factor]
  (let [[x0 y0 x1 y1] (bounds objs)]
    (doseq [o (:objects objs)]
      (swap! (:obj-dx o) * factor)
      (swap! (:obj-dy o) * factor))
    (let [[xa ya xb yb] (bounds objs)]
      (redraw (min x0 xa) (min y0 ya)
              (max x1 xb) (max y1 yb)))))

I wrote a sample collection of objects that could be passed in as obis to either function. The collection is actually a map, with all of the objects mapped to :objects. Originally, I had a keyword :bounds that stored the starting bounds of the objects, but the bounds need to be recalculated after the transformation, so it didn't make sense to store it in the collection. In the real world, the collection may have other properties aside from just the objects it contains, so I decided to leave it as a map.

(def sample-object-collection
  {:objects [{:name "Object 1"
              :obj-x (atom 0) :obj-y (atom 0)
              :obj-dx (atom 5) :obj-dy (atom 5)}
             {:name "Object 2"
              :obj-x (atom 10) :obj-y (atom 20)
              :obj-dx (atom 20) :obj-dy (atom 20)}]})

(move-objs sample-object-collection 5 5)

Both functions apply their transformations and then call the redraw function in the same verbose way. If we added a flip method and a rotate method, again we would have a unique transformation followed by the same call to redraw. To battle this repetition, Graham provides the with-redraw macro.

(defmacro with-redraw [varname objs & body]
  `(let [[x0# y0# x1# y1#] (bounds ~objs)]
     (doseq [~varname (:objects ~objs)] ~@body)
    (let [[xa# ya# xb# yb#] (bounds ~objs)]
      (redraw (min x0# xa#) (min y0# ya#)
              (max x1# xb#) (max y1# yb#)))))

Because of this macro, the new versions of move-objs and scale-objs are much nicer. Each function has gone from 8 lines to 4, and all of the code that was taken out was noisy and distracting. Now it is easy to see how each function performs its transformation.

(defn move-objs [objs dx dy]
  (with-redraw o objs
    (swap! (:obj-x o) + dx)
    (swap! (:obj-y o) + dy)))

(defn scale-objs [objs factor]
  (with-redraw o objs
    (swap! (:obj-dx o) * factor)
    (swap! (:obj-dy o) * factor)))

On Lisp in Clojure chapter 7 (7.5 - 7.11)

2012-05-03T09:34:00.000-07:00

I am continuing to translate the examples from On Lisp by Paul Graham into Clojure. The examples and links to the rest of the series can be found on github.

This post covers the second half of chapter 7. Stuart Halloway also has a post on this chapter on his blog.

Section 7.5 Destructuring in Parameter Lists

Clojure also has destructuring in the same form as Graham describes in Common Lisp. Clojure also supports destructuring with its map collection type. The book Clojure Programming shows how to combine vector destructuring and map destructuring in a parameter list or let binding. But back to the example...

(let [[x [y] & z] ['a ['b] 'c 'd ]]
  (list x y z))

In the next example, Graham shows a function called dolist which executes a particular function against each member of a list in succession. This may sound like map, but map builds a new list from the return values generated by applying a function to the members of a list. dolist executes a function against each member of a list and disregards the return values. It is used to execute a function for its side effects. Clojure's version is called doseq.

(doseq [x '(1 2 3)]
  (println x))

Graham then shows a way to implement a version of dolist. He builds a macro that takes in a list, a return value and the body of commands to be executed. I like the example, especially because it shows how to incorporate an optional parameter (return) and a variadic parameter (body) in the same parameter list.

The Clojure example doesn't work quite the same though. map in Clojure is lazy, the terms will only be evaluated when they are used. So if you don't pass a return value the map executes, because the reader wants to print out the return values. If you do pass a parameter, that becomes the only return value the repl needs to display, so the mapped function is never executed.

(defmacro our-dolist [[lst & result] & body]
  `(do  (map ~@body ~lst)
        ~@result))

(macroexpand-1 (our-dolist [[1 2 3] ] #(println %)))
(macroexpand-1 (our-dolist [[1 2 3] 4] #(println %)))

Section 7.6 A Model of Macros

Graham's our-defmacro, in addition to writing the desired function, also added a property called 'expander and attached it to the created function. I thought Clojure's metadata could serve the same purpose, but I was not able to make it work. Defmacro seems to work, and macroexpand-1 works the same with it.

(defmacro our-defmacro [name params & body]
  `(defn ~name [~@params]
     (do
       ~@body)))

(macroexpand-1 '(our-defmacro test [x] (println x)(+ x 2)))

Section 7.7 Macros as Programs

In this section, Graham shows how lists can be turned into programs by using macros. The expression we would want to use in Clojure though would have the parameters in a map, instead of a list where position matters.

While the named parameters are nicer than the Common Lisp version, at the same time I did cut a couple of corners. I wrote some of the values so that I didn't have to translate them, such as the let binding, which I wrote as one long vector and stated explicitly that z was nil.

;; our desired call
(our-looper {:initial-vals [w 3 x 1 y 2 z nil]
             :body ((println x) (println y))
             :loop-params [x x y y]
             :recursion-expr ((inc x) (inc y))
             :exit-cond (> x 10)
             :exit-code (println z)
             :return-val y})

;; our desired result
(let [w 3 x 1 y 2 z nil]
  (loop [x x y y]
    (if (> x 10)
      (do (println z) y )
       (do
         (println x)
         (println y)
         (recur (inc x) (inc y))))))

;; the macro
(defmacro our-looper [{:keys [initial-vals
                              body
                              loop-params
                              recursion-expr
                              exit-cond
                              exit-code
                              return-val]}]
  `(let [~@initial-vals]
     (loop [~@loop-params]
       (if ~exit-cond
         (do ~exit-code
             ~return-val)
         (do ~@body
             (recur ~@recursion-expr))
         ))))

Section 7.8 Macro Style

I just translated the first implementation of and; as Graham says, it is the more readable.

(defmacro our-and [& args]
  (loop [lst args]
    (cond
     (= (count lst) 0) true
     (= (count lst) 1) (first lst)
     :else (if (first lst) (recur (rest lst)) false))))

Section 7.9 Dependence on Macros

Just as Graham describes for Common Lisp, in Clojure if a function-b depends on function-a, when function-a is updated, function-b will reflect the change. If function-d depends on macro-c, function-d will not be updated when macro-c is updated.

(defn func-a [input]
  (+ input 1))
(defn func-b []
  (func-a 3))
(func-b)
;; 4
(defn func-a [input]
  (+ input 10))
(func-b)
;; 13

(defmacro macro-c [input]
  `(+ ~input 1))
(defn func-d []
  (macro-d 3))
(func-d)
;; 4
(defmacro macro-c [input]
  `(+ ~input 10))
(func-d)
;; 4

Section 7.10 Macros from Functions

The examples from this section are all pretty straight forward.

(defn second-f [x]
  (first (rest x)))

(defmacro second-m [x]
  `(first (rest ~x)))

(defn noisy-second-f [x]
  (println "Someone is taking a cadr")
  (first (rest x)))

(defmacro noisy-second-m [x]
  `(do
     (println "Someone is taking a cadr")
     (first (rest ~x))))

(defn sum-f [& args]
  (apply + args))

(defmacro sum-m [& args]
  `(apply + (list ~@args)))

(defmacro sum2-m [& args]
  `(+ ~@args))

(defn foo [x y z]
  (list x (let [x y]
            (list x z))))

(defmacro foo-m [x y z]
  `(list ~x
         (let [x# ~y]
           (list x# ~z))))

(macroexpand-1
 (foo-m 1 2 3) )

Section 7.11 Symbol Macros

Symbol macros do not exist in core clojure. Konrad Hinsen has a library that adds symbol macros and other useful macro functions.

On Lisp in Clojure chapter 7 (7.1 - 7.4)

2012-04-24T16:54:00.000-07:00

I am continuing to translate the examples from On Lisp by Paul Graham into Clojure. You can find links to the other posts in this series and the code on the github repository for this project.

Stuart Halloway has also translated this chapter into Clojure on his blog. I definitely recommend reading that.

In Chapter 7 we finally get to macros.

Section 7.1 How Macro's Work

The nil! macro Graham defines sets the variable passed in to nil. Values in Clojure are immutable by default, but there are special constructs for doing mutation. One is the atom.

;; set atom to nil
(def x (atom 0))
(reset! x 5)

(defmacro atomnil! [var]
  (list 'reset! var nil))

(atomnil! x)

Ref's must be mutated within a transaction, which is done with dosync.

;; set ref to nil
(def y (ref 0))
(dosync
 (ref-set y 5))

(defmacro refnil! [var]
  (list 'dosync (list 'ref-set var nil)))

(refnil! y)

Section 7.2 Backquote

Clojure also uses the backquote, usually referred to as the syntax quote which can be unquoted. In Clojure unqouting done with the tilde ~ instead of the comma used in Common Lisp.

(defmacro atomnil! [var]
  `(reset! ~var nil))

(defmacro refnil! [var]
  `(dosync (ref-set ~var nil)))


;; 3 way numerical if
(defmacro nif [expr pos zero neg]
  `(cond
    (< ~expr 0) ~neg
    (> ~expr 0) ~pos
    :else ~zero ))

(nif -1 "pos" "zero" "neg")

Just as the , was replaced by ~ `@ becomes ~@

(def b '(1 2 3))

`(a ~b c)
;; => OnLisp.ch7/a (1 2 3) OnLisp.ch7/c

`(a ~@b c)
;; => OnLisp.ch7/a 1 2 3 OnLisp.ch7/c

Clojure's do is the equivelant of progn in Common Lisp. It executes a series of statements and returns the value of the last expression.

(defmacro our-when [test & body]
  `(if ~test
     (do ~@body)))

(our-when (< 1 2)
          (println "This is a side effect")
          (println "This is another side effect")
          "this is a value")

Section 7.3 Defining Special Macrosk

Clojure does not contain a member function, but we can define one, and also define the memq macro which does the same thing.

(defn member [obj lst]
  (some (partial = obj) lst))

(defmacro memq [obj lst]
  `(some (partial = ~obj) ~lst))

Clojure already has a while loop, which is good, since this implementation isn't very durable.

(defmacro our-while [test & body]
  `(if ~test
    (do
      (swap! ~test not)
      ~@body)))

(our-while (atom true) (println "side effect") "Value")

Section 7.4 Testing Macro Expansion

I like the pretty print macro expansion.

(defmacro mac [expr]
  `(clojure.pprint/pprint (macroexpand-1 '~expr)))

(mac (our-while (atom true) (println "side effect") "Value"))

That seems like a good amount for one post. Will pick up the second half of chapter 7 next time.

Hello World in ClojureScript

2012-04-21T10:18:00.003-07:00

Hello CLJS

Edit October 18, 2013

If you want to get started with ClojureScript I highly recommend that you read Mimmo Cosenza's series of tutorials at https://github.com/magomimmo/modern-cljs . This is the best resource I have found on learning ClojureScript. Please check it out, and don't waste a second reading my old post.

There is probably no reason to keep the original text, but in case there is some value in it I have overlooked, I will leave it unchanged.

I have wanted to learn about ClojureScript, but I haven't known where to start. Like so much in Clojure it turns out to be much simpler than I expected. The instructions for the various projects are each very clear, but I still found it overwhelming at first. Rather than go into detail, I will try to give an overview and link to the related projects.

The appeal of JavaScript, and thus ClojureScript, is that it can be used in so many ways. For example, you can use cljs to do client-side programming in a traditional web application, to do all of the dynamic code in a single page web app, or to do server side scripting in node.js.

Overview

ClojureScript, wherever you use it is compiled to JavaScript. It is the .js file generated by the compiler that you will refer to in your application.

While it is possible to use the ClojureScript compiler directly, the recommended method is to use lein-cljsbuild. If you are creating a new noir web application that you want to use ClojureScript with you can use the cljs-template which will install and configure lein-cljsbuild for you. For anything but a brand new noir project, you will need to set up lein-cljsbuild yourself.

Again, following the instructions on lein-cljsbuild is all you need to do, but I like to see the big picture before I get to the details.

To install cljs-build you add a reference to it with a :pluggins tag on an existing project that you created with leiningen. Configure the plugin by adding :cljsbuild tag, also in the project definition. specifying a path to your ClojureScript source files. If you are writing scripts to use in node.js you will need to specify :target :nodejs in the compiler options. Leiningen will actually install the plugin the first time you try to compile your script files.

Example

I will do a quick walkthrough, to make sure everything is working. I am going to create a very basic script that simply displays an alert when a page is loaded. I will then create a simple html file that will host that script.

My instructions will be very similar to the instructions in the lein-cljsbuild read me, except that I structure my source and destination folders a little differently. Assuming you have Leiningen 1.7 or later installed at the terminal create a new project:

lein new hello-cljs
cd hello-cljs

Edit the project.clj file, adding a :pluggins tag and setting the build options:

(defproject hello-cljs "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.3.0"]]
  :plugins [[lein-cljsbuild "0.1.8"]]
  :cljsbuild {
             :builds [{
                       :source-path "src-cljs"
                       :compiler {
                                  :output-to "web/js/main.js"
                                  :optimzations :whitespace
                                  :pretty-print true}}]})

Create a src-cljs directory, and in it create a file called main.cljs. Your main.cljs should contain the following:

(ns hello-cljs.main)

(js/alert "Hello from ClojureScript.")

From your project's root directory compile the ClojureScript to javascript by typing:

lein cljsbuild once

Compiling will create the destination folders, if they do not already exist.

Go to your web directory, and create a file hello.html:

<html>
 <head>
    <title>Hello CLJS</title>
  </head>
  <body>
    You should have seen an alert if everything is working.
    <script type="text/javascript" src="js/main.js"<>/script>
  </body>
</html>

Not a project that will win any awards, but it is nice to know you are set up correctly.

I wanted to mention one additional thing. In this example we compiled the cljs file to js using the command "lein cljsbuild once". Instead, you can use the command "lein cljsbuild auto" and cljsbuild will monitor your cljs source code directory and recompile any file that changes. Another good option to know is "lein cljsbuild clean" which will delete the compiled js file.

On Lisp in Clojure (chapter 6)

2012-04-16T06:24:00.001-07:00

I am continuing to translate the examples from On Lisp by Paul Graham into Clojure.

I have placed the examples from the first 6 chapters on GitHub. The readme links to all of the posts in this series. (Except this one... a fact without a time is incomplete).

Section 6.1 Networks

I have represented the nodes as a map of maps.

(def nodes
  {:people {:question "Is the person a man?" :yes :male :no :female}
   :male {:question "Is he dead?" :yes :deadman :no :liveman }
   :deadman {:question "Was he American?" :yes :us :no :them}
   :us {:question "Is he on a coin?" :yes :coin :no :cidence}
   :coin {:question "Is the coin a penny?" :yes :penny :no :coins}
   :penny {:answer "Lincoln"}})

(defn ask-question [question]
  true)

Since the network is incomplete, I decided not to implement the IO. Making ask-question always return true did require one change from Graham's example. Instead of asking if the person is living, I ask if he is dead, since I only go down the true line.

(defn run-node [name nodes]
  (let [n (name nodes)]
    (if-let [result (get n :answer)]
      result
      (if (ask-question (:question n))
        (recur (:yes n) nodes)
        (recur (:no n) nodes)))))

(run-node :people nodes)

Of course, we want to be able to add nodes programmatically. Instead of optional parameters, in the Clojure implementation we can define a multiple arity function to add both branches and leaves.

(defn add-node
  ([nodes tag answer]
     (conj nodes {tag {:answer answer}}))
  ([nodes tag question yes no]
     (conj nodes {tag {:question question :yes yes :no no}})))

Because nodes is immutable, the following two calls each return a new map that is the original map, plus their one node.

(add-node nodes :liveman "Is he a former president" :texas :california)
(add-node nodes :texas "George W Bush")

The Clojure threading macro, ->, makes it easy to insert the result of one function as a parameter of a second function. The following block creates a new set of nodes with the :liveman tag and passes this to the function that adds the :texas tag. In the end, we get a new map that has both tags added.

(-> 
    (add-node nodes :liveman "Is he a former president" 
         :texas :california)
    (add-node :texas "George W Bush"))

Section 6.2 Compiling Networks

In this section, Graham rewrote the network, adding the function calls to the nodes themselves.

The add-node function becomes

(defn add-node2
  ([nodes tag answer]
     (conj nodes {tag answer}))
  ([nodes tag question yes no]
     (conj nodes {tag (if (ask-question question) 
                           (yes nodes) 
                           (no nodes))})))

I added a couple of nodes, and was surprised by the results:

(def node2
  (-> 
      (add-node2 {} :people "Is the preson a man?" :male :female)
      (add-node2 :male "Is he dead?" :deadman :liveman)))
node2
;; => {:male nil, :people nil}

I decided to start adding from the bottom up:

(def node2
  (-> (add-node2 {} :penny "Lincoln")
      (add-node2 :coin "is the coin a penny?" :penny :coins)
      (add-node2 :us "Is he on a coin" :coin :cindence)))
node2
;; =>  {:us "Lincoln", :coin "Lincoln", :penny "Lincoln"}

I tried rewriting my add-node2 function.

(defn add-node2
  ([nodes tag answer]
     (conj nodes {tag answer}))
  ([nodes tag question yes no]
     (conj nodes
           {tag
            (if ((fn [x] (ask-question x)) question )
              (yes nodes) (no nodes) )})))

I still got the same results.

I tried declaring, but not defining ask-question. When I called add-node2 I got an error that ask-question had not been defined. I tried referring to node2 from another namespace, and still every node evaluated to "Lincoln".

I rewrote the ask-question function to actually ask a question:

(defn prompt [text]
  (do
    (println text)
    (read-line)))

(defn ask-question [question]
  (prompt question))

Now, I get prompted with the question for each node I add. Again, I tried this from a different namespace, and again I was prompted.

I wonder if we have reached the limit of functions. Stay tuned, Chapter 7 begins our journey into the world of macros.