sw1nn

Adventures. Stories. Code.

Clojure STM - What? Why? How?

| Comments

Clojure has a variety of features that help in the creation of programs that deal with concurrency. This post is going to focus on Software Transactional Memory but the nature of these problems is that many parts act together to provide the solution, so expect some of the other concepts to leak in…

What is STM?

Software Transactional Memory (STM) is a concurrency control technique
analogous to database transactions for controlling access to shared
memory in concurrent computing. It is an alternative to lock based synchronization.

STM allows updates to multiple values in memory and for those changes to become visible atomically to other threads at the same logical moment. Similar to database transactions, if for some reason all the updates can’t be made then none of the updates are made.

The canonical example of a transaction is in banking - where you’d like to move amount X from your account to my account.. We’ll use that as our example, because conceptually everyone is familiar with it 1.

Implementation of STM is not a solved problem in computer science at present. There is active research on how best to achieve it. Clojure opts for an approach based on Multiversion concurrency control (MVCC). MVCC maintains multiple (logical) versions of data referenced in a transaction. For the duration of your transaction you see a snapshot of what the data looked like at the start of the transaction. MVCC is an obvious choice here when you consider that the pervasive use of persistent data structures in clojure get you most of the way to this without much further work.

Why do we need STM?

The main constructs for STM are ref and dosync. Lets have a look at an example…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(def account1 (ref 100))
(def account2 (ref 0))

; to read the current value of a ref, use (deref refname):
;=> (deref account1)
100
;=> @account1 ; @refname is equivalent to (deref refname)
100

(defn transfer [amount from to]
    (dosync
       (alter from - amount)   ; alter from => (- @from amount)
       (alter to   + amount))) ; alter to   => (+ @to amount)

;=> @account1 
100
;=> @account2
0
;=> (transfer 100 account1 account2)
100
;=> @account1
0
;=> @account2
100

You can see here that we’ve declared two accounts and seeded account1 with 100. The (transfer) function takes an amount, a from account ref and a to account ref and then uses alter on both refs to deduct amount from the from account and adds the same amount to the to account. This code runs in a single thread, but consider what could happen if another thread ran between the two alter statements and made changes to the values stored in the two acccounts. Without the protection of STM, it would be easy to lose an update to either or both accounts.

Another example; Suppose that at the end of day, the bank wanted to run a report on the balances in all accounts. Potentially such a report could take a long time to run, but during the time the report runs, transactions should still be possible on the accounts and for the purposes of our report we should see a consistent view of the data.

1
2
3
4
5
6
7
8
9
10
11
12
13
; 50k entries, each a ref to a value 0-100
(def *all-accounts* (take 50000 (repeatedly #(ref (rand-int 100)))))

(defn total
   "Sum all accounts"
   [accounts]
    (dosync 
        (reduce + (map deref accounts))))

;=> (total *all-accounts*) ; confident that this had a 
                           ; consistent view  over 
                           ; *all-accounts*
2478427

How does STM in clojure work

Consider the diagram below. You’ll see three transactions shown in pink boxes. You’ll also see the various versions of the ref in the left hand column. When a transaction is started via (dosync) the versions of the refs are captured. These captured values are the value that (deref) will return for the duration of the transaction. (i.e. reads of refs are consistent within a transaction).

STM Transactions

Lets look a the two leftmost transactions. The transactions start at the same time, both access the same ref, so both get a copy of version 0 of that ref. Within the transaction some processing is performed that results in the value of the ref being updated. The first (leftmost) transaction finishes first, so wins the race to update the ref with it’s new value. When the second transaction finishes it tries to write it’s value, but the write fails (red arrow in the diagram) because the version of the ref is not what was expected. In this case the transaction is re-tried. Note that when the transaction re-tries, it first gets a copy of the (new) version of the ref. ie. it sees the changes made by transaction 1. Since no other transaction attempts to update the ref, this time transaction 2 completes.

You’ll also see that while all that was going on a third transaction was running, but the processing in that transaction didn’t update the ref, so no retry was required and the transaction ran to completion.

If the values stored in the refs are persistent data structures holding multiple logical versions of those structures in memory can be efficient, because of the inherent structural sharing in those structures. However, extra resources are being used and in some scenarios that may cause problems. When declaring refs you have the option to pass a couple of options to tune how the runtime will manage the history (i.e. the number of versions as discussed above) associated with the refs.

1
2
3
4
5
; pre-allocate history and limit max-history
(def myref (ref 1 :min-history 5 :max-history: 10))

; supply a function to validate the ref before commit.
(def myvalidref (ref 1 :validator pos?))

This allows you to improve marginal performance in the presence of read-faults by taking the pre-allocation hit up front and limit how resources are allocated to history

Relaxing consistency and side-effects

There are some concurrency use cases where you can afford to be a bit more relaxed to gain some performance. For example, imagine you were keeping a transaction log through the day. You might be relaxed about the order of transactions in the log if you knew that the final balance was always correct. Basically if you receive two deposits of $100 and $50 you won’t mind too much if they are logged as $100 then $50 or $50 then $100. The deposit of the two transaction is commutative and clojure provides a concurrency operation (commute) for just such cases….

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(defn log-deposit [account amount]
     (dosync
        (println "Depositing $" amount " into account, balance now: "
            (commute account + amount))))

(def myaccount (ref 0))

(log-deposit myaccount 100)
(log-deposit myaccount 50)

; (as good as) equivalent to 

(log-deposit myaccount 50)
(log-deposit myaccount 100)

The thing to note about (commute) is that when the function is called it sets the in-transaction value of the ref, but the actual change is applied at commit time, by running the function passed to commute again with the latest ref value. This means that the value you calculate in your transaction might not be the value that is finally committed to the ref. This needs careful thought. You need to be sure that your processing doesn’t rely on seeing the latest value of the ref.

Finally, you’ll be wondering about that (println) in the example above. What happens to side-effects like that in the event of a transaction re-try? Very simply the side-effects will happen again. In the above case this probably doesn’t matter, the log will be inconsistent but the real source of data (the ref) will be correct.

Clojure provides a macro to help here, namelyio!. This allows you to flag code that musn’t run inside a transaction. You can use this to protect yourself when you inadvertently call side-effecting code in a transaction.

For example:

1
2
3
4
5
6
7
(defn log [s]
   (io!
      (println s)))

(log "Hello World") ; succeeds

(dosync (log "Hello World!")) ; throws IllegalStateException

To properly acheive IO from within a transaction you’d fire messages to agents from within the transaction. Agents within clojure are integrated with STM such that messages are sent if transactions succeed and discarded on transaction failure. This will be the subject of a future blog post

UPDATE: Check out the follow on post here to discuss possible issues with ref consistency in clojure

1 Interesting sidenote here: This canonical example is not how financial services firms actually work, for small amounts they wave the amounts through and sort out discrepancies in reconcilliation at the end of day. The cost of imposing strict transactions outweighs the losses through errors/fraud. But fixing financial services is beyond the scope of this article…

Comments