That’s a bit complex

I’ve been doing some interviewing at work, and during these sessions, the notion of the complexity of an algorithm always comes up. It’s always been a bit of an annoyance to me, that when people state things about complexity, it isn’t always clear what that complexity is measuring. For example, sorting is O(n log n) in the number of comparisons, though for some kinds of data it is possible to sort more efficiently by not actually using comparisons at all.

The Using Your Head site had a good series of puzzles based on this theme, with a number of really clever algorithms for doing a sort of n integers in O(n), in a complexity model where the integer values are unbounded and all of the operations such as adding and multiplying are assumed to be O(1) in cost. Here we’ll re-implement one of the given solutions in Clojure.

We’ll assume all of the integers are positive – if they aren’t then we can adjust the array by mapping the smallest value to 0 by adding a constant to all elements. We’ll then represent the array using blocks of N bits for an appropriate N. To allow for overflow, we’ll have a bit between the elements. This will act as a sign bit when we do a subtraction in the ge function which simultaneously compares two arrays element by element by doing a subtraction.

Given an input sequence, we’ll encode it in the representation described above. We’ll return a map containing the encoding together with a lot of additional information we’ll need from time to time.

(defn generate-vectors[input]
  (let [size (count input)
        width (+ (max size (reduce max 0 input)) 1)]
    (let [flagmask (bit-shift-left 1 (- width 1))
          datamask (- flagmask 1)
          items-and-indices (zipmap (iterate inc 0) input)]
      {
       :width width
       :size size
       :flagmask flagmask
       :datamask (- flagmask 1)
       :encoded-input
       (reduce (fn [sofar [index item]]
         (+ sofar (bit-shift-left item (* index width))))
           0
           items-and-indices)
       :ones-vector
       (reduce (fn [sofar [index item]]
         (+ sofar (bit-shift-left 1 (* index width))))
           0
           items-and-indices)
       :flagmask-vector
       (reduce (fn [sofar [index item]]
         (+ sofar (bit-shift-left flagmask (* index width))))
           0
           items-and-indices)
       :datamask-vector
       (reduce (fn [sofar [index item]]
         (+ sofar (bit-shift-left datamask (* index width))))
           0
           items-and-indices)
       })))

For debugging purposes, we’ll define a function that prints the contents of the encoded vector.

(defn extract-index [vector position setup-data]
  (bit-and (bit-shift-right vector (* position (:width setup-data))) (:datamask setup-data)))

(defn print-vector [vector setup-data]
  (dotimes [ i (:size setup-data)]
    (print (extract-index vector i setup-data) " "))
  (println))

We’ll quickly test the code on a small example:

user> (def vector1-setup (generate-vectors [1 3 2]))
#’user/vector1-setup
user> (print-vector (:encoded-input vector1-setup) vector1-setup)
1  3  2 
nil

We can now define the function which compares two vectors. We subtract the two integer representations, and then use the sign bits to see which subtractions gave a negative result.

(defn ge [a b setup-data]
  (bit-shift-right
   (bit-and (+ a (bit-and (bit-not b) (:datamask-vector setup-data)) (:ones-vector setup-data))
        (:flagmask-vector setup-data))
   (- (:width setup-data) 1)))

user> (print-vector (:encoded-input vector2-setup) vector2-setup)
2  2  2 
nil
user> (print-vector (:encoded-input vector3-setup) vector3-setup)
2  1  3 
nil
user> (print-vector (ge (:encoded-input vector2-setup) (:encoded-input vector3-setup) vector2-setup)
          vector2-setup)
1  1  0 
nil

We can now put this into a function which compares a start vector will all cyclic shifts of the same vector, adding the result vectors as we go along.

(defn compare-counts [vector-plus-data]
  (let [width (:width vector-plus-data)
        size (:size vector-plus-data)
        datamask (:datamask vector-plus-data)
        vector (:encoded-input vector-plus-data)]
    (loop [data vector less-than-counts 0 shifts size]
      (if (= shifts 0)
         less-than-counts
       (let [shifted-data (+ (bit-shift-right data width)
                                    (bit-shift-left (bit-and data datamask) (* width (- size 1))))]
           (recur shifted-data
               (+ less-than-counts (ge vector shifted-data vector-plus-data))
               (dec shifts)))))))

We can test this:

user> (def vector4-setup (generate-vectors [17 18 12 11 13 15 16]))
#’user/vector4-setup
user> (print-vector (compare-counts vector4-setup) vector4-setup)
6  7  2  1  3  4  5 
nil

We now have the indices. We then simply need to unpack the array, taking care to handle the case where there are duplicate values.

The is a follow-up puzzle on the mentioned site which shows how various other functions can be implemented efficiently in this complexity model.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s