To do f(10^8) the OP says his/her code would take the matrix and square it about 26 times (2^26 = 67 million and change), then do some more in other ways. That's where the speedup comes from. Are you also doing that?
(The mechanism for the speedup is described in passing, in a little note about the Haskell library function he's using to calculate a power of the matrix. For whatever it's worth this trick is excellent, but it dates from before computers were invented.)
Yes, as you can see in the code, the loop executes once for each bit to the right of the leftmost set bit or equivalently ⌊log₂(n)⌋ times. Naively this suggest a runtime of O(log(n)) but the numbers become huge pretty quickly and then all the arithmetic operations are far from O(1). The actual runtime is probably more like O(n²) or O(n² log(n)) after one accounts for the runtime of the big integer operations.
(The mechanism for the speedup is described in passing, in a little note about the Haskell library function he's using to calculate a power of the matrix. For whatever it's worth this trick is excellent, but it dates from before computers were invented.)