Optimize string_repeat.
Christian Tismer pointed out the high cost of the loop overhead and function call overhead for 'c' * n where n is large. Accordingly, the new code only makes lg2(n) loops. Interestingly, 'c' * 1000 * 1000 ran a bit faster with old code. At some point, the loop and function call overhead became cheaper than invalidating the cache with lengthy memcpys. But for more typical sizes of n, the new code runs much faster and for larger values of n it runs only a bit slower.
Showing
Please
register
or
sign in
to comment