2023-09-03 18:15 | IT / Coding / How-To | rust, benchmark, time-stamp-counter, counter-timer-register, counter, clock, timer, rdtsc, rdtscp, cntvct_el0, linux, apple, intel

Hardware-based tick counters for high-precision benchmarks in Rust

One of the aspects I love about Rust is its ability to inline low-level assembly code snippets, enabling access to the capabilities of the hardware platform. These snippets can then be conditionally compiled according to the underlying platform and wrapped with high-level methods for easy use. It's great being close to the metal to achieve maximum by using hardware capacities.

Rust is fast, but the code is performant when developed in the right way. Benchmarking plays a crucial role in measuring and optimizing code performance.

While the Rust ecosystem offers excellent libraries for sophisticated benchmarks, there are situations during development and testing when I need a more basic tool for benchmarking specific lines or blocks of code. Rust's standard library provides a user-friendly option with std::time::Instant, which is well-suited for this purpose.

I experimented with std::time::Instant and observed that its precision is approximately 40 nanoseconds on both Intel x86_64 (Intel® Core™ i7) running Linux and AArch64 (Apple M1 Pro) platforms.

However, Intel CPUs (since Pentium) provide access to the hardware time stamp counter through the RDTSC instruction, and utilizing this feature theoretically could offer better precision. It was also interesting to explore what the Apple platform has to offer for this purpose.

I enjoy delving into low-level details and experimenting. I love such stuff.

On the Apple platform, by reading the CNTVCT_EL0 counter-timer register, my tests were consistent with std::time::Instant, at around 40 nanoseconds.

Environment: macos/unix aarch64
Tick frequency, MHZ: 24.00 (hardware provided)
Tick accuracy, nanoseconds: 41.67
Tick counter start: 103684134140
Tick counter stop: 103708255194
Elapsed ticks count in 1 seconds: 24121054
Elapsed nanoseconds according to elapsed ticks: 1005043916.67

Comparing the measurement methods using 100 samples:
Elapsed time in nanoseconds, using std::time::Instant
    Mean = 60.34
    Min  = 41.00
    Max  = 167.00
    Standard deviation = 23.92 (39.64 %)
Elapsed time in nanoseconds, using tick_counter
    Mean = 42.41
    Min  = 42.00
    Max  = 83.00
    Standard deviation = 4.08 (9.62 %)

But, on the Intel i7 platform running Linux, the results were more interesting.

First, I learned that Rust's standard library already includes _rdtsc() and __rdtscp() wrapper functions for the x86/x86_64-based platforms, which gave approximately the same results as using inline assembly instructions.

Comparing results, using 100 samples:
Elapsed ticks count, using std::arch::x86_64::_rdtsc()
    Mean = 59.50
    Min  = 58.00
    Max  = 60.00
    Standard deviation = 0.87 (1.46 %)
Elapsed ticks count, using tick_counter
    Mean = 59.52
    Min  = 58.00
    Max  = 60.00
    Standard deviation = 0.85 (1.44 %)

Furthermore, my tests indicate that when using inline assembly on the Intel i7 platform (Linux), the precision of ticks was approximately 0.29 nanoseconds. However, I observed an additional, around 17 nanoseconds or 60 ticks overhead in my benchmarks, probably due to CPU cycles consumed by the RDTSC instruction itself and the wrapper functions.

I also calculated some statistics and standard deviations using both std::time::Instant and the tick_counter crate.

Environment: linux/unix x86_64
Tick frequency, MHZ: 3430.87 (software estimated in 1s)
Tick accuracy, nanoseconds: 0.29
Tick counter start: 14959335168548
Tick counter stop: 14962766085156
Elapsed ticks count in 1 seconds: 3430916608
Elapsed nanoseconds according to elapsed ticks: 1000013467.72

Comparing results, using 100 samples:
Elapsed time in nanoseconds, using std::time::Instant
    Mean = 46.35
    Min  = 42.00
    Max  = 262.00
    Standard deviation = 21.70 (46.82 %)
Elapsed time in nanoseconds, using tick_counter
    Mean = 17.40
    Min  = 15.00
    Max  = 18.00
    Standard deviation = 0.95 (5.45 %)

These results are intriguing and show certain benefits in terms of benchmarking accuracy and stability.

Overall, it has been a rewarding journey.

The source code is available on GitHub: https://github.com/sheroz/tick_counter

2023 September (1)
2023 August (1)
2019 May (1)
2016 March (2)
2016 February (1)
2014 December (1)
2013 May (1)
2013 March (1)
2013 February (1)
2012 December (2)
2012 October (1)
2011 February (2)
2010 October (2)
2010 July (1)
2010 May (1)
2010 April (1)

IT / Coding / How-To

28147-89 (1), access (1), ajax (1), apple (1), asus (1), benchmark (1), block-cipher (1), c++ (1), cache (1), capistrano (1), centos (2), chrome (1), client (1), clock (1), cntvct_el0 (1), counter (1), counter-timer-register (1), cryptography (1), crystal reports (1), database (4), dbase (1), dbf (1), docker (1), error (7), fedora (2), firefox (1), google (2), gost (1), gost-28147-89 (1), gost-r-34-12-2015 (1), highlight (1), hp (1), html (1), intel (1), intellijidea (1), ireport (1), jasperreports (1), jaspersoft (1), java (5), javascript (2), jdbc (1), jdk (1), jet (1), jetty (1), jquery (2), laptop (1), linux (8), macos (1), magma (1), mongodb (1), ms visual c++ (1), mysql (1), native (1), network (2), openvpn (1), oracle (2), performance (3), postgres (1), postgresql (2), prettify (1), r-34-12-2015 (1), rails (1), ramdisk (2), ramdrive (2), rdtsc (1), rdtscp (1), recaptcha (1), redis (1), rest (1), restful (1), ruby (1), rust (2), security (3), server (2), sidekiq (1), spam (1), spring (1), sql (1), startup (1), subversion (1), svn (1), syntax (1), time-stamp-counter (1), timer (1), tomcat (2), ubuntu (2), update (1), vba (1), web service (1), wifi (1), winapi (1), windows (9)

Hardware-based tick counters for high-precision benchmarks in Rust

IT / Coding / How-To

Other Topics