Event Timeline
Results from Release:
g++-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0:
-O3 -DNDEBUG -std=gnu++20
| ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 8.97 | 111,505,243.88 | 3.2% | 2.53 | `base->Call()` | 5.35 | 187,073,255.20 | 4.4% | 1.52 | `a->Call()` | 4.85 | 206,146,018.92 | 2.9% | 1.36 | `b->Call()` | 11.91 | 83,951,922.46 | 2.5% | 3.34 | `function &Base::Call, base` | 9.40 | 106,408,670.05 | 3.0% | 2.64 | `function &A::Call, a` | 9.29 | 107,610,932.40 | 2.8% | 2.62 | `function &B::Call, b` | 11.10 | 90,091,261.95 | 2.2% | 3.12 | `boost function &Base::Call, base` | 7.88 | 126,858,949.14 | 3.2% | 2.23 | `boost function &A::Call, a` | 7.36 | 135,790,316.78 | 2.7% | 2.06 | `boost function &B::Call, b` | 10.48 | 95,408,295.17 | 2.1% | 2.92 | `function pointer direct base, &Base::Call ` | 6.71 | 149,076,013.32 | 2.8% | 1.89 | `function pointer direct a, &A::Call` | 6.92 | 144,541,102.14 | 3.0% | 1.96 | `function pointer direct b, &B::Call` | 5.89 | 169,756,950.49 | 3.5% | 1.65 | `Sean::function a|b, &A|B::Call` | 2.81 | 356,323,824.69 | 3.3% | 1.09 | `Sean::function a, &A::Call` | 2.72 | 366,998,189.34 | 3.1% | 1.10 | `Sean::function b, &B::Call` | 5.71 | 175,196,322.94 | 3.1% | 1.61 | `Sean::function direct a|b, &A|B::Call` | 2.81 | 355,563,444.66 | 3.8% | 1.10 | `Sean::function direct a, &A::Call` | 2.71 | 368,476,521.26 | 3.2% | 1.10 | `Sean::function direct b, &B::Call` | 6.95 | 143,792,394.89 | 3.1% | 1.96 | `Sean::function a|b, &A|B::CallParams` | 2.87 | 347,939,831.24 | 3.7% | 1.09 | `Sean::function a, &A::CallParams` | 2.87 | 348,645,958.79 | 3.0% | 1.10 | `Sean::function b, &B::CallParams` | 7.01 | 142,639,204.54 | 2.9% | 1.96 | `Sean::function direct a|b, &A|B::CallParams` | 2.88 | 347,443,044.31 | 2.9% | 1.10 | `Sean::function direct a, &A::CallParams` | 2.89 | 346,163,931.96 | 3.5% | 1.09 | `Sean::function direct b, &B::CallParams` | 7.04 | 142,023,712.12 | 3.1% | 1.97 | `template sean::function a|b, &A|B::CallParams` | 2.86 | 349,374,179.85 | 2.9% | 1.10 | `template sean::function a, &A::CallParams` | 2.87 | 348,555,217.87 | 3.8% | 1.10 | `template sean::function b, &B::CallParams` | 7.36 | 135,807,392.68 | 2.8% | 2.07 | `template check sean::function a|b, &A|B::CallParams` | 3.07 | 325,381,962.27 | 3.1% | 1.09 | `template check sean::function a, &A::CallParams` | 2.98 | 336,103,318.78 | 3.6% | 1.09 | `template check sean::function b, &B::CallParams`
MinSizeRel
-Os -DNDEBUG -std=gnu++20
| ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 10.60 | 94,344,931.82 | 2.8% | 2.97 | `base->Call()` | 5.97 | 167,477,592.03 | 3.4% | 1.68 | `a->Call()` | 5.41 | 184,901,294.35 | 3.0% | 1.53 | `b->Call()` | 13.68 | 73,124,846.95 | 2.7% | 3.82 | `function &Base::Call, base` | 9.46 | 105,669,404.37 | 3.2% | 2.64 | `function &A::Call, a` | 9.70 | 103,085,126.35 | 3.1% | 2.67 | `function &B::Call, b` | 14.48 | 69,048,782.30 | 3.1% | 4.05 | `boost function &Base::Call, base` | 8.19 | 122,141,120.01 | 3.2% | 2.33 | `boost function &A::Call, a` | 8.29 | 120,667,180.75 | 3.7% | 2.35 | `boost function &B::Call, b` | 11.60 | 86,224,941.90 | 3.6% | 3.26 | `function pointer direct base, &Base::Call ` | 6.92 | 144,414,111.03 | 2.6% | 1.94 | `function pointer direct a, &A::Call` | 6.68 | 149,690,679.93 | 2.2% | 1.84 | `function pointer direct b, &B::Call` | 8.03 | 124,592,437.34 | 2.5% | 2.24 | `Sean::function a|b, &A|B::Call` | 5.30 | 188,772,307.34 | 3.1% | 1.48 | `Sean::function a, &A::Call` | 5.27 | 189,694,989.33 | 4.4% | 1.47 | `Sean::function b, &B::Call` | 7.28 | 137,307,480.61 | 3.4% | 2.03 | `Sean::function direct a|b, &A|B::Call` | 5.10 | 195,962,106.63 | 3.2% | 1.43 | `Sean::function direct a, &A::Call` | 5.19 | 192,672,407.85 | 3.1% | 1.45 | `Sean::function direct b, &B::Call` | 7.34 | 136,232,012.08 | 3.0% | 2.04 | `Sean::function a|b, &A|B::CallParams` | 5.21 | 191,782,099.69 | 2.7% | 1.45 | `Sean::function a, &A::CallParams` | 5.27 | 189,619,729.88 | 2.7% | 1.48 | `Sean::function b, &B::CallParams` | 7.33 | 136,398,281.55 | 3.9% | 2.05 | `Sean::function direct a|b, &A|B::CallParams` | 5.21 | 192,095,062.80 | 2.7% | 1.45 | `Sean::function direct a, &A::CallParams` | 5.32 | 188,061,229.03 | 2.6% | 1.47 | `Sean::function direct b, &B::CallParams` | 7.77 | 128,682,149.45 | 2.6% | 2.17 | `template sean::function a|b, &A|B::CallParams` | 5.21 | 191,782,392.03 | 2.7% | 1.47 | `template sean::function a, &A::CallParams` | 5.33 | 187,496,921.86 | 3.6% | 1.51 | `template sean::function b, &B::CallParams` | 8.90 | 112,357,224.57 | 2.7% | 2.50 | `template check sean::function a|b, &A|B::CallParams` | 6.01 | 166,347,689.09 | 2.8% | 1.69 | `template check sean::function a, &A::CallParams` | 5.80 | 172,298,426.51 | 3.3% | 1.64 | `template check sean::function b, &B::CallParams`
RelWithDebInfo
-O2 -DNDEBUG -std=gnu++20
| ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 9.03 | 110,738,754.20 | 3.0% | 2.54 | `base->Call()` | 5.28 | 189,366,926.98 | 3.7% | 1.50 | `a->Call()` | 4.83 | 207,105,530.39 | 2.4% | 1.36 | `b->Call()` | 11.85 | 84,358,606.01 | 2.2% | 3.33 | `function &Base::Call, base` | 9.00 | 111,051,549.44 | 2.7% | 2.53 | `function &A::Call, a` | 9.00 | 111,074,910.28 | 2.3% | 2.53 | `function &B::Call, b` | 11.37 | 87,932,375.07 | 2.4% | 3.19 | `boost function &Base::Call, base` | 7.31 | 136,868,309.00 | 2.3% | 2.05 | `boost function &A::Call, a` | 7.33 | 136,417,043.66 | 2.9% | 2.08 | `boost function &B::Call, b` | 10.21 | 97,977,654.63 | 2.3% | 2.85 | `function pointer direct base, &Base::Call ` | 6.63 | 150,734,756.40 | 2.6% | 1.87 | `function pointer direct a, &A::Call` | 6.88 | 145,364,737.50 | 3.1% | 1.93 | `function pointer direct b, &B::Call` | 6.19 | 161,458,499.84 | 3.0% | 1.73 | `Sean::function a|b, &A|B::Call` | 2.89 | 346,152,377.07 | 2.7% | 1.09 | `Sean::function a, &A::Call` | 2.93 | 341,304,113.06 | 2.7% | 1.10 | `Sean::function b, &B::Call` | 6.00 | 166,796,329.13 | 3.0% | 1.69 | `Sean::function direct a|b, &A|B::Call` | 2.90 | 344,292,303.07 | 2.6% | 1.07 | `Sean::function direct a, &A::Call` | 2.90 | 345,391,845.52 | 2.3% | 1.09 | `Sean::function direct b, &B::Call` | 7.22 | 138,497,211.74 | 1.8% | 2.00 | `Sean::function a|b, &A|B::CallParams` | 2.90 | 344,818,582.23 | 2.6% | 1.09 | `Sean::function a, &A::CallParams` | 3.02 | 331,334,448.80 | 2.5% | 1.12 | `Sean::function b, &B::CallParams` | 7.34 | 136,228,235.17 | 2.0% | 2.04 | `Sean::function direct a|b, &A|B::CallParams` | 3.05 | 327,705,146.85 | 3.0% | 1.11 | `Sean::function direct a, &A::CallParams` | 3.00 | 332,837,039.23 | 2.0% | 1.10 | `Sean::function direct b, &B::CallParams` | 7.38 | 135,533,804.72 | 1.9% | 2.06 | `template sean::function a|b, &A|B::CallParams` | 3.03 | 330,020,313.85 | 1.8% | 1.11 | `template sean::function a, &A::CallParams` | 2.97 | 336,151,301.10 | 2.6% | 1.12 | `template sean::function b, &B::CallParams` | 7.62 | 131,292,731.68 | 2.6% | 2.11 | `template check sean::function a|b, &A|B::CallParams` | 3.09 | 323,415,838.11 | 3.0% | 1.09 | `template check sean::function a, &A::CallParams` | 3.11 | 321,256,824.05 | 3.3% | 1.13 | `template check sean::function b, &B::CallParams`
These tests were run because I use std function a fair bit and was curious about the performance overhead. I've usually assumed it did some compile time switching around which would have resulted in a near 1:1 with a regular call, but quite obviously that isn't what is happening. This might be because of bind getting in the way and causing some remapping overheads. But for functions that are 1:1 mapped I would expect the implementation to result in a solution that was near to what I have done.
My implementation is pretty inflexible (it requires a 1:1 binding) and isn't portable. It depends on the vtable being implemented in a particular way (it also isn't type safe, but you could add a layer of type safety on without incurring overhead).
Okay the above tests are invalid. My function pointers weren't bound to the correct functions so there were different workloads. I'm glad this was the case because I wouldn't have a good explanation for the significant difference in performance.
Here are corrected test results. The code was changed to calculate the function location based on &virtual X::function. gcc calculates this for you but clang doesn't. They both have the same vtable layout.
Interestingly, the implementation sits in roughly the middle of direct member calls and the std and boost functions.