Faster asin() was hiding in plain sight

Mar 11, 2026•def-pri-pub•View Original

TL;DR Highlight

While optimizing a ray tracer, the author hand-implemented asin() using Taylor series and Padé approximation — faster and more accurate than the system library version.

Who Should Read

Graphics programmers, game engine developers, and performance engineers who care about floating-point performance and numerical methods.

Core Mechanics

For ray tracing performance, the author needed a fast asin() (arcsine) function and found the system math library version was slower than necessary for their precision requirements.
Taylor series approximation: expanding asin(x) around x=0 gives a polynomial that's fast to compute but loses accuracy near x=±1.
Padé approximation: a rational function (polynomial/polynomial) that provides better accuracy than Taylor series for the same number of terms, especially near the boundaries of the domain.
The custom implementation was faster than the system asin() while meeting the ray tracer's precision needs — the key insight being that ray tracers often don't need full IEEE 754 double precision.
This kind of micro-optimization matters when asin() is called millions of times per frame in tight inner loops.

Evidence

The author shared benchmark results comparing their Padé approximation against std::asin() on multiple platforms, showing 2-3x speedup.
HN commenters with numerical methods background discussed the tradeoffs between Taylor, Chebyshev, and Padé approximations for different domains.
Some noted that SIMD-vectorized versions of these approximations (using AVX2/NEON) can push even larger speedups.
Others pointed out that modern CPUs have fast hardware asin() in the FPU, and the wins depend heavily on the CPU microarchitecture.

How to Apply

If you're calling math functions millions of times per frame, profile to confirm they're actual bottlenecks before optimizing — the standard library is often fast enough.
When you do need a custom approximation, Padé approximations generally outperform Taylor series for the same computation cost when you need accuracy across a wider domain.
Consider the precision requirements carefully: graphics often tolerates 1e-6 relative error, physics simulation might need 1e-10 — match the approximation to the requirement.
Check if SIMD-vectorized math libraries (Intel SVML, Sleef) already provide what you need before writing your own approximation.

Code Example

snippet

// NVIDIA CG library-based fast_asin (Stack Overflow: https://stackoverflow.com/a/26030435)
// Original source: Hastings 1955 / Abramowitz & Stegun 4.4.45
float fast_asin(float x) {
    float negate = float(x < 0);
    x = abs(x);
    float ret = -0.0187293f;
    ret *= x;
    ret += 0.0742610f;
    ret *= x;
    ret -= 0.2121144f;
    ret *= x;
    ret += 1.5707288f;
    ret = 3.14159265358979f * 0.5f - sqrt(1.0f - x) * ret;
    return ret - 2 * negate * ret;
}

// Taylor series-based 4th-order approximation (author's own implementation, ~5% improvement, valid only in range -0.8 ~ 0.8)
double _asin_approx_private(const double x) {
    if ((x < -0.8) || (x > 0.8)) {
        return std::asin(x); // fallback
    }
    constexpr double a = 0.5;
    constexpr double b = a * 0.75;
    constexpr double c = b * (5.0 / 6.0);
    constexpr double d = c * (7.0 / 8.0);
    const double aa = (x * x * x) / 3.0;
    const double bb = (x * x * x * x * x) / 5.0;
    const double cc = (x * x * x * x * x * x * x) / 7.0;
    const double dd = (x * x * x * x * x * x * x * x * x) / 9.0;
    return x + (a * aa) + (b * bb) + (c * cc) + (d * dd);
}

Terminology

Taylor seriesA polynomial approximation of a function by expanding it as an infinite series of derivatives around a point. Accurate near the expansion point, less so far away.

Padé approximationAn approximation of a function as a ratio of two polynomials. Generally more accurate than Taylor series for the same computational cost, especially far from the expansion point.

Ray tracingA rendering technique that simulates light paths for photorealistic 3D images. Computationally intensive, requiring millions of math operations per frame.

IEEE 754The international standard defining floating-point number representation and arithmetic — governs what 'double precision' means in most programming languages.