Unverified Kaydet (Commit) 745c0f39 authored tarafından Raymond Hettinger's avatar Raymond Hettinger Kaydeden (comit) GitHub

Simplify vector_norm() by eliminating special cases in the main loop (GH-9006)

The *max* value is no longer treated as a special case in the main loop.  Besides making the main loop simpler and branchless, this also lets us relax the input restriction of *vec* to contain only non-negative values.
üst aada63b2
...@@ -2032,14 +2032,14 @@ math_fmod_impl(PyObject *module, double x, double y) ...@@ -2032,14 +2032,14 @@ math_fmod_impl(PyObject *module, double x, double y)
} }
/* /*
Given an *n* length *vec* of non-negative values Given an *n* length *vec* of values and a value *max*, compute:
where *max* is the largest value in the vector, compute:
max * sqrt(sum((x / max) ** 2 for x in vec)) max * sqrt(sum((x / max) ** 2 for x in vec))
The value of the *max* variable must be present in *vec* The value of the *max* variable must be non-negative and
or should equal to 0.0 when n==0. Likewise, *max* will at least equal to the absolute value of the largest magnitude
be INF if an infinity is present in the vec. entry in the vector. If n==0, then *max* should be 0.0.
If an infinity is present in the vec, *max* should be INF.
The *found_nan* variable indicates whether some member of The *found_nan* variable indicates whether some member of
the *vec* is a NaN. the *vec* is a NaN.
...@@ -2053,16 +2053,19 @@ The *csum* variable tracks the cumulative sum and *frac* tracks ...@@ -2053,16 +2053,19 @@ The *csum* variable tracks the cumulative sum and *frac* tracks
the cumulative fractional errors at each step. Since this the cumulative fractional errors at each step. Since this
variant assumes that |csum| >= |x| at each step, we establish variant assumes that |csum| >= |x| at each step, we establish
the precondition by starting the accumulation from 1.0 which the precondition by starting the accumulation from 1.0 which
represents an entry equal to *max*. This also provides a nice represents the largest possible value of (x/max)**2.
side benefit in that it lets us skip over a *max* entry (which
is swapped into *last*) saving us one iteration through the loop. After the loop is finished, the initial 1.0 is subtracted out
for a net zero effect on the final sum. Since *csum* will be
greater than 1.0, the subtraction of 1.0 will not cause
fractional digits to be dropped from *csum*.
*/ */
static inline double static inline double
vector_norm(Py_ssize_t n, double *vec, double max, int found_nan) vector_norm(Py_ssize_t n, double *vec, double max, int found_nan)
{ {
double x, csum = 1.0, oldcsum, frac = 0.0, last; double x, csum = 1.0, oldcsum, frac = 0.0;
Py_ssize_t i; Py_ssize_t i;
if (Py_IS_INFINITY(max)) { if (Py_IS_INFINITY(max)) {
...@@ -2071,27 +2074,20 @@ vector_norm(Py_ssize_t n, double *vec, double max, int found_nan) ...@@ -2071,27 +2074,20 @@ vector_norm(Py_ssize_t n, double *vec, double max, int found_nan)
if (found_nan) { if (found_nan) {
return Py_NAN; return Py_NAN;
} }
if (max == 0.0) { if (max == 0.0 || n == 1) {
return 0.0; return max;
} }
assert(n > 0); for (i=0 ; i < n ; i++) {
last = vec[n-1];
for (i=0 ; i < n-1 ; i++) {
x = vec[i]; x = vec[i];
assert(Py_IS_FINITE(x) && x >= 0.0 && x <= max); assert(Py_IS_FINITE(x) && fabs(x) <= max);
if (x == max) {
x = last;
last = max;
}
x /= max; x /= max;
x = x*x; x = x*x;
assert(csum >= x);
oldcsum = csum; oldcsum = csum;
csum += x; csum += x;
assert(csum >= x);
frac += (oldcsum - csum) + x; frac += (oldcsum - csum) + x;
} }
assert(last == max); return max * sqrt(csum - 1.0 + frac);
return max * sqrt(csum + frac);
} }
#define NUM_STACK_ELEMS 16 #define NUM_STACK_ELEMS 16
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment