c++ - I tried: valgrind, _GLIBCXX_DEBUG, -fno-strict-aliasing; how do I debug this error? -
i have strange error i've spend several days trying figure out, , want see if has comments me understand what's happening.
some background. i'm working on software project involves adding c++ extensions python 2.7.1 using boost 1.45, code being run through python interpreter. recently, made change code broke 1 of our regression tests. regression test sensitive numerical fluctuations (e.g. different machines), should fix that. however, since regression breaking on same machine/compiler produced original regression results, traced difference in results snippet of numerical code (which verifiably unrelated code changed):
c[3] = 0.25 * (-3 * df[i-1] - 23 * df[i] - 13 * df[i+1] - df[i+2] - 12 * f[i-1] - 12 * f[i] + 20 * f[i+1] + 4 * f[i+2]); printf("%2li %23a : %23a %23a %23a %23a : %23a %23a %23a %23a\n",i, c[3], df[i-1],df[i],df[i+1],df[i+2],f[i-1],f[i],f[i+1],f[i+2]);
which constructs numerical tables. note that:
- %a prints provides exact ascii representation
- the left hand side (lhs) c[3], , rhs other 8 values.
- the output below values of far boundaries of f, df
- this code exists within loop on i, nested several layers (so i'm unable provide isolated case reproduce this).
so cloned source tree, , difference between 2 executables compile clone includes code isn't executed in test. makes me suspect must memory problem, since difference should code exists in memory... anyway, when run 2 executables, here's difference in produce:
diff new.out old.out 655,656c655,656 < 6 -0x1.7c2a5a75fc046p-10 : 0x0p+0 0x0p+0 0x0p+0 -0x1.75eee7aa9b8ddp-7 : 0x1.304ec13281eccp-4 0x1.304ec13281eccp-4 0x1.304ec13281eccp-4 0x1.1eaea08b55205p-4 < 7 -0x1.a18f0b3a3eb8p-10 : 0x0p+0 0x0p+0 -0x1.75eee7aa9b8ddp-7 -0x1.a4acc49fef001p-6 : 0x1.304ec13281eccp-4 0x1.304ec13281eccp-4 0x1.1eaea08b55205p-4 0x1.9f6a9bc4559cdp-5 --- > 6 -0x1.7c2a5a75fc006p-10 : 0x0p+0 0x0p+0 0x0p+0 -0x1.75eee7aa9b8ddp-7 : 0x1.304ec13281eccp-4 0x1.304ec13281eccp-4 0x1.304ec13281eccp-4 0x1.1eaea08b55205p-4 > 7 -0x1.a18f0b3a3ec5cp-10 : 0x0p+0 0x0p+0 -0x1.75eee7aa9b8ddp-7 -0x1.a4acc49fef001p-6 : 0x1.304ec13281eccp-4 0x1.304ec13281eccp-4 0x1.1eaea08b55205p-4 0x1.9f6a9bc4559cdp-5 <more output truncated>
you can see value in c[3] subtly different, while none of rhs values different. how identical input giving rise different output. tried simplifying rhs expression, change make eliminates difference. if print &c[3], difference goes away. if run on 2 different machines (linux, osx) have access to, there's no difference. here's i've tried:
- valgrind (reported numerous problems in python, nothing in code, , nothing looked serious)
- -d_glibcxx_debug -d_glibcxx_debug_assert -d_glibcxx_debug_pedassert -d_glibcxx_debug_verify (but nothing asserts)
- -fno-strict-aliasing (but aliasing compile warnings out of boost code)
i tried switching gcc 4.1.2 gcc 4.5.2 on machine has problem, , specific, isolated difference goes away (but regression still fails, let's assume that's different problem).
is there can isolate problem further? future reference, there way analyze or understand kind of problem quicker? example, given description of lhs changing though rhs not, conclude?
edit: problem entirely due -ffast-math
.
you can change type of floating-point data of program. if use float, can switch double; if c
,f
,df
double, can switch long double (80bit on intel; 128 on sparc). 4.5.2 can try use _float128
(128bit) software-simulated type.
the rounding error less longer floating-point type.
why adding code (even unexecuted) changes result? gcc may compile programm differently if code size changes. there lot of heuristics inside gcc , heuristics based on function sizes. gcc may compile function in different way.
also, try compile project flag -mfpmath=sse -msse2
because using x87 (default fpmath older gcc) http://gcc.gnu.org/wiki/x87note
by default x87 arithmetic not true 64/32 bit ieee
ps: should not use -ffast-math
-like options when interested in stable numberic results: http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/optimize-options.html
-ffast-math
sets -fno-math-errno, -funsafe-math-optimizations, -fno-trapping-math, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans , fcx-limited-range.this option causes preprocessor macro fast_math defined.
this option should never turned on -o option since can result in incorrect output programs depend on exact implementation of ieee or iso rules/specifications math functions.
this part of fast-math may change results
-funsafe-math-optimizations
allow optimizations floating-point arithmetic (a) assume arguments , results valid , (b) may violate ieee or ansi standards. when used @ link-time, may include libraries or startup files change default fpu control word or other similar optimizations.
this part hide traps , nan-like errors user (sometime user want traps debug code)
-fno-trapping-math
compile code assuming floating-point operations cannot generate user-visible traps. these traps include division zero, overflow, underflow, inexact result , invalid operation. option implies -fno-signaling-nans. setting option may allow faster code if 1 relies on “non-stop” ieee arithmetic, example.
this part of fast math says, compiler can assume default rounding mode anywhere (which can false programms):
-fno-rounding-math
enable transformations , optimizations assume default floating point rounding behavior. round-to-zero floating point integer conversions, , round-to-nearest other arithmetic truncations. ... option enables constant folding of floating point expressions @ compile-time (which may affected rounding mode) , arithmetic transformations unsafe in presence of sign-dependent rounding modes.
Comments
Post a Comment