c++ - Emulate "double" using 2 "float"s -
i writing program embedded hardware supports 32-bit single-precision floating-point arithmetic. algorithm implementing, however, requires 64-bit double-precision addition , comparison. trying emulate double
datatype using tuple of 2 float
s. double d
emulated struct
containing tuple: (float d.hi, float d.low)
.
the comparison should straightforward using lexicographic ordering. addition bit tricky because not sure base should use. should flt_max
? , how can detect carry?
how can done?
edit (clarity): need significant digits rather range.
double-float technique uses pairs of single-precision numbers achieve twice precision of single precision arithmetic accompanied slight reduction of single precision exponent range (due intermediate underflow , overflow @ far ends of range). basic algorithms developed t.j. dekker , william kahan in 1970s. below list 2 recent papers show how these techniques can adapted gpus, of material covered in these papers applicable independent of platform should useful task @ hand.
http://hal.archives-ouvertes.fr/docs/00/06/33/56/pdf/float-float.pdf guillaume da graça, david defour implementation of float-float operators on graphics hardware, 7th conference on real numbers , computers, rnc7.
http://andrewthall.org/papers/df64_qf128.pdf andrew thall extended-precision floating-point numbers gpu computation.
Comments
Post a Comment