Getting started with C++Templates Metaprogramming Iterators Returning several values from a function std::string Namespaces File I/O Classes/Structures Smart Pointers Function Overloading std::vector Operator Overloading Lambdas Loops std::map Threading Value Categories Preprocessor SFINAE (Substitution Failure Is Not An Error)The Rule of Three, Five, And Zero RAII: Resource Acquisition Is Initialization Exceptions Implementation-defined behavior Special Member Functions Random number generation References Sorting Regular expressions Polymorphism Perfect Forwarding Virtual Member Functions Undefined Behavior Value and Reference Semantics Overload resolution Move Semantics Pointers to members Pimpl Idiom std::function: To wrap any element that is callable const keyword auto std::optional Copy Elision Bit Operators Fold Expressions Unions Unnamed types mutable keyword Bit fields std::array Singleton Design Pattern The ISO C++ Standard User-Defined Literals Enumeration Type Erasure Memory management Bit Manipulation Arrays Pointers Explicit type conversions RTTI: Run-Time Type Information Standard Library Algorithms Friend keyword Expression templates Scopes Atomic Types static_assert operator precedence constexpr Date and time using <chrono> header Trailing return type Function Template Overloading Common compile/linker errors (GCC)Design pattern implementation in C++Optimization in C++Compiling and Building Type Traits std::pair Keywords One Definition Rule (ODR)Unspecified behavior Floating Point Arithmetic Argument Dependent Name Lookup std::variant Attributes Internationalization in C++Profiling Return Type Covariance Non-Static Member Functions Recursion in C++Callable Objects std::iomanip Constant class member functions Side by Side Comparisons of classic C++ examples solved via C++ vs C++11 vs C++14 vs C++17 The This Pointer Inline functions Copying vs Assignment Client server examples Header Files Const Correctness std::atomics Data Structures in C++Refactoring Techniques C++ Streams Parameter packs Literals Flow Control Type Keywords Basic Type Keywords Variable Declaration Keywords Iteration type deduction std::any C++11 Memory Model Build Systems Concurrency With OpenMP Type Inference std::integer_sequence Resource Management std::set and std::multiset Storage class specifiers Alignment Inline variables Linkage specifications Curiously Recurring Template Pattern (CRTP)Using declaration Typedef and type aliases Layout of object types C incompatibilities std::forward_list Optimization Semaphore Thread synchronization structures C++ Debugging and Debug-prevention Tools & Techniques Futures and Promises More undefined behaviors in C++Mutexes Unit Testing in C++Recursive Mutex decltype Using std::unordered_map Digit separators C++ function "call by value" vs. "call by reference"Basic input/output in c++Stream manipulators C++ Containers Arithmitic Metaprogramming

Floating Point Arithmetic

Floating Point Numbers are Weird

The first mistake that nearly every single programmer makes is presuming that this code will work as intended:

float total = 0;
for(float a = 0; a != 2; a += 0.01f) {
    total += a;
}

The novice programmer assumes that this will sum up every single number in the range 0, 0.01, 0.02, 0.03, ..., 1.97, 1.98, 1.99, to yield the result 199—the mathematically correct answer.

Two things happen that make this untrue:

The program as written never concludes. a never becomes equal to 2, and the loop never terminates.
If we rewrite the loop logic to check a < 2 instead, the loop terminates, but the total ends up being something different from 199. On IEEE754-compliant machines, it will often sum up to about 201 instead.

The reason that this happens is that Floating Point Numbers represent Approximations of their assigned values.

The classical example is the following computation:

double a = 0.1;
double b = 0.2;
double c = 0.3;
if(a + b == c)
    //This never prints on IEEE754-compliant machines
    std::cout << "This Computer is Magic!" << std::endl; 
else
    std::cout << "This Computer is pretty normal, all things considered." << std::endl;

Though what we the programmer see is three numbers written in base10, what the compiler (and the underlying hardware) see are binary numbers. Because 0.1, 0.2, and 0.3 require perfect division by 10—which is quite easy in a base-10 system, but impossible in a base-2 system—these numbers have to be stored in imprecise formats, similar to how the number 1/3 has to be stored in the imprecise form 0.333333333333333... in base-10.

//64-bit floats have 53 digits of precision, including the whole-number-part.
double a =     0011111110111001100110011001100110011001100110011001100110011010; //imperfect representation of 0.1
double b =     0011111111001001100110011001100110011001100110011001100110011010; //imperfect representation of 0.2
double c =     0011111111010011001100110011001100110011001100110011001100110011; //imperfect representation of 0.3
double a + b = 0011111111010011001100110011001100110011001100110011001100110100; //Note that this is not quite equal to the "canonical" 0.3!

Contributors

Topic Id: 5115

Example Ids: 18072

This site is not affiliated with any of the contributors.