In C or C++, the traditional variable to use in a loop is “int i”, typically something like:
for (int i=0;i<length;i++) ... do stuff with i ...
Using “int” is traditional, but is a really bad idea, because “int” is only 32 bits on all modern machines. This means any loop that should do more than 2 binary billion iterations will fail (binary billion == 1,073,741,824 == 1<<20).
If the loop iteration count exceeds what “int” can store, one option is a crash as int wraps around to -2147483648:
long length=3*1000*1000*1000L; std::vector<char> v(length); size_t count=0; for (int i=0;i<length;i++) { if (0 == i%(128*1024*1024)) std::cout<<"i="<<i<<"\n"; v[i]=3; count++; } return count;
On my 64-bit Linux machine, this produces no compile warnings and works fine for lengths below 2 binary billion. It all runs in about one second, including the time for std::vector to allocate and zero initialize 3GB of RAM. But at runtime everything works fine until i overflows and wraps around:
i=0 i=134217728 i=268435456 i=402653184 i=536870912 i=671088640 i=805306368 i=939524096 i=1073741824 i=1207959552 i=1342177280 i=1476395008 i=1610612736 i=1744830464 i=1879048192 i=2013265920 i=-2147483648 ------------------- Caught signal SIGSEGV
When i wraps around to -2147483648, the program is actually trying to write to array element v[-2147483648]. So worse than crashing, this introduces a potential security hole if an attacker can arrange for this index to point to valid memory.
If the program does not access memory, this acts as an infinite loop, since the int i can never reach the loop target.
In the program above, if we replace “long length” with “size_t length”, we at least get a compiler warning about comparison between signed (int) and unsigned (size_t) types. But now the loop silently stops before finishing the correct number of iterations:
i=0 i=134217728 i=268435456 i=402653184 i=536870912 i=671088640 i=805306368 i=939524096 i=1073741824 i=1207959552 i=1342177280 i=1476395008 i=1610612736 i=1744830464 i=1879048192 i=2013265920 Program complete.
Here, i has wrapped around, but the compiler casts it to unsigned before comparing it with length: the loop is essentially converted to “for (int i=0;(size_t)i<length;i++)”. Since casting a negative int i to unsigned size_t results in a huge value, the loop breaks out after 2 binary billion iterations, leaving the rest of the array untouched.
Needless to say, this will result in a very confusing data corruption bug.
Typecasting the comparison to work the opposite way, “for (int i=0;i<(int)length;i++)”, eliminates the warning but results in the long length converting to a negative int, and the loop executes zero iterations.
Hence using “int” as the loop index can result in:
- A crash and/or security hole
- An infinite loop
- Skipping the last loop iterations
- Skipping the loop entirely
Instead you should:
- Prefer C++11 range-based for loops, since they’re clean and reliable
- for (char &element : myVector)
- If you can’t use range-based for, use size_t as a loop index (on 64-bit machines, size_t is 64 bits)
- for (size_t i=0;i<length;i++)
If you’re stuck with int, you can’t reliably process data with more than 2 binary billion entries. In a world where phones can have 8GB of RAM, people regularly process files exceeding 2GB, and your CPU can sling multiple gigs around in under a second, this is not a good idea!