NaN-boxing in Go
Recently, I was reading this Go proposal to disallow NaN keys in maps whose keys are floating-point values. The proposal and accompanying discussion are interesting in their own right, as is the blog post to which it also links. While reading, I went to refresh my understanding of the structure of IEEE 754 floating-point values, and in particular how the “special cases” of ±infinity and NaN are handled.
For 32-bit IEEE 754 floats, one bit is used as the sign bit, eight bits are used for the exponent, and the remaining twenty-three bits are used for the significand, also know as the mantissa:
1 8 23
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
↑ exp mantissa
|
sign bit
64-bit floats are similar, with one bit for the sign, eleven bits for the exponent, and 52 bits for the significand. The special cases of ±infinity and NaN are represented by setting the exponent to all-ones. Then, the significand indicates the case. An all-zeros significand represents infinity, with sign controlled by the sign bit. Any other value in the significand represents NaN, with the most significant bit of the significand distinguishing between a quiet NaN and a signaling NaN. This means that there are 2^23-1 representations of NaN for a 32-bit float, and 2^52-1 for a 64-bit float! This is not an oversight in the specification; instead, these extra bits are intended to be used as a payload to indicate what went wrong in the preceding calculation that caused the result to be NaN. However, we can instead misuse these bits to store arbitrary data, a technique referred to as NaN-boxing, and is well-known. For example, Mozilla’s SpiderMonkey JS interpreter uses it, and here’s some discussion on why. Naturally, it’s also possible to implement in Go:
package main
import (
"fmt"
"math"
)
func main() {
e := encode("Hello world!")
fmt.Println(e)
d := decode(e)
fmt.Println(d)
}
func encode(s string) []float64 {
f := make([]float64, len(s))
for i, c := range s {
f[i] = math.Float64frombits(math.Float64bits(math.NaN()) & ^uint64(1<<8-1) | uint64(uint8(c)))
}
return f
}
func decode(f []float64) string {
b := make([]rune, len(f))
for i, c := range f {
b[i] = rune(math.Float64bits(c) & uint64(1<<8-1))
}
return string(b)
}
Running this prints:
[NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]
Hello world!
You can try it yourself here. This is a pretty naive implementation, with at least the following faults:
- It only encodes one ASCII character per float64, wasting a lot of space
- It silently encodes characters beyond ASCII incorrectly, since it simply truncates everything beyond the first byte
- It cannot encode
0x00
, because that would be positive infinity
All of these are solvable, of course, but this is just an example. For a more thorough, full-featured implementation of NaN-boxing in C, check out this project.