NaN-boxing in Go

Recently, I was reading this Go proposal to disallow NaN keys in maps whose keys are floating-point values. The proposal and accompanying discussion are interesting in their own right, as is the blog post to which it also links. While reading, I went to refresh my understanding of the structure of IEEE 754 floating-point values, and in particular how the “special cases” of ±infinity and NaN are handled.

For 32-bit IEEE 754 floats, one bit is used as the sign bit, eight bits are used for the exponent, and the remaining twenty-three bits are used for the significand, also know as the mantissa:

1    8                23
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
↑   exp            mantissa
|
sign bit

64-bit floats are similar, with one bit for the sign, eleven bits for the exponent, and 52 bits for the significand. The special cases of ±infinity and NaN are represented by setting the exponent to all-ones. Then, the significand indicates the case. An all-zeros significand represents infinity, with sign controlled by the sign bit. Any other value in the significand represents NaN, with the most significant bit of the significand distinguishing between a quiet NaN and a signaling NaN. This means that there are 2^23-1 representations of NaN for a 32-bit float, and 2^52-1 for a 64-bit float! This is not an oversight in the specification; instead, these extra bits are intended to be used as a payload to indicate what went wrong in the preceding calculation that caused the result to be NaN. However, we can instead misuse these bits to store arbitrary data, a technique referred to as NaN-boxing, and is well-known. For example, Mozilla’s SpiderMonkey JS interpreter uses it, and here’s some discussion on why. Naturally, it’s also possible to implement in Go:

package main

import (
	"fmt"
	"math"
)

func main() {
	e := encode("Hello world!")
	fmt.Println(e)
	d := decode(e)
	fmt.Println(d)
}

func encode(s string) []float64 {
	f := make([]float64, len(s))
	for i, c := range s {
		f[i] = math.Float64frombits(math.Float64bits(math.NaN()) & ^uint64(1<<8-1) | uint64(uint8(c)))
	}
	return f
}

func decode(f []float64) string {
	b := make([]rune, len(f))
	for i, c := range f {
		b[i] = rune(math.Float64bits(c) & uint64(1<<8-1))
	}
	return string(b)
}

Running this prints:

[NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]
Hello world!

You can try it yourself here. This is a pretty naive implementation, with at least the following faults:

It only encodes one ASCII character per float64, wasting a lot of space
It silently encodes characters beyond ASCII incorrectly, since it simply truncates everything beyond the first byte
It cannot encode 0x00, because that would be positive infinity

All of these are solvable, of course, but this is just an example. For a more thorough, full-featured implementation of NaN-boxing in C, check out this project.