# How many decimal digits are there, anyways?

Recently I was reviewing a PR opened by a colleague of mine and noticed a function like the following:

```
+ (BOOL)validateNumber:(NSString *)number {
NSCharacterSet *invertedDecimalDigitCharacterSet =
[[NSCharacterSet decimalDigitCharacterSet] invertedSet];
NSRange range = [number rangeOfCharacterFromSet:invertedDecimalDigitCharacterSet];
// Ensure that the given string is digits only
return range.location == NSNotFound
}
```

On a hunch, I checked the docs for `decimalDigitCharacterSet`

:

A character set containing the characters in the category of Decimal Numbers. … Informally, this set is the set of all characters used to represent the decimal values 0 through 9. These characters include, for example, the decimal digits of the Indic scripts and Arabic.

That’s definitely going to be trouble. `validateNumber`

is supposed to return `YES`

only when the input string is made
up of the decimal digits 0 through 9, but `decimalDigitCharacterSet`

includes every character in the Unicode category
Nd, also known as Decimal_Number (see the full list of categories here). Let’s try a few:

```
let s = CharacterSet.decimalDigits
// U+0031 DIGIT ONE
s.contains("1") // true as expected
// U+1D7D9 MATHEMATICAL DOUBLE-STRUCK DIGIT ONE
s.contains("𝟙") // true!
// U+0967 DEVANAGARI DIGIT ONE
s.contains("१") // true!
// U+1811 MONGOLIAN DIGIT ONE
s.contains("᠑") // true!
```

Note that for convenience I’m using Swift’s CharacterSet, which is bridged to `NSCharacterSet`

, so observations about
it apply equally to its Objective-C counterpart.

Clearly this isn’t doing what my colleague intended. Just how many of these Nd characters are there? It seems like we
ought to be able to simply ask the `CharacterSet`

how many elements it has with `count`

, but it doesn’t conform to
`Collection`

, nor does `NSCharacterSet`

support any simple means of obtaining the number of characters represented, so
we’ll just have to do it the hard way:

```
import Foundation
func sizeOf(set: CharacterSet) -> Int {
return (0...Int(0x10FFFF))
.compactMap { Unicode.Scalar($0) }
.filter { set.contains($0) }
.count
}
let s = CharacterSet.decimalDigits
print(sizeOf(set: s)) // 610
```

`sizeOf(set:)`

enumerates every code point from zero to the maximum valid value, 0x10FFFF, and checks whether each value
is in the set. `compactMap`

lets us ignore cases where `Unicode.Scalar`

returns `nil`

, as it does in the range 0xD800 to
0xDFFF, because these are invalid code points reserved for use as surrogates in UTF-16 (incidentally, UTF-16 is also the
reason that 0x10FFFF is the maximum valid code point).

According to this function, then, there are not just ten decimal number characters, as you might expect, but *six
hundred* and ten! To fix the bug, we’ll just have to be more explicit:

```
+ (BOOL)validateAccountNumber:(NSString *)number {
NSCharacterSet *invertedArabicDecimalDigitCharacterSet =
[[NSCharacterSet characterSetWithCharactersInString:@"0123456789"] invertedSet];
NSRange range = [number rangeOfCharacterFromSet:invertedArabicDecimalDigitCharacterSet];
// Ensure that the given string is digits only
return range.location == NSNotFound
}
```

Amusingly, the accepted answer for this question on Stack Overflow also gets this wrong. I can hardly blame them!