Runes and String Encoding

In many programming languages (cough, C, cough), a "character" is a single byte. Using ASCIIarrow-up-right encoding, the standard for the C programming language, we can represent 128 characters with 7 bits. This is enough for the English alphabet, numbers, and some special characters.

In Go, strings are just sequences of bytes: they can hold arbitrary data. However, Go also has a special type, runearrow-up-right, which is an alias for int32. This means that a rune is a 32-bit integer, which is large enough to hold any Unicodearrow-up-right code point.

When you're working with strings, you need to be aware of the encoding (bytes -> representation). Go uses UTF-8arrow-up-right encoding, which is a variable-length encoding.

What Does This Mean?

There are 2 main takeaways:

  • When you need to work with individual characters in a string, you should use the rune type. It breaks strings up into their individual characters, which can be more than one byte long.

  • We can include a wide variety of Unicode characters in our strings, such as emojis and Chinese characters, and Go will handle them just fine.

Assignment

Boots is a bear. (Not a dog, haters).

1

Run the code as-is

Notice that the simple string "boots" has 5 bytes, and 5 runes (characters).

main.go
package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
	const name = "boots"
	fmt.Printf("constant 'name' byte length: %d\n", len(name))
	fmt.Printf("constant 'name' rune length: %d\n", utf8.RuneCountInString(name))
	fmt.Println("=====================================")
	fmt.Printf("Hi %s, so good to have you back in the arcanum\n", name)
}

Output:

RESULTS

constant 'name' byte length: 5
constant 'name' rune length: 5
=====================================
Hi boots, so good to have you back in the arcanum
2

Update the name variable to the bear emoji

Replace the word "boots" with the bear emojiarrow-up-right:

🐻
main.go
package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
	const name = "🐻"
	fmt.Printf("constant 'name' byte length: %d\n", len(name))
	fmt.Printf("constant 'name' rune length: %d\n", utf8.RuneCountInString(name))
	fmt.Println("=====================================")
	fmt.Printf("Hi %s, so good to have you back in the arcanum\n", name)
}

Output:

RESULTS:

constant 'name' byte length: 4
constant 'name' rune length: 1
=====================================
Hi 🐻, so good to have you back in the arcanum