Match 比對

Go 的 regexp 套件提供的規則表示式的支援，至於可使用的規則表示式，在 regexp/syntax 有說明。

在 Go 中要撰寫規則表示式，使用反引號（`）是比較方便的做法，這樣就不用轉譯 \。

詮譯字元在規則表示式中有特殊意義，例如 $ ^ * ( ) + = { } [ ] | \ : . ? 等，若要比對這些字元，則必須加上轉義（Escape）符號，即使 Python 有原始字串表示，自己處理這些事也還是麻煩，這時可以使用 regexp 的 QuoteMeta 函式來代勞：

func QuoteMeta(s string) string

例如：

fmt.Println(regexp.QuoteMeta(`main.exe`)) // main\.exe

在 regexp 套件中提供其他函式，主要就是比對來源中，是否有符合規則表示式的部份，來源可以是 []byte、io.RuneReader 或者是 string，比對結果會是布林值，若是規則表示式有誤，錯誤就不會是 nil：

func Match(pattern string, b []byte) (matched bool, err error)
func MatchReader(pattern string, r io.RuneReader) (matched bool, err error)
func MatchString(pattern string, s string) (matched bool, err error)

例如：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    matched, err := regexp.MatchString(`\d{3}`, "Kaohsiung 803, Road 12")
    fmt.Println(matched, err)
    matched, err = regexp.MatchString(`\d{4}-\d{6}`, "0970-168168")
    fmt.Println(matched, err)
    matched, err = regexp.MatchString(`\d{4}-\d{6}`, "Phone: 0970-168168")
    fmt.Println(matched, err)
}

可以在規則表示式中使用嵌入旗標表示法（Embedded Flag Expression）。例如 (?i)dog，表示不區分大小寫，若想對特定分組嵌入旗標，可以使用 (?i:dog) 這樣的語法。

至於 Go 中可用的 POSIX 字元類：

[[:alnum:]]：字母與數字（等於 [0-9A-Za-z]）
[[:alpha:]]：字母（等於 [A-Za-z]）
[[:ascii:]]：ASCII（等於 [\x00-\x7F]）
[[:blank:]]：空白或 Tab（等於 [\t ]）
[[:cntrl:]]：控制字元（等於 [\x00-\x1F\x7F]）
[[:digit:]]：數字（等於 [0-9]）
[[:graph:]]：可見字元（等於 [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_{|}~]`）
[[:lower:]]：小寫字母（等於 [a-z]）
[[:print:]]：可列印字元（等於 [ -~]、[ [:graph:]]）
[[:punct:]]：標點符號（等於 [!-/:-@[-{-~]`）
[[:space:]]：全部的空白（等於 [\t\n\v\f\r ]）
[[:upper:]]：大寫（等於 [A-Z]）
[[:word:]]：單字字元（等於 [0-9A-Za-z_]）
[[:xdigit:]]：十六進位數字（等於 [0-9A-Fa-f]）

在 Unicode 特性的支援上，使用 \p、\P 的方式，表示具有或不具有指定的特性，\pN、\PN 的 N 是單一字母，若要多個字母組合，可以使用 \p{...}、\P{...}。

例如〈一般分類特性〉，\pL 表示字母（Letter），\pN 表示數字（Number）等，若要進一步指定子特性，例如 \p{Lu} 表示大寫字母、\p{Ll} 表示小寫字母：

fmt.Println(regexp.MatchString(`\p{Ll}`, "a")) // true <nil>
fmt.Println(regexp.MatchString(`\p{Lu}`, "a")) // false <nil>

來個有趣的比對吧！𝟏𝟐𝟑𝟜𝟝𝟞𝟩𝟪𝟫𝟬𝟭𝟮𝟯𝟺𝟻𝟼 都是十進位數字：

fmt.Println(regexp.MatchString(`\p{Nd}`, "𝟏𝟐𝟑𝟜𝟝𝟞𝟩𝟪𝟫𝟬𝟭𝟮𝟯𝟺𝟻𝟼")) // true <nil>

數字呢？²³¹¼½¾𝟏𝟐𝟑𝟜𝟝𝟞𝟩𝟪𝟫𝟬𝟭𝟮𝟯𝟺𝟻𝟼㉛㉜㉝ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾⅿ 都是：

fmt.Println(regexp.MatchString(`\p{N}`, "²³¹¼½¾𝟏𝟐𝟑𝟜𝟝𝟞𝟩𝟪𝟫𝟬𝟭𝟮𝟯𝟺𝟻𝟼㉛㉜㉝ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾⅿ")) // true <nil>

有的語言可能會使用多種文字來書寫，例如日語就包含了漢字、平假名、片假名等文字，有的語言只使用一種文字，例如泰文。Unicode 將碼點群組為文字（script）特性上，測試時只要寫上文字特性名稱就可以了，例如測試漢字、希臘文：

fmt.Println(regexp.MatchString(`\p{Han}`, "林"))  // true <nil>
fmt.Println(regexp.MatchString(`\p{Greek}`, "α")) // true <nil>