Regular expressions (Regex) in GoLang
Regular expression (Regex) is a special text string for describing a search pattern. Regex is as wildcards for the text strings. It is like a god for String operations which are Includes string, string index, string submatch, string submatch index and replace operations.
Young developers don’t use Regex, unfortunately. Most of them don’t even know Regexp. They write many code lines instead of writing a just a line regexp command. Knowing regexp is a privilege. Every language has built-in regexp library regardless of level.
Regex package in Go?
GoLang standard package name is “regexp”. It uses RE2 syntax standards what is used by many languages like Python, Perl, C.
import “regexp”
Basics
We want to test if our string starts with “b”.
To check if there is a substring matching ^b, use the “MatchString” function. The caret(^) matches the beginning of a text.
match, err := regexp.MatchString(`^b`, "batur")
fmt.Println(match) // true
fmt.Println(err) // nil (regexp is valid)
The dollar($) matches the ending of a text.
match, err := regexp.MatchString(`$r`, "batur")
fmt.Println(match) // true
fmt.Println(err) // nil (regexp is valid)
To check if a full string matches ^b(.*)r$
, anchor the start and the end of the regexp. The syntax (.*) means any set of text.
match, err := regexp.MatchString(`^b(.*)r$`, "batur")
fmt.Println(match) // true
fmt.Println(err) // nil (regexp is valid)
Regexp
Object (Compile)
For more complicated or advanced queries, you must compile a regular expression to create a Regexp object.
re, err := regexp.Compile(`^b(.*)r$`)
// error if regexp invalid
When you write an invalid regex, if you don’t want to get an error, you must use “MustCompile” method. The MustCompile returns panic.
re:= regexp.MustCompile(`^b(.*)r$`)
// panic if regexp invalid
How to find a string (First Match)
Use the FindString method to find the text of the first match. When there is no match, the return will be an empty string.
package main
import (
“fmt”
“regexp”
)func main() {
re := regexp.MustCompile(“b..”)
fmt.Println(re.FindString(“batur orkun”)) //Output: “bat”
fmt.Println(re.FindString(“last branch”)) //Output: “bra”
fmt.Println(re.FindString(“orkun”)) //Output: “”}
How to find an index (Location)
Use the FindStringIndex method to find location. The location will be the first match in a string and two-element slice. The first index of the slice is the beginning index value in the string. The second index of the slice is the ending index value in the string. When there is no match, the return will be a nil value.
package main
import (
“fmt”
“regexp”
)func main() {
re := regexp.MustCompile(“b..”) //Output: [0 3]
fmt.Println(re.FindStringIndex(“batur orkun”)) //Output: [5 8]
fmt.Println(re.FindStringIndex(“last branch”)) //Output: [] (nil slice output)
fmt.Println(re.FindStringIndex(“orkun”))}
How to find all strings (All matches)
Use the FindAllString method to find the text of all matches. When there is no match, the return will be a nil value.
The second parameter of the method is a number argument. if the argument is greater then 0, the function returns the matches which are up to this number.
package main
import (
“fmt”
“regexp”
)func main() {
re := regexp.MustCompile(`a.`)
fmt.Printf(“%q\n”, re.FindAllString(“batur ankara”, -1)) // Output: [“at” “an” “ar”]
fmt.Printf(“%q\n”, re.FindAllString(“batur ankara”, 2)) // Output: [“at” “an”]
fmt.Printf(“%q\n”, re.FindAllString(“batur ankara”, 1)) // Output: [“at”]
fmt.Printf(“%q\n”, re.FindAllString(“orkun”, -1)) // Output: [] (nil slice)}
How to find a string submatch (First Match + Submatch)
To find a string submatch use the FindStringSubmatch
method. This returns the first match of the regular expression and an of the submatches. When there is no match, the return will be a nil value.
package main
import (
“fmt”
“regexp”
)func main() {
re := regexp.MustCompile(“n([a-z]+)([0–9]+)”) //
fmt.Println(re.FindStringSubmatch(“number1234 is wrong”)) // Output: [number1234 umber 1234]
fmt.Println(re.FindStringSubmatch(“number is wrong”)) // Output: []
}
How to Replace
Use the ReplaceAll String method to replace the text of all matches. The method replaces all matches and returns the new text. When there is no match to replace, the return will current text.
package main
import (
“fmt”
“regexp”
)func main() {
re := regexp.MustCompile(“place”)
c := “play”
fmt.Println(re.ReplaceAllString(“replay string”, c)) // Output: “replay string”
fmt.Println(re.ReplaceAllString(“no played”, c)) // Output: “no played”
fmt.Println(re.ReplaceAllString(“palace”, c)) // Output: “palace”
}
How to Split (Explode String)
Use the Split method to explode the text by a separated string. It returns a slice of the substrings between those separated matches. When there is no match, the return will be the current text.
The second parameter of the method is a number argument. if the argument is greater then 0, the function returns the matches which are up to this number.
package main
import(
“fmt”
“regexp”
)func main() {
re := regexp.MustCompile(`\s+`)
fmt.Printf(“%q\n”, re.Split(“the number is 5”, -1)) // Output: [“the” “number” “is” “5”]
fmt.Printf(“%q\n”, re.Split(“batur orkun ankara turkey”, 3)) // Output: [“batur” “orkun” “ankara turkey”]
fmt.Printf(“%q\n”, re.Split(“baturorkun”, 1)) // Output: [“baturorkun”]
}
Basic Regex Cheat Sheet
I can advise you to just memorizing the common character symbols and groupings.
Symbols
- “^” : Matches the beginning of the string.
- “$” : Matches the end of the string.
- “ . ” : Matches any single character, except for line breaks.
- “ * “ : Matches the preceding expression zero or more times.
Characters
- “ \d “ : Any single digit character. Ex: 1, 2, 3, …
- “ \w “ : Any word character (alphanumeric & underscore). Ex: a, A, b, B, …
- “ \W “ : Any character that is not a word character. Ex: *,-,+,=
- “ \D “ : Any character that is not a digit . Ex: a,B,+
- “ \s “ : Whitespace
Classes
- “ [abc] ” : Character Set: Any single character from the character within the brackets. Values: a, b, c
- “ [a-z] “ : Character Set: Character range, start from “a” to “z”, all lowercase letters.Ex: a,b,c, …
- “ [a-z]+” : One or more of any of the characters in the set.
- “ [^a-z] ”: Inside a character set, the ^ is used for negation. In this example, match anything that is NOT a lowercase letter.
Logic
- “ | “ : OR operand. Ex: 22|33
- “ ( … ) ” : Capturing group. Ex : B(arman|eer)
- “ (?: … ) ” : Non-capturing group. Ex: B(?:arman|eer)
Quantifiers
- “ +“ : Matches the preceding expression 1 or more times.
- “ {3} “ : Exactly three times. Ex: d{3} -> 123, 321
- “ ? ” : Preceding expression is optional (Matches 0 or 1 times).
If you want to learn more about regex, you can check this website https://www.rexegg.com/regex-quickstart.html
If you want to try some queries online, you can check these websites:
“ https://regexr.com “ or “ https://regex101.com ”