String Manipulations in Bash
--
You need string manipulation, no matter what you are. You may be a coder in the software team or Linux administrator in the system team or any team member in the DevOps team. Therefore, you must learn string manipulation for different languages. Especially if you are a system/DevOps engineer, probably you have to use different languages but it is nearly impossible you are an expert in string manipulation for many languages. But if you are an expert in Bash, it does not matter which language use when you need advanced string manipulation operations. Because Bash is a universal language in the Linux world. You can call the system command from the main language platform at any time. Then, let’s be expert string manipulation in Bash. I am sure your life will be easy after that. :)
Bash supports a surprising number of string manipulation operations. It is impossible to mention here. But do not panic, I am working on Linux for over many years and I know the most useful ones. However, we should glance at some basic operations, although it sounds very simple. After this fundamental basic knowledges, we will study real examples for advanced operations.
Note: I will assume you know fundamental Linux and Bash knowledge.
Length
We can access the length of a string using the hash (#) operator.
$ VAR=Batur
$ echo ${#VAR}
Output:
5
Substrings
We can extract a substring using the colon (:) operator.
$ VAR=Batur
$ echo ${VAR:1}
$ echo ${VAR:2}
$ echo ${VAR:1:3}
Output:
atur
tur
atu
Substring Match
Following syntax deletes or replaces match of $substring from $string
${string##sub}
The syntax deletes the longest match of $sub from the front of $string
$ VAR="Batur Orkun"
$ echo ${VAR##Batur}
Output:
Orkun
${string%%sub}
The syntax deletes the longest match of $sub from the back of $string
$ VAR="Batur Orkun"
$ echo ${VAR%%Orkun}
Output:
Batur
${string/pattern/replacement}
It matches the pattern in the variable $string, and replace only the first match of the pattern with the replacement.
$ VAR="Batur Orkun"
$ echo ${VAR/r/R}
Output:
BatuR Orkun
${string//pattern/replacement}
Replace all the matches
$ VAR="Batur Orkun"
$ echo ${VAR/r/R}
Output:
BatuR ORkun
Note: Regular Expression (RegEx) is an important tool for string manipulations. You should learn basic Regular Expression at least. Knowing regular expressions will make life easy while struggling with string manipulations. But you must be careful while using RegEx.
For example:
digit="456"if [[ $digit =~ [0-9] ]]; then
echo "$digit is a digit"
else
echo "digit is NO digit "
fi
Output:
456 is a digit
This is another simple example of using RegEx in bash. But I can not say logic is right. Because, if you set the digit to “456a”, the output says it is “digit” again. So we should fix it.
digit="456a"if [[ $digit =~ ^-?[0-9]+$ ]]; then
echo "$digit is a digit"
else
echo "digit is NO digit "
fi
Output:
456 is a NO digit
Basic & Common RegEx Operators:
- The
^
indicates the beginning of the input pattern - The
-
is a literal "-" - The
?
means "0 or 1 of the preceding (-
)" - The
+
means "1 or more of the preceding ([0-9]
)" - The
$
indicates the end of the input pattern
Bash includes many magics for string operations. You can use some utilities which already came installed packages in Linux. For example; grep, sed, and awk are very useful command-line utilities.
grep = global regular expression print
It is unargued that the most popular and used. It uses to search a string in the output of a command and in a file or files.
$ grep "batur" myfile
In this example, grep would loop through every line of the file “myfile” and print out every line that contains the word “batur”
“grep” can take lots of options but some of them are very useful. For example; If you need line numbers, use the “n” option.
$ grep -n "batur" myfile
You can search it in many files: $ grep “batur” myfile
It prints all found lines with the filename. But if you just need filenames, use the “l” option.
$ grep -l "batur" myfile
Imagine that, Grep thought everything that you need. You should glance output of “$ man grep”. But you can find a few useful options below.
- “ -c “ : Print only a count of the lines that contain the pattern.
- “ -l “ : Print only the names of files with matching lines, separated by newline characters.
- “ -i “ : Ignore upper/lower case distinction during comparisons.
- “ -n “ : Print each line by its line number in the file. ( first line is 1).
- “ -v “: Print all lines except those that contain the pattern.
- “ -r “: It recursively searches the pattern in all the files in the current directory and all its sub-directories.
- “ -w “: It searches the exact word
For example, I think the option”-i” is important. Normally, searches happen case-sensitive. But, If you want to search by not -case-sensitive, you must use “-i” option.
$ grep -i "batur" myfile
It founds also words like “Batur” or “BATUR” or “baTur”. There is even more beautiful than that: you can use RegEx with “grep”.
For example; if you want to get lines only ending with “Orkun”:
$ grep "Orkun$" myfile
if you want to get lines just including “Batur Orkun”:
$ grep "^Batur Orkun$" myfile
Notice: pgrep is a special grep command. It is an acronym that stands for “Process-ID Global Regular Expressions Print”. pgrep looks through the currently running processes and lists the process IDs. It is useful when all you want to know is the process id integer of a process.
$ pgrep nginx
If there are running processes names matching “nginx”, their PIDs will be displayed on the screen. If no matches are found, the output is empty.
Example successful output
4567
76788
234
sed: stream editor
SED performs editing operations on text coming from standard input or a file. It can do lots of functions on files like, searching, find and replace, insertion or deletion. SED supports regular expression which allows it to perform complex pattern matching.
For example; You want to change the content of a file.
$ sed 's/devops/DevOps/' myfile
It performs changing all “devops” in content to “DevOps”
The output content will be the changed content of the file. You can save the content to another file with the “>” pointer or edit the original file.
$ sed "s/devops/DevOps/" myfile > newfile
or
$ sed -ni "s/devops/DevOps/" myfile
“-i” : Edit files in place
“-n” : Suppress automatic printing of pattern space. ( — quiet, — silent )
We use “-n”, because we don’t want to view output content.
If you use sed at all, you will probably want to know these commands.
“s” : substitute
The “s” command is probably the most important in sed and has a lot of different options. Its basic concept is simple: the s command attempts to match the pattern space against the supplied regexp. if the match is successful, then that portion of the pattern space which was matched is replaced with replacement. We used it above.
Syntax: "s/regexp/replacement/flags"
“p”: Print
Print out the pattern space (to the standard output).
Syntax: "/pattern/ command"
For example; You want to list files or directories which names include “picus”
$ ls -l | sed -n '/picus/ p'
Maybe just you need directories, not files
$ ls -l | sed -n '/picus/ p' | grep '^d'
You can delete unwanted lines by SED
$ sed -i '/picus/ d' myfile
This command deletes the lines included “picus” word and edits your original file because of using the “-i” option.
“d”: Delete command
Delete the pattern space
AWK
A text pattern scanning and processing language. Yes, You read right! AWK is a text-processing programming language. It is a direct predecessor of PERL and is still very useful in modern systems.
Awk command can be used to :
- Arithmetic and string operations.
- Scans a file line by line.
- Splits each input line into fields.
- Compares input line/fields to a pattern.
- Performs actions on matched lines.
- Produce formatted reports.
- Conditionals and loops.
AWK can have an optional 3 parts; BEGIN{} , MAIN {} and END{} sections.
BEGIN { …. initialization awk commands …}
{ …. man awk commands …}
END { …. finalization awk commands …}
When you run “ls -l” command on the terminal, you may see the output like below.
-rw-rw-r — . 1 centos centos 535 Mar 20 2020 report-security-group-unused.py
-rw-rw-r — . 1 centos centos 1890 Feb 24 2020 report-security-group.py
-rw-rw-r — . 1 centos centos 7836 Feb 14 2020 report-vectors-hb-play.py
For example; Find the sum size of files listed
$ ls -l | awk 'BEGIN {sum=0} {sum=sum+$5} END {print sum}'
Output:
10261
If you separated this output by space, the size column will be fifth.
An input line is typically made up of fields separated by white space If you want to use a different separator, use the FS option by the regular expression. The fields are called $1, $2, …, while $0 refers to the entire line. If FS is null, the input line is split into one field per character.
You must have understood how to find “$5”
$ echo "A-B-C-D-E" | awk -F "-" '{ print $2 }'
Output:
B
$ echo "A-B-C-D-E" | awk -F "-" '{ print $1,$5 }'
Output:
A E
You can operate on files. Example file called “myfile”:
1) Name Surname City
2) Batur Orkun Ankara 70
3) HaticeEbru Orkun Istanbul 95
For example; Print name and city columns in a simple format.
$ awk '{print $2 “=” $4}' myfile
Output:
Name=City
Batur=Ankara
HaticeEbru=Istanbul
$ awk '/Orkun/ {print $0}' myfile
Output: ( The lines included “Orkun” )
2) Batur Orkun Ankara 70
3) HaticeEbru Orkun Istanbul 95
$ awk '/Orkun/{++cnt} END {print “Count = “, cnt}' myfile
Output:
Count = 2
$ awk 'length($2) > 5' myfile
“length” function returns the length of data
Output: ( The lines length of the second input data is greater than 5 )
3) HaticeEbru Orkun Istanbul 95
What if you want to get lines fifth column value has greater than 70:
$awk '{if ($5>70) print}' myfile
Output:
3) HaticeEbru Orkun Istanbul 95
What if you want to get line numbers but get rid of the parenthesis:
$ awk '{ print substr( $1, 0,1 ) }' myfile
“substr” function returns the portion of the string specified by the offset and length parameters.
Output:
1
2
3
4
What if you don’t want to print the first line:
$ awk '{if(NR>1)print}' myfile
NR variable has line number
Output:
2) Batur Orkun Ankara 70
3) HaticeEbru Orkun Istanbul 95
What if you want to list lines that have 4 columns:
$ awk '{if (NF==4) print}' myfile
Output:
1) Name Surname City
Notice: You can see different awk types:
NAWK stands for “New AWK”. This is AT&T’s version of the Awk.
MAWK, a fast implementation that mostly supports to standard features. it is smaller and faster than gawk but has limits on nf and “sprint” buffer size.
GAWK stands for “GNU AWK”. All Linux distributions come with GAWK. This is fully compatible with AWK and NAWK.
Take care and do not forget…:)
All the best people in life seem to like LINUX. (S. Wozniak)