String Manipulations in Bash

You need string manipulation, no matter what you are. You may be a coder in the software team or Linux administrator in the system team or any team member in the DevOps team. Therefore, you must learn string manipulation for different languages. Especially if you are a system/DevOps engineer, probably you have to use different languages but it is nearly impossible you are an expert in string manipulation for many languages. But if you are an expert in Bash, it does not matter which language use when you need advanced string manipulation operations. Because Bash is a universal language in the Linux world. You can call the system command from the main language platform at any time. Then, let’s be expert string manipulation in Bash. I am sure your life will be easy after that. :)

Bash supports a surprising number of string manipulation operations. It is impossible to mention here. But do not panic, I am working on Linux for over many years and I know the most useful ones. However, we should glance at some basic operations, although it sounds very simple. After this fundamental basic knowledges, we will study real examples for advanced operations.

Note: I will assume you know fundamental Linux and Bash knowledge.

Length

$ VAR=Batur
$ echo ${#VAR}

Output:

5

Substrings

$ VAR=Batur
$ echo ${VAR:1}
$ echo ${VAR:2}
$ echo ${VAR:1:3}

Output:

atur
tur
atu

Substring Match

${string##sub}

The syntax deletes the longest match of $sub from the front of $string

$ VAR="Batur Orkun"
$ echo ${VAR##Batur}

Output:

Orkun

${string%%sub}

The syntax deletes the longest match of $sub from the back of $string

$ VAR="Batur Orkun"
$ echo ${VAR%%Orkun}

Output:

Batur

${string/pattern/replacement}

It matches the pattern in the variable $string, and replace only the first match of the pattern with the replacement.

$ VAR="Batur Orkun"
$ echo ${VAR/r/R}

Output:

BatuR Orkun

${string//pattern/replacement}

Replace all the matches

$ VAR="Batur Orkun"
$ echo ${VAR/r/R}

Output:

BatuR ORkun

Note: Regular Expression (RegEx) is an important tool for string manipulations. You should learn basic Regular Expression at least. Knowing regular expressions will make life easy while struggling with string manipulations. But you must be careful while using RegEx.

For example:

digit="456"if [[ $digit =~ [0-9] ]]; then
echo "$digit is a digit"
else
echo "digit is NO digit "
fi

Output:

456 is a digit

This is another simple example of using RegEx in bash. But I can not say logic is right. Because, if you set the digit to “456a”, the output says it is “digit” again. So we should fix it.

digit="456a"if [[ $digit =~ ^-?[0-9]+$ ]]; then
echo "$digit is a digit"
else
echo "digit is NO digit "
fi

Output:

456 is a NO digit

Basic & Common RegEx Operators:

  • The ^ indicates the beginning of the input pattern
  • The - is a literal "-"
  • The ? means "0 or 1 of the preceding (-)"
  • The + means "1 or more of the preceding ([0-9])"
  • The $ indicates the end of the input pattern

Bash includes many magics for string operations. You can use some utilities which already came installed packages in Linux. For example; grep, sed, and awk are very useful command-line utilities.

grep = global regular expression print

$ grep "batur" myfile

In this example, grep would loop through every line of the file “myfile” and print out every line that contains the word “batur”

“grep” can take lots of options but some of them are very useful. For example; If you need line numbers, use the “n” option.

$ grep -n "batur" myfile

You can search it in many files: $ grep “batur” myfile

It prints all found lines with the filename. But if you just need filenames, use the “l” option.

$ grep -l "batur" myfile

Imagine that, Grep thought everything that you need. You should glance output of “$ man grep”. But you can find a few useful options below.

  • “ -c “ : Print only a count of the lines that contain the pattern.
  • “ -l “ : Print only the names of files with matching lines, separated by newline characters.
  • “ -i “ : Ignore upper/lower case distinction during comparisons.
  • “ -n “ : Print each line by its line number in the file. ( first line is 1).
  • “ -v “: Print all lines except those that contain the pattern.
  • “ -r “: It recursively searches the pattern in all the files in the current directory and all its sub-directories.
  • “ -w “: It searches the exact word

For example, I think the option”-i” is important. Normally, searches happen case-sensitive. But, If you want to search by not -case-sensitive, you must use “-i” option.

$ grep -i "batur" myfile

It founds also words like “Batur” or “BATUR” or “baTur”. There is even more beautiful than that: you can use RegEx with “grep”.

For example; if you want to get lines only ending with “Orkun”:

$ grep "Orkun$" myfile

if you want to get lines just including “Batur Orkun”:

$ grep "^Batur Orkun$" myfile

Notice: pgrep is a special grep command. It is an acronym that stands for “Process-ID Global Regular Expressions Print”. pgrep looks through the currently running processes and lists the process IDs. It is useful when all you want to know is the process id integer of a process.

$ pgrep nginx

If there are running processes names matching “nginx”, their PIDs will be displayed on the screen. If no matches are found, the output is empty.

Example successful output

4567
76788
234

sed: stream editor

For example; You want to change the content of a file.

$ sed 's/devops/DevOps/' myfile

It performs changing all “devops” in content to “DevOps”

The output content will be the changed content of the file. You can save the content to another file with the “>” pointer or edit the original file.

$ sed "s/devops/DevOps/" myfile > newfile

or

$ sed -ni "s/devops/DevOps/" myfile

“-i” : Edit files in place

“-n” : Suppress automatic printing of pattern space. ( — quiet, — silent )

We use “-n”, because we don’t want to view output content.

If you use sed at all, you will probably want to know these commands.

“s” : substitute

Syntax: "s/regexp/replacement/flags"

“p”: Print

Syntax: "/pattern/ command"

For example; You want to list files or directories which names include “picus”

$ ls -l | sed -n '/picus/ p'

Maybe just you need directories, not files

$ ls -l | sed -n '/picus/ p' | grep '^d'

You can delete unwanted lines by SED

$ sed -i '/picus/ d' myfile

This command deletes the lines included “picus” word and edits your original file because of using the “-i” option.

“d”: Delete command

AWK

Awk command can be used to :

  • Arithmetic and string operations.
  • Scans a file line by line.
  • Splits each input line into fields.
  • Compares input line/fields to a pattern.
  • Performs actions on matched lines.
  • Produce formatted reports.
  • Conditionals and loops.

AWK can have an optional 3 parts; BEGIN{} , MAIN {} and END{} sections.

BEGIN { …. initialization awk commands …}
{ …. man awk commands …}
END { …. finalization awk commands …}

When you run “ls -l” command on the terminal, you may see the output like below.

-rw-rw-r — . 1 centos centos 535 Mar 20 2020 report-security-group-unused.py
-rw-rw-r — . 1 centos centos 1890 Feb 24 2020 report-security-group.py
-rw-rw-r — . 1 centos centos 7836 Feb 14 2020 report-vectors-hb-play.py

For example; Find the sum size of files listed

$ ls -l | awk 'BEGIN {sum=0} {sum=sum+$5} END {print sum}'

Output:

10261

If you separated this output by space, the size column will be fifth.

An input line is typically made up of fields separated by white space If you want to use a different separator, use the FS option by the regular expression. The fields are called $1, $2, …, while $0 refers to the entire line. If FS is null, the input line is split into one field per character.

You must have understood how to find “$5”

$ echo "A-B-C-D-E" | awk -F "-" '{ print $2 }'

Output:

B

$ echo "A-B-C-D-E" | awk -F "-" '{ print $1,$5 }'

Output:

A E

You can operate on files. Example file called “myfile”:

1) Name        Surname   City 
2) Batur Orkun Ankara 70
3) HaticeEbru Orkun Istanbul 95

For example; Print name and city columns in a simple format.

$ awk '{print $2 “=” $4}' myfile

Output:

Name=City
Batur=Ankara
HaticeEbru=Istanbul

$ awk '/Orkun/ {print $0}' myfile

Output: ( The lines included “Orkun” )

2) Batur Orkun Ankara 70
3) HaticeEbru Orkun Istanbul 95

$ awk '/Orkun/{++cnt} END {print “Count = “, cnt}' myfile

Output:

Count = 2

$ awk 'length($2) > 5' myfile

“length” function returns the length of data
Output: ( The lines length of the second input data is greater than 5 )

3) HaticeEbru Orkun Istanbul 95

What if you want to get lines fifth column value has greater than 70:

$awk '{if ($5>70) print}' myfile

Output:

3) HaticeEbru Orkun Istanbul 95

What if you want to get line numbers but get rid of the parenthesis:

$ awk '{ print substr( $1, 0,1 ) }' myfile

“substr” function returns the portion of the string specified by the offset and length parameters.

Output:

1
2
3
4

What if you don’t want to print the first line:

$ awk '{if(NR>1)print}' myfile

NR variable has line number

Output:

2) Batur Orkun Ankara 70
3) HaticeEbru Orkun Istanbul 95

What if you want to list lines that have 4 columns:

$ awk '{if (NF==4) print}' myfile

Output:

1) Name Surname City

Notice: You can see different awk types:

NAWK stands for “New AWK”. This is AT&T’s version of the Awk.

MAWK, a fast implementation that mostly supports to standard features. it is smaller and faster than gawk but has limits on nf and “sprint” buffer size.

GAWK stands for “GNU AWK”. All Linux distributions come with GAWK. This is fully compatible with AWK and NAWK.

Take care and do not forget…:)

All the best people in life seem to like LINUX. (S. Wozniak)

DevOps & Software & Architect & Linux Geek — http://baturorkun.com