String Manipulations in Bash

Batur Orkun
8 min readDec 18, 2020

You need string manipulation, no matter what you are. You may be a coder in the software team or Linux administrator in the system team or any team member in the DevOps team. Therefore, you must learn string manipulation for different languages. Especially if you are a system/DevOps engineer, probably you have to use different languages but it is nearly impossible you are an expert in string manipulation for many languages. But if you are an expert in Bash, it does not matter which language use when you need advanced string manipulation operations. Because Bash is a universal language in the Linux world. You can call the system command from the main language platform at any time. Then, let’s be expert string manipulation in Bash. I am sure your life will be easy after that. :)

Bash supports a surprising number of string manipulation operations. It is impossible to mention here. But do not panic, I am working on Linux for over many years and I know the most useful ones. However, we should glance at some basic operations, although it sounds very simple. After this fundamental basic knowledges, we will study real examples for advanced operations.

Note: I will assume you know fundamental Linux and Bash knowledge.

Length

We can access the length of a string using the hash (#) operator.

$ VAR=Batur
$ echo ${#VAR}

Output:

5

Substrings

We can extract a substring using the colon (:) operator.

$ VAR=Batur
$ echo ${VAR:1}
$ echo ${VAR:2}
$ echo ${VAR:1:3}

Output:

atur
tur
atu

Substring Match

Following syntax deletes or replaces match of $substring from $string

${string##sub}

The syntax deletes the longest match of $sub from the front of $string

$ VAR="Batur Orkun"
$ echo ${VAR##Batur}

Output:

Orkun

${string%%sub}

The syntax deletes the longest match of $sub from the back of $string

$ VAR="Batur Orkun"
$ echo ${VAR%%Orkun}

Output:

Batur

${string/pattern/replacement}

It matches the pattern in the variable $string, and replace only the first match of the pattern with the replacement.

$ VAR="Batur Orkun"
$ echo ${VAR/r/R}

Output:

BatuR Orkun

${string//pattern/replacement}

Replace all the matches

$ VAR="Batur Orkun"
$ echo ${VAR/r/R}

Output:

BatuR ORkun

Note: Regular Expression (RegEx) is an important tool for string manipulations. You should learn basic Regular Expression at least. Knowing regular expressions will make life easy while struggling with string manipulations. But you must be careful while using RegEx.

For example:

digit="456"if [[ $digit =~ [0-9] ]]; then
echo "$digit is a digit"
else
echo "digit is NO digit "
fi

Output:

456 is a digit

This is another simple example of using RegEx in bash. But I can not say logic is right. Because, if you set the digit to “456a”, the output says it is “digit” again. So we should fix it.

digit="456a"if [[ $digit =~ ^-?[0-9]+$ ]]; then
echo "$digit is a digit"
else
echo "digit is NO digit "
fi

Output:

456 is a NO digit

Basic & Common RegEx Operators:

  • The ^ indicates the beginning of the input pattern
  • The - is a literal "-"
  • The ? means "0 or 1 of the preceding (-)"
  • The + means "1 or more of the preceding ([0-9])"
  • The $ indicates the end of the input pattern

Bash includes many magics for string operations. You can use some utilities which already came installed packages in Linux. For example; grep, sed, and awk are very useful command-line utilities.

grep = global regular expression print

It is unargued that the most popular and used. It uses to search a string in the output of a command and in a file or files.

$ grep "batur" myfile

In this example, grep would loop through every line of the file “myfile” and print out every line that contains the word “batur”

“grep” can take lots of options but some of them are very useful. For example; If you need line numbers, use the “n” option.

$ grep -n "batur" myfile

You can search it in many files: $ grep “batur” myfile

It prints all found lines with the filename. But if you just need filenames, use the “l” option.

$ grep -l "batur" myfile

Imagine that, Grep thought everything that you need. You should glance output of “$ man grep”. But you can find a few useful options below.

  • “ -c “ : Print only a count of the lines that contain the pattern.
  • “ -l “ : Print only the names of files with matching lines, separated by newline characters.
  • “ -i “ : Ignore upper/lower case distinction during comparisons.
  • “ -n “ : Print each line by its line number in the file. ( first line is 1).
  • “ -v “: Print all lines except those that contain the pattern.
  • “ -r “: It recursively searches the pattern in all the files in the current directory and all its sub-directories.
  • “ -w “: It searches the exact word

For example, I think the option”-i” is important. Normally, searches happen case-sensitive. But, If you want to search by not -case-sensitive, you must use “-i” option.

$ grep -i "batur" myfile

It founds also words like “Batur” or “BATUR” or “baTur”. There is even more beautiful than that: you can use RegEx with “grep”.

For example; if you want to get lines only ending with “Orkun”:

$ grep "Orkun$" myfile

if you want to get lines just including “Batur Orkun”:

$ grep "^Batur Orkun$" myfile

Notice: pgrep is a special grep command. It is an acronym that stands for “Process-ID Global Regular Expressions Print”. pgrep looks through the currently running processes and lists the process IDs. It is useful when all you want to know is the process id integer of a process.

$ pgrep nginx

If there are running processes names matching “nginx”, their PIDs will be displayed on the screen. If no matches are found, the output is empty.

Example successful output

4567
76788
234

sed: stream editor

SED performs editing operations on text coming from standard input or a file. It can do lots of functions on files like, searching, find and replace, insertion or deletion. SED supports regular expression which allows it to perform complex pattern matching.

For example; You want to change the content of a file.

$ sed 's/devops/DevOps/' myfile

It performs changing all “devops” in content to “DevOps”

The output content will be the changed content of the file. You can save the content to another file with the “>” pointer or edit the original file.

$ sed "s/devops/DevOps/" myfile > newfile

or

$ sed -ni "s/devops/DevOps/" myfile

“-i” : Edit files in place

“-n” : Suppress automatic printing of pattern space. ( — quiet, — silent )

We use “-n”, because we don’t want to view output content.

If you use sed at all, you will probably want to know these commands.

“s” : substitute

The “s” command is probably the most important in sed and has a lot of different options. Its basic concept is simple: the s command attempts to match the pattern space against the supplied regexp. if the match is successful, then that portion of the pattern space which was matched is replaced with replacement. We used it above.

Syntax: "s/regexp/replacement/flags"

“p”: Print

Print out the pattern space (to the standard output).

Syntax: "/pattern/ command"

For example; You want to list files or directories which names include “picus”

$ ls -l | sed -n '/picus/ p'

Maybe just you need directories, not files

$ ls -l | sed -n '/picus/ p' | grep '^d'

You can delete unwanted lines by SED

$ sed -i '/picus/ d' myfile

This command deletes the lines included “picus” word and edits your original file because of using the “-i” option.

“d”: Delete command

Delete the pattern space

AWK

A text pattern scanning and processing language. Yes, You read right! AWK is a text-processing programming language. It is a direct predecessor of PERL and is still very useful in modern systems.

Awk command can be used to :

  • Arithmetic and string operations.
  • Scans a file line by line.
  • Splits each input line into fields.
  • Compares input line/fields to a pattern.
  • Performs actions on matched lines.
  • Produce formatted reports.
  • Conditionals and loops.

AWK can have an optional 3 parts; BEGIN{} , MAIN {} and END{} sections.

BEGIN { …. initialization awk commands …}
{ …. man awk commands …}
END { …. finalization awk commands …}

When you run “ls -l” command on the terminal, you may see the output like below.

-rw-rw-r — . 1 centos centos 535 Mar 20 2020 report-security-group-unused.py
-rw-rw-r — . 1 centos centos 1890 Feb 24 2020 report-security-group.py
-rw-rw-r — . 1 centos centos 7836 Feb 14 2020 report-vectors-hb-play.py

For example; Find the sum size of files listed

$ ls -l | awk 'BEGIN {sum=0} {sum=sum+$5} END {print sum}'

Output:

10261

If you separated this output by space, the size column will be fifth.

An input line is typically made up of fields separated by white space If you want to use a different separator, use the FS option by the regular expression. The fields are called $1, $2, …, while $0 refers to the entire line. If FS is null, the input line is split into one field per character.

You must have understood how to find “$5”

$ echo "A-B-C-D-E" | awk -F "-" '{ print $2 }'

Output:

B

$ echo "A-B-C-D-E" | awk -F "-" '{ print $1,$5 }'

Output:

A E

You can operate on files. Example file called “myfile”:

1) Name        Surname   City 
2) Batur Orkun Ankara 70
3) HaticeEbru Orkun Istanbul 95

For example; Print name and city columns in a simple format.

$ awk '{print $2 “=” $4}' myfile

Output:

Name=City
Batur=Ankara
HaticeEbru=Istanbul

$ awk '/Orkun/ {print $0}' myfile

Output: ( The lines included “Orkun” )

2) Batur Orkun Ankara 70
3) HaticeEbru Orkun Istanbul 95

$ awk '/Orkun/{++cnt} END {print “Count = “, cnt}' myfile

Output:

Count = 2

$ awk 'length($2) > 5' myfile

“length” function returns the length of data
Output: ( The lines length of the second input data is greater than 5 )

3) HaticeEbru Orkun Istanbul 95

What if you want to get lines fifth column value has greater than 70:

$awk '{if ($5>70) print}' myfile

Output:

3) HaticeEbru Orkun Istanbul 95

What if you want to get line numbers but get rid of the parenthesis:

$ awk '{ print substr( $1, 0,1 ) }' myfile

“substr” function returns the portion of the string specified by the offset and length parameters.

Output:

1
2
3
4

What if you don’t want to print the first line:

$ awk '{if(NR>1)print}' myfile

NR variable has line number

Output:

2) Batur Orkun Ankara 70
3) HaticeEbru Orkun Istanbul 95

What if you want to list lines that have 4 columns:

$ awk '{if (NF==4) print}' myfile

Output:

1) Name Surname City

Notice: You can see different awk types:

NAWK stands for “New AWK”. This is AT&T’s version of the Awk.

MAWK, a fast implementation that mostly supports to standard features. it is smaller and faster than gawk but has limits on nf and “sprint” buffer size.

GAWK stands for “GNU AWK”. All Linux distributions come with GAWK. This is fully compatible with AWK and NAWK.

Take care and do not forget…:)

All the best people in life seem to like LINUX. (S. Wozniak)

--

--