Learning to Manipulate YAML on the Command Line with YQ

Batur Orkun
5 min readOct 31, 2023

You are probably familiar with “jq”. Every developer or DevOps must have handled JSON data. “jq” is a lightweight and flexible command-line JSON processor akin to sed, awk, and grep. Those linux commands like sed, awk, and grep are awesome tools, but when you are working on formatted data like JSON or YAML, a private library that is developed for this format can be a better solution because you do not have to write complex linux command syntaxes.

“yq” is a lightweight and portable command-line YAML, JSON, and XML processor. “yq” uses “jq” as syntax but works with YAML files as well as JSON, XML, properties, CSV, and TSV. When you search for “yq”, you can find two repositories. But the “yq” tool we refer to in this article is mikefarah/yq, not “kislyuk/yq. When you try to install yq from the Ubuntu Snap repository, “mikefarah/yq” will be installed. The version needs to be "4.x" which is crucial. There are huge differences between “3.x” and “4.x”.

YAML files are so popular nowadays. One reason is that Kubernetes supports YAML files. YAML files are important because they offer several benefits, like being better human-readable than others. As a DevOps and Kubernetes guru, I can write a huge Yaml file with a simple IDE. Whatever, we need YAML file operations, so we need “yq”.

Before we get to the miracles of “yq”, let’s take a look at that below.

That is a simple YAML file ( your_yaml_file_1.yaml ):

staff:
- name: "Batur"
surname: "Orkun"
city: "Ankara"
street: "My street 1"
number: 1
- name: "Ebru"
surname: "Orkun"
city: "Antalya"
street: "My street 2"
number: 2

I want to pull out the city name of the staff member named Ebru.

“Antalya”

Using “awk”:

awk -F ': ' '/- name: "Ebru"/ {getline; getline; print $2}'  your_yaml_file_1.yaml | tr -d '"'

Using “yq”:

yq e '.staff[] | select(.name == "Ebru") | .city' your_yaml_file_1.yaml

You can easily see which one is smooth and understandable.

Maybe your YAML is a little more complex. ( your_yaml_file_2.yaml )

staff:
- name: "Batur"
surname: "Orkun"
address:
city: "Ankara"
street: "My street 1"
number: 1
- name: "Ebru"
surname: "Orkun"
address:
city: "Antalya"
street: "My street 2"
number: 2

Same request. I want to pull out the city name of the staff member named Ebru.

Using “awk”:

awk -F ': ' '/- name: "Ebru"/ {getline; while (getline) { if ($1 ~ /city/) { print $2; break; } }}' your_yaml_file_2.yaml

Using “yq”


yq eval '.staff[] | select(.name == "Ebru") | .address.city' your_yaml_file_2.yaml

You can see that we found the word "city" by regular expression while using awk. If we add another keyword to YAML, like “publicity”. “if regex” will find this line, and the awk command will return the wrong value. So our commands can be easily affected when the structure of the data changes.

How can you install “yq”? From Snap, Brew, or GitHub.

For Mac / OS X:

brew install yq

For Ubuntu/Debian:

snap install yq

For any Linux (64 bit):

wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq &&\
chmod +x /usr/bin/yq

“yq” abandoned those subcommands like r and d and instead triggered all operations with yq eval in version “4.x”

yq eval [expression] [yaml_file1]… [flags]
yq e [expression] [yaml_file1]… [flags]
yq [expression] [yaml_file1]… [flags]

All of them are the same because eval is the default command. You can get or change any data in any condition from a YAML file. Let’s take a look at the basics.

That is the “fruits.yaml” file:

fruits:
apple: red
pear: green
banana: yellow

If you wonder about the color of the pear:

yq '.fruits.pear' fruits.yml

Output is “green”.

If you want to change the value, use the "-i" parameter.

yq e '.fruits.apple = "verdant"' -i fruits.yml

After running the command, when you look at the file:

fruits:
apple: verdant
pear: green
banana: yellow

YAML files generally have array values. You must learn to work on array values. Let’s work on Kubernetes files.

apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: test-container
image: busybox
env:
— name: DB_CONNECT
value: postgres://192.168.10.10:5432
- name: SHARED_PATH
value: /var/nfs

We need the IP of the Postgresql database from that YAML above.

yq ".spec.containers[0].env[0].value" pod.yaml

Output is "postgres://192.168.10.10:5432”.

We can use “awk” to parse the IP address.

yq ".spec.containers[0].env[0].value" pod.yaml | awk -F// '{split($2, a, ":"); print a[1]}'

We accessed the first value in the env array. If we change the order, we get the wrong value. So maybe we can get the value by selecting the name.

yq '.spec.containers[0].env[] | select(.name == "DB_CONNECT") | .value' pod.yaml

Let’s make it a little harder.

Our new YAML file (users.yaml):

user:
name: Batur
surname: Orkun
gender: male
active: true
addresses:
city: Ankara
country: Turkey
subscriptions:
active:
- 12345
- 56789
- 10267
passive:
- 34566
- 89011

I wonder if the user has any active subscriptions.

yq '.user.subscriptions | has("active")' users.yaml

Output is "true”.

I wonder how many active subscriptions there are.

yq '.user.subscriptions.active | length' users.yaml

Output is “3”

Suppose you don’t know how to use “length”. But it’s a good thing you know Linux.

yq '.user.subscriptions.active' users.yaml | wc -l

I wonder which passive subscriptions have started with “3”

yq  '.user.subscriptions.passive[] | select(. == "3*")' users.yaml

Output is “34566”.

I want to continue with Kubernetes. If you are a DevOps who is working on Kubernetes, you probably already know “yq” or will love it now.

Pretty-printing is possible with yq in CLI.

kubectl get ns kube-system -o yaml | yq '.' -

Deleting fields is possible with yq in CLI. The command below will delete all “annotations” from YAML.

kubectl get ns kube-system -o yaml | yq 'del(.metadata.annotations)'

Or adding new fields. Add to the new annotation:

kubectl get ns kube-system -o yaml | yq '.metadata.annotations.mydate |= "2023-10-31"'

If you want to change the Kubernetes configuration with a new YAML file, you can do it like this:

kubectl get ns batur -o yaml | yq '.metadata.annotations.mydate |= "2023-10-31"' | kubectl apply -f -

You most likely ran into the terminating state when trying to remove a namespace. You tried the parameters “ — force — grace-period=0” but nothing.

kubectl get ns batur -o yaml  | yq '.spec.finalizers = []' | kubectl replace -f -

That is our new YAML file:

services:
service-1:
image: image1

service-2:
image: image2
ports:
- "9090:8080"
service-4:
image: image3
ports:
- "80:8081"

I wonder what my services are.

yq e '.services | keys'  services.yaml

Output:

- service-1
- service-2
- service-3

Maybe you need multiple pieces of information.

yq e '"Total Services: " + (.services | length), (.services | keys)[]' services.yaml

Total Services: 3
service-1
service-2
service-3

I only need image names that have ports.

yq eval '.services | with_entries(select(.value.ports != null)) | .[] | .image' services.yaml

Output:

image2
image3

There are many advanced “yq” uses. I had to write a book about “yq”, if I wanted to mention it all. You can glance at the link below.

--

--