HPR4407: A 're-response' Bash script
This show has been flagged as Explicit by the host.
Introduction
On 2025-06-19 Ken Fallon did a show, number
4404
,
responding to Kevie's show
4398
,
which came out on 2025-06-11.
Kevie was using a Bash pipeline to find the latest episode in an RSS
feed, and download it. He used
grep
to parse the XML of the
feed.
Ken's response was to suggest the use of
xmlstarlet
to
parse the XML because such a complex structured format as XML cannot
reliably be parsed without a program that "understands" the intricacies
of the format's structure. The same applies to other complex formats
such as HTML, YAML and JSON.
In his show Ken presented a Bash script which dealt with this problem
and that of the ordering of episodes in the feed. He asked how others
would write such a script, and thus I was motivated to produce this
response to his response!
Alternative script
My script is a remodelling of Ken's, not a completely different
solution. It contains a few alternative ways of doing what Ken did, and
a reordering of the parts of his original. We will examine the changes
in this episode.
Script
#!/bin/bash
# Original (c) CC-0 Ken Fallon 2025
# Modified by Dave Morriss, 2025-06-14 (c) CC-0
podcast="https://tuxjam.otherside.network/feed/podcast/"
# [1]
while read -r item
do
# [2]
pubDate="${item%;*}"
# [3]
pubDate="$( \date --date="${pubDate}" --universal +%FT%T )"
# [4]
url="${item#*;}"
# [5]
echo "${pubDate};${url}"
done < <(curl --silent "${podcast}" | \
xmlstarlet sel --text --template --match 'rss/channel/item' \
--value-of 'concat(pubDate, ";", enclosure/@url)' --nl - ) | \
sort --numeric-sort --reverse | \
head -1 | \
cut -f2 -d';' | wget --quiet --input-file=- # [6]
I have placed some comments in the script in the form of
'# [1]'
and I'll refer to these as I describe the changes
in the following numbered list.
Note:
I checked, and the script will run with the
comments, though they are only there to make it easier to refer to
things.
The format of the pipeline is different. It starts by defining a
while
loop, but the data which the
read
command receives comes from a
process substitution
of the form
'<(statements)'
(see the
process
substitution section
of "hpr2045 :: Some other Bash tips"
). I
have arranged the pipeline in this way because it's bad practice to
place a
while
in a pipeline, as discussed in the show:
hpr3985 :: Bash snippet - be careful when feeding data to
loops
.
(I added
-r
to the
read
because
shellcheck
, which I run in the
vim
editor,
nagged me!)
The lines coming from the
process substitution
are from
running
curl
to collect the feed, then using
xmlstarlet
to pick out the
pubDate
field of
the item, and the
url
attribute of the
enclosure
field returning them as two strings separated by
a semicolon (
';'
). This is from Ken's original code. Each
line is read into the variable
item
, and the first element
(before the semicolon) is extracted with the Bash expression
"${item%;*}"
. Parameter manipulation expressions were
introduced in
HPR show
1648
. See the full notes section
Remove
matching suffix pattern
for this one.
I modified Ken's
date
command to simplify the
generation of the ISO8601 date and time by using the pattern
+%FT%T
. This just saves typing!
The
url
value is extracted from the contents of
item
with the expression
"${item#*;}
. See the
section of show 1648 entitled
Remove
matching prefix pattern
for details.
The
echo
which generates the list of podcast URLs
prefixed with an ISO time stamp uses
';'
as the delimiter
where Ken used a
tab
character. I assume this was done for
the benefit of either the following
sort
or the
awk
script. It's not needed for
sort
since it
sorts the line as-is and doesn't use fields. My version doesn't use
awk
.
Rather than using
awk
I use
cut
to
remove the time stamp from the front of each line, returning the second
field delimited by the semicolon. The result of this will be the URL for
wget
to download. In this case
wget
receives
the URL on standard input (
STDIN
), and the
--input-file=-
option tells it to use that information for
the download.
Conclusion
I'm not sure my solution is
better
in any significant way. I
prefer to use Bash functionality to do things where calling
awk
or
sed
could be overkill, but that's just
a personal preference.
I might have replaced the
head
and
cut
with
a
sed
expression, such as the following as the last
line:
sed -e '1{s/^.\+;//;q}' | wget --quiet --input-file=-
Here, the
sed
expression operates on the first line from
the
sort
, where it removes everything from the start of the
line to the semicolon. The expression then causes
sed
to
quit, so that only the edited first line is passed to
wget
.
Links
hpr1648 ::
Bash parameter manipulation
Section
entitled
Remove matching suffix pattern
Section
entitled
Remove matching prefix pattern
Diagram
showing the Bash parameter manipulation methods
hpr2045 ::
Some other Bash tips
Section
on
process substitution
hpr3985 :: Bash snippet - be careful when feeding data to
loops
hpr4398 ::
Command line fun: downloading a podcast
by Kevie
hpr4404 ::
Kevie nerd snipes Ken by grepping xml
by Ken Fallon
Provide feedback on this episode.