Keep Crashing Daemons Running on FreeBSD


UPDATE 1 [2019/05/11]: Thanks to @mirrorbox’s suggestion, I refactored the script to use service status instead of ps aux | grep which makes the script even more simple. As a result, the syntax has changed. Since I keep the article untouched, for the updated code visit either the GitHub or GitLab repositories. The new syntax is as follows:

# Syntax
$ /path/to/daemon-keeper.sh

Correct usage:

    daemon-keeper.sh -d {daemon} -e {extra daemon to (re)start} [-e {another extra daemon to (re)start}] [... even more -e and extra daemons to (re)start]

# Example
$ /path/to/daemon-keeper.sh -d "clamav-clamd" -e "dovecot"

# Crontab
$ sudo -u root -g wheel crontab -l

# At every minute
*   *   *   *   *   /usr/local/cron-scripts/daemon-keeper.sh -d "clamav-clamd" -e "dovecot"

UPDATE 2 [2019/05/11]: Another thanks to @mirrorbox for mentioning sysutils/daemontools which seems a proven solution for restarting a crashing daemon. It makes this hack redundant.

Daemontools is a small set of /very/ useful utilities, from Dan
Bernstein.  They are mainly used for controlling processes, and
maintaining logfiles.

WWW: http://cr.yp.to/daemontools.html

UPDATE 3 [2019/05/11]: Thanks to @dlangille for mentioning sysutils/py-supervisor, which seems to be a viable alternative to sysutils/daemontools.

Supervisor is a client/server system that allows its users
to monitor and control a number of processes on UNIX-like
operating systems.

It shares some of the same goals of programs like launchd,
daemontools, and runit. Unlike some of these programs, it is
not meant to be run as a substitute for init as "process id 1".
Instead it is meant to be used to control processes related to
a project or a customer, and is meant to start like any
other program at boot time.

WWW: http://supervisord.org/

UPDATE 4 [2019/05/13]: Thanks to @olevole for mentioning sysutils/fsc. It is minimalistic, dependency free and designed for FreeBSD:

The FreeBSD Services Control software provides service
monitoring, restarting, and event logging for FreeBSD
servers.  The core functionality is a daemon (fscd)
which is interfaced with using fscadm.  See manual pages
for more information.

UPDATE 5 [2019/05/13]: Thanks to @jcigar for bringing daemon(8) to my attention, which is available in the base system and it seems perfectly capable of doing what I was going to achieve in my script and more.


Amidst all the chaos in the current stage of my life, I don’t know exactly what got into me that I thought it was a good idea to perform a major upgrade on a production FreeBSD server from 11.2-RELENG to 12.0-RELENG, when I even did not have enough time to go through /usr/src/UPDATING thoroughly or consult the Release Notes or the Errata properly; let alone hitting some esoteric changes which technically crippled my mail server, when I realized it has been over a week that I haven’t been receiving any new emails.

At first, I did not take it seriously. Just rebooted the server and prayed to the gods that it won’t happen again. It was a quick fix and it seemed to work. Until after a few days, I noticed that it happened again. This time I prayed to the gods even harder - both the old ones and the new ones ¯\_(ツ)_/¯ - and rebuilt every installed ports all over again in order to make sure I did not miss anything. I went for another reboot and, ops! There it was again laughing at me. Thus, losing all faith in the gods, which led me to take up responsibility and investigate more on this issue or ask the experts on the FreeBSD forums.

After messing around with it, it turned out that the culprit is clamav-clamd service crashing without any apparent reason at first. I fired up htop after restarting clamav-clamd and figured even at idle times it devours around ~ 30% of the available memory. According to this Stack Exchange answer:

ClamAV holds the search strings using the classic string (Boyer Moore) and regular expression (Aho Corasick) algorithms. Being algorithms from the 1970s they are extemely memory efficient.

The problem is the huge number of virus signatures. This leads to the algorithms’ datastructures growing quite large.

You can’t send those datastructures to swap, as there are no parts of the algorithms’ datastructures accessed less often than other parts. If you do force pages of them to swap disk, then they’ll be referenced moments later and just swap straight back in. (Technically we say “the random access of the datastructure forces the entire datastructure to be in the process’s working set of memory”.)

The datastructures are needed if you are scanning from the command line or scanning from a daemon.

You can’t use just a portion of the virus signatures, as you don’t get to choose which viruses you will be sent, and thus can’t tell which signatures you will need.

I guess due to some arcane changes in 12.0-RELEASE, FreeBSD kills memory hogs such as clamav-clamd daemon (don’t take my word for it; it is just a poor man’s guess). I even tried to lower the memory usage without much of a success. At the end, there were not too many choices or workarounds around the corner:

A. Pray to the gods that it go away by itself, which I deemed impractical

B. Put aside laziness, and replace security/clamsmtp with security/amavisd-new in order to be able to run ClamAV on-demand which has its own pros and cons

C. Write a quick POSIX-shell script to scan for a running clamav-clamd process using ps aux | grep clamd, set it up as a cron job with X-minute(s) interval, and then start the server if it cannot be found running, and be done with it for the time being.

For the sake of slothfulness, I opted to go with option C. As a consequence, I came up with a generic simple script that is able to not only monitor and restart the clamav-clamd service but also is able to keep any other crashing services running on FreeBSD.

Requirements and Dependencies

Taking a look at the source code reveals the necessitas for running the script successfully:

readonly BASENAME="basename"
readonly CUT="/usr/bin/cut"
readonly ECHO="echo -e"
readonly GREP="/usr/bin/grep"
readonly LOGGER="/usr/bin/logger"
readonly PS="/bin/ps"
readonly REV="/usr/bin/rev"
readonly SERVICE="/usr/sbin/service"
readonly TR="/usr/bin/tr"

All the dependencies in this list are either internal shell commands or are already present in the FreeBSD base system. So, for running the script, nothing extra is required.

Furthermore, I did not want to rely anything more than standard POSIX shell for such a simple task, despite the fact that I prefer Bash over anything else for more complex tasks ([OmniBackup: One Script to back them all up](OmniBackup: One Script to back them all up) available through FreeBSD Ports as sysutils/omnibackup; or, Reddit wallpaper downloader script).

Usage Syntax

Before running the script, please note that it must has the executable permission set on it. If not, in order to grant executable permission for all users:

$ chmod a+x /path/to/daemon-keeper.sh

Or, only the current user (the user who owns the file):

$ chmod u+x /path/to/daemon-keeper.sh

Or, the users under the group who owns the file:

$ chmod g+x /path/to/daemon-keeper.sh

Getting away from the basics, one can simply run the script by issuing the following command and it outputs the correct usage syntax for you:

Correct usage:

    daemon-keeper.sh -e {executable full path} -s {service name to (re)start} [-s {another service name to (re)start}] [... even more -s and service names to (re)start]

Here is the detailed explanation for the available options:

  • -e: Expects the executable’s full path. For example in my case, clamav-clamd service, which is located at /usr/local/etc/rc.d/clamav-clamd, the executable path is /usr/local/sbin/clamd. “How do I know the name and path to the underlying executable?”, you may ask. Well, then answer is, by taking a look at the /usr/local/etc/rc.d/clamav-clamd content:
command=/usr/local/sbin/clamd
  • -s: The service name to restart in case of a possible crash. Hmm, why passing more than one service name by specifying -s is allowed? Very good question indeed. Sometimes you may be required to restart multiple services in case of a crash. For me, I had to also restart the dovecot service in addition to clamav-clamd service; if not, my mail server refused to receive any new emails even after starting the clamav-clamd service. The solution was to restart dovecot after starting up the crashed clamav-clamd service.

For the convenience of description, the following example is enough to take care of my mail server (monitoring the clamav-clamd service and watching out for crashes; then restarting the clamav-clamd and dovecot services if a crash happens):

$ /usr/local/cron-scripts/daemon-keeper.sh \
    -e "/usr/local/sbin/clamd" -s "clamav-clamd" -s "dovecot"

[WARNING] '/usr/local/sbin/clamd' is not running!
[INFO] Stopping the service 'clamav-clamd'...
[ERROR] Failed to stop the 'clamav-clamd' service!
[INFO] Starting the service 'clamav-clamd'...
[INFO] The 'clamav-clamd' service has been started successfully!
[INFO] Stopping the service 'dovecot'...
[INFO] The 'dovecot' service has been stopped successfully!
[INFO] Starting the service 'dovecot'...
[INFO] The 'dovecot' service has been started successfully!

$ /usr/local/cron-scripts/daemon-keeper.sh \
    -e "/usr/local/sbin/clamd" -s "clamav-clamd" -s "dovecot"

[INFO] '/usr/local/sbin/clamd' is running!
[INFO] No action is required!

Running through a Cron Job

I have already wrote a guide on how to properly add a cron job on *nix systems, so I won’t go through this in details. Fire up the root’s crontab file in your favorite editor by issuing:

$ sudo -u root -g wheel -H crontab -e

I prefer to detect a crash as immediately as possible and then restart the service instantaneously. Therefore, I am running the script at a 1 minute interval:

# At every minute
*   *   *   *   *   /usr/local/cron-scripts/daemon-keeper.sh -e "/usr/local/sbin/clamd" -s "clamav-clamd" -s "dovecot"

If you are not familiar with the crontab syntax, crontab.guru is a great visual aid.

On another note, due to the fact that this script is designed to run as a cron job, in addition to stdout and stderr, the scripts logs are getting passed through to the system’s log file. On FreeBSD this file is located at /var/log/messages. This portion of the system’s log output is the result of the script running as a cron job:

$ tail -f /var/log/messages
May  9 20:49:00 3rr0r DAEMON-KEEPER[75509]: WARNING '/usr/local/sbin/clamd' is not running!
May  9 20:49:00 3rr0r DAEMON-KEEPER[78503]: INFO Stopping the service 'clamav-clamd'...
May  9 20:49:00 3rr0r DAEMON-KEEPER[11358]: ERROR Failed to stop the 'clamav-clamd' service!
May  9 20:49:00 3rr0r DAEMON-KEEPER[13204]: INFO Starting the service 'clamav-clamd'...
May  9 20:49:58 3rr0r DAEMON-KEEPER[34208]: INFO The 'clamav-clamd' service has been started successfully!
May  9 20:49:58 3rr0r DAEMON-KEEPER[36552]: INFO Stopping the service 'dovecot'...
May  9 20:49:59 3rr0r DAEMON-KEEPER[32672]: INFO The 'dovecot' service has been stopped successfully!
May  9 20:49:59 3rr0r DAEMON-KEEPER[36849]: INFO Starting the service 'dovecot'...
May  9 20:49:59 3rr0r DAEMON-KEEPER[2973]: INFO The 'dovecot' service has been started successfully!
May  9 20:50:00 3rr0r DAEMON-KEEPER[89081]: INFO '/usr/local/sbin/clamd' is running!
May  9 20:50:00 3rr0r DAEMON-KEEPER[94832]: INFO No action is required!

How it Works

Do not let ~200 lines of shell script code fool you. In fact, there is only one line of code in the script (broken into multiple lines for the purpose of readability) that does all the work:

readonly DAEMON_PROCESS_COUNT=$(${PS} aux \
    | ${GREP} -v "${GREP}" \
    | ${GREP} -v "${SCRIPT}" \
    | ${GREP} -c "${DAEMON}")

Technically. what it does is listing to all the running processes from all users on the system, then looking for the target daemon, it leaves out all the other processes, afterwards counting the number of running processes. If the daemon is not running, then the process count is simply zero. As simple as that.

Leaving out the ${GREP} -v “${SCRIPT}" part (we will be attending to this one in a moment) and the variable assignment, it will basically gets translated to something similar to:

$ ps aux | grep -v grep | grep -c /usr/local/sbin/clamd

If clamd is running, the result of running the above command would be a number bigger than zero; otherwise, it would be zero. Well, let’s break it down brick by brick:

$ ps aux

# SORRY!
# I WON'T BE SHARING THE OUTPUT OF THIS COMMAND AS IT IS TOO DANGEROUS TO BE
# SHARED, SINCE ONE CAN GET TO KNOW WHAT EXACTLY I AM RUNNING ON THIS SERVER FOR
# A POTENTIAL EXPLOIT.
# INSTEAD, IF YOU WOULD LIKE TO, YOU CAN RUN IT ON YOUR OWN *NIX DISTRO, AND SEE
# FOR YOURSELF WHAT IT ACTUALLY DOES.

What does ps aux is essentially doing is showing all the processes for all users (for our purpose the ax flags would suffice and the u can be omitted, nonetheless as a habit I keep it). Please consult the ps man page for more information.

Now try the following:

$ ps aux | grep /usr/local/sbin/clamd

clam      26199   0.0 23.7  747944 323252  -  Is   20:51     0:00.58 /usr/local/sbin/clamd
root      34001   0.0  0.2   11492   2768  0  S+   22:27     0:00.00 grep /usr/local/sbin/clamd

In case clamd is running, it returns the above results. If clamd is not running (e.g. crashed or has not been started yet):

$ ps aux | grep /usr/local/sbin/clamd

root      34001   0.0  0.2   11492   2768  0  S+   22:27     0:00.00 grep /usr/local/sbin/clamd

So, the grep command will always gets counted as one line since it is a running process at the moment the output of ps aux from the left side of pipe is getting piped to the second part of the command. Using one more pipe we try to eliminate any grep processes from the results before feeding the output to the last grep. This is what grep -v grep does in the following command. So, if it finds the clamd process it returns the following output, or else nothing at all (which signifies the daemon is not running):

$ ps aux | grep -v grep | grep /usr/local/sbin/clamd

clam      26199   0.0 23.7  747944 323252  -  Is   20:51     0:00.58 /usr/local/sbin/clamd

As a final note, when we run the script as a cron job, there is one more thing that has to be taken care of. The ${GREP} -v “${SCRIPT}" part. Remember the cron job from the previous section?

# At every minute
*   *   *   *   *   /usr/local/cron-scripts/daemon-keeper.sh -e "/usr/local/sbin/clamd" -s "clamav-clamd" -s "dovecot"

When we run the script from a cron job, if you haven’t noticed by now, we pass /usr/local/sbin/clamd to the script and it is considered a running process when the output is caught by grep, always adding one more line to the output. So we have to eliminate this one, too; or the script thinks the process is running due to the count being at least 1 all the times:

$ ps aux \
    | grep -v "grep" \
    | grep -v "/usr/local/cron-scripts/daemon-keeper.sh" \
    | grep "/usr/local/sbin/clamd"

In order to count the number of running process of the daemon (yes, it is possible for a daemon to spawn more processes than one), the last thing we have to do is passing -c argument to the last grep command:

$ ps aux \
    | grep -v "grep" \
    | grep -v "/usr/local/cron-scripts/daemon-keeper.sh" \
    | grep -c "/usr/local/sbin/clamd"

Which either returns 0 if the daemon is not running, or any number >0 if the daemon is already running.

Obtaining the Source Code

The source code is available on both GitHub and GitLab for the sake of convenience. In order to download the source code using fetch, curl, aria2, wget directly:

# GitHub
$ curl -fLo /path/to/daemon-keeper.sh \
    --create-dirs \
    https://raw.githubusercontent.com/NuLL3rr0r/freebsd-daemon-keeper/master/daemon-keeper.sh

# GitLab
$ curl -fLo /path/to/daemon-keeper.sh\
    --create-dirs \
    https://gitlab.com/NuLL3rr0r/freebsd-daemon-keeper/raw/master/daemon-keeper.sh

It is also possible to obtain the whole repository by cloning it from git:

# GitHub
$ git clone \
    https://github.com/NuLL3rr0r/freebsd-daemon-keeper.git \
    /path/to/clone/freebsd-daemon-keeper

# GitLab
$ git clone \
    https://gitlab.com/NuLL3rr0r/freebsd-daemon-keeper.git \
    /path/to/clone/freebsd-daemon-keeper

Alternatively, it can be copy-pasted directly from here, which is strongly discouraged due to Pastejacking Exploitation Technique:

daemon-keeper.sh
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
#!/usr/bin/env sh

#  (The MIT License)
#
#  Copyright (c) 2019 Mamadou Babaei
#
#  Permission is hereby granted, free of charge, to any person obtaining a copy
#  of this software and associated documentation files (the "Software"), to deal
#  in the Software without restriction, including without limitation the rights
#  to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
#  copies of the Software, and to permit persons to whom the Software is
#  furnished to do so, subject to the following conditions:
#
#  The above copyright notice and this permission notice shall be included in
#  all copies or substantial portions of the Software.
#
#  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
#  IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
#  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
#  AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
#  LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
#  OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
#  THE SOFTWARE.


set +e

readonly BASENAME="basename"
readonly CUT="/usr/bin/cut"
readonly ECHO="echo -e"
readonly GREP="/usr/bin/grep"
readonly LOGGER="/usr/bin/logger"
readonly PS="/bin/ps"
readonly REV="/usr/bin/rev"
readonly SERVICE="/usr/sbin/service"
readonly TR="/usr/bin/tr"

readonly FMT_OFF='\e[0m'
readonly FMT_INFO='\e[1;32m'
readonly FMT_WARN='\e[1;33m'
readonly FMT_ERR='\e[1;91m'
readonly FMT_FATAL='\e[1;31m'

readonly LOG_INFO="INFO"
readonly LOG_WARN="WARNING"
readonly LOG_ERR="ERROR"
readonly LOG_FATAL="FATAL"

readonly SCRIPT="$0"
readonly SCRIPT_NAME="$(${BASENAME} -- "${SCRIPT}")"
readonly SYSLOG_TAG="$(${BASENAME} -- "${SCRIPT}" \
    | ${TR} '[:lower:]' '[:upper:]' \
    | ${REV} \
    | ${CUT} -d "." -f2- \
    | ${REV})"

usage()
{
    ${ECHO}
    ${ECHO} "${FMT_INFO}Correct usage:${FMT_OFF}"
    ${ECHO}
    ${ECHO} "    ${FMT_INFO}${SCRIPT_NAME} -e {executable full path} -s {service name to (re)start} [-s {another service name to (re)start}] [... even more -s and service names to (re)start]${FMT_OFF}"
    ${ECHO}

    exit 1
}

log()
{
    log_type=$1; shift
    fmt=$1; shift

    if [ -n "$1" -a -n "$@" ] ;
    then
        ${ECHO} "${fmt}[${log_type}] $@${FMT_OFF}"
        ${LOGGER} -t "${SYSLOG_TAG}" "${log_type} $@"
    fi 
}

info()
{
    log "${LOG_INFO}" "${FMT_INFO}" "$@"
}

warn()
{
    log "${LOG_WARN}" "${FMT_WARN}" "$@"
}

err()
{
    log "${LOG_ERR}" "${FMT_ERR}" "$@"
}

fatal()
{
    log "${LOG_FATAL}" "${FMT_FATAL}" "$@"
    exit 1
}

restart_service()
{
    service_name="$1"

    info "Stopping the service '${service_name}'..."
    service ${service_name} stop > /dev/null 2>&1

    if [ "$?" -eq 0 ] ;
    then
        info "The '${service_name}' service has been stopped successfully!"
    else
        err "Failed to stop the '${service_name}' service!"
    fi

    info "Starting the service '${service_name}'..."
    service ${service_name} start > /dev/null 2>&1

    if [ "$?" -eq 0 ] ;
    then
        info "The '${service_name}' service has been started successfully!"
    else
        err "Failed to start the '${service_name}' service!"
    fi
}

if [ "$#" -eq 0 ] ;
then
    usage
fi

SERVICE_COUNT=0

while getopts ":e: :s:" ARG ;
do
    case ${ARG} in
        e)
            if [ -z "${OPTARG}" ] ;
            then
                err "Missing executable ${OPTARG}!"
                usage
            fi

            if [ ! -f "${OPTARG}" ] ;
            then
                fatal "The executable '${OPTARG}' does not exist!"
            fi

            readonly DAEMON="${OPTARG}"
            ;;
        s)
            if [ ! -f "/usr/etc/rc.d/${OPTARG}" \
                -a ! -f "/usr/local/etc/rc.d/${OPTARG}" ] ;
            then
                fatal "No such a service exists: '${OPTARG}'!"
            fi

            SERVICE_COUNT=$((SERVICE_COUNT+1))
            ;;
        \?)
            err "Invalid option: -${OPTARG}!"
            usage
        ;;
    esac
done

if [ "${SERVICE_COUNT}" -eq 0 ] ;
then
    err "At least one service name is required!"
    usage
fi

readonly DAEMON_PROCESS_COUNT=$(${PS} aux \
    | ${GREP} -v "${GREP}" \
    | ${GREP} -v "${SCRIPT}" \
    | ${GREP} -c "${DAEMON}")

if [ "${DAEMON_PROCESS_COUNT}" -lt 1 ] ;
then
    warn "'${DAEMON}' is not running!"

    OPTIND=1
    while getopts ":e: :s:" ARG ;
    do
        case ${ARG} in
            s)
                restart_service "${OPTARG}"
                ;;
            \?)
            ;;
        esac
    done
else
    info "'${DAEMON}' is running!"
    info "No action is required!"
fi