【发布时间】:2020-10-20 23:49:22
【问题描述】:
更新:我将*/30 * * * * run-one /opt/scripts/staleFile.sh 中的run-one 替换为run-this-one,并且更新了日志!看来run-one阻止了它。我不确定如何处理 run-one 中的锁,但我的脚本中的某些内容阻止它释放锁。我搜索了ps aux | grep [script-name] 和ps aux | grep [PID of original stuck cron command from syslog],但没有看到脚本实际上被卡住了,所以我认为这是run-one 的问题。我在其他几个 cron 脚本中使用 run-one 并且还没有遇到问题。如果有人对什么是绊倒它有任何建议,我全神贯注。 /更新
我有一个 bash 根 crontab 脚本,它每 30 分钟运行一次以检查 nfs 过时的文件句柄,如果存在过时的句柄,则通过重新挂载 fstab 来修复它。每次我将数据移动到 nfs 共享时都会发生这种情况,第二天早上 7 点左右再次发生这种情况,因为当数据移动到共享上时,它首先加载到缓存驱动器上,然后在清晨移动到 HDD。根据日志(粘贴在脚本下方),它似乎成功完成,但根据日志文件时间戳需要永远完成(1hr51m),如果遇到过时的句柄和将不会再次运行修复它。如果相同的脚本只是以 root 身份运行,即 "sudo ./staleFile.sh",它会快速完成(不到一分钟),并且按预期完成。
我有依赖于合并本地数据和来自我的 nfs 共享数据的mergefs 挂载的 docker 容器,这就是我在脚本运行时停止它们的原因。
以下是我的 sudo crontab 的相关摘录:
SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
*/30 * * * * run-one /opt/scripts/staleFile.sh
以下是有问题的脚本:
#!/bin/bash
logFile="/av/misc/logs/stale.log" #REMEMBER TO CHANGE!
exec &>> "$logFile"
# Exit if not being run by root user.
if [[ $(/usr/bin/id -u) -ne 0 ]]; then echo "script must be run as root. exiting..."; exit 1; fi
# Get time for log
now="$(/usr/bin/date +'%Y/%m/%d %H:%M')"
# Check for stale file handle, exit script if no problems
if ls /mnt/movies &>/dev/null; then :; else mov=1; fi
if ls /mnt/TV &>/dev/null; then :; else tv=1; fi
if [[ -z $mov && -z $tv ]]; then echo "$now: ok"; exit 0; fi
echo "Stale file handle...fixing"
echo "----------START----------"
printf "DATE: %s\n" "$now"
if [[ "$mov" && -z "$tv" ]]; then #check if just movies nfs share
echo "STALE NFS MOVIE FILE HANDLE. FIXING..."
docker-compose -f /opt/docker-compose.yml stop radarr
docker-compose -f /opt/docker-compose.yml stop rutorrent
echo "unmounting /av/mergerfs/movies"
umount /av/mergerfs/movies
systemctl stop plexmediaserver.service
echo "unmounting /mnt/movies"
umount /mnt/movies
echo "remounting fstab"
mount -a
systemctl start plexmediaserver.service
echo "remounting /av/mergerfs/movies"
mergerfs -o allow_other,minfreespace=75G,async_read=false,use_ino,func.getattr=newest,category.action=all,category.create=ff,cache.files=partial,dropcacheonclose=true,nonempty /av/movies=RW:/mnt/movies=RO /av/mergerfs/movies
echo "relaunching docker containers"
mergMovies=$(find /av/mergerfs/movies/* -maxdepth 0 | wc -l)
mergTV=$(find /av/mergerfs/tv/* -maxdepth 0 | wc -l)
if [ "$mergMovies" -gt 1000 ]; then docker-compose -f /opt/docker-compose.yml up -d radarr; fi
if [[ $mergTV -gt 200 && $mergMovies -gt 1000 ]]; then docker-compose -f /opt/docker-compose.yml up -d rutorrent; fi
docker-compose -f /opt/docker-compose.yml restart reverse
echo "finished!"
exit 0
elif [[ -z "$mov" && "$tv" ]]; then #check if just tv nfs share
echo "STALE NFS TV FILE HANDLE. FIXING..."
docker-compose -f /opt/docker-compose.yml stop sonarr
docker-compose -f /opt/docker-compose.yml stop rutorrent
echo "unmounting /av/mergerfs/*..."
umount /av/mergerfs/tv
systemctl stop plexmediaserver.service
echo "unmounting /mnt/[services]"
umount /mnt/TV
echo "remounting fstab"
mount -a
systemctl start plexmediaserver.service
echo "remounting /av/mergerfs/tv..."
mergerfs -o allow_other,minfreespace=75G,async_read=false,use_ino,func.getattr=newest,category.action=all,category.create=ff,cache.files=partial,dropcacheonclose=true,nonempty /av/tv=RW:/mnt/TV=RO /av/mergerfs/tv
echo "relaunching docker containers"
mergTV=$(find /av/mergerfs/tv/* -maxdepth 0 | wc -l)
mergMovies=$(find /av/mergerfs/movies/* -maxdepth 0 | wc -l)
if [ "$mergTV" -gt 200 ]; then docker-compose -f /opt/docker-compose.yml up -d sonarr; fi
if [[ $mergTV -gt 200 && $mergMovies -gt 1000 ]]; then docker-compose -f /opt/docker-compose.yml up -d rutorrent; fi
docker-compose -f /opt/docker-compose.yml restart reverse
echo "finished!"
exit 0
elif [[ "$mov" && "$tv" ]]; then #must be both
echo "STALE NFS MOVIE & TV FILE HANDLE. FIXING..."
docker-compose -f /opt/docker-compose.yml stop radarr
docker-compose -f /opt/docker-compose.yml stop sonarr
docker-compose -f /opt/docker-compose.yml stop rutorrent
echo "unmounting /av/mergerfs/BOTH..."
umount /av/mergerfs/movies
umount /av/mergerfs/tv
systemctl stop plexmediaserver.service
echo "unmounting /mnt/BOTH"
umount /mnt/movies
umount /mnt/TV
echo "remounting fstab"
mount -a
systemctl start plexmediaserver.service
echo "remounting /av/mergerfs/movies..."
mergerfs -o allow_other,minfreespace=75G,async_read=false,use_ino,func.getattr=newest,category.action=all,category.create=ff,cache.files=partial,dropcacheonclose=true,nonempty /av/movies=RW:/mnt/movies=RO /av/mergerfs/movies
echo "remounting /av/mergerfs/tv..."
mergerfs -o allow_other,minfreespace=75G,async_read=false,use_ino,func.getattr=newest,category.action=all,category.create=ff,cache.files=partial,dropcacheonclose=true,nonempty /av/tv=RW:/mnt/TV=RO /av/mergerfs/tv
#restart docker containers, but check if mergerfs mount was successful based on number of files
echo "relaunching docker containers"
mergMovies=$(find /av/mergerfs/movies/* -maxdepth 0 | wc -l)
mergTV=$(find /av/mergerfs/tv/* -maxdepth 0 | wc -l)
if [ "$mergTV" -gt 200 ]; then docker-compose -f /opt/docker-compose.yml up -d sonarr; fi
if [ "$mergMovies" -gt 1000 ]; then docker-compose -f /opt/docker-compose.yml up -d radarr; fi
if [[ $mergTV -gt 200 && $mergMovies -gt 1000 ]]; then docker-compose -f /opt/docker-compose.yml up -d rutorrent; fi
docker-compose -f /opt/docker-compose.yml restart reverse
echo "finished!"
exit 0
fi
以下是日志的摘录(奇怪的字符来自docker在控制台中以绿色突出显示“完成”,在控制台上查看时一切正常):
2020/06/30 04:00: ok
2020/06/30 04:30: ok
2020/06/30 05:00: ok
2020/06/30 05:30: ok
2020/06/30 06:00: ok
2020/06/30 06:30: ok
2020/06/30 07:00: ok
Stale file handle...fixing
----------START----------
DATE: 2020/06/30 07:30
STALE NFS TV FILE HANDLE. FIXING...
Stopping sonarr ...
[1A[2K
Stopping sonarr ... [32mdone[0m
[1BStopping rutorrent ...
[1A[2K
Stopping rutorrent ... [32mdone[0m
[1Bunmounting /av/mergerfs/*...
unmounting /mnt/[services]
remounting fstab
remounting /av/mergerfs/tv...
relaunching docker containers
Starting sonarr ...
[1A[2K
Starting sonarr ... [32mdone[0m
[1BStarting rutorrent ...
[1A[2K
Starting rutorrent ... [32mdone[0m
[1BRestarting reverse ...
[1A[2K
Restarting reverse ... [32mdone[0m
[1Bfinished!
从日志中可以看出,脚本返回“完成!”之后它不再在预定的下半小时运行。此外,日志文件上的时间戳是上午 8:51,这意味着它首先需要很长时间(1 小时 51 分钟)才能完成。我还有其他根 crontab 脚本继续按计划运行。
【问题讨论】:
-
系统日志显示它仍在尝试运行。
Jun 30 13:00:01 ISOTHERMICX CRON[700109]: (root) CMD (run-one /opt/scripts/staleFile.sh)另外,我检查了 ps aux 以查看整个脚本是否挂起并且 run-one 阻止它重新执行,但是当ps aux | grep staleFile.sh -
我会在那里添加一些严肃的
wait和/或sleep说明。例如。 mount 和 systemctl 可能需要一些时间来完成它们的工作。此外,我会针对所有情况回显不同的退出消息,以便更好地调试 - ymmv。 -
或者例如在执行
mount之后,明确检查您要安装的东西是否真的已已安装。 -
@Roadowl
mount不会等待回复?意思是,即使脚本未完成,脚本也会继续? -
不,挂载不会等到所有驱动器都已挂载。如果它们是需要时间的物理驱动器,请在您的机器上尝试一次并观察 /var/log/kern.log '做'它的事情;使用 TB 驱动器很容易花费几秒钟的时间。只需检查 /proc/mounts 以查看您想要(重新)安装的东西现在是否实际安装了。