您现在的位置是:首页 >技术杂谈 >记一次完整的rc.local中启动python脚本报psutil找不到问题解决网站首页技术杂谈

记一次完整的rc.local中启动python脚本报psutil找不到问题解决

位面元哥 2023-06-03 00:00:03
简介记一次完整的rc.local中启动python脚本报psutil找不到问题解决

文章目录

1,问题

场景是windos10开机后在其ubuntu子系统中启动其他服务。
在ubuntu子系统rc.local脚本中启动shell脚本没问题,但是启动python脚本却会失败,会报某模块找不到
百度查到说是由于rc.local脚本的执行顺序先于python脚本的依赖库造成的

2023-04-20 00:46:08.1681922768 import error: No module named ‘psutil’

1.1,rc.local

#!/bin/bash -x

LOG_FILE="/mnt/e/111111111/package/log/rc.local.log"

exec 1>> $LOG_FILE 2>&1
set -x

echo password|sudo -S /etc/init.d/apache2 restart

TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
echo "$TIMES /etc/rc.local start apache2" >> $LOG_FILE

cd /xxxxx/package
nohup ./start.sh >> $LOG_FILE 2>&1 &

TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
echo "$TIMES /etc/rc.local start start.sh" >> $LOG_FILE

exit 0

1.2,watchdog.py

import subprocess
import time
import psutil
import setproctitle
# 重命名进程名
setproctitle.setproctitle("watchdog.py")

def checkProcess(process_name):
    for process in psutil.process_iter():
        if process.name() == process_name:
            return True
    return False

def killProcess(process_name):
    for process in psutil.process_iter():
        if process.name() == process_name:
            process.kill()
def startProces(path, program):
    param = [path + "/"+ program]
    if checkProcess(program):
        killProcess(program)
        print("{} is existed and has been killed! it will be resatrt for a moment!".format(program))

    process = subprocess.Popen(param)
    print("{} started! process[{}]".format(program, process))
    return process

def run(path, program):
    process = startProces(path, program)
    while True:
        time.sleep(5)  # 检查每隔60秒
        if process.poll() is not None:  # 检查进程是否崩溃
            process = startProces(path, program)
        else:     
            print("{} is running".format(program))

if __name__ == '__main__':
    run("server", "webserver")

2,问题排查

为了方便问题解决更加方便,我准备了一个shell脚本start.sh,在rclocal中启动start.sh,在start.sh中启动python脚本。

在这里插入图片描述

#!/bin/bash

LOG_FILE="/mnt/e/package/log/start.log"

LOG()
{
	TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
	echo "${TIMES}$1" >> $LOG_FILE
}

cd /mnt/e/package
nohup python3 watchdog.py >> $LOG_FILE 2>&1 &
LOG "EXECUTE_CMD /mnt/e/package python3 watchdog.py. result:$?"

exit $?

2.1,手动执行start.sh后功能正常

在这里插入图片描述
start.log中如下所示

2023-04-20 19:37:26.780 EXECUTE_CMD /mnt/e/package python3 watchdog.py. result:0

2.2,开机启动后rc.local加载start.sh,然后start.sh启动python脚本报错

wangdog进程和webserver进程都没起来,start.log中如下所示

2023-04-22 02:06:06.082EXECUTE_CMD /mnt/e/package python3 watchdog.py. result:0
Traceback (most recent call last):
  File "/mnt/e/package/watchdog.py", line 3, in <module>
    import psutil
ModuleNotFoundError: No module named 'psutil'

2.3,怀疑是rc.local加载的时候,python脚本中用到的psutil模块还没加载

手动执行脚start.sh能正常运行,说明start.sh和watchdog.py两个脚本本身没问题

3,方案一:让start.sh脚本后台运行,直到watchdog.py执行成功后才退出

3.1,start.sh脚本中最多重试5次后退出,间隔5秒

#!/bin/bash

LOG_FILE="/mnt/e/package/log/start.log"

LOG()
{
	TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
	echo "${TIMES} $1" >> $LOG_FILE
}

EXECUTE_CMD(){
	FILE_PATH=$1
	CMD=$2
	PROGRAM=$3
	RETRY=5  # 最多重试次数
	COUNT=1  # 当前重试次数
	while true; do
		cd $FILE_PATH	
		nohup $CMD $PROGRAM >> $LOG_FILE 2>&1 &  #  脚本重新启动 

		if [ $? -eq 0 ]; then
			# 执行成功,退出循环
			break
		fi

		if [ $COUNT -ge $RETRY ]; then
			# 达到最大重试次数,强制退出
			LOG "ERROR: Command['$CMD $PROGRAM'] failed even after $COUNT retries! Exiting."
			return 1
		fi

		COUNT=$(expr $COUNT + 1)
		LOG "Command['$CMD $PROGRAM'] failed, retrying in $INTERVAL seconds... (retry $COUNT/$RETRY)"
		sleep 5
	done

	LOG "Command['$CMD $PROGRAM'] succeeded after $COUNT retryies."
	return 0
}

EXECUTE_CMD /mnt/e/package python3 watchdog.py
LOG "EXECUTE_CMD /mnt/e/package python3 watchdog.py. result:$?"

exit $?

3.2,但开机启动后,发现返回了脚本执行成功,然后才打印了watchdog.py中报错

2023-04-22 02:20:56.955 Command['python3 watchdog.py'] succeeded after 1 retryies.
2023-04-22 02:20:56.958 EXECUTE_CMD /mnt/e/project/anweimian/package python3 watchdog.py. result:0
Traceback (most recent call last):
  File "/mnt/e/project/anweimian/package/watchdog.py", line 3, in <module>
    import psutil
ModuleNotFoundError: No module named 'psutil'

3.3,那么是不是会是nohup使start.sh脚本忽略了watchdog.py启动时的异常呢

nohup : 运行命令忽略挂起信号
& 是指后台运行;
nohup 的功能和& 之间的功能并不相同。其中,nohup 可以使得命令永远运行下去和用户终端没有关系。当我们断开ssh 连接的时候不会影响他的运行。而& 表示后台运行。当ssh 断开连接的时候(用户退出或挂起的时候),命令也自动退出。

#$CMD $PROGRAM >> $LOG_FILE 2>&1 &  #  脚本重新启动 
#修改为:
nohup $CMD $PROGRAM >> $LOG_FILE 2>&1 &  #  脚本重新启动 

3.4,去除nohup后,报错依旧,仍然不成功

2023-04-22 02:31:56.906 Command['python3 watchdog.py'] succeeded after 1 retryies.
2023-04-22 02:31:56.909 EXECUTE_CMD /mnt/e/project/anweimian/package python3 watchdog.py. result:0
Traceback (most recent call last):
  File "/mnt/e/project/anweimian/package/watchdog.py", line 3, in <module>
    import psutil
ModuleNotFoundError: No module named 'psutil'

4,方案二:不使用返回值而采用判断进程是否存

4.1,start.sh脚本如下所示

#!/bin/bash

LOG_FILE="/mnt/e/package/log/start.log"

LOG()
{
	TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
	echo "${TIMES} $1" >> $LOG_FILE
}

RUN_PYTHON()
{
	FILE_PATH=$1
	PROGRAM=$2
	while true  # 循环检测脚本是否停止
	do
		procnum=$(ps -ef | grep "$PROGRAM" | grep -v grep | wc -l) # 记录正在运行run.py的数量

		if [[ ${procnum} == 0 ]] ; then  # 如果run.py正在运行数量等于0,脚本中断,需要重启
			LOG "procnum[$procnum] not found $PROGRAM"
			cd $FILE_PATH
			nohup python3 $PROGRAM >> $LOG_FILE 2>&1 &  #  脚本重新启动 
			sleep 2  # 睡眠60s,每60s检测一次
		else
			LOG "procnum[$procnum] found $PROGRAM"
			sleep 60  # 睡眠60s,每60s检测一次
		fi
	
	done
}

RUN_PYTHON /mnt/e/package watchdog.py
LOG "RUN_PYTHON /mnt/e/package watchdog.py. result:$?"

exit $?

4.1,开机启动后,返现仍然无法启动watchdog.py,仍然报错psutil找不到

2023-04-22 02:42:47.519 procnum[0] not found watchdog.py
Traceback (most recent call last):
  File "/mnt/e/project/anweimian/package/watchdog.py", line 3, in <module>
    import psutil
ModuleNotFoundError: No module named 'psutil'

这就十分不应该,即便psutil加载慢,但总应该可以加载上的,但是不管start.sh脚本循环等待多久,watchdog.py脚本中一直报错

4.2,发现rc.local启动后的start.sh脚本进程属于root用户

在这里插入图片描述

4.3,而手动启动后的start.sh脚本进程属于username用户

2023-04-22 02:51:11.555 procnum[0] not found watchdog.py
2023-04-22 02:51:13.573 procnum[1] found watchdog.py
2023-04-22 02:52:13.590 procnum[1] found watchdog.py

在这里插入图片描述

4.4,那么有没有可能就是由于启动start.sh脚本的用户不同,一个有psutil模块,一个没有导致的失败呢?

脚本中增加如下命令
LOG "user name = ${USER}"

4.4.1,开机后rc.local脚本启动start.sh脚本,打印用户名是root

2023-04-20 00:55:24.271 user name = root

4.4.2,手动启动start.sh脚本,打印用户名是当前username

2023-04-20 00:56:44.329 user name = username

4.4.3,如果想要跟踪脚本运行的命令对不对,可以使用exec 1>>file和set -x

# 将输出到终端的打印都写入文件
exec 1>> $LOG_FILE 2>&1

#该命令后执行的命令都打印到终端
set -x

#取消执行的命令打印到终端
set +x

5,方案三:指定启动watchdog.sh的用户

5.1,修改start.sh脚本如下所示

#!/bin/bash

LOG_FILE="/mnt/e/package/log/start.log"

exec 1> $LOG_FILE 2>&1

LOG()
{
	TIMES=`date +"%Y-%m-%d %H:%M:%S.%3N"`
	echo "${TIMES} $1" >> $LOG_FILE
}

RUN_PYTHON()
{
	FILE_PATH=$1
	PROGRAM=$2
	set -x
	cd $FILE_PATH
	nohup echo password|sudo -S -u username python3 $PROGRAM >> $LOG_FILE 2>&1 &  #  脚本启动 
	set +x
}

RUN_PYTHON /mnt/e/package watchdog.py
LOG "RUN_PYTHON /mnt/e/package watchdog.py. result:$?"

exit $?

5.2,重启电脑后发现完美解决了

start.log日志打印如下所示

+ cd /mnt/e/package
+ nohup echo 931108
+ set +x
+ sudo -S -u wangxinyuan python3 watchdog.py
2023-04-22 03:20:51.157 RUN_PYTHON /mnt/e/package watchdog.py. result:0
风语者!平时喜欢研究各种技术,目前在从事后端开发工作,热爱生活、热爱工作。