shell_util
1 模块简介
xlib/util/shell_util.py
用于在 butterfly 中执行 shell 命令
备注:此模块可独立使用
2 使用
2.1 butterfly 中使用例子
2.1.1 进入 butterfly 主目录
cd butterfly
备注:butterfly 主目录为 run.sh 所在的目录
2.1.2 创建 shell app
mkdir -p handlers/shell
2.1.3 编写 handler
将如下代码写入
handlers/shell/__init__.py
# coding=utf8
"""
# Description:
shell demo
"""
from xlib import retstat
from xlib.httpgateway import Request
from xlib.middleware import funcattr
from xlib.util import shell_util
__info = "shell demo"
__version = "1.0.1"
@funcattr.api
def run_cmd(req, cmd):
"""
shell demo
"""
isinstance(req, Request)
ret = shell_util.run(cmd, timeout=10)
if ret.success():
# 成功逻辑
"""
可通过 ret.output() 获取脚本执行内容
"""
return retstat.OK, {"data": ret.output()}, [(__info, __version)]
else:
# 失败逻辑
"""
失败时,会自动记录异常日志到 logs/common.log.wf
"""
return retstat.ERR, {"data": ret.output()}, [(__info, __version)]
2.1.4 启动 butterfly
启动
bash run.sh start
或者
重启
bash run.sh restart
2.1.5 请求
$curl -d '{"cmd": "date"}' "http://127.0.0.1:8585/shell/run_cmd"
{"stat": "OK", "data": "Wed Mar 3 21:59:12 CST 2021"}
2.2 自己脚本调用此库
from xlib import logger
from xlib.util import shell_util
# 日志路径
logger.init_log("dev/common.log")
# 错误命令
cmd = "llll"
ret = shell_util.run(cmd, timeout=10)
if ret.success():
# 成功逻辑
"""
可通过 ret.output() 获取脚本执行内容
"""
print ret.output()
else:
# 失败逻辑
"""
失败时,会自动记录异常日志到 logs/common.log.wf
"""
print ret.output()
执行日志
ERROR butterfly 03-03 22:08:58: shell_util.py:120 4670393792 @@@@@@@@@@@@@@@@ * [file=w.py:<module>:20 reqid= type=shell req_path=llll req_data=None cost=0.000061 is_success=False err_no=127 err_msg=/bin/sh: llll: command not found res_len=32 res_data=/bin/sh: llll: command not found:) res_attr=None]
3 其他实践
3.1 在远程服务器上执行本地带参数的 shell 脚本
3.1.1 操作例子
exe = shell_util.remote_run(user="work", host=host, command='grep -c "model name" /proc/cpuinfo')
if exe.success():
cpu_count=exe.output()
else:
cpu_count=0
假如需要 kill 进程,则需要过滤到脚本,ps 命令等
远端执行脚本
exe = shell_util.remote_run(user="work", host=host, command="bash ./xxx.sh arg1 arg2")
if exe.success():
cpu_count=exe.output()
else:
cpu_count=0
3.1.2 常见错误
# 执行超时(比如机器登录时卡住,无法执行命令)
err_no=124 err_msg=exe timeout
# 登录失败(比如目标机器需要密码)
err_no=255 err_msg=
# 机器 Unreachable
err_no=1 err_msg=
3.2 subprocess.Popen().poll() 返回值详解
import subprocess
proc = subprocess.Popen(['python', 'test.py'], stdout=subprocess.PIPE)
while 1:
print proc.poll()
#while 1:
# print "hello"
print "hello"
poll 函数返回码:
0 正常结束
1 sleep
2 子进程不存在
-15 kill
3.3 Popen 僵尸进程
ceshi.sh
#!/bin/bash
ping 127.0.0.1
python(父进程)用 subprocess.Popen 新建一个进程(子进程)去开启一个 shell
shell 新开一个子进程(孙进程)去执行 ping 127.0.0.1 的命令
$ ps -ef | grep ww.py
304804 3967 8242 0 21:04 pts/4 00:00:00 python ww.py
$ ps -ef | grep 3968
304804 3968 3967 0 21:04 pts/4 00:00:00 bash ./ceshi.sh
304804 3970 3968 0 21:04 pts/4 00:00:00 ping 127.0.0.1
---------------------------------------------------timeout
$ ps -ef | grep 3968
304804 3968 3967 0 21:04 pts/4 00:00:00 [bash] <defunct>
$ ps -ef | grep ping
304804 3970 1 0 21:04 pts/4 00:00:00 ping 127.0.0.1
这个时候 process.communicate() 是阻塞的
output, _ = process.communicate()
3.4 使用 timer 控制超时时间
import inspect
import subprocess
import time
import logging
from threading import Timer
log = logging.getLogger("butterfly")
def kill_command(process):
"""
kill command
"""
process.terminate()
class Result(object):
"""
easyrun 返回结果封装
"""
def __init__(self, command="", retcode="", output="", cost="", reqid=""):
"""
command : (str) 执行命令
retcode : (int) 执行结果返回码
output : (str) 输出结果
cost : (str) 执行命令耗时
"""
self.command = command or ''
self.retcode = retcode
self._output = output
self._output_len = len(output)
self._success = False
self.cost = cost
self.reqid = reqid
if retcode == 0:
self._success = True
self.err_msg = "OK"
else:
self.err_msg = output
self._logger()
def __str__(self):
"""
object str format
"""
return "[command]:{command} [success]:{success} [output]:{output}".format(
command=self.command,
success=self._success,
output=self._output
)
def _logger(self):
"""
record log
"""
f = inspect.currentframe().f_back.f_back
file_name, lineno, func_name = self._get_backframe_info(f)
if self._output_len > 50:
output_log = self._output[:50].replace("\n", ">>>") + "... :("
else:
output_log = self._output.replace("\n", ">>>") + ":)"
log_msg = ("[file={file_name}:{func_name}:{lineno} "
"reqid={reqid} "
"type=shell "
"req_path={req_path} "
"req_data=None "
"cost={cost} "
"is_success={is_success} "
"err_no={err_no} "
"err_msg={err_msg} "
"res_len={res_len} "
"res_data={res_data} "
"res_attr=None]".format(
file_name=file_name, func_name=func_name, lineno=lineno,
reqid=self.reqid,
req_path=self.command,
cost=self.cost,
is_success=self._success,
err_no=self.retcode,
err_msg=self.err_msg,
res_len=self._output_len,
res_data=output_log,
))
if self._success:
log.info(log_msg)
else:
log.error(log_msg)
def _get_backframe_info(self, f):
"""
get backframe info
"""
return f.f_back.f_code.co_filename, f.f_back.f_lineno, f.f_back.f_code.co_name
def success(self):
"""
检查执行是否成功
"""
return self._success
def output(self):
"""
返回输出结果
"""
return self._output
def run(command, timeout=10, reqid=""):
"""
Args:
command : (str) 执行的命令
timeout : (int) 默认 10s
reqid : (str) 用于记录异步任务 reqid, 此 reqid 为请求发起时的 reqid
Returns:
Result
"""
timeout = int(timeout)
process = subprocess.Popen(
command,
stderr=subprocess.STDOUT,
stdout=subprocess.PIPE,
shell=True)
timer = None
if timeout > 0:
timer = Timer(timeout, kill_command, [process])
timer.start()
t_beginning = time.time()
try:
output, _ = process.communicate()
finally:
if timer is not None:
timer.cancel()
seconds_passed = time.time() - t_beginning
cost_str = "%.6f" % seconds_passed
if timeout and seconds_passed > timeout:
return Result(command=command, retcode=124, output="exe timeout", cost=cost_str, reqid=reqid)
output = output.strip('\n')
return Result(command=command, retcode=process.returncode, output=output, cost=cost_str, reqid=reqid)
问题
如果遇到 kill 导致僵尸进程,父进程则会阻塞到 process.communicate() 这里
3.5 检查机器是否存活
res = shell_util.run("ping -c 1 -w 1 {host}".format(host=host))
if res.success():
return retstat.OK
else:
return retstat.ERR
3.6 容器相关
3.6.1 登录容器
sshx.exp(模拟用户与应用程序的交互)
#!/usr/bin/expect
set timeout 30
set host [lindex $argv 0]
set msgput [lindex $argv 1]
set pswd [lindex $argv 2]
set command [lindex $argv 3]
spawn ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=30 $host
expect {
"Are you sure you want to continue connecting*" { send "yes\r"; exp_continue }
"password:" { send "$pswd\r"; exp_continue }
"mysql@" { send "export PS1='$msgput'\r"; interact }
"root@" { send "$command \r";
expect {
"root@" { send "export PS1='$msgput'\r"; interact }
}
}
"root@" { send "export PS1='$msgput'\r"; interact }
"Administrator@" { send "export PS1='$msgput'\r"; interact }
}
command 命令
/home/work/scripts/sshx.exp root@<IP> '[\u@\[\033[34m\]todo\[\033[0m\](\[\033[36m\]unkonw\[\033[0m\]) \w]\$ ' <password> 'docker exec -it <容器 ID> /bin/bash'
3.6.2 通过 docker_exec.sh 对容器发起命令
docker_exec.sh
docker exec $1 /bin/bash -c "${*:2}"
docker 参数
脚本中不能加 -it 参数,否则会返回 the input device not a TTY 错误
shell ${*:2}
:从第二个位置参数开始,将所有剩余的位置参数视为一个单一的字符串
在没有双引号包裹时,$*与 $@相同:都是数组
被双引号包裹时,$*与 $@不同:"$@"为数组,"$*"为一个字符串,SHELL会将所有参数整合为一个字符串。
$*与 $@ 区别(执行./script.sh arg1 arg2 arg3)
#!/bin/bash
for arg in "${*:2}"; do echo "$arg" done
arg2 arg3
${*:2}
被扩展成了单个字符串arg2 arg3
#!/bin/bash
for arg in "${@:2}"; do echo "$arg" done
arg2 arg3
${@:2}
将每个剩余的位置参数作为独立的引用字符串进行迭代
command 命令(如查询容器里是否有 gcc 12)
/home/work/scripts/sshpass -p <password> ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=3 -o ConnectionAttempts=3 root@<IP> 'bash -s' < /home/work/scripts/shell_scripts/docker_exec.sh <container_id> '/opt/compiler/gcc-12/bin/gcc --version 2>/dev/null | grep -q 12.1.0 && echo stat:OK || echo stat:ERR'
3.6.3 通过登录机器对容器发起命令
x1-cli /x1/host_ssh <IP> --command="bash <shell_file>" --timeout 5
本质上和 3.6.2 一样,只是改为了自定义的脚本,对容器进行发送命令
如清理 /var/spool/postfix/maildrop 脚本
#!/bin/bash
PROC_COUNT=$(ps -ef | grep /var/spool/postfix/maildrop | grep -vc grep)
if [[ "$PROC_COUNT" == "0" ]]; then
for i in `docker ps |awk '(NR>1){print $1}'`;
do
echo $i;
docker exec $i /bin/bash -c "cp /var/spool/cron/root /var/spool/cron/root.bak && echo 'MAILTO=\"\"' > /var/spool/cron/root.tmp && cat /var/spool/cron/root |grep -v MAILTO >> /var/spool/cron/root.tmp && mv -f /var/spool/cron/root.tmp /var/spool/cron/root"
docker exec $i /bin/bash -c "echo 'MAILTO=\"\"' > /etc/cron.d/sysstat.tmp && cat /etc/cron.d/sysstat |grep -v MAILTO >> /etc/cron.d/sysstat.tmp && mv -f /etc/cron.d/sysstat.tmp /etc/cron.d/sysstat"
docker exec $i /bin/bash -c "find /var/spool/postfix/maildrop -type f -exec rm {} \;"
done
else
echo "stat:ERR_HAVE_REPEATED_PROC"
exit 0
fi
echo "stat:OK"
3.6.4 容器超时
现象:
仅使用 shell_util 超时执行 /home/work/scripts/sshpass -p <password> ssh root@<IP> 'bash -s' <SHELL_FILE> 时,到超时时间之后
shell_util 中的 process.terminate 并没有效果
【执行机器】上会存在 sshpass 的进程,以及 ssh root@<IP> 的子进程
【远端机器】上会存在 docker exec 相关进程
【容器内部】 上会存在容器要执行的命令
通过 timeout 进行控制超时
timeout 60s /home/work/scripts/sshpass ...
(1)超时之后 timeout 会发送一个 SIGTERM 信号来终止 sshpass
(2)远端机器 & 容器内部执行完成之后就会结束
(3)脚本中需要做下幂等操作,防止对容器执行相同的 exec 命令
Last updated