shell_util

1 模块简介

xlib/util/shell_util.py

用于在 butterfly 中执行 shell 命令

备注：此模块可独立使用

2 使用

2.1 butterfly 中使用例子

2.1.1 进入 butterfly 主目录

cd butterfly

备注：butterfly 主目录为 run.sh 所在的目录

2.1.2 创建 shell app

mkdir -p handlers/shell

2.1.3 编写 handler

将如下代码写入 handlers/shell/__init__.py

# coding=utf8
"""
# Description:
shell demo
"""
from xlib import retstat
from xlib.httpgateway import Request
from xlib.middleware import funcattr
from xlib.util import shell_util

__info = "shell demo"
__version = "1.0.1"


@funcattr.api
def run_cmd(req, cmd):
    """
    shell demo
    """
    isinstance(req, Request)

    ret = shell_util.run(cmd, timeout=10)
    if ret.success():
        # 成功逻辑
        """
        可通过 ret.output() 获取脚本执行内容
        """
        return retstat.OK, {"data": ret.output()}, [(__info, __version)]
    else:
        # 失败逻辑
        """
        失败时，会自动记录异常日志到 logs/common.log.wf
        """
        return retstat.ERR, {"data": ret.output()}, [(__info, __version)]

2.1.4 启动 butterfly

启动

bash run.sh start

或者

重启

bash run.sh restart

2.1.5 请求

$curl -d '{"cmd": "date"}' "http://127.0.0.1:8585/shell/run_cmd"
{"stat": "OK", "data": "Wed Mar  3 21:59:12 CST 2021"}

2.2 自己脚本调用此库

from xlib import logger
from xlib.util import shell_util

# 日志路径
logger.init_log("dev/common.log")

# 错误命令
cmd = "llll"

ret = shell_util.run(cmd, timeout=10)
if ret.success():
    # 成功逻辑
    """
    可通过 ret.output() 获取脚本执行内容
    """
    print ret.output()
else:
    # 失败逻辑
    """
    失败时，会自动记录异常日志到 logs/common.log.wf
    """
    print ret.output()

执行日志

ERROR butterfly 03-03 22:08:58: shell_util.py:120 4670393792 @@@@@@@@@@@@@@@@ * [file=w.py:<module>:20 reqid= type=shell req_path=llll req_data=None cost=0.000061 is_success=False err_no=127 err_msg=/bin/sh: llll: command not found res_len=32 res_data=/bin/sh: llll: command not found:) res_attr=None]

3 其他实践

3.1 在远程服务器上执行本地带参数的 shell 脚本

3.1.1 操作例子

exe = shell_util.remote_run(user="work", host=host, command='grep -c "model name" /proc/cpuinfo')
if exe.success():
    cpu_count=exe.output()
else:
    cpu_count=0

假如需要 kill 进程，则需要过滤到脚本，ps 命令等

远端执行脚本

exe = shell_util.remote_run(user="work", host=host, command="bash ./xxx.sh arg1 arg2")
if exe.success():
    cpu_count=exe.output()
else:
    cpu_count=0

3.1.2 常见错误

# 执行超时(比如机器登录时卡住，无法执行命令)
err_no=124 err_msg=exe timeout

# 登录失败(比如目标机器需要密码)
err_no=255 err_msg=

# 机器 Unreachable
err_no=1 err_msg=

3.2 subprocess.Popen().poll() 返回值详解

import subprocess

proc = subprocess.Popen(['python', 'test.py'], stdout=subprocess.PIPE)

while 1:
    print proc.poll()
#while 1:
#     print "hello"

print "hello"

poll 函数返回码：

0 正常结束
1 sleep
2 子进程不存在
-15 kill

3.3 Popen 僵尸进程

ceshi.sh

#!/bin/bash
ping  127.0.0.1

python（父进程）用 subprocess.Popen 新建一个进程（子进程）去开启一个 shell
shell 新开一个子进程（孙进程）去执行 ping 127.0.0.1 的命令

$ ps -ef | grep ww.py
304804    3967  8242  0 21:04 pts/4    00:00:00 python ww.py

$ ps -ef | grep 3968
304804    3968  3967  0 21:04 pts/4    00:00:00 bash ./ceshi.sh
304804    3970  3968  0 21:04 pts/4    00:00:00 ping 127.0.0.1

---------------------------------------------------timeout
$ ps -ef | grep 3968
304804    3968  3967  0 21:04 pts/4    00:00:00 [bash] <defunct>

$ ps -ef | grep ping
304804    3970     1  0 21:04 pts/4    00:00:00 ping 127.0.0.1

这个时候 process.communicate() 是阻塞的

output, _ = process.communicate()

3.4 使用 timer 控制超时时间

import inspect
import subprocess
import time
import logging
from threading import Timer


log = logging.getLogger("butterfly")


def kill_command(process):
    """
    kill command
    """
    process.terminate()


class Result(object):
    """
    easyrun 返回结果封装
    """

    def __init__(self, command="", retcode="", output="", cost="", reqid=""):
        """
        command : (str) 执行命令
        retcode : (int) 执行结果返回码
        output  : (str) 输出结果
        cost    : (str) 执行命令耗时
        """
        self.command = command or ''
        self.retcode = retcode
        self._output = output
        self._output_len = len(output)
        self._success = False
        self.cost = cost
        self.reqid = reqid
        if retcode == 0:
            self._success = True
            self.err_msg = "OK"
        else:
            self.err_msg = output

        self._logger()

    def __str__(self):
        """
        object str format
        """
        return "[command]:{command} [success]:{success} [output]:{output}".format(
            command=self.command,
            success=self._success,
            output=self._output
        )

    def _logger(self):
        """
        record log
        """
        f = inspect.currentframe().f_back.f_back
        file_name, lineno, func_name = self._get_backframe_info(f)

        if self._output_len > 50:
            output_log = self._output[:50].replace("\n", ">>>") + "... :("
        else:
            output_log = self._output.replace("\n", ">>>") + ":)"

        log_msg = ("[file={file_name}:{func_name}:{lineno} "
                   "reqid={reqid} "
                   "type=shell "
                   "req_path={req_path} "
                   "req_data=None "
                   "cost={cost} "
                   "is_success={is_success} "
                   "err_no={err_no} "
                   "err_msg={err_msg} "
                   "res_len={res_len} "
                   "res_data={res_data} "
                   "res_attr=None]".format(
                       file_name=file_name, func_name=func_name, lineno=lineno,
                       reqid=self.reqid,
                       req_path=self.command,
                       cost=self.cost,
                       is_success=self._success,
                       err_no=self.retcode,
                       err_msg=self.err_msg,
                       res_len=self._output_len,
                       res_data=output_log,
                   ))

        if self._success:
            log.info(log_msg)
        else:
            log.error(log_msg)

    def _get_backframe_info(self, f):
        """
        get backframe info
        """
        return f.f_back.f_code.co_filename, f.f_back.f_lineno, f.f_back.f_code.co_name

    def success(self):
        """
        检查执行是否成功
        """
        return self._success

    def output(self):
        """
        返回输出结果
        """
        return self._output


def run(command, timeout=10, reqid=""):
    """
    Args:
        command : (str) 执行的命令
        timeout : (int) 默认 10s
        reqid   : (str) 用于记录异步任务 reqid, 此 reqid 为请求发起时的 reqid
    Returns:
        Result
    """
    timeout = int(timeout)
    process = subprocess.Popen(
        command,
        stderr=subprocess.STDOUT,
        stdout=subprocess.PIPE,
        shell=True)

    timer = None
    if timeout > 0:
        timer = Timer(timeout, kill_command, [process])
        timer.start()

    t_beginning = time.time()
    try:
        output, _ = process.communicate()
    finally:
        if timer is not None:
            timer.cancel()

    seconds_passed = time.time() - t_beginning
    cost_str = "%.6f" % seconds_passed
    if timeout and seconds_passed > timeout:
        return Result(command=command, retcode=124, output="exe timeout", cost=cost_str, reqid=reqid)

    output = output.strip('\n')
    return Result(command=command, retcode=process.returncode, output=output, cost=cost_str, reqid=reqid)

问题
如果遇到 kill 导致僵尸进程，父进程则会阻塞到 process.communicate() 这里

3.5 检查机器是否存活

    res = shell_util.run("ping -c 1 -w 1 {host}".format(host=host))
    if res.success():
        return retstat.OK
    else:
        return retstat.ERR

3.6 容器相关

3.6.1 登录容器

sshx.exp（模拟用户与应用程序的交互）

#!/usr/bin/expect

set timeout 30
set host [lindex $argv 0]
set msgput [lindex $argv 1]
set pswd [lindex $argv 2]
set command [lindex $argv 3]

spawn ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=30 $host
expect {
    "Are you sure you want to continue connecting*" { send "yes\r"; exp_continue }
    "password:" { send "$pswd\r"; exp_continue }
    "mysql@" { send "export PS1='$msgput'\r"; interact }
    "root@" { send "$command \r";
            expect {
                    "root@" { send "export PS1='$msgput'\r"; interact }
                  }
              }
    "root@" { send "export PS1='$msgput'\r"; interact }
    "Administrator@" { send "export PS1='$msgput'\r"; interact }
}

command 命令

/home/work/scripts/sshx.exp root@<IP> '[\u@\[\033[34m\]todo\[\033[0m\](\[\033[36m\]unkonw\[\033[0m\]) \w]\$ ' <password> 'docker exec -it <容器 ID> /bin/bash'

3.6.2 通过 docker_exec.sh 对容器发起命令

docker_exec.sh

docker exec $1 /bin/bash -c "${*:2}"

docker 参数

脚本中不能加 -it 参数，否则会返回 the input device not a TTY 错误

shell ${*:2}：从第二个位置参数开始，将所有剩余的位置参数视为一个单一的字符串

在没有双引号包裹时，$*与 $@相同：都是数组
被双引号包裹时，$*与 $@不同："$@"为数组，"$*"为一个字符串，SHELL会将所有参数整合为一个字符串。

$*与 $@ 区别(执行./script.sh arg1 arg2 arg3)

代码

说明

#!/bin/bash

for arg in "${*:2}"; do echo "$arg" done

arg2 arg3

${*:2}被扩展成了单个字符串arg2 arg3

#!/bin/bash

for arg in "${@:2}"; do echo "$arg" done

arg2 arg3

${@:2}将每个剩余的位置参数作为独立的引用字符串进行迭代

command 命令(如查询容器里是否有 gcc 12)

/home/work/scripts/sshpass -p <password> ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=3 -o ConnectionAttempts=3 root@<IP> 'bash -s' < /home/work/scripts/shell_scripts/docker_exec.sh <container_id> '/opt/compiler/gcc-12/bin/gcc --version 2>/dev/null | grep -q 12.1.0 && echo stat:OK || echo stat:ERR'

3.6.3 通过登录机器对容器发起命令

 x1-cli /x1/host_ssh <IP> --command="bash <shell_file>" --timeout 5

本质上和 3.6.2 一样，只是改为了自定义的脚本，对容器进行发送命令

如清理 /var/spool/postfix/maildrop 脚本

#!/bin/bash
PROC_COUNT=$(ps -ef | grep  /var/spool/postfix/maildrop | grep -vc grep)
if [[ "$PROC_COUNT" == "0" ]]; then
    for i in `docker ps |awk '(NR>1){print $1}'`;
        do
            echo $i;
            docker exec $i /bin/bash -c "cp /var/spool/cron/root /var/spool/cron/root.bak && echo 'MAILTO=\"\"' > /var/spool/cron/root.tmp && cat /var/spool/cron/root |grep -v MAILTO >> /var/spool/cron/root.tmp && mv -f /var/spool/cron/root.tmp /var/spool/cron/root"
            docker exec $i /bin/bash -c "echo 'MAILTO=\"\"' > /etc/cron.d/sysstat.tmp && cat /etc/cron.d/sysstat |grep -v MAILTO >> /etc/cron.d/sysstat.tmp && mv -f /etc/cron.d/sysstat.tmp /etc/cron.d/sysstat"
            docker exec $i /bin/bash -c "find /var/spool/postfix/maildrop -type f -exec rm {} \;"
        done
else
    echo "stat:ERR_HAVE_REPEATED_PROC"
    exit 0
fi
echo "stat:OK"

3.6.4 容器超时

现象:

仅使用 shell_util 超时执行 /home/work/scripts/sshpass -p <password> ssh root@<IP> 'bash -s' <SHELL_FILE> 时，到超时时间之后

shell_util 中的 process.terminate 并没有效果

【执行机器】上会存在 sshpass 的进程，以及 ssh root@<IP> 的子进程
【远端机器】上会存在 docker exec 相关进程
【容器内部】上会存在容器要执行的命令

通过 timeout 进行控制超时
timeout 60s /home/work/scripts/sshpass ...

（1）超时之后 timeout 会发送一个 SIGTERM 信号来终止 sshpass
（2）远端机器 & 容器内部执行完成之后就会结束
（3）脚本中需要做下幂等操作，防止对容器执行相同的 exec 命令

3.7 远程执行脚本，含横杠开头参数

ssh work@xxx “bash -s” < xx.sh -r bj , 执行之后 -r 会被丢弃，变成了执行 bash xx.sh bj

问题原因

ssh work@xxx "bash -s" < xx.sh -r bj

SSH 客户端会按以下方式解析：

< xx.sh 将脚本内容通过 stdin 传递给远程的 bash -s
-r bj 被误认为是 SSH 客户端的本地参数，而非远程命令参数

最终远程执行的命令实际是：

bash -s xx.sh bj  # 丢失了 -r 参数

解决方案

需要通过 -- 显式分隔本地参数和远程参数：

正确命令：

ssh work@xxx "bash -s" < xx.sh -- -r bj

或：

cat xx.sh | ssh work@xxx "bash -s -- -r bj"

关键点解释

-- 的作用 告诉 SSH 客户端后续参数 (-r bj) 属于远程命令，而非本地 SSH 参数。
bash -s 的机制 -s 表示从 stdin 读取脚本，后续参数会传递给脚本本身（即 $1 对应 -r，$2 对应 bj）。

3.8 Redis 4c 绑核脚本

#!/bin/bash
cpu_count=$(lscpu | grep "^CPU(s):" | awk '{print $NF}')
if [[ "${cpu_count}" != "4" ]]; then
    echo "stat:ERR cpu not 4"
    exit -1
fi

redis_pid=$(ps -ef | grep /root/agent/bin/redis-server  | grep -v grep  | awk '{print $2}')
if [[ "${redis_pid}" == "" ]]; then
    echo "stat:ERR pid not found"
    exit -1
fi

redis_server_tid=$(ps -Tp ${redis_pid} | grep redis-server | awk '{print $2}')
if [[ "${redis_server_tid}" == "" ]]; then
    echo "stat:ERR redis_server_tid not found"
    exit -1
else
    taskset -pc 1 ${redis_server_tid} | grep -q "new affinity list: 1" || exit -1
fi

# taskset io_thd_tid
cpu_num=1
for io_thd_tid in $(ps -T -p ${redis_pid} | grep io_thd_  | awk '{print $2}')
do
    let cpu_num=cpu_num+1
    if [ ${cpu_num} -gt 3 ];then
        echo "stat:ERR cpu_num gt 3"
        exit -1
    fi
    taskset -pc ${cpu_num} ${io_thd_tid} | grep -q "new affinity list: ${cpu_num}" || exit -1
done

echo "stat:OK"
exit 0

程序中的判断 xxx|| exit -1 可以封装函数代替 exit -1,输出内容并进行退出

Previoushost_util Nexthttp_util

Last updated 17 days ago

shell_util

1 模块简介

xlib/util/shell_util.py

用于在 butterfly 中执行 shell 命令

备注：此模块可独立使用

2 使用

2.1 butterfly 中使用例子

2.1.1 进入 butterfly 主目录

cd butterfly

备注：butterfly 主目录为 run.sh 所在的目录

2.1.2 创建 shell app

mkdir -p handlers/shell

2.1.3 编写 handler

将如下代码写入 handlers/shell/__init__.py

# coding=utf8
"""
# Description:
shell demo
"""
from xlib import retstat
from xlib.httpgateway import Request
from xlib.middleware import funcattr
from xlib.util import shell_util

__info = "shell demo"
__version = "1.0.1"


@funcattr.api
def run_cmd(req, cmd):
    """
    shell demo
    """
    isinstance(req, Request)

    ret = shell_util.run(cmd, timeout=10)
    if ret.success():
        # 成功逻辑
        """
        可通过 ret.output() 获取脚本执行内容
        """
        return retstat.OK, {"data": ret.output()}, [(__info, __version)]
    else:
        # 失败逻辑
        """
        失败时，会自动记录异常日志到 logs/common.log.wf
        """
        return retstat.ERR, {"data": ret.output()}, [(__info, __version)]

2.1.4 启动 butterfly

启动

bash run.sh start

或者

重启

bash run.sh restart

2.1.5 请求

$curl -d '{"cmd": "date"}' "http://127.0.0.1:8585/shell/run_cmd"
{"stat": "OK", "data": "Wed Mar  3 21:59:12 CST 2021"}

2.2 自己脚本调用此库

from xlib import logger
from xlib.util import shell_util

# 日志路径
logger.init_log("dev/common.log")

# 错误命令
cmd = "llll"

ret = shell_util.run(cmd, timeout=10)
if ret.success():
    # 成功逻辑
    """
    可通过 ret.output() 获取脚本执行内容
    """
    print ret.output()
else:
    # 失败逻辑
    """
    失败时，会自动记录异常日志到 logs/common.log.wf
    """
    print ret.output()

执行日志

ERROR butterfly 03-03 22:08:58: shell_util.py:120 4670393792 @@@@@@@@@@@@@@@@ * [file=w.py:<module>:20 reqid= type=shell req_path=llll req_data=None cost=0.000061 is_success=False err_no=127 err_msg=/bin/sh: llll: command not found res_len=32 res_data=/bin/sh: llll: command not found:) res_attr=None]

3 其他实践

3.1 在远程服务器上执行本地带参数的 shell 脚本

3.1.1 操作例子

exe = shell_util.remote_run(user="work", host=host, command='grep -c "model name" /proc/cpuinfo')
if exe.success():
    cpu_count=exe.output()
else:
    cpu_count=0

假如需要 kill 进程，则需要过滤到脚本，ps 命令等

远端执行脚本

exe = shell_util.remote_run(user="work", host=host, command="bash ./xxx.sh arg1 arg2")
if exe.success():
    cpu_count=exe.output()
else:
    cpu_count=0

3.1.2 常见错误

# 执行超时(比如机器登录时卡住，无法执行命令)
err_no=124 err_msg=exe timeout

# 登录失败(比如目标机器需要密码)
err_no=255 err_msg=

# 机器 Unreachable
err_no=1 err_msg=

3.2 subprocess.Popen().poll() 返回值详解

import subprocess

proc = subprocess.Popen(['python', 'test.py'], stdout=subprocess.PIPE)

while 1:
    print proc.poll()
#while 1:
#     print "hello"

print "hello"

poll 函数返回码：

0 正常结束
1 sleep
2 子进程不存在
-15 kill

3.3 Popen 僵尸进程

ceshi.sh

#!/bin/bash
ping  127.0.0.1

python（父进程）用 subprocess.Popen 新建一个进程（子进程）去开启一个 shell
shell 新开一个子进程（孙进程）去执行 ping 127.0.0.1 的命令

$ ps -ef | grep ww.py
304804    3967  8242  0 21:04 pts/4    00:00:00 python ww.py

$ ps -ef | grep 3968
304804    3968  3967  0 21:04 pts/4    00:00:00 bash ./ceshi.sh
304804    3970  3968  0 21:04 pts/4    00:00:00 ping 127.0.0.1

---------------------------------------------------timeout
$ ps -ef | grep 3968
304804    3968  3967  0 21:04 pts/4    00:00:00 [bash] <defunct>

$ ps -ef | grep ping
304804    3970     1  0 21:04 pts/4    00:00:00 ping 127.0.0.1

这个时候 process.communicate() 是阻塞的

output, _ = process.communicate()

3.4 使用 timer 控制超时时间

import inspect
import subprocess
import time
import logging
from threading import Timer


log = logging.getLogger("butterfly")


def kill_command(process):
    """
    kill command
    """
    process.terminate()


class Result(object):
    """
    easyrun 返回结果封装
    """

    def __init__(self, command="", retcode="", output="", cost="", reqid=""):
        """
        command : (str) 执行命令
        retcode : (int) 执行结果返回码
        output  : (str) 输出结果
        cost    : (str) 执行命令耗时
        """
        self.command = command or ''
        self.retcode = retcode
        self._output = output
        self._output_len = len(output)
        self._success = False
        self.cost = cost
        self.reqid = reqid
        if retcode == 0:
            self._success = True
            self.err_msg = "OK"
        else:
            self.err_msg = output

        self._logger()

    def __str__(self):
        """
        object str format
        """
        return "[command]:{command} [success]:{success} [output]:{output}".format(
            command=self.command,
            success=self._success,
            output=self._output
        )

    def _logger(self):
        """
        record log
        """
        f = inspect.currentframe().f_back.f_back
        file_name, lineno, func_name = self._get_backframe_info(f)

        if self._output_len > 50:
            output_log = self._output[:50].replace("\n", ">>>") + "... :("
        else:
            output_log = self._output.replace("\n", ">>>") + ":)"

        log_msg = ("[file={file_name}:{func_name}:{lineno} "
                   "reqid={reqid} "
                   "type=shell "
                   "req_path={req_path} "
                   "req_data=None "
                   "cost={cost} "
                   "is_success={is_success} "
                   "err_no={err_no} "
                   "err_msg={err_msg} "
                   "res_len={res_len} "
                   "res_data={res_data} "
                   "res_attr=None]".format(
                       file_name=file_name, func_name=func_name, lineno=lineno,
                       reqid=self.reqid,
                       req_path=self.command,
                       cost=self.cost,
                       is_success=self._success,
                       err_no=self.retcode,
                       err_msg=self.err_msg,
                       res_len=self._output_len,
                       res_data=output_log,
                   ))

        if self._success:
            log.info(log_msg)
        else:
            log.error(log_msg)

    def _get_backframe_info(self, f):
        """
        get backframe info
        """
        return f.f_back.f_code.co_filename, f.f_back.f_lineno, f.f_back.f_code.co_name

    def success(self):
        """
        检查执行是否成功
        """
        return self._success

    def output(self):
        """
        返回输出结果
        """
        return self._output


def run(command, timeout=10, reqid=""):
    """
    Args:
        command : (str) 执行的命令
        timeout : (int) 默认 10s
        reqid   : (str) 用于记录异步任务 reqid, 此 reqid 为请求发起时的 reqid
    Returns:
        Result
    """
    timeout = int(timeout)
    process = subprocess.Popen(
        command,
        stderr=subprocess.STDOUT,
        stdout=subprocess.PIPE,
        shell=True)

    timer = None
    if timeout > 0:
        timer = Timer(timeout, kill_command, [process])
        timer.start()

    t_beginning = time.time()
    try:
        output, _ = process.communicate()
    finally:
        if timer is not None:
            timer.cancel()

    seconds_passed = time.time() - t_beginning
    cost_str = "%.6f" % seconds_passed
    if timeout and seconds_passed > timeout:
        return Result(command=command, retcode=124, output="exe timeout", cost=cost_str, reqid=reqid)

    output = output.strip('\n')
    return Result(command=command, retcode=process.returncode, output=output, cost=cost_str, reqid=reqid)

问题
如果遇到 kill 导致僵尸进程，父进程则会阻塞到 process.communicate() 这里

3.5 检查机器是否存活

    res = shell_util.run("ping -c 1 -w 1 {host}".format(host=host))
    if res.success():
        return retstat.OK
    else:
        return retstat.ERR

3.6 容器相关

3.6.1 登录容器

sshx.exp（模拟用户与应用程序的交互）

#!/usr/bin/expect

set timeout 30
set host [lindex $argv 0]
set msgput [lindex $argv 1]
set pswd [lindex $argv 2]
set command [lindex $argv 3]

spawn ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=30 $host
expect {
    "Are you sure you want to continue connecting*" { send "yes\r"; exp_continue }
    "password:" { send "$pswd\r"; exp_continue }
    "mysql@" { send "export PS1='$msgput'\r"; interact }
    "root@" { send "$command \r";
            expect {
                    "root@" { send "export PS1='$msgput'\r"; interact }
                  }
              }
    "root@" { send "export PS1='$msgput'\r"; interact }
    "Administrator@" { send "export PS1='$msgput'\r"; interact }
}

command 命令

/home/work/scripts/sshx.exp root@<IP> '[\u@\[\033[34m\]todo\[\033[0m\](\[\033[36m\]unkonw\[\033[0m\]) \w]\$ ' <password> 'docker exec -it <容器 ID> /bin/bash'

3.6.2 通过 docker_exec.sh 对容器发起命令

docker_exec.sh

docker exec $1 /bin/bash -c "${*:2}"

docker 参数

脚本中不能加 -it 参数，否则会返回 the input device not a TTY 错误

shell ${*:2}：从第二个位置参数开始，将所有剩余的位置参数视为一个单一的字符串

在没有双引号包裹时，$*与 $@相同：都是数组
被双引号包裹时，$*与 $@不同："$@"为数组，"$*"为一个字符串，SHELL会将所有参数整合为一个字符串。

$*与 $@ 区别(执行./script.sh arg1 arg2 arg3)

代码

说明

#!/bin/bash

for arg in "${*:2}"; do echo "$arg" done

arg2 arg3

${*:2}被扩展成了单个字符串arg2 arg3

#!/bin/bash

for arg in "${@:2}"; do echo "$arg" done

arg2 arg3

${@:2}将每个剩余的位置参数作为独立的引用字符串进行迭代

command 命令(如查询容器里是否有 gcc 12)

/home/work/scripts/sshpass -p <password> ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=3 -o ConnectionAttempts=3 root@<IP> 'bash -s' < /home/work/scripts/shell_scripts/docker_exec.sh <container_id> '/opt/compiler/gcc-12/bin/gcc --version 2>/dev/null | grep -q 12.1.0 && echo stat:OK || echo stat:ERR'

3.6.3 通过登录机器对容器发起命令

 x1-cli /x1/host_ssh <IP> --command="bash <shell_file>" --timeout 5

本质上和 3.6.2 一样，只是改为了自定义的脚本，对容器进行发送命令

如清理 /var/spool/postfix/maildrop 脚本

#!/bin/bash
PROC_COUNT=$(ps -ef | grep  /var/spool/postfix/maildrop | grep -vc grep)
if [[ "$PROC_COUNT" == "0" ]]; then
    for i in `docker ps |awk '(NR>1){print $1}'`;
        do
            echo $i;
            docker exec $i /bin/bash -c "cp /var/spool/cron/root /var/spool/cron/root.bak && echo 'MAILTO=\"\"' > /var/spool/cron/root.tmp && cat /var/spool/cron/root |grep -v MAILTO >> /var/spool/cron/root.tmp && mv -f /var/spool/cron/root.tmp /var/spool/cron/root"
            docker exec $i /bin/bash -c "echo 'MAILTO=\"\"' > /etc/cron.d/sysstat.tmp && cat /etc/cron.d/sysstat |grep -v MAILTO >> /etc/cron.d/sysstat.tmp && mv -f /etc/cron.d/sysstat.tmp /etc/cron.d/sysstat"
            docker exec $i /bin/bash -c "find /var/spool/postfix/maildrop -type f -exec rm {} \;"
        done
else
    echo "stat:ERR_HAVE_REPEATED_PROC"
    exit 0
fi
echo "stat:OK"

3.6.4 容器超时

现象:

仅使用 shell_util 超时执行 /home/work/scripts/sshpass -p <password> ssh root@<IP> 'bash -s' <SHELL_FILE> 时，到超时时间之后

shell_util 中的 process.terminate 并没有效果

【执行机器】上会存在 sshpass 的进程，以及 ssh root@<IP> 的子进程
【远端机器】上会存在 docker exec 相关进程
【容器内部】上会存在容器要执行的命令

通过 timeout 进行控制超时
timeout 60s /home/work/scripts/sshpass ...

（1）超时之后 timeout 会发送一个 SIGTERM 信号来终止 sshpass
（2）远端机器 & 容器内部执行完成之后就会结束
（3）脚本中需要做下幂等操作，防止对容器执行相同的 exec 命令

3.7 远程执行脚本，含横杠开头参数

ssh work@xxx “bash -s” < xx.sh -r bj , 执行之后 -r 会被丢弃，变成了执行 bash xx.sh bj

问题原因

ssh work@xxx "bash -s" < xx.sh -r bj

SSH 客户端会按以下方式解析：

< xx.sh 将脚本内容通过 stdin 传递给远程的 bash -s
-r bj 被误认为是 SSH 客户端的本地参数，而非远程命令参数

最终远程执行的命令实际是：

bash -s xx.sh bj  # 丢失了 -r 参数

解决方案

需要通过 -- 显式分隔本地参数和远程参数：

正确命令：

ssh work@xxx "bash -s" < xx.sh -- -r bj

或：

cat xx.sh | ssh work@xxx "bash -s -- -r bj"

关键点解释

-- 的作用 告诉 SSH 客户端后续参数 (-r bj) 属于远程命令，而非本地 SSH 参数。
bash -s 的机制 -s 表示从 stdin 读取脚本，后续参数会传递给脚本本身（即 $1 对应 -r，$2 对应 bj）。

3.8 Redis 4c 绑核脚本

#!/bin/bash
cpu_count=$(lscpu | grep "^CPU(s):" | awk '{print $NF}')
if [[ "${cpu_count}" != "4" ]]; then
    echo "stat:ERR cpu not 4"
    exit -1
fi

redis_pid=$(ps -ef | grep /root/agent/bin/redis-server  | grep -v grep  | awk '{print $2}')
if [[ "${redis_pid}" == "" ]]; then
    echo "stat:ERR pid not found"
    exit -1
fi

redis_server_tid=$(ps -Tp ${redis_pid} | grep redis-server | awk '{print $2}')
if [[ "${redis_server_tid}" == "" ]]; then
    echo "stat:ERR redis_server_tid not found"
    exit -1
else
    taskset -pc 1 ${redis_server_tid} | grep -q "new affinity list: 1" || exit -1
fi

# taskset io_thd_tid
cpu_num=1
for io_thd_tid in $(ps -T -p ${redis_pid} | grep io_thd_  | awk '{print $2}')
do
    let cpu_num=cpu_num+1
    if [ ${cpu_num} -gt 3 ];then
        echo "stat:ERR cpu_num gt 3"
        exit -1
    fi
    taskset -pc ${cpu_num} ${io_thd_tid} | grep -q "new affinity list: ${cpu_num}" || exit -1
done

echo "stat:OK"
exit 0

程序中的判断 xxx|| exit -1 可以封装函数代替 exit -1,输出内容并进行退出

Previoushost_util Nexthttp_util

Last updated 17 days ago