# 数据分析(庖丁)

## 庖丁（数据分析）

## 1 项目概述

### 1.1 背景介绍及目标

针对 Redis 服务做 Redis 服务数据分析

> * 大 key 分析
>   * 大 key 带来了什么危害？
>     * Redis 阻塞：因为 Redis 单线程特性，如果操作某个 Bigkey 耗时比较久，则后面的请求会被阻塞。
>     * 内存空间不均匀：在 Redis 集群中，会造成节点的内存使用不均匀。
>     * 过期时可能阻塞：如果 Bigkey 设置了过期时间，当过期后，这个 key 会被删除，假如没有使用 Redis 4.0 的过期异步删除，就会存在阻塞 Redis 的可能性，并且慢查询中查不到（因为这个删除是内部循环事件）。
> * 热 key 分析

### 1.2 名词说明

### 1.3 Roadmap

## 2 需求分析

### 2.1 功能需求

### 2.2 非功能需求

### 2.3 调研

#### 2.3.1 big key

**2.3.1.1 Redis cli**

使用 scan 命令进行扫描 big key

```
redis-cli --bigkeys -i 0.1
```

> output

```
# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest set    found so far 'redisorm:xxx2_monit:object:cHjyJazUyOi3ygSD:tags' with 2 members
[00.00%] Biggest string found so far 'redisorm:xxx_redis_matrix:object:vxFtvpeJBFFkBWr2' with 1489 bytes
[00.00%] Biggest set    found so far 'redisorm:0AAB5C3FE11F0D59:tags:time:2020-11-23 20:25:49' with 6 members
[00.01%] Biggest hash   found so far 'rq:job:c16a763b-4f1b-4544-a410-dd45245d96da' with 10 fields
[00.02%] Biggest set    found so far 'redisorm:xxx_conf:object:ZNtmrqCWGoUKIvt4:tags' with 10 members
[00.07%] Biggest string found so far 'redisorm:xxx_conf:object:cTSSZrWoo3vastEW' with 1553 bytes
[00.14%] Biggest set    found so far 'redisorm:xxx2_status:object:2u69AiuZR06kNv2p:tags' with 17 members
[00.32%] Biggest string found so far 'redisorm:xxx_replication:object:4DypIgMprlbRnPRF' with 11471 bytes
[00.56%] Biggest set    found so far 'redisorm:xxx_replication:tags:time:2021-04-12 16:44' with 59 members
[01.22%] Biggest set    found so far 'redisorm:xxx_replication:tags:time:2021-04-12 16:28' with 64 members
[01.24%] Biggest set    found so far 'redisorm:xxx_conf:tags:adapter_bin_hash:OK' with 2217 members
[11.60%] Biggest zset   found so far 'rq:finished:xxx_service' with 27 members
[12.98%] Biggest zset   found so far 'redisorm:xxx_qos:__expire__' with 779 members
[29.24%] Biggest hash   found so far 'rq:worker:f6a94ae9bdd0402e872386354b9a4c35' with 11 fields
[47.73%] Biggest set    found so far 'redisorm:xxx_redis_matrix:__all__' with 4120 members
[66.18%] Biggest list   found so far 'info' with 6 items
[86.92%] Biggest hash   found so far 'test' with 18 fields
[96.71%] Biggest zset   found so far 'rq:finished:default' with 2277 members

-------- summary -------

Sampled 91031 keys in the keyspace!
Total key length in bytes is 7497297 (avg len 82.36)

Biggest string found 'redisorm:xxx_replication:object:4DypIgMprlbRnPRF' has 11471 bytes
Biggest   list found 'info' has 6 items
Biggest    set found 'redisorm:xxx_redis_matrix:__all__' has 4120 members
Biggest   hash found 'test' has 18 fields
Biggest   zset found 'mq:finished:default' has 2277 members

23609 strings with 8878680 bytes (25.94% of keys, avg size 376.07)
1 lists with 6 items (00.00% of keys, avg size 6.00)
65031 sets with 225006 members (71.44% of keys, avg size 3.46)
2377 hashs with 23698 fields (02.61% of keys, avg size 9.97)
13 zsets with 3182 members (00.01% of keys, avg size 244.77)
```

缺点：

> * 线上使用：虽然 scan 命令通过游标遍历建空间并且在生产上可以通过对从服务执行该命令，但毕竟是一个线上操作
> * set,zset,list 以及 hash 类型只能获取有多少个元素。但其实元素多的不一定占用空间大

#### 2.3.2 hot key

**2.3.2.1 Facebook redis-faina（基于 monitor）**

> 使用 monitor 获取最近访问的热点信息

```
Overall Stats
========================================
Lines Processed     117773
Commands/Sec        11483.44

Top Prefixes
========================================
friendlist          69945
followedbycounter   25419
followingcounter    10139
recentcomments      3276
queued              7

Top Keys
========================================
friendlist:zzz:1:2     534
followingcount:zzz     227
friendlist:zxz:1:2     167
friendlist:xzz:1:2     165
friendlist:yzz:1:2     160
friendlist:gzz:1:2     160
friendlist:zdz:1:2     160
friendlist:zpz:1:2     156

Top Commands
========================================
SISMEMBER   59545
HGET        27681
HINCRBY     9413
SMEMBERS    9254
MULTI       3520
EXEC        3520
LPUSH       1620
EXPIRE      1598

Command Time (microsecs)
========================================
Median      78.25
75%         105.0
90%         187.25
99%         411.0

Heaviest Commands (microsecs)
========================================
SISMEMBER   5331651.0
HGET        2618868.0
HINCRBY     961192.5
SMEMBERS    856817.5
MULTI       311339.5
SADD        54900.75
SREM        40771.25
EXEC        28678.5

Slowest Calls
========================================
3490.75     "SMEMBERS" "friendlist:zzz:1:2"
2362.0      "SMEMBERS" "friendlist:xzz:1:3"
2061.0      "SMEMBERS" "friendlist:zpz:1:2"
1961.0      "SMEMBERS" "friendlist:yzz:1:2"
1947.5      "SMEMBERS" "friendlist:zpz:1:2"
1459.0      "SISMEMBER" "friendlist:hzz:1:2" "zzz"
1416.25     "SMEMBERS" "friendlist:zhz:1:2"
1389.75     "SISMEMBER" "friendlist:zzx:1:2" "zzz"
```

需要对 Redis 使用 monitor 命令，需要考虑 Redis client-output-buffer 问题

**2.3.2.2 aof-selector（基于 aof）**

<https://github.com/hongliuliao/aof-selector>

需要线上 Redis 开启了 AOF

**2.3.2.3 阿里云热点 key 发现**

热点数据的发现（在 Redis 侧进行统计）

> * 请求统计
> * 热点定位
> * 热点反馈

需要改造 Redis 内核

## 3 总体设计

> 总体设计重点是设计与折衷

### 3.1 系统架构

> 一般来说会有个简单的架构图，并配以文字对架构进行简要说明；

### 3.2 模块简介

> 架构图中如果有很多模块，需要对各个模块的功能进行简要介绍；

### 3.3 设计与折衷

> 设计与折衷是总体设计中最重要的部分；

### 3.4 潜在风险

## 4 详细设计

> 详细设计重点在“详细”

### 4.1 模块 xx

> （有了数据库 + 接口 + 流程，别的同学拿到详设文档，基本也能够搞定了）

#### 4.1.1 交互流程

> 简要的交互可用文字说明，复杂的交互建议使用流程图，交互图或其他图形进行说明

#### 4.1.2 数据库设计

#### 4.1.3 接口形式


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://meetbill.gitbook.io/butterfly-project-doc/project-handlers/chunfeng/butterfly-paoding.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
