算法-变异系数

1 统计学解释

变异系数:变异系数(coefficient of variation),又称离散系数,是一个衡量数据离散程度的、没有量纲的统计量。其值为标准差与平均值之比。

变异系数的计算公式为:

CV=标准差/均值

2 实现

2.1 依赖 numpy 实现

import numpy
def coefficient_of_variation(data):
    mean=numpy.mean(data) #计算平均值
    std=numpy.std(data,ddof=0) #计算标准差
    cv=std/mean
    return cv
data_test_1=[1,2,3,4,5,6,7]
data_test_2=[1,1,1,4,7,7,7]
print('CV_1',coefficient_of_variation(data_test_1))
print('CV_2',coefficient_of_variation(data_test_2))

结果

CV_1 0.5
CV_2 0.6943650748294136

2.2 简单实现

# Calculate the Arithmetic mean.
def mean(values):
    """
    mean

    Args:
        values: list, eg:[1,2,3,4]
    """
    if not len(values):
        return 0

    arithmetic_mean = sum(values) / float(len(values))
    return arithmetic_mean


# Function to calculate the Standard Deviation.
def standard_deviation(values):
    """
    standard_deviation

    Args:
        values: list, eg:[1,2,3,4]
    """
    if not len(values):
        return 0

    value_mean = mean(values)
    value_sum =0
    for value in values:
        value_sum +=(value-value_mean)**2;
    x = (value_sum / float(len(values))) ** 0.5
    return x


# Function to calculate the Coefficient of Variation
def coefficient_of_variation(values):
    """
    standard_deviation

    Args:
        values: list, eg:[1,2,3,4]
    """
    if not len(values):
        return 0

    value_mean=mean(values)
    coefficient_of_variation = (standard_deviation(values) / float(value_mean))
    return coefficient_of_variation

if __name__ == "__main__":
    assert coefficient_of_variation([1, 1, 1]) == 0

    values=[1,2,3,4,5,6,7]
    assert mean(values) == 4
    assert standard_deviation(values) == 2
    assert coefficient_of_variation(values) == 0.5

    values=[1,1,1,4,7,7,7]
    print coefficient_of_variation(values)

3 实际应用

变异系数通常用来比较两组量纲差异明显的数据的离散程度,例如两个粉丝数差距显著的社交媒体账号推文点赞数的离散程度。

Last updated