信息增益比率
计算并给出一个信息增益比率的样例。
信息增益比率
因为信息增益中,不同的属性值越多,信息增益也越大,所以通过信息增益比率来做相关尝试。
样例
Outlook | Temperature | Humidity 湿度 | Wind 风 | Play |
---|---|---|---|---|
Sunny 晴朗 | Hot | High | False | No |
Sunny 晴朗 | Hot | High | True | No |
Overcast 低沉 | Hot | High | False | Yes |
Rainy 雨 | Mild 轻微 | High | False | Yes |
Rainy 雨 | Cool 凉 | Normal | False | Yes |
Rainy 雨 | Cool 凉 | Normal | True | No |
Overcast 低沉 | Cool 凉 | Normal | True | Yes |
Sunny 晴朗 | Mild 轻微 | High | False | No |
Sunny 晴朗 | Cool 凉 | Normal | False | Yes |
Rainy 雨 | Mild 轻微 | Normal | False | Yes |
Sunny 晴朗 | Mild 轻微 | Normal | False | Yes |
Overcast 低沉 | Mild 轻微 | High | True | Yes |
Overcast 低沉 | Hot | Normal | False | Yes |
Rainy 雨 | Mild 轻微 | High | True | No |
Outlook | Yes | No | Count of each group | Entropy 熵 |
---|---|---|---|---|
sunny 晴朗 | 2 | 3 | 5 | 0.971 |
overcast 低沉 | 4 | 0 | 4 | 0.000 |
rainy 雨 | 3 | 2 | 5 | 0.971 |
Results 结果 | Values 值 | |||
Information 信息 | 0.694 | |||
Overall entropy 总熵 | 0.940 | |||
Information gain 信息增益 | 0.247 | |||
Split information 拆分信息 | 1.577 | |||
Gain ratio 增益比率 | 0.156 |
- Overall entropy:
-5/14 * log2(5/14) - 9/14 * log2(9/14) = 0.940
- No 的数量为 5,Yes 的数量为 9,一共 14 个
- Entropy 计算:
- sunny 熵:
-2/5 * log2(2/5) - 3/5 * log2(3/5) = 0.971
- overcast 熵:
-4/4 * log2(4/4) - 0/4 * log2(0/4) = 0
- sunny 熵:
-3/5 * log2(3/5) - 2/5 * log2(2/5) = 0.971
- sunny 熵:
- Information:
5/14 * 0.971 + 4/14 * 0 + 5/14 * 0.971 = 0.694
- Information gain:
0.940 - 0.694 = 0.247
此处有精度问题 - Split Information:
-5/14 * log2(5/14) - 4/14 * log2(4/14) - 5/14 * log2(5/14) = 1.577
- Gain ratio:
0.247 / 1.577 = 0.156
公式
overall entropy:
\[Overall = -\sum_{i=1}^{n}P(T_i)log_2P(T_i)\]Entropy:
\[Entropy(A_j) = -\sum_{i=1}^{n}P(A_j|T_i)log_2P(A_j|T_i)\]Information:
\[Information(A) = \sum_{j=1}^{m}P(A_j)Entropy(A_j)\]Information gain (A):
\[InformationGain(A) = Overall - Information(A)\]Split information:
\[SplitInformation(A) = -\sum_{j=1}^{m}P(A_j)log_2P(A_j)\]Gain ratio:
\[GainRatio(A) = \frac{InformationGain(A)}{SplitInformation(A)}\]参考
- https://en.wikipedia.org/wiki/Information_gain_(decision_tree)
- https://en.wikipedia.org/wiki/Information_gain_ratio
- https://www.itheima.com/news/20210916/180526.html