连续属性决策树所建立的垃圾邮件识别器的稳定性研究
2005-02-15分类号:F224
【部门】中国人民大学统计学院 中国人民大学统计学系
【摘要】Avoiding spam mial is one of the most critical problem in Internet technology,finding the most important attribute or the attribute combination to identify which email is normal and which email is spam mail,is the bottleneck of discriminate of the spam.Recent years,decision tress is popular used for excellent with good expression and capable to output rules,and then becomes the core technique in predicting spam mail.However,many famous decision trees such as C4.5 and CART is not very robust,that make the output is not stable which distrubing the construction of the identifying classification.In this paper,we studied the robust of CART algorithm,point out the robust problem when using the decision tree classifier on identifying Spam from normal email with interval attribute,then we try to using BAGGING algorithm to gain more robust model,an at the same time increase the performance of the initial models.
【关键词】垃圾邮件 决策树 BAGGING
【基金】
【所属期刊栏目】统计研究
文献传递