Abstract:
|
In this work, a set of comparative experiments for the
problem of automatically filtering unwanted electronic mail
messages are performed on two public corpora: PU1 and
LingSpam. Several variants of the AdaBoost algorithm with
confidence-rated predictions (Schapire et al., 99) have been
applied, which differ in the complexity of the base learners
considered. Two main conclusions can be drawn from our
experiments: a) The boosting--based methods clearly
outperform the other learning algorithms results published
on the two evaluation corpora, achieving very high levels of
the F_1 measure; b) Increasing the complexity of the base
learners allows to obtain better high-precision
classifiers, which is a very important issue when
misclassification costs are
considered. |