SylFilter - a message filter Copyright (C) 2011-2013 Hiroyuki Yamamoto Copyright (C) 2011-2013 Sylpheed Development Team About This Program ================== This is SylFilter, a generic message filter library and command-line tools. SylFilter provides a bayesian filter which is very popular as a spam filtering algorithm. SylFilter is also internationalized and can be applied to any languages. SylFilter library provides simple but powerful C APIs and can be used from C programs. SylFilter command-line tool can be used as a junk filter program like major tools such as bogofilter and bsfilter etc. SylFilter is free software and distributed under the BSD-like license. See COPYING for detail. Install ======= This program requires GLib and a key-value store engine. Install them before building. Currently SQLite (enabled by default), QDBM and GDBM are supported for key-value store engine. $ ./configure ( $ ./configure --disable-sqlite --enable-qdbm (enables QDBM) ) ( $ ./configure --disable-sqlite --enable-gdbm (enables GDBM) ) $ make $ sudo make install By default, built-in subset of libsylph is used for message parsing. To use libsylph installed on your system, specify --with-libsylph option. ./configure --with-libsylph=builtin use built-in LibSylph (default) ./configure --with-libsylph=standalone use standalone version of LibSylph ./configure --with-libsylph=sylpheed use Sylpheed's LibSylph If libsylph is installed on non-standard location, also use --with-libsylph-dir option. Usage ===== SylFilter accepts rfc822 message files (for example: MH, Maildir, eml). Learning junk mails $ sylfilter -j ~/Mail/junk/* Learning clean mails $ sylfilter -c ~/Mail/clean/* Classifying mails $ sylfilter ~/Mail/inbox/1234 Show learn status $ sylfilter -s Show learn status and all learned tokens $ sylfilter -s -v Show help message $ sylfilter -h $ sylfilter --help Usage with Sylpheed =================== On 'Common preferences... - Junk mail - Learning command:', manually set each command as following: Junk : sylfilter -j Not Junk : sylfilter -c Classifying command : sylfilter Other information ================= Token database files are created under ~/.sylfilter/ . (On Windows: %APPDATA%\SylFilter\) Library Design ============== The filtering of SylFilter consists of a set of simple filter modules. (Learning) (Classifying) rfc822 message rfc822 message | | [ text content filter ] [ text content filter ] | | [ word separator filter ] [ blacklist filter ] --> spam | | [ n-gram filter ] [ word separator filter ] | | [ learning filter ] [ n-gram filter ] | [ bayesian filter ] --> spam | non-spam The library users can create arbitrary combination of provided filters. Users also can add their original custom filters. Please read the source of src/sylfilter.c for library usage. Algorithm of Bayesian Filter ============================ SylFilter implements Fisher's method which is described by Gary Robinson. It is also implemented by bogofilter and bsfilter. http://radio-weblogs.com/0101454/stories/2002/09/16/spamDetection.html http://www.bgl.nu/bogofilter/fisher.html SylFilter initially implemented the customized version of algorithm described by Paul Graham. http://paulgraham.com/spam.html http://paulgraham.com/better.html Robinson-Fisher method is used by default. Basically the algorithm can be described as follows: 1. Counts the number of occurrences of words in a spam and non-spam. 2. Calculates the probability that a message containing it is a spam for each words in a message. 3. Calculates the combined probability using important words in the message. See the above Web pages for the detail.