SylFilter is a generic message filter library and command-line tools. SylFilter provides a bayesian filter which is very popular as an algorithm for spam filtering. SylFilter also supports multilingual and can be applied to any languages. SylFilter is implemented with C language and runs fast with small resources.
SylFilter library provides simple but powerful C APIs and can be used from C programs. Library users can create arbitrary combination of provided filters and/or original custom ones.
SylFilter command-line tool can be used as a junk filter program like major tools such as bogofilter, bsfilter etc.
SylFilter already can be applied for actual use, though still in its development stage. Learning data of the older versions might be unavailable in the future version because of its changes.
Main features of SylFilter
- Compilation error because of missing config.h inclusion was fixed.
- Compilation error with newer GLib was fixed.
- External LibSylph (standalone or Sylpheed) can be used instead of built-in one.
- Received header is also used for classification now.
- The verbose mode has several levels now.
- Sylfilter command: an option to specify database directory was added.
- Sylfilter command: -V (print version) option was added.
- The calculation of combined probability of Robinson-Fisher method was fixed. The version 0.4 wrongly applied the total number of words in a message without excluding the words within the minimum deviation. This fix will introduce more accurate filtering and a small speedup.
- Sylfilter command: options to specify Robinson-Fisher parameters were added.
- The bug that a multibyte filename could not be passed on Windows was fixed.
- Robinson-Fisher method was implemented, and it is used by default. This will improve the accuracy of filtering. Previous Paul/Naive bayes method still can be used with -m option.
- Sylfilter command: an option to specify filtering method (-m) was added.
- Sylfilter command: -j and -C / -c and -J option can be specified at the same time.
- Sylfilter command: -B (no-bias for clean messages) option was added.
- GDBM support was added (exclusive with QDBM).
- APIs for global configuration were added.
- CJK kanji parts longer than four-letters are split up with 4-gram. This will mainly improve the detection of Chinese spam.
- Speedup learning (data incompatible to 0.1).
- Support Windows.
- Initial release.
- Supports QDBM and SQLite.
- Provides libsylfiter library and sylfilter command-line tool.
- Initial implementation of bayesian filter.
sylfilter-0.8.tar.gz (15 Mar 2013)
sylfilter-0.7.tar.gz (13 Jan 2012)
sylfilter-0.6.tar.gz (15 Nov 2011)
sylfilter-0.5.tar.gz (13 Oct 2011)
sylfilter-0.4.tar.gz (6 Oct 2011)
sylfilter-0.3.tar.gz (30 Sep 2011)
sylfilter-0.2.tar.gz (13 Sep 2011)
sylfilter-0.1.tar.gz (5 Sep 2011)
sylfilter-0.8.zip (15 Mar 2012)
sylfilter-0.7.zip (13 Jan 2012)
sylfilter-0.6.zip (15 Nov 2011)
sylfilter-0.5.zip (13 Oct 2011)
sylfilter-0.4.zip (6 Oct 2011)
sylfilter-0.3.zip (30 Sep 2011)
sylfilter-0.2.zip (13 Sep 2011)
Initial copy: git clone http://floss.sraoss.jp/~yamamoto/sylfilter.git Update: cd sylfilter git pull
SylFilter accepts rfc822 message files (for example: MH, Maildir, eml).
- Learn junk (spam) messages in a MH/Maildir folder
$ sylfilter -j ~/Mail/junk/*
- Learn clean (non-spam) messages in a MH/Maildir folder
$ sylfilter -c ~/Mail/clean/*
- Classify a message
$ sylfilter ~/Mail/inbox/1234
- Show the learn status of database
$ sylfilter -s
- Show the learn status of database and all learned tokens
$ sylfilter -s -v
- Show help message
$ sylfilter -h $ sylfilter --help
Usage with Sylpheed
On 'Common preferences... - Junk mail - Learning command:', set each command as following. Since 3.2beta4, sylfilter can be selected from the presets.
- Junk : sylfilter -j
- Not Junk: sylfilter -c
- Classifying command : sylfilter
Bug tracking system
SylFilter bug tracking system is made public. Please use it for bug report or feature requests.
Algorithm of Bayesian Filter
SylFilter implements Fisher's method which is described by Gary Robinson. The same algorithm is also implemented by bogofilter and bsfilter.
- Gary Robinson: Spam Detection
- Bogofilter Calculations: Comparing Geometric Mean with Fisher's Method for Combining Probabilities
SylFilter initially implemented the customized version of the algorithm described by Paul Graham.
Robinson-Fisher method is used by default.
Basically the algorithm can be described as follows:
- Counts the number of occurrences of words in a spam and non-spam.
- Calculates the probability that a message containing it is a spam for each words in a message.
- Calculates the combined probability using important words in the message.
See the above Web pages for the detail.
SylFilter is a free software distributed under the BSD-like license. You can freely use, modify and redistribute it under the license.
Copyright (C) 2011-2013 Hiroyuki Yamamoto Copyright (C) 2011-2013 Sylpheed Development Team All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the Sylpheed Development Team nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.