Statistics
| Branch: | Tag: | Revision:

root / README @ master

History | View | Annotate | Download (4.1 kB)

1 aebfd4cc Hiroyuki Yamamoto
  SylFilter - a message filter
2 aebfd4cc Hiroyuki Yamamoto
3 5dc96a4e Hiroyuki Yamamoto
  Copyright (C) 2011-2013 Hiroyuki Yamamoto <hiro-y@kcn.ne.jp>
4 5dc96a4e Hiroyuki Yamamoto
  Copyright (C) 2011-2013 Sylpheed Development Team
5 aebfd4cc Hiroyuki Yamamoto
6 aebfd4cc Hiroyuki Yamamoto
7 4012ec30 Hiroyuki Yamamoto
About This Program
8 4012ec30 Hiroyuki Yamamoto
==================
9 4012ec30 Hiroyuki Yamamoto
10 4012ec30 Hiroyuki Yamamoto
This is SylFilter, a generic message filter library and command-line tools.
11 6525b226 Hiroyuki Yamamoto
SylFilter provides a bayesian filter which is very popular as a spam filtering
12 6525b226 Hiroyuki Yamamoto
algorithm. SylFilter is also internationalized and can be applied to any
13 6525b226 Hiroyuki Yamamoto
languages.
14 4012ec30 Hiroyuki Yamamoto
15 4012ec30 Hiroyuki Yamamoto
SylFilter library provides simple but powerful C APIs and can be used from C
16 4012ec30 Hiroyuki Yamamoto
programs.
17 4012ec30 Hiroyuki Yamamoto
18 4012ec30 Hiroyuki Yamamoto
SylFilter command-line tool can be used as a junk filter program like major
19 4012ec30 Hiroyuki Yamamoto
tools such as bogofilter and bsfilter etc.
20 4012ec30 Hiroyuki Yamamoto
21 aebfd4cc Hiroyuki Yamamoto
SylFilter is free software and distributed under the BSD-like license.
22 aebfd4cc Hiroyuki Yamamoto
See COPYING for detail.
23 aebfd4cc Hiroyuki Yamamoto
24 6525b226 Hiroyuki Yamamoto
25 aebfd4cc Hiroyuki Yamamoto
Install
26 aebfd4cc Hiroyuki Yamamoto
=======
27 aebfd4cc Hiroyuki Yamamoto
28 6047ea8d Hiroyuki Yamamoto
This program requires GLib and a key-value store engine. Install them before building.
29 6047ea8d Hiroyuki Yamamoto
Currently SQLite (enabled by default), QDBM and GDBM are supported for key-value store engine.
30 aebfd4cc Hiroyuki Yamamoto
31 aebfd4cc Hiroyuki Yamamoto
  $ ./configure
32 6047ea8d Hiroyuki Yamamoto
  ( $ ./configure --disable-sqlite --enable-qdbm (enables QDBM) )
33 6047ea8d Hiroyuki Yamamoto
  ( $ ./configure --disable-sqlite --enable-gdbm (enables GDBM) )
34 6047ea8d Hiroyuki Yamamoto
35 aebfd4cc Hiroyuki Yamamoto
  $ make
36 aebfd4cc Hiroyuki Yamamoto
  $ sudo make install
37 aebfd4cc Hiroyuki Yamamoto
38 2d9cf61c Hiroyuki Yamamoto
By default, built-in subset of libsylph is used for message parsing.
39 2d9cf61c Hiroyuki Yamamoto
To use libsylph installed on your system, specify --with-libsylph option.
40 2d9cf61c Hiroyuki Yamamoto
41 2d9cf61c Hiroyuki Yamamoto
  ./configure --with-libsylph=builtin     use built-in LibSylph (default)
42 2d9cf61c Hiroyuki Yamamoto
  ./configure --with-libsylph=standalone  use standalone version of LibSylph
43 2d9cf61c Hiroyuki Yamamoto
  ./configure --with-libsylph=sylpheed    use Sylpheed's LibSylph
44 2d9cf61c Hiroyuki Yamamoto
45 2d9cf61c Hiroyuki Yamamoto
If libsylph is installed on non-standard location, also use
46 2d9cf61c Hiroyuki Yamamoto
--with-libsylph-dir option.
47 2d9cf61c Hiroyuki Yamamoto
48 6525b226 Hiroyuki Yamamoto
49 aebfd4cc Hiroyuki Yamamoto
Usage
50 aebfd4cc Hiroyuki Yamamoto
=====
51 aebfd4cc Hiroyuki Yamamoto
52 d6331f26 Hiroyuki Yamamoto
SylFilter accepts rfc822 message files (for example: MH, Maildir, eml).
53 d6331f26 Hiroyuki Yamamoto
54 aebfd4cc Hiroyuki Yamamoto
Learning junk mails
55 aebfd4cc Hiroyuki Yamamoto
56 aebfd4cc Hiroyuki Yamamoto
  $ sylfilter -j ~/Mail/junk/*
57 aebfd4cc Hiroyuki Yamamoto
58 aebfd4cc Hiroyuki Yamamoto
Learning clean mails
59 aebfd4cc Hiroyuki Yamamoto
60 aebfd4cc Hiroyuki Yamamoto
  $ sylfilter -c ~/Mail/clean/*
61 aebfd4cc Hiroyuki Yamamoto
62 aebfd4cc Hiroyuki Yamamoto
Classifying mails
63 aebfd4cc Hiroyuki Yamamoto
64 aebfd4cc Hiroyuki Yamamoto
  $ sylfilter ~/Mail/inbox/1234
65 aebfd4cc Hiroyuki Yamamoto
66 a555b3db Hiroyuki Yamamoto
Show learn status
67 a555b3db Hiroyuki Yamamoto
68 a555b3db Hiroyuki Yamamoto
  $ sylfilter -s
69 a555b3db Hiroyuki Yamamoto
70 6047ea8d Hiroyuki Yamamoto
Show learn status and all learned tokens
71 6047ea8d Hiroyuki Yamamoto
72 6047ea8d Hiroyuki Yamamoto
  $ sylfilter -s -v
73 6047ea8d Hiroyuki Yamamoto
74 6047ea8d Hiroyuki Yamamoto
Show help message
75 6047ea8d Hiroyuki Yamamoto
76 6047ea8d Hiroyuki Yamamoto
  $ sylfilter -h
77 6899d5dd Hiroyuki Yamamoto
  $ sylfilter --help
78 6047ea8d Hiroyuki Yamamoto
79 a555b3db Hiroyuki Yamamoto
80 a555b3db Hiroyuki Yamamoto
Usage with Sylpheed
81 a555b3db Hiroyuki Yamamoto
===================
82 a555b3db Hiroyuki Yamamoto
83 a555b3db Hiroyuki Yamamoto
On 'Common preferences... - Junk mail - Learning command:', manually set
84 a555b3db Hiroyuki Yamamoto
each command as following:
85 a555b3db Hiroyuki Yamamoto
86 a555b3db Hiroyuki Yamamoto
Junk                : sylfilter -j
87 a555b3db Hiroyuki Yamamoto
Not Junk            : sylfilter -c
88 a555b3db Hiroyuki Yamamoto
Classifying command : sylfilter
89 a555b3db Hiroyuki Yamamoto
90 6525b226 Hiroyuki Yamamoto
91 aebfd4cc Hiroyuki Yamamoto
Other information
92 aebfd4cc Hiroyuki Yamamoto
=================
93 aebfd4cc Hiroyuki Yamamoto
94 aebfd4cc Hiroyuki Yamamoto
Token database files are created under ~/.sylfilter/ .
95 a555b3db Hiroyuki Yamamoto
(On Windows: %APPDATA%\SylFilter\)
96 6525b226 Hiroyuki Yamamoto
97 6525b226 Hiroyuki Yamamoto
98 6525b226 Hiroyuki Yamamoto
Library Design
99 6525b226 Hiroyuki Yamamoto
==============
100 6525b226 Hiroyuki Yamamoto
101 6525b226 Hiroyuki Yamamoto
The filtering of SylFilter consists of a set of simple filter modules.
102 6525b226 Hiroyuki Yamamoto
103 6525b226 Hiroyuki Yamamoto
         (Learning)                   (Classifying)
104 6525b226 Hiroyuki Yamamoto
105 6525b226 Hiroyuki Yamamoto
        rfc822 message                rfc822 message
106 6525b226 Hiroyuki Yamamoto
              |                             |
107 6525b226 Hiroyuki Yamamoto
   [ text content filter ]       [ text content filter ]
108 6525b226 Hiroyuki Yamamoto
              |                             |
109 6525b226 Hiroyuki Yamamoto
  [ word separator filter ]       [ blacklist filter ]  --> spam
110 6525b226 Hiroyuki Yamamoto
              |                             |
111 6525b226 Hiroyuki Yamamoto
      [ n-gram filter ]         [ word separator filter ]
112 6525b226 Hiroyuki Yamamoto
              |                             |
113 6525b226 Hiroyuki Yamamoto
     [ learning filter ]            [ n-gram filter ]
114 6525b226 Hiroyuki Yamamoto
                                            |
115 6525b226 Hiroyuki Yamamoto
                                   [ bayesian filter ]  --> spam
116 2d9cf61c Hiroyuki Yamamoto
                                            |
117 2d9cf61c Hiroyuki Yamamoto
                                         non-spam
118 6525b226 Hiroyuki Yamamoto
119 2d9cf61c Hiroyuki Yamamoto
The library users can create arbitrary combination of provided filters.
120 6525b226 Hiroyuki Yamamoto
Users also can add their original custom filters.
121 6525b226 Hiroyuki Yamamoto
122 6525b226 Hiroyuki Yamamoto
Please read the source of src/sylfilter.c for library usage.
123 6525b226 Hiroyuki Yamamoto
124 6525b226 Hiroyuki Yamamoto
125 6525b226 Hiroyuki Yamamoto
Algorithm of Bayesian Filter
126 6525b226 Hiroyuki Yamamoto
============================
127 6525b226 Hiroyuki Yamamoto
128 364c84ba Hiroyuki Yamamoto
SylFilter implements Fisher's method which is described by Gary Robinson.
129 364c84ba Hiroyuki Yamamoto
It is also implemented by bogofilter and bsfilter.
130 364c84ba Hiroyuki Yamamoto
131 364c84ba Hiroyuki Yamamoto
  http://radio-weblogs.com/0101454/stories/2002/09/16/spamDetection.html
132 364c84ba Hiroyuki Yamamoto
  http://www.bgl.nu/bogofilter/fisher.html
133 364c84ba Hiroyuki Yamamoto
134 364c84ba Hiroyuki Yamamoto
SylFilter initially implemented the customized version of algorithm
135 6525b226 Hiroyuki Yamamoto
described by Paul Graham.
136 6525b226 Hiroyuki Yamamoto
137 6525b226 Hiroyuki Yamamoto
  http://paulgraham.com/spam.html
138 6525b226 Hiroyuki Yamamoto
  http://paulgraham.com/better.html
139 6525b226 Hiroyuki Yamamoto
140 364c84ba Hiroyuki Yamamoto
Robinson-Fisher method is used by default.
141 364c84ba Hiroyuki Yamamoto
142 6525b226 Hiroyuki Yamamoto
Basically the algorithm can be described as follows:
143 6525b226 Hiroyuki Yamamoto
144 6525b226 Hiroyuki Yamamoto
1. Counts the number of occurrences of words in a spam and non-spam.
145 6525b226 Hiroyuki Yamamoto
2. Calculates the probability that a message containing it is a spam for
146 6525b226 Hiroyuki Yamamoto
   each words in a message.
147 364c84ba Hiroyuki Yamamoto
3. Calculates the combined probability using important words in the message.
148 364c84ba Hiroyuki Yamamoto
149 364c84ba Hiroyuki Yamamoto
See the above Web pages for the detail.