Statistics
| Branch: | Tag: | Revision:

root / README @ master

History | View | Annotate | Download (4.12 KB)

1 aebfd4cc Hiroyuki Yamamoto
  SylFilter - a message filter
2
3 5dc96a4e Hiroyuki Yamamoto
  Copyright (C) 2011-2013 Hiroyuki Yamamoto <hiro-y@kcn.ne.jp>
4
  Copyright (C) 2011-2013 Sylpheed Development Team
5 aebfd4cc Hiroyuki Yamamoto
6
7 4012ec30 Hiroyuki Yamamoto
About This Program
8
==================
9
10
This is SylFilter, a generic message filter library and command-line tools.
11 6525b226 Hiroyuki Yamamoto
SylFilter provides a bayesian filter which is very popular as a spam filtering
12
algorithm. SylFilter is also internationalized and can be applied to any
13
languages.
14 4012ec30 Hiroyuki Yamamoto
15
SylFilter library provides simple but powerful C APIs and can be used from C
16
programs.
17
18
SylFilter command-line tool can be used as a junk filter program like major
19
tools such as bogofilter and bsfilter etc.
20
21 aebfd4cc Hiroyuki Yamamoto
SylFilter is free software and distributed under the BSD-like license.
22
See COPYING for detail.
23
24 6525b226 Hiroyuki Yamamoto
25 aebfd4cc Hiroyuki Yamamoto
Install
26
=======
27
28 6047ea8d Hiroyuki Yamamoto
This program requires GLib and a key-value store engine. Install them before building.
29
Currently SQLite (enabled by default), QDBM and GDBM are supported for key-value store engine.
30 aebfd4cc Hiroyuki Yamamoto
31
  $ ./configure
32 6047ea8d Hiroyuki Yamamoto
  ( $ ./configure --disable-sqlite --enable-qdbm (enables QDBM) )
33
  ( $ ./configure --disable-sqlite --enable-gdbm (enables GDBM) )
34
35 aebfd4cc Hiroyuki Yamamoto
  $ make
36
  $ sudo make install
37
38 2d9cf61c Hiroyuki Yamamoto
By default, built-in subset of libsylph is used for message parsing.
39
To use libsylph installed on your system, specify --with-libsylph option.
40
41
  ./configure --with-libsylph=builtin     use built-in LibSylph (default)
42
  ./configure --with-libsylph=standalone  use standalone version of LibSylph
43
  ./configure --with-libsylph=sylpheed    use Sylpheed's LibSylph
44
45
If libsylph is installed on non-standard location, also use
46
--with-libsylph-dir option.
47
48 6525b226 Hiroyuki Yamamoto
49 aebfd4cc Hiroyuki Yamamoto
Usage
50
=====
51
52 d6331f26 Hiroyuki Yamamoto
SylFilter accepts rfc822 message files (for example: MH, Maildir, eml).
53
54 aebfd4cc Hiroyuki Yamamoto
Learning junk mails
55
56
  $ sylfilter -j ~/Mail/junk/*
57
58
Learning clean mails
59
60
  $ sylfilter -c ~/Mail/clean/*
61
62
Classifying mails
63
64
  $ sylfilter ~/Mail/inbox/1234
65
66 a555b3db Hiroyuki Yamamoto
Show learn status
67
68
  $ sylfilter -s
69
70 6047ea8d Hiroyuki Yamamoto
Show learn status and all learned tokens
71
72
  $ sylfilter -s -v
73
74
Show help message
75
76
  $ sylfilter -h
77 6899d5dd Hiroyuki Yamamoto
  $ sylfilter --help
78 6047ea8d Hiroyuki Yamamoto
79 a555b3db Hiroyuki Yamamoto
80
Usage with Sylpheed
81
===================
82
83
On 'Common preferences... - Junk mail - Learning command:', manually set
84
each command as following:
85
86
Junk                : sylfilter -j
87
Not Junk            : sylfilter -c
88
Classifying command : sylfilter
89
90 6525b226 Hiroyuki Yamamoto
91 aebfd4cc Hiroyuki Yamamoto
Other information
92
=================
93
94
Token database files are created under ~/.sylfilter/ .
95 a555b3db Hiroyuki Yamamoto
(On Windows: %APPDATA%\SylFilter\)
96 6525b226 Hiroyuki Yamamoto
97
98
Library Design
99
==============
100
101
The filtering of SylFilter consists of a set of simple filter modules.
102
103
         (Learning)                   (Classifying)
104
105
        rfc822 message                rfc822 message
106
              |                             |
107
   [ text content filter ]       [ text content filter ]
108
              |                             |
109
  [ word separator filter ]       [ blacklist filter ]  --> spam
110
              |                             |
111
      [ n-gram filter ]         [ word separator filter ]
112
              |                             |
113
     [ learning filter ]            [ n-gram filter ]
114
                                            |
115
                                   [ bayesian filter ]  --> spam
116 2d9cf61c Hiroyuki Yamamoto
                                            |
117
                                         non-spam
118 6525b226 Hiroyuki Yamamoto
119 2d9cf61c Hiroyuki Yamamoto
The library users can create arbitrary combination of provided filters.
120 6525b226 Hiroyuki Yamamoto
Users also can add their original custom filters.
121
122
Please read the source of src/sylfilter.c for library usage.
123
124
125
Algorithm of Bayesian Filter
126
============================
127
128 364c84ba Hiroyuki Yamamoto
SylFilter implements Fisher's method which is described by Gary Robinson.
129
It is also implemented by bogofilter and bsfilter.
130
131
  http://radio-weblogs.com/0101454/stories/2002/09/16/spamDetection.html
132
  http://www.bgl.nu/bogofilter/fisher.html
133
134
SylFilter initially implemented the customized version of algorithm
135 6525b226 Hiroyuki Yamamoto
described by Paul Graham.
136
137
  http://paulgraham.com/spam.html
138
  http://paulgraham.com/better.html
139
140 364c84ba Hiroyuki Yamamoto
Robinson-Fisher method is used by default.
141
142 6525b226 Hiroyuki Yamamoto
Basically the algorithm can be described as follows:
143
144
1. Counts the number of occurrences of words in a spam and non-spam.
145
2. Calculates the probability that a message containing it is a spam for
146
   each words in a message.
147 364c84ba Hiroyuki Yamamoto
3. Calculates the combined probability using important words in the message.
148
149
See the above Web pages for the detail.