Statistics
| Branch: | Tag: | Revision:

root / README @ 5dc96a4e

History | View | Annotate | Download (4.12 KB)

1
  SylFilter - a message filter
2

    
3
  Copyright (C) 2011-2013 Hiroyuki Yamamoto <hiro-y@kcn.ne.jp>
4
  Copyright (C) 2011-2013 Sylpheed Development Team
5

    
6

    
7
About This Program
8
==================
9

    
10
This is SylFilter, a generic message filter library and command-line tools.
11
SylFilter provides a bayesian filter which is very popular as a spam filtering
12
algorithm. SylFilter is also internationalized and can be applied to any
13
languages.
14

    
15
SylFilter library provides simple but powerful C APIs and can be used from C
16
programs.
17

    
18
SylFilter command-line tool can be used as a junk filter program like major
19
tools such as bogofilter and bsfilter etc.
20

    
21
SylFilter is free software and distributed under the BSD-like license.
22
See COPYING for detail.
23

    
24

    
25
Install
26
=======
27

    
28
This program requires GLib and a key-value store engine. Install them before building.
29
Currently SQLite (enabled by default), QDBM and GDBM are supported for key-value store engine.
30

    
31
  $ ./configure
32
  ( $ ./configure --disable-sqlite --enable-qdbm (enables QDBM) )
33
  ( $ ./configure --disable-sqlite --enable-gdbm (enables GDBM) )
34

    
35
  $ make
36
  $ sudo make install
37

    
38
By default, built-in subset of libsylph is used for message parsing.
39
To use libsylph installed on your system, specify --with-libsylph option.
40

    
41
  ./configure --with-libsylph=builtin     use built-in LibSylph (default)
42
  ./configure --with-libsylph=standalone  use standalone version of LibSylph
43
  ./configure --with-libsylph=sylpheed    use Sylpheed's LibSylph
44

    
45
If libsylph is installed on non-standard location, also use
46
--with-libsylph-dir option.
47

    
48

    
49
Usage
50
=====
51

    
52
SylFilter accepts rfc822 message files (for example: MH, Maildir, eml).
53

    
54
Learning junk mails
55

    
56
  $ sylfilter -j ~/Mail/junk/*
57

    
58
Learning clean mails
59

    
60
  $ sylfilter -c ~/Mail/clean/*
61

    
62
Classifying mails
63

    
64
  $ sylfilter ~/Mail/inbox/1234
65

    
66
Show learn status
67

    
68
  $ sylfilter -s
69

    
70
Show learn status and all learned tokens
71

    
72
  $ sylfilter -s -v
73

    
74
Show help message
75

    
76
  $ sylfilter -h
77
  $ sylfilter --help
78

    
79

    
80
Usage with Sylpheed
81
===================
82

    
83
On 'Common preferences... - Junk mail - Learning command:', manually set
84
each command as following:
85

    
86
Junk                : sylfilter -j
87
Not Junk            : sylfilter -c
88
Classifying command : sylfilter
89

    
90

    
91
Other information
92
=================
93

    
94
Token database files are created under ~/.sylfilter/ .
95
(On Windows: %APPDATA%\SylFilter\)
96

    
97

    
98
Library Design
99
==============
100

    
101
The filtering of SylFilter consists of a set of simple filter modules.
102

    
103
         (Learning)                   (Classifying)
104

    
105
        rfc822 message                rfc822 message
106
              |                             |
107
   [ text content filter ]       [ text content filter ]
108
              |                             |
109
  [ word separator filter ]       [ blacklist filter ]  --> spam
110
              |                             |
111
      [ n-gram filter ]         [ word separator filter ]
112
              |                             |
113
     [ learning filter ]            [ n-gram filter ]
114
                                            |
115
                                   [ bayesian filter ]  --> spam
116
                                            |
117
                                         non-spam
118

    
119
The library users can create arbitrary combination of provided filters.
120
Users also can add their original custom filters.
121

    
122
Please read the source of src/sylfilter.c for library usage.
123

    
124

    
125
Algorithm of Bayesian Filter
126
============================
127

    
128
SylFilter implements Fisher's method which is described by Gary Robinson.
129
It is also implemented by bogofilter and bsfilter.
130

    
131
  http://radio-weblogs.com/0101454/stories/2002/09/16/spamDetection.html
132
  http://www.bgl.nu/bogofilter/fisher.html
133

    
134
SylFilter initially implemented the customized version of algorithm
135
described by Paul Graham.
136

    
137
  http://paulgraham.com/spam.html
138
  http://paulgraham.com/better.html
139

    
140
Robinson-Fisher method is used by default.
141

    
142
Basically the algorithm can be described as follows:
143

    
144
1. Counts the number of occurrences of words in a spam and non-spam.
145
2. Calculates the probability that a message containing it is a spam for
146
   each words in a message.
147
3. Calculates the combined probability using important words in the message.
148

    
149
See the above Web pages for the detail.