]>
diplodocus.org Git - minc/blob - minc
7 B<minc> - Incorporate mail from a maildir into mh folders.
15 B<minc> is a program for incorporating mail from a maildir to a mh
16 folder hierarchy. It takes mail from a maildir folder (not a maildir
17 folder hierarchy), optionally checks for spam with a user-defined
18 spam-checking function, and optionally filters mail into separate mh
21 The filtering is quite sophisticated, as it is done using real Perl
22 matching (m//) commands.
29 require 'sysexits.ph';
33 use Fcntl
qw(O_WRONLY O_EXCL O_CREAT);
37 use Log
::Dispatch
::File
;
38 use POSIX
qw(strftime WEXITSTATUS WIFEXITED);
40 # If a filter set's header is $MAGIC_TO_TOKEN, that set is compared
41 # against headers matching this regex (taken from procmail).
42 my $MAGIC_TO_REGEX = '^((Original-)?(Resent-)?(To|Cc|Bcc)|(X-Envelope |Apparently(-Resent)?)-To)';
43 my $MAGIC_TO_TOKEN = ' TO';
51 Dump (using Data::Dumper) the FILTERS list and exit. This is useful
52 for testing the syntax of .mincfilter.
56 Filter only, then exit. This is useful after running B<minc -s>.
64 Dry run; do not actually incorporate the mail, but log and report to
65 stdout/stderr as normal.
69 Process SPAM only, then exit, leaving all non-spam messages in the
85 our $opt_s; # ;; # stupid cperl-mode
87 if (not getopts
('dfhns')) {
94 print("Sorry bub, no help.\n");
102 } elsif ($opt_s) { # ))){ # stupid cperl-mode
112 Where configuration files (.mincfilter) are found. Also,
113 $HOME/Maildir is used for the maildir if MAILDIR is not set.
117 Where mail is delivered.
123 use Env
qw(HOME MAILDIR);
126 die("HOME environment variable must be set.\n");
129 $MAILDIR = "$HOME/Maildir";
136 =item $HOME/.mincfilter
138 This file is Perl code (included via the 'require' directive) which is
139 expected to define the FILTERS list.
141 =item $HOME/.mincspam
143 If this file exists, B<minc> will include it with the expectation that it
144 will define a B<is_spam> function. This function takes a message
145 filename as an argument and returns 1 if the message is spam, else 0.
146 If this file does not exist, B<minc> will define a simple function that
149 One of B<minc>'s global variables is available to the user-defined
150 is_spam function: $run. This boolean should be honored; is_spam
151 should only take real action (i.e. removing or creating files, running
152 external programs, etc.) if $run is non-zero.
154 XXX: need more details about the spam-handling process; for now read
157 =item `mhpath +`/logs/minc.log
159 Where B<minc> logs what it does, unless in -n mode.
161 =item `mhpath +`/logs/dryrun.log
163 Where B<minc> logs what it would do; used in -n mode.
165 =item `mhpath +`/.minc.context
167 B<minc> uses this file for context (i.e. current folder) instead of
168 `mhpath +`/context. This allows you some amount of freedom to use mh
169 while B<minc> is running. For example, you can changed folders without
170 causing a message to be stored in the wrong folder. Care must still
171 be taken, however, about the unseen sequence; if you change it
172 (whether via show, or mark, or anything else) while in the same folder
173 as B<minc>, it is likely the new message B<minc> stores will not end up
181 require "$HOME/.mincfilter";
183 if (-f
"$HOME/.mincspam") {
184 require "$HOME/.mincspam";
198 $logfile = $mh . '/logs/minc.log';
200 $logfile = $mh . '/logs/dryrun.log';
203 $ENV{"MHCONTEXT"} = $mh . '/.minc.context';
206 ###############################################################################
209 # debuglevels for the logger
210 use constant LOG_DEBUGLEVEL
=> 'info';
211 use constant SCREEN_DEBUGLEVEL
=> 'debug';
213 # characters used to wrap around the id field in the log so I can more
214 # easily parse the msg flow
215 use constant DEBUGCHAR
=> '%';
216 use constant INFOCHAR
=> '=';
217 use constant INCOMINGCHAR
=> '<';
218 use constant SAVECHAR
=> '>';
219 use constant WARNCHAR
=> 'W';
220 use constant ERRORCHAR
=> 'E';
222 my $logger = new Log
::Dispatch
;
223 $logger->add(new Log
::Dispatch
::File
(name
=>'logfile',
228 # log alias, handles getting passed vars that are undef
232 my $timestamp = strftime
('%b %e %H:%M:%S', localtime);
234 foreach my $part (@_) {
235 if (defined($part)) {
239 # no newlines in the log message, thanks
241 $msg = sprintf("%s %s%s %s\n", $timestamp, $act, $act, $msg);
242 $logger->log(level
=>$level, message
=>$msg);
244 if ($act eq SAVECHAR
) {
245 $logger->log(level
=>$level, message
=>"\n");
249 sub logsave
{ mylog
('notice', SAVECHAR
, @_); }
250 sub loginfo
{ mylog
('info', INFOCHAR
, @_); }
251 sub logdebug
{ mylog
('debug', DEBUGCHAR
, @_); }
252 sub logwarn
{ mylog
('warning', WARNCHAR
, @_); }
253 sub logerr
{ mylog
('error', ERRORCHAR
, @_); }
256 my ($text, @contents) = @_;
260 $last = $contents[-1];
265 mylog
('info', INCOMINGCHAR
, $text, $last);
271 # For an explanation of the %headers structure, see the
272 # get_headers function below.
273 logincoming
('From: ', @{$headers{'return-path'}});
274 logincoming
('To: ', @{$headers{'to'}});
275 logincoming
('Subject: ', @{$headers{'subject'}});
276 logincoming
('Message-Id: ', @{$headers{'message-id'}});
280 ###############################################################################
284 printf(STDERR
'%s:%s', __FILE__
, __LINE__
);
298 print(STDERR
": $!\n");
313 my $mhfolder = shift;
318 $folder = $mh . '/' . $mhfolder;
321 if (not -d
$folder) {
322 foreach $component (split('/', $folder)) {
323 if (defined($component) and length($component) >= 1) {
324 $target = $target . '/' . $component;
325 if (-d
$target or mkdir($target)) {
329 "Failed to create +$mhfolder ($component)");
340 if (not opendir(DIR
, $dir)) {
341 err
(&EX_OSERR
, "Failed opendir($dir)");
344 # Initialize $! to 0 (success) because stupid stupid Perl provides
345 # no way to distinguish an error from an empty directory; that
346 # means setting $! to 0 and testing it afterwards is the only way
347 # to detect an error. Real Programmers don't handle errors,
351 ($_ ne '.' and $_ ne '..')
352 and $_ = "$MAILDIR/new/$_"
356 err
(&EX_OSERR
, "Failed readdir($dir)");
364 sub get_highest_msgnum
{
365 my $mhfolder = shift;
371 $dir = "$mh/$mhfolder";
372 if (not opendir(DIR
, $dir)) {
373 err
(&EX_OSERR
, "Failed opendir($dir)");
377 @list = readdir(DIR
);
380 err
(&EX_OSERR
, "Failed readdir($dir)");
386 foreach $msgnum (@list) {
388 if ($msgnum =~ /^[0-9]+$/) {
389 if ($msgnum > $highest) {
398 # We want to print the name of each list that has new mail only once,
399 # so use this hash to store the lists that have already been printed.
400 # Start the list out with SPAM already in it, since we don't care when
401 # new messages are added to it.
402 my %FOLDERS = ('SPAM'=>1);
406 my $mhfolder = shift;
412 # We must do this even in -n mode because later steps fail without
413 # it. This should be harmless.
416 # This loop is a modified version of the maildir delivery algorithm.
417 $msgnum = get_highest_msgnum
($mhfolder);
418 for ($try = 0; ; $try++) {
420 $mhmsg = "$mh/$mhfolder/$msgnum";
422 if (not stat($mhmsg)) {
424 # Now we have a non-existent file, let's try to create
425 # it. We must create a zero-byte file first because a
426 # file my appear between our happy stat results and
427 # our later rename(2), which would clobber said file.
428 # So attempt to create a file with this name. If it
429 # succeeds, in just a bit here we'll knowingly clobber
430 # this file with the rename(2) call.
432 # Another way to do this is not to use rename(2), but
433 # use link(2) + unlink(2) instead. That's how the
434 # standard maildir algorithm does it. Each method has
435 # a disadvantage: the program may crash between the
436 # link(2) and unlink(2) calls. With the standard
437 # maildir algorithm, that means the message will end
438 # up duplicated. The advantage of creating an empty
439 # file followed by rename(2) is that an extra empty
440 # file is left behind as opposed to a duplicate
441 # message. This is more easiliy detected by the user.
443 if (sysopen(MSG
, "$mhmsg",
444 O_WRONLY
| O_EXCL
| O_CREAT
, 0600)) {
454 # This algorithm is different from the maildir one; let's make
455 # 10 tries instead of 3.
457 errx
(&EX_TEMPFAIL
, "Attempted filename $mhmsg exists.");
460 # This algorithm is different; i don't think we need to sleep.
464 logsave
("+$mhfolder");
467 if (not rename($msg, $mhmsg)) {
468 err
(&EX_OSERR
, "Failed rename($msg, $mhmsg)");
471 # Mark each message as soon as we store it and bomb if that
472 # fails. While it is slow, it is not safe to store multiple
473 # messages and then have a failure before marking some (or
475 if ($mhfolder ne 'SPAM') {
476 $status = system('mark', "+$mhfolder", "$msgnum", '-sequence',
478 if (not WIFEXITED
($status)) {
479 err
(&EX_OSERR
, "Failed to run mark");
480 } elsif (WEXITSTATUS
($status) != 0) {
481 errx
(&EX_SOFTWARE
, "Failed to mark message unseen.");
486 if (not $FOLDERS{$mhfolder}) {
487 print("+$mhfolder\n");
488 $FOLDERS{$mhfolder} = 1;
492 # Parse a message file into a structure describing the headers. The
493 # structure is a hash of arrays. The hash keys are the names of the
494 # headers found in the message, made all lower-case. Each item in
495 # the hash is an array of header text. The array has one element
496 # per occurrence of the header. Most headers will only have a single
497 # element because they occur only once. The Received header is the
498 # most common header with multiple texts.
502 my $current; # current header, used for unfolding lines
503 my $fieldname; # unmolested header name
504 my $contents; # contents of header
514 # folded header continuation
516 if (not defined($current)) {
517 print(STDERR
"Malformed message, cannot parse headers.\n");
521 @{$headers{$current}}[-1] .= $_;
523 ($fieldname) = split(/:/);
524 $current = lc($fieldname);
525 (undef, $contents) = split(/^\Q$fieldname\E:\s*/);
527 if (defined($headers{$current})) {
528 # multiple occurence of same header
529 push(@{$headers{$current}}, $contents);
531 # first occurence of a header
532 $headers{$current} = [$contents];
541 ###############################################################################
553 foreach $msg (@msglist) {
554 printf('%sChecking for spam... %6d/%d',
555 "\r", ++$i, scalar(@msglist));
558 store_message
($msg, 'SPAM');
563 print("\nDone: ", scalar(@result), " survivors\n");
569 ###############################################################################
584 %headers = get_headers
($msg);
589 log_headers
(%headers);
591 # Walk the list of filters. This structure is documented in
592 # pod at the end of the program.
593 foreach $filterref (@FILTERS) {
594 @filter = @$filterref;
595 $header = shift(@filter);
597 # Handle filters using the magic TO header specially.
598 if ($header eq $MAGIC_TO_TOKEN) {
599 foreach $header (keys(%headers)) {
600 if ($header =~ /$MAGIC_TO_REGEX/i) {
601 foreach $contents (@{$headers{lc($header)}}) {
602 foreach $pair (@filter) {
603 ($match, $expression) = @$pair;
604 if ($contents =~ /$match/) {
612 # Now that it's been processed specially, skip normal handling.
616 # Walk the list of message headers matching the filter's
618 foreach $contents (@{$headers{lc($header)}}) {
619 # Walk the filter's list of match/expression pairs.
620 foreach $pair (@filter) {
621 ($match, $expression) = @$pair;
622 if ($contents =~ /$match/) {
623 if (eval "\$result = \"$expression\"") {
639 foreach $msg (@msglist) {
640 $mhfolder = find_mh_folder
($msg);
641 store_message
($msg, $mhfolder);
650 $Data::Dumper
::Indent
= 1;
651 print(Dumper
(\
@FILTERS));
655 @msglist = getfiles
("$MAILDIR/new");
657 if (not $filteronly) {
658 @msglist = kill_spam
(@msglist);
662 filter_mail
(@msglist);
669 =head1 THE FILTERS STRUCTURE
671 The user's .mincfilter file must define the @FILTERS structure. This
672 structure is an array. Each element of @FILTERS is a filter. A
673 filter is itself an array. The first element of a filter is a string,
674 the name of the header this filter acts upon. The header name is not
675 case-sensitive. Each subsequent element of a filter is a pair (i.e. a
676 two-element array): first, a regular expression B<minc> uses to
677 determine whether this filter matches or not, and second, an
678 expression which B<minc> evaluates to get the folder name.
680 B<minc> decides where to store a message by iterating over the
681 @FILTERS array. It tests each regexp of each filter against all
682 headers matching that filter's specified header. As soon as a match
683 is found, B<minc> evaluates the second part of the pair. This part
684 may contain positional parameters from the matched regexp ($1, $2,
685 etc.). The result of this expression is used as the folder name.
687 Multiple occurrences of the same header are preserved and tested
688 individually against the filters. That means, for example, that
689 filters can search all the Received headers for a certain string.
691 It is important to note that all the arrays of the @FILTERS structure
692 are traversed I<in order>. This means the filters can be arranged so
693 that some have priority over others. XXX: get Doug to write an
694 example of having the same header matched more than once.
696 Lastly, B<minc> supports a magic ' TO' header. Filters using this
697 header are matched against a collection of headers related to (and
698 including) the To. Which headers to use is determined by a regular
699 expression borrowed from procmail.
706 ['<([[:graph:]]+)-([[:graph:]]+)\.lists\.sourceforge\.net>', 'l/$1/$2'],
707 ['<([[:graph:]]+?)(-list)?\.freedesktop\.org>', 'l/freedesktop/$1'],
708 ['<swig\.cs\.uchicago\.edu>', 'l/swig'],
712 ['<mailto:([[:graph:]]+)+@([[:graph:]]+)\.apache\.org>', 'l/apache/$2/$1'],
720 The first List-Id filter is a surprisingly effective one which creates
721 appropriate folder names (such as l/htdig/updates and l/rox/devel) for
722 all Sourceforge lists to which i subscribe. Certainly there are lists
723 hosted at Sourceforge which do not work well with this, and this is
724 where it is important to remember that the first match is the one
725 uses. Simply put the more specific rules before this one.
727 The next List-Id example is simple. The swig example demonstrates
728 that the folder name does not have to use a portion of the matched
729 header; you can just hardcode the folder name.
731 The List-Post example is a nice one. Most ASF projects have their own
732 apache.org subdomain, with mailing lists hosted there. So, given a
733 list such as dev@httpd.apache.org, this filter will create the folder
734 name l/apache/httpd/dev.
736 For an example is_spam function, see
737 L<http:E<047>E<047>pretzelnet.orgE<047>cvsE<047>dotfilesE<047>.mincspam>
741 Written by Eric Gillespie <epg@pretzelnet.org> with logging code
742 stolen from Adam Lazur <adam@lazur.org>.
744 Design by Eric Gillespie and Doug Porter <dsp@waterspout.com>.
749 # cperl-indent-level: 4
750 # perl-indent-level: 4
751 # indent-tabs-mode: nil
754 # vi: set tabstop=4 expandtab: