4 * Comments on these tokens:
6 * ATEXT is defined in RFC 5222 as:
8 * '!' / '#' / '$' / '%' / '&' / ''' / '*' / '+' / '-' / '/' /
9 * '=' / '?' / '^' / '_' / '`' / '{' / '|' / '}' / '~'
11 * All printable ASCII characters except for spaces and specials
13 * QSTRING is a quoted string, which is printable ASCII characters except
14 * for \ or the quote character surrounded by quotes. Use \ for quoting
15 * \ and the quote character.
17 * FWS is folding white space, which is defined as SP (\040), HTAB (\011),
18 * and NL (\012). Technically CR (\015) is part of that, but traditionally
19 * Unix format files don't have that character.
21 * COMMENT is a comment string, which is printable ASCII characters except
22 * for '(', ')', and '\'. Uses same quoting rules as QSTRING. To make
23 * the grammer slightly less conflict-happy, COMMENT must include any FWS
24 * in front or behind of it (simply have it eaten in the lexer).
26 * Everything else is a SPECIAL, which is returned directly. These are
27 * defined in RFC 5322 as:
29 * '(' / ')' / '<' / '>' / '[' / ']' / ':' / ';' / '@' / '\' / ',' / '.' /
32 * Technically we don't return all of these; we handle () in comments, " in
33 * quoted string handling, and \ in those handlers.
36 %token ATEXT QSTRING FWS COMMENT
41 * A list of addresses; the main entry point to the parser
43 address_list: /* nothing */
44 | address_list ',' address
48 * A single address; can be a single mailbox, or a group address
57 * A traditional single mailbox. Either in Name <user@name> or just a bare
58 * email address with no angle brackets.
67 * An email address, with the angle brackets. Optionally contains a display
68 * name in the front. The RFC says "display-name", but display-name is
69 * defined as a phrase, so we just use that.
78 cfws '<' addr_spec '>' cfws
79 | cfws '<' addr_spec '>'
80 | '<' addr_spec '>' cfws
85 * The group list syntax. The group list is allowed to be empty or be
86 * spaces, so we define group_list as either being a mailbox list or
87 * just being CFWS. mailbox_list can be empty, so that can handle the
88 * case of nothing being between the ':' and the ';'
91 phrase ':' group_list ';' cfws
92 | phrase ':' group_list ';'
101 mailbox_list: /* nothing */
102 | mailbox_list ',' mailbox
106 local_part '@' domain
120 cfws '[' dtext_fws ']' cfws
121 | cfws '[' dtext_fws ']'
122 | '[' dtext_fws ']' cfws
127 * It was hard to make a definition of dtext and domain-literal that
128 * exactly matched the RFC. This was the best I could come up with.
131 dtext_fws: /* nothing */
135 | dtext_fws FWS ATEXT FWS
136 | dtext_fws FWS ATEXT
137 | dtext_fws ATEXT FWS
148 * obs-phrase is basically the same as "phrase", but after the first word
149 * you're allowed to have a '.'. I believe this is correct.
159 | obs_phrase_list word
168 * This makes sure any comments and white space before/after the quoted string
186 * Making dot-atom work was a little confusing; I finally handled it by
187 * defining "dot_atom_text" as having two or more ATEXTs separted by
188 * '.', and defining dot_atom as allowing a single atom.
192 | cfws dot_atom_text cfws
200 | dot_atom_text '.' ATEXT
204 * As mentioned above, technically in the CFWS definition in the RFC allows
205 * FWS before and after the comment. The lexer is responsible for eating
206 * the FWS before/after comments.