Initial thoughts.

Tue Feb 24 11:30:02 GMT 2004

On Tue, 2004-02-24 at 09:39 +0000, Brian Candler wrote:
> The problem that SPF tries to solve is actually smaller than that: it only
> verifies the envelope-sender, i.e. the return address for bounces. By the
> time the message is displayed to the user, this information has either been
> lost, or has been put in a Return-Path: header which only experts would know
> how to view and interpret. The MUA displays the From: header, and SPF makes
> no attempt to validate this.

This is true. It's a very limited problem.

> I believe there are two reasons why people want to validate the
> envelope-sender:
> 
> (1) To prevent bounces being sent to forged addresses, otherwise known as
> "joe jobs" (that is, if a spammer puts my address as the envelope sender on
> a spam, and it bounces, then I get the bounce).
> This is a worthy aim, but
> - SPF won't help until the whole Internet adopts SPF (if ever)
> - SPF won't help with spam sent directly to an end-user with a null
>   envelope sender
> - There are simpler and better solutions anyway

Like what I'm doing with SRS of my own address. I can now reject bounces
to dwmw2 at infradead.org; the problem is solved. I receive no more bounces
to joe-jobs. Yes, there are people out there who don't bother to do any
form of checking on their incoming mail. There always _will_ be. But
they don't affect _me_.

> (2) To identify spam. This is IMO a pointless exercise, since all it will do
> is change the spammers' behaviour so that their mails are SPF-compliant (and
> note that it's still easy to forge mails under SPF using any domain which
> your ISP's IP addresses permit)

(3) To obsolete SPF, by proposing a _practicable_ solution which doesn't
throw the baby out with the bathwater by making changes which are 
fundamentally incompatible with the way the world actually works.

> > A practicable solution will work end-to-end, without the need for
> > participation by conservative and uninterested third parties.
> 
> Well, I still think the first thing which needs to be done is to agree on
> the problem, before we look at the solution. I put some thoughts up here:
> http://archives.listbox.com/spf-discuss@v2.listbox.com/200402/0603.html

Interesting reading; thanks for the pointer.

Your (1) I think my SRS solution already covers. This does of course
require that I'm willing to accept (1c). But as I already refuse to
accept mail from sites which don't like 'MAIL FROM:<>' and which don't
accept mail to postmaster, etc... that's a policy decision I'm happy
with. They're violating a MUST. You can't work round people like that.

(1a) is easily fixable since it's only locally-generated addresses. I
control all the inputs -- I can make it fit. (1b) I think isn't too
likely. Some lists _do_ filter so only subscribers can post, but I
believe they all do it on the From: header not the reverse-path.

Your (2) isn't solved by my SRS toy, but crypto-signing of mail _would_
solve it. You mandate that 'all mail From: dwmw2 at infradead.org _shall_
be signed with key XXXX'. That's basically what we're here for, I think.
It's a far more interesting problem, and not one which SPF addresses.

> Personally, I exclude "verifying the envelope sender on deliverable mails"
> from the problem set; if the message is deliverable, then by definition we
> don't use the envelope sender anyway, because there's no bounce to be sent.

You don't know that at RCPT time. You know that you don't want to
generate a bounce _now_, but you (or one of your other machines) might
need to do so _later_. Consider the case of a backup MX. You accept the
mail because the primary is down (and hence can't tell you it's to an
invalid address). Later it comes back and you need to bounce. Or if
you're running virtual domains, you can get similar situations.

I agree that if you _can_ reject at SMTP time rather than accepting and
later bouncing, you should. But there _are_ situations in which you'll
still need to generate a bounce.

And, FWIW, I disagree with the philosophy that "the spammers will catch
up so it's all pointless". The spammers will work hard to catch up with
the _majority_. They won't necessarily catch up with the clueful. 

But these are policy decisions to be made locally. I'm sure you agree
that it's worthwhile to be able to verify the envelope sender, and
precisely _when_ we do it is a site-local choice. There's no need for us
to digress.

> But I do include forged headers (at least the From: header) as a worthwhile
> aim.

Yes. 

> > Ideally we'd be able to introduce a signing scheme where we put a hash
> > into the headers, cryptographically verifying some part of the mail
> > headers and contents. The questions are as follows:
> > 
> >         1. What part of the mail and its contents need verification?
> >         2. How shall it be done?
> 
> 3. How is this done in a way which is at least partly useful in the interim
> period, with unmodified MUAs?

Oh and the important one I forgot to mention, because it's so obvious:
	4. How do we prevent replay attacks?

> > The problem we have is that forwarding MTAs may perform arbitrary  
> > mutilation of our mail, disrupting our signature.
> 
> Indeed, although in many cases only the headers are munged.

A problem which is solved by signing only those headers which must _not_
be munged. I think it's reasonable, for example, to assume that a remote
MTA will not munge the From: header. The _local_ one may do so before
sending, but if we implement signing at the MTA level then that's fine.

I think we _should_ implement signing at the MTA level, precisely
because so much of the munging that goes on is done by the sending MTA.

> I did wonder whether it was worthwhile adding a sort of cryptographic
> "postmark", like a signed version of the Received: header. This could make
> declarations such as:
> - this mail was handled by the mail relay at foo.com
> - this mail was received from IP address x.x.x.x
> - this mail was sent by user X who authenticated using SMTP AUTH

Those are mostly uninteresting to me, I think. Well, except for 'this
mail was sent by user X at domain.com', where that is a statement signed
with the public key for domain.com. I don't care _how_ they
authenticated themselves; only that someone with the private key for
domain.com actually believed them.

> Having this information in a proper structured format (unlike the ad-hoc
> format of Received: headers), *and* signed for non-repudiation, would lead
> to faster tools for reporting of spam and hence for dynamic black-lists.

I think this is a digression. It's a cute idea, but it still requires
implementation by everyone. Only those who care would implement it, and
those who care _already_ put useful Received: headers in. It's hardly
difficult to see where real Received: headers stop and the faked ones
start; we don't need to address this here.

> Unfortunately, anyone who picks up one of these messages with signed
> postmarks can then modify it and re-send it, unless the signature covers the
> entire message. And thus we are back to the problem that the whole message
> needs to be signed.

Yes. We have to protect against replay attacks.

> If we are building an infrastructure which converts plain mails into fully
> OpenGPG or S/MIME signed mails: that is cool, in some ways. But the
> automatic verification then has to be tied into whatever their existing key
> infrastructure is, which is generally not suitable for our needs - we want
> something DNS-based.

And I think that the fact that OpenGPG and S/MIME are _visible_ to
uninterested third parties would be a showstopper to widespread
adoption. We really need something that can be hidden in the headers
where it doesn't offend anyone.

> And if an end-user has signed a message using their own private key, we must
> be able to add our own ISP-generated signature as well as it passes through
> our mailserver.

Why would a message need to be signed with more than one key? I was
envisaging a scheme where the end-user identifies themselves as usual
(and with unmodified software) with SMTP AUTH and the _ISP_ signs with
the appropriate key.

If we _do_ have a per-user key scheme where the end-user has signed a
message using their own key, why would it need signing by the ISP too?

> Then you have to consider how an end-point will verify the signatures. If
> it's an MTA, then it can make a lookup in the DNS in real-time. However if
> it's an off-line MUA, which has downloaded the mail using POP3 say, then
> either it must trust the final MTA to stamp the message somehow, or it must
> be able to verify the signatures off-line (which means a chain of trust up
> to some root certificates)

This should be left as flexible as possible, to allow sites to implement
their own policy. It should be possible to do _either_. 

Verifying the signatures in the MUA can still do the DNS lookups to get
the pubkey, though, surely? Just like SpamAssassin can do RBL and SPF
lookups. You're normally online at the time you _fetch_ the mail, or you
wouldn't be able to talk to the mail server.

> > It's late -- I must be missing something here. What are the _valid_
> > instances of mails getting rewritten in transit, by intermediate
> > forwarding hosts between those which are doing the signing and those
> > border MX hosts on the reception side which are _verifying_ the
> > signatures?
> 
> - extra Received: headers are added (as required by RFC2821)
> - other extra headers may be added (Sender:, X-wotsit:)
> - some evil systems rewrite From: (From: foo at mail.mydomain.com to
>   From: foo.bar at mydomain.com; typically only at the sending end though)
> - often footers are added (adverts, company signature, pointless virus
>   sweeper signatures)

I think we should avoid including any header in the signature which we
don't _really_ need; especially those which might get munged. We really
need to avoid false negatives, and err on the side of permissiveness.

In fact, I suspect we should include only the Date:, From:, and
Message-Id: headers. We have a very reasonable expectation that those
will not be changed in transit, except perhaps by the local MTA which as
discussed is fine.

And yes, we should also include some form of fuzzy hash of the mail. As
a primitive example, we could perhaps specify the number of lines which
were present when the mail was signed, and then the receiver knows how
much to trim from the bottom. 

We probably also need to canonicalise the character set and encoding;
transition between systems supporting 8BITMIME and not will cause
breakage otherwise. 

> Whether people will accept the extra signature noise, or the concept of
> mails being signed by third-parties in transit, is another matter...

Signature noise can be hidden in the headers so that people don't notice
it. The concept of mail being signed by third parties would be a
conceptual problem for some people if we used GPG or S/MIME, but if we
introduce a new encryption scheme which is _specifically_ for automatic
use by the sending party and not their agent, and which nobody ever
wants to use as _legal_ proof of sender's identity, that should be fine.

==============================

The proposal as it stands; as a straw man to be argued against...

Publish records in DNS stating somehow 'all mail From: *@infradead.org
shall be signed with key XXX' 

On sending, when the sender is authenticated (via SMTP AUTH or other
methods like just being a valid local user):

Generate an SHA1 hash of the From: Date: and Message-ID: headers. Call
it $HDR_HASH.

For each MIME part of the message (considering non-MIME messages to be a
single text/plain), generate a hash for _that_ part, including $HDR_HASH
in the hash input. Undo any encoding (i.e. use the _binary_ as hash
input, not base64 encoding of it). If a part is _text_ and a charset is
specified, also convert to UTF-8 for the purposes of generating the
hash, and count the number of lines in the part, calling it NLINES. 

Generate an SHA1 hash of $HDR_HASH, $NLINES, and the actual contents,
together. Sign it with the private key and add this in a Content-Hash:
header in the MIME part. If the part was text, also add a
Content-Hash-Lines: header with the number NLINES. Finally, add an
_insecure_ hash, a simple checksum, of the same textual content. Add
this in a Content-Hash-Locator: header.

Obviously for plain non-MIME mails these headers go in the main headers
where Content-* headers normally go.

---

On receipt: 

Look up the TXT records (or whatever) for domain of each mailbox listed
in the From: header. If one has a record which specifies a signature is
required, verify it as follows:

For each binary part...
  Look for the Content-Hash* headers, do the obvious thing.

For each text part...
  Look at NLINES and see how many extra lines have been added in
transit. If nothing's been added, do the obvious thing -- there's been
no munging.

If lines have been added, we use the trivial 'locator' checksum to find
the correct lines within what we've received. (Local policy may dictate
that you want to reject the mail at this state if too many lines have
been added in 'transit'.)

Use the rolling checksum to _find_ a suitable NLINES inside the text,
matching the contents of the Content-Hash-Locator: header. The reason we
use a rolling checksum is so that it's trivial to add a line _and_
'remove' a line from an _existing_ checksum. If you want to look for a
match at ten different places in the text, you don't have to recalculate
an SHA1 hash from scratch each time. 

Once you find a match for the cheap checksum, verify the strong hash. If
it fails, continue trying to find a match for the cheap checksum.

----

I think it's OK to assume that the messages won't be munged in the
_middle_. We accept messages with crap added at the start, and at the
end, and we should even handle entire MIME parts being dropped by list
software. 

We cryptographically verify the sender, date and contents of the
message, without much chance of false negatives. Seem reasonable at
first glance?

-- 
dwmw2