Initial thoughts.

Tue Feb 24 13:30:52 GMT 2004

On Tue, Feb 24, 2004 at 11:30:02AM +0000, David Woodhouse wrote:
> Your (2) isn't solved by my SRS toy, but crypto-signing of mail _would_
> solve it. You mandate that 'all mail From: dwmw2 at infradead.org _shall_
> be signed with key XXXX'. That's basically what we're here for, I think.
> It's a far more interesting problem, and not one which SPF addresses.

Agreed.

> > Personally, I exclude "verifying the envelope sender on deliverable mails"
> > from the problem set; if the message is deliverable, then by definition we
> > don't use the envelope sender anyway, because there's no bounce to be sent.
> 
> You don't know that at RCPT time. You know that you don't want to
> generate a bounce _now_, but you (or one of your other machines) might
> need to do so _later_.

Absolutely true. But suppose you just accept the envelope-sender as given.
Then either:

1. it's correct, in which case the bounce goes to the right place; or

2. it's maliciously incorrect (a joe job pointing to someone else), in which
case it falls under the different problem case (1), i.e. I don't want to
accept bounces to mail I didn't send; or

3. it's an accidental misconfiguration by the end-user. Unfortunately, most
mail has already been through one relay (the sender's ISP's smarthost), so
detecting an invalid return address after this stage is futile.

It would be nice for the ISP's smarthost to validate the MAIL FROM address,
so it could reject mails (550) before accepting the content. This is
something which I have configured - at least, on smarthosts, I do a simple
DNS check that the MAIL FROM domain is OK.

To do a stronger check at this point would involve requiring all users to do
SMTP AUTH and binding their AUTH username to the MAIL FROM name(s) they are
permitted to use.

> And, FWIW, I disagree with the philosophy that "the spammers will catch
> up so it's all pointless". The spammers will work hard to catch up with
> the _majority_. They won't necessarily catch up with the clueful. 

I think the SPF people are counting on some sort of equilibrium: say only 5%
of the Internet adopts SPF, so little that spammers won't care. But they may
get some benefit if, say, spammers send mail from foo at aol.com but aol.com is
declaring information in SPF.

They will also take the pain of SPF breakage, but they may gain some net
benefit.

But this is an unstable equilibrium. If SPF does actually deliver some
benefit, then everyone will want it, at which point the spammers will bypass
it. Personally I think the SRS-signing of outgoing mail as you have
implemented gives much greater benefit to the implementor, immediately, at
very little cost.

> But these are policy decisions to be made locally. I'm sure you agree
> that it's worthwhile to be able to verify the envelope sender

Actually, no :-) See above.

> > > The problem we have is that forwarding MTAs may perform arbitrary  
> > > mutilation of our mail, disrupting our signature.
> > 
> > Indeed, although in many cases only the headers are munged.
> 
> A problem which is solved by signing only those headers which must _not_
> be munged. I think it's reasonable, for example, to assume that a remote
> MTA will not munge the From: header. The _local_ one may do so before
> sending, but if we implement signing at the MTA level then that's fine.

Or else you put in a brand new header, say "X-Signed-From:", which includes
the crypto signature.

Any MUA which understands it can use it. Any final end-point MTA can compare
the signed address with the From: address, and modify the From: address
appropriately:

    From: "Dubya Bush" <president at whitehouse.gov>
becomes:
    From:  "Dubya Bush [PROBABLY FORGED, claimed president at whitehouse.gov, real sender spam123 at isp.net]" <spam123 at isp.net>

The sending MTA can add a header saying "real sender was spam123 at isp.net"
without touching the original header.

But as you say, you have to prevent replay (i.e. cut-and-paste of the signed
header), which means the header has to include, say, a hash of the entire
message body, and the message must be in a form which is not likely to be
corrupted in transit otherwise the hash is invalidated.

A review of existing work in OpenPGP, S/MIME etc would be useful here I
think, because they must have exactly the same problem.

> > I did wonder whether it was worthwhile adding a sort of cryptographic
> > "postmark", like a signed version of the Received: header. This could make
> > declarations such as:
> > - this mail was handled by the mail relay at foo.com
> > - this mail was received from IP address x.x.x.x
> > - this mail was sent by user X who authenticated using SMTP AUTH
> 
> Those are mostly uninteresting to me, I think. Well, except for 'this
> mail was sent by user X at domain.com', where that is a statement signed
> with the public key for domain.com. I don't care _how_ they
> authenticated themselves; only that someone with the private key for
> domain.com actually believed them.

There are different levels of trust which could be indicated:

- a mail from "b.candler at pobox.com", sent via myisp.net, could be signed via
myisp.net saying "this mail came through me, but I cannot vouch for the
sender; the real sender could be be anyone at isp.net"

- a mail from "random at myisp.net", sent via myisp.net, could be signed as
"this mail came through me, and it's one of my domains, the RHS may be right
but I can't vouch for the LHS of the address" (this is roughly what SPF
attempts)

- a mail from "random at myisp.net", where the user authenticated with SMTP
AUTH and with a username which matched that address, could be signed "this
mail came through me and I vouch for the authenticity of the sender"

- a mail from "b.candler at pobox.com", where I authenticated with SMTP AUTH as
brian at myisp.net" could be signed "this mail came through me from the user
brian at myisp.net, but I cannot vouch for the sender address given"

Such policies could allow existing practices to continue working, and for an
address which cannot be fully vouched for (which would be the vast majority
in the early days), you at least get _some_ information about the sender or
the domain.

Then it's a logical extension to include the IP address. This is the
information which the ISP needs to trace the original user if they did not
authenticate with SMTP AUTH. A spam complaint containing such information
could be processed in a highly automated fashion, AND provide strong
evidence (e.g. if needed in a court of law) for kicking off the spammer.

Hmm. Let me put it another way. Suppose in our smarthost MTA we see a mail
which says
   From: president at whitehouse.gov

but whitehouse.gov has not published any key information, so we know the
recipient won't be able to verify it one way or the other.

I think it would be better for us to sign it saying "This mail probably came
from *@myisp.net", than not sign it at all. The receiver then has something
to verify (using the myisp.net key in the DNS) and show to the end-user.

Since the certification disagrees with the originally-provided From: header,
the end-user has a better indication of a forgery than if no signature were
applied, even though whitehouse.gov has not published anything.

> > Having this information in a proper structured format (unlike the ad-hoc
> > format of Received: headers), *and* signed for non-repudiation, would lead
> > to faster tools for reporting of spam and hence for dynamic black-lists.
> 
> I think this is a digression. It's a cute idea, but it still requires
> implementation by everyone. Only those who care would implement it, and
> those who care _already_ put useful Received: headers in.

If we digress, I don't think it's far, if we are talking initially about
crypto-signing with an ultimate view towards reduction of spam, not
crypto-signing for E-commerce applications etc.

Of course, crypto-signing will never be able to identify a piece of spam,
and it will never be able to identify a spammer (because spammers can create
new electronic identities for themselves at will). However, if it can help
reliably trace the *source* of a spam, then that pressure can be put on that
source. And with dynamic IPs, having a fast response to spam is important
(spamcop etc)

> It's hardly
> difficult to see where real Received: headers stop and the faked ones
> start; we don't need to address this here.

Not for experts like you or me. But there are plenty of people with enough
knowledge to find full headers, but not enough to understand them, who send
spam complaints to the wrong places (based on forged HELO names, forged
Received: headers, and forged envelope senders). At least one person on
spf-discuss was arguing that as a case for validating HELO names.

> > And if an end-user has signed a message using their own private key, we must
> > be able to add our own ISP-generated signature as well as it passes through
> > our mailserver.
> 
> Why would a message need to be signed with more than one key? I was
> envisaging a scheme where the end-user identifies themselves as usual
> (and with unmodified software) with SMTP AUTH and the _ISP_ signs with
> the appropriate key.

Sure. I meant if we used an existing mechanism like OpenGPG. What if we come
across a mail which is already in OpenGPG format?

> If we _do_ have a per-user key scheme where the end-user has signed a
> message using their own key, why would it need signing by the ISP too?

We need to add some level of the ISP 'vouching for' the user - which, unless
the end-user's key has been signed by the ISP, means signing the message
somehow.

Otherwise, we are unable to add any trust for an existing GPG message.
Spammers can always start sending OpenGPG-signed mails using throwaway keys.

If we treat OpenGPG as just a lump of MIME, and add our own signature in the
top headers, this should be fine.

> > Then you have to consider how an end-point will verify the signatures. If
> > it's an MTA, then it can make a lookup in the DNS in real-time. However if
> > it's an off-line MUA, which has downloaded the mail using POP3 say, then
> > either it must trust the final MTA to stamp the message somehow, or it must
> > be able to verify the signatures off-line (which means a chain of trust up
> > to some root certificates)
> 
> This should be left as flexible as possible, to allow sites to implement
> their own policy. It should be possible to do _either_. 

A good goal.

> Verifying the signatures in the MUA can still do the DNS lookups to get
> the pubkey, though, surely? Just like SpamAssassin can do RBL and SPF
> lookups. You're normally online at the time you _fetch_ the mail, or you
> wouldn't be able to talk to the mail server.

Not if you're offline (i.e. disconnected from the Internet)

> In fact, I suspect we should include only the Date:, From:, and
> Message-Id: headers. We have a very reasonable expectation that those
> will not be changed in transit, except perhaps by the local MTA which as
> discussed is fine.

If done properly then probably the 'right' thing to do would be to
encapsulate the whole message, with authenticated headers, inside a
message/rfc822 container, and sign that. But that would be too intrusive for
day one.

> And yes, we should also include some form of fuzzy hash of the mail. As
> a primitive example, we could perhaps specify the number of lines which
> were present when the mail was signed, and then the receiver knows how
> much to trim from the bottom. 

A bit too easy to replay-attack. Just take the first N lines of a message to
a mailing list, say, and paste your spam at the bottom.

I actually find it hard to conceive of a value of N which would be useful.
Many messages are only 2 lines long; after that you get the extra footers
(virus-sweeper, mailing list info, etc)

The value of N could be included in the signature itself: i.e. "this message
was originally N lines long, and the hash covers those N lines; anything
after that is not signed". But the replay-and-add-spam attack is too easy.

> We probably also need to canonicalise the character set and encoding;
> transition between systems supporting 8BITMIME and not will cause
> breakage otherwise. 

I guess already addressed by OpenGPG etc. Or maybe they just base64 encode
the lot, which is not a friendly thing to do for us.

Regards,

Brian.