Initial thoughts.

Tue Feb 24 14:31:00 GMT 2004

On Tue, 2004-02-24 at 13:30 +0000, Brian Candler wrote:
> 2. it's maliciously incorrect (a joe job pointing to someone else), in which
> case it falls under the different problem case (1), i.e. I don't want to
> accept bounces to mail I didn't send; or

Indeed you don't. That's nice for you. Now _I_ am stuck with an
undeliverable bounce because you quite rightly don't want to accept it.

> I think the SPF people are counting on some sort of equilibrium: say only 5%
> of the Internet adopts SPF, so little that spammers won't care. But they may
> get some benefit if, say, spammers send mail from foo at aol.com but aol.com is
> declaring information in SPF.

They're on crack if they think this, since SPF can only be workable _if_
the whole of the Internet works around its brokenness by implementing
SRS.

In the case of SPF, I agree that it's pointless. Since by definition if
it ever becomes workable, it becomes pointless. Other schemes are
different, in that they don't _need_ critical mass in order to help, and
hence aren't obsolete by the time they're useful.

> But this is an unstable equilibrium. If SPF does actually deliver some
> benefit, then everyone will want it, at which point the spammers will bypass
> it. Personally I think the SRS-signing of outgoing mail as you have
> implemented gives much greater benefit to the implementor, immediately, at
> very little cost.

To the implementor yes, but not to the poor sods out there who don't
bother to _check_ with me whether that's a valid return-path. 

That's their choice -- and it's yours. I like to check, though.

> > But these are policy decisions to be made locally. I'm sure you agree
> > that it's worthwhile to be able to verify the envelope sender
> 
> Actually, no :-) See above.

Fine. But we are still digressing from the point we _do_ agree on...

> Or else you put in a brand new header, say "X-Signed-From:", which includes
> the crypto signature.

That's basically what I was suggesting, surely?

> Any MUA which understands it can use it. Any final end-point MTA can compare
> the signed address with the From: address, and modify the From: address
> appropriately:
> 
>     From: "Dubya Bush" <president at whitehouse.gov>
> becomes:
>     From:  "Dubya Bush [PROBABLY FORGED, claimed president at whitehouse.gov, real sender spam123 at isp.net]" <spam123 at isp.net>
> 
> The sending MTA can add a header saying "real sender was spam123 at isp.net"
> without touching the original header.

I wouldn't recommend editing the From: address. I think RFC2821 says you
MUST NOT. It will suffice to add a warning header that a MUA can look
for if it wants, or which procmail can use to dump the message into a
spam box.

> But as you say, you have to prevent replay (i.e. cut-and-paste of the signed
> header), which means the header has to include, say, a hash of the entire
> message body, and the message must be in a form which is not likely to be
> corrupted in transit otherwise the hash is invalidated.

Yes.

> A review of existing work in OpenPGP, S/MIME etc would be useful here I
> think, because they must have exactly the same problem.

They make requirements about the MIME parts (like restricting to 7 bits)
which I don't think we can make in the general case. They don't handle
munging of text parts either.

You're right that we must avoid reinventing the wheel. As it is, I don't
think either of those wheels will fit. I'm not completely familiar with
S/MIME though -- I only glanced at the RFCs this morning for inspiration
and to assess reusability.

> There are different levels of trust which could be indicated:
> 
> - a mail from "b.candler at pobox.com", sent via myisp.net, could be signed via
> myisp.net saying "this mail came through me, but I cannot vouch for the
> sender; the real sender could be be anyone at isp.net"

I don't see what this achieves. If the host in question is clueful, its
Received: headers are already enough, surely?

> - a mail from "random at myisp.net", sent via myisp.net, could be signed as
> "this mail came through me, and it's one of my domains, the RHS may be right
> but I can't vouch for the LHS of the address" (this is roughly what SPF
> attempts)
> 
> - a mail from "random at myisp.net", where the user authenticated with SMTP
> AUTH and with a username which matched that address, could be signed "this
> mail came through me and I vouch for the authenticity of the sender"

I _can_ see the point in making a distinction between the above two
cases, but I wonder if it's really _necessary_. Any proposal has to be
lightweight and we must avoid featuritis.

> - a mail from "b.candler at pobox.com", where I authenticated with SMTP AUTH as
> brian at myisp.net" could be signed "this mail came through me from the user
> brian at myisp.net, but I cannot vouch for the sender address given"

According to RFC2822, in this case you should have brian at myisp.net in
the Sender: header, and b.candler at pobox.com in the From: header. 

I wonder if it's worth extending the scheme to allow verification of
Sender: addresses as well as From: addresses; I suspect that's another
example of featuritis though. 

> Such policies could allow existing practices to continue working, and for an
> address which cannot be fully vouched for (which would be the vast majority
> in the early days), you at least get _some_ information about the sender or
> the domain.

I think we could address this differently. The pubkey-in-DNS scheme
should allow you to specify different pubkeys on a per-user basis if you
want to. That way, the user himself can add the signatures using his
_own_ private key, and the ISP-de-jour doesn't have to be involved.

That'll be by _far_ the most common case, surely?

> Then it's a logical extension to include the IP address. This is the
> information which the ISP needs to trace the original user if they did not
> authenticate with SMTP AUTH. A spam complaint containing such information
> could be processed in a highly automated fashion, AND provide strong
> evidence (e.g. if needed in a court of law) for kicking off the spammer.

It's in the Received: headers already. Outside our scope for today. 

> Hmm. Let me put it another way. Suppose in our smarthost MTA we see a mail
> which says
>    From: president at whitehouse.gov
> 
> but whitehouse.gov has not published any key information, so we know the
> recipient won't be able to verify it one way or the other.
> 
> I think it would be better for us to sign it saying "This mail probably came
> from *@myisp.net", than not sign it at all. The receiver then has something
> to verify (using the myisp.net key in the DNS) and show to the end-user.

But it means nothing. It's like signing a memo you found on your desk
without reading it, just to say "I saw this and it really did exist".

I disagree that such a signature would be useful.

> Since the certification disagrees with the originally-provided From: header,
> the end-user has a better indication of a forgery than if no signature were
> applied, even though whitehouse.gov has not published anything.

Why a better indication of a forgery? No information is available as to
whether this could be valid or not. Do you think it _is_ or it _isn't_
if you find such a signature? I don't see how any information is gained.

> > It's hardly
> > difficult to see where real Received: headers stop and the faked ones
> > start; we don't need to address this here.
> 
> Not for experts like you or me. But there are plenty of people with enough
> knowledge to find full headers, but not enough to understand them, who send
> spam complaints to the wrong places (based on forged HELO names, forged
> Received: headers, and forged envelope senders). At least one person on
> spf-discuss was arguing that as a case for validating HELO names.

Nothing will fix these people apart from a cluebat. Smell the coffee :)
Seriously, the Received: headers will always have to exists; people will
always misinterpret them. If you think such a scheme would help in any
but the most trivial amount, then I respectfully submit that you're
crazy. And, given your cynicism about even sender-verification,
inconsistently optimistic. :)

> Sure. I meant if we used an existing mechanism like OpenGPG. What if we come
> across a mail which is already in OpenGPG format?

We won't use GPG, because it's too obvious to non-participating users.
It would never get implemented. 

But if we see a message which is already signed with _our_ scheme, it is
a local policy decision as to whether we re-sign it or leave the
original signature to stand. Why should the RFC make this decision?

I think it's OK to allow only _one_ signature to exist for any given
mail. It gets complicated otherwise.

> > If we _do_ have a per-user key scheme where the end-user has signed a
> > message using their own key, why would it need signing by the ISP too?
> 
> We need to add some level of the ISP 'vouching for' the user - which, unless
> the end-user's key has been signed by the ISP, means signing the message
> somehow.

The ISP vouches for the user by listing the appropriate pubkey in the
DNS for that user at isp-domain. Isn't that sufficient?

> Otherwise, we are unable to add any trust for an existing GPG message.
> Spammers can always start sending OpenGPG-signed mails using throwaway keys.

True. The DNS (or whatever) must list the keys which individual users
are using, if signing is done by individual users.

> If we treat OpenGPG as just a lump of MIME, and add our own signature in the
> top headers, this should be fine.

Yep.

> > > Then you have to consider how an end-point will verify the signatures. If
> > > it's an MTA, then it can make a lookup in the DNS in real-time. However if
> > > it's an off-line MUA, which has downloaded the mail using POP3 say, then
> > > either it must trust the final MTA to stamp the message somehow, or it must
> > > be able to verify the signatures off-line (which means a chain of trust up
> > > to some root certificates)
> > 
> > This should be left as flexible as possible, to allow sites to implement
> > their own policy. It should be possible to do _either_. 
> 
> A good goal.
> 
> > Verifying the signatures in the MUA can still do the DNS lookups to get
> > the pubkey, though, surely? Just like SpamAssassin can do RBL and SPF
> > lookups. You're normally online at the time you _fetch_ the mail, or you
> > wouldn't be able to talk to the mail server.
> 
> Not if you're offline (i.e. disconnected from the Internet)

Then you're unlikely to be fetching your mail, surely? I think we can
declare this an uninteresting sub-problem.

	"How do you look up keys in the public database while offline"
	"You don't."

> If done properly then probably the 'right' thing to do would be to
> encapsulate the whole message, with authenticated headers, inside a
> message/rfc822 container, and sign that. But that would be too intrusive for
> day one.

Yes. That's the _technical_ reason we're not using GPG or S/MIME.

> > And yes, we should also include some form of fuzzy hash of the mail. As
> > a primitive example, we could perhaps specify the number of lines which
> > were present when the mail was signed, and then the receiver knows how
> > much to trim from the bottom. 
> 
> A bit too easy to replay-attack. Just take the first N lines of a message to
> a mailing list, say, and paste your spam at the bottom.

Hence the suggested policy of rejecting a mail where there's more than a
very few lines added to the original NLINES. Also bear in mind that the
date was included in the signature, so you can place a time limit on
replay attacks. Again up to local policy.

> I actually find it hard to conceive of a value of N which would be useful.
> Many messages are only 2 lines long; after that you get the extra footers
> (virus-sweeper, mailing list info, etc)
> 
> The value of N could be included in the signature itself: i.e. "this message
> was originally N lines long, and the hash covers those N lines; anything
> after that is not signed". But the replay-and-add-spam attack is too easy.

Yes, that's what I suggested. You include the number of lines in the
headers, signed for authenticity. 

> > We probably also need to canonicalise the character set and encoding;
> > transition between systems supporting 8BITMIME and not will cause
> > breakage otherwise. 
> 
> I guess already addressed by OpenGPG etc. Or maybe they just base64 encode
> the lot, which is not a friendly thing to do for us.

Indeed.

-- 
dwmw2