Add checking expiry in handshake#12
Conversation
real certs are coming.
Used at startup to load certs, and will be used to reload when the ACME client has completed issuance Drop the cert struct and just store a *tls.Certificate. The struct isn't needed.
This prevents serving certificates which are expired (if they should be valid/revoked), or serving certificates which should be expired (but aren't). Once we have metrics wired up, this will be a key monitoring point.
Make sure all four possible cases are covered.
| certs map[string]*tls.Certificate | ||
|
|
||
| // expired is a map of domain to whether the cert is expected to be expired | ||
| expired map[string]bool |
There was a problem hiding this comment.
I have a slight preference for storing this as metadata on the certificate itself, rather than having a bunch of separate maps. Something like:
type certificate {
*tls.Certificate
shouldBeExpired bool
shouldBeRevoked bool
}
type CertManager struct {
...
certs map[string]*certificate
}Though I admit that this somewhat conflicts with the current design of "LoadCertificate modifies the CertManager, but is agnostic of what cert it is loading".
There was a problem hiding this comment.
That was my first instinct (and I wrote it that way), but I found the conditionals in the rest of the code felt more awkward. This also keeps expired entirely static, which just feels less prone to errors.
When I found a bug in my implementation, realized I could eliminate the possibility of that bug by writing it in this style. I'm not wedded to this choice, and I've already done it both ways, but this one looked nicer in the end :)
There was a problem hiding this comment.
What about the conditionals elsewhere felt worse to you? In my mind it's as simple as
if cert.shouldBeExpired && !expired {
...
} else if !cert.shouldBeExpired && expired {
...
}Or a different sketch:
type certKind int
const (
Valid certKind = iota
Expired
Revoked
)
type certificate {
*tls.Certificate
kind certKind
}
func (c *CertManager) GetCertificate(info *tls.ClientHelloInfo) (*tls.Certificate, error) {
...
switch certificate.kind {
case Valid:
// ensure it's actually valid
case Expired:
if !time.Now().After(cert.Leaf.NotAfter) { oops }
case Revoked:
// ensure it's actually revoked
}Though that has its own code duplication problems.
So I guess I'm largely sold on the idea of "here are static maps identifying the domains that should use expired certs". Then my mind returns to the same question as above: why the asymmetry? If we take for granted that the cert manager is going to keep track of which domains should be serving which kinds of certs, then I'd expect:
type certManager struct {
// certs is a map of domain to the certificate served
certs map[string]*tls.Certificate
// valid is the set of domains whose cert is expected to be valid
valid map[string]struct{}
// expired is the set of domains whose cert is expected to be expired
expired map[string]struct{}
// revoked is the set of domains whose cert is expected to be revoked
revoked map[string]struct{}There was a problem hiding this comment.
Right now, LoadCertificate just unconditionally overwrites the map entry.
Without that, the code had to do a read first, just for the .expired bool field, then read from storage, and write back a new struct.
But at startup we might not have a cert yet, but we want to track if it should be expired, so we put a struct with a nil certificate but the flag set.
Then, GetCert needs to check if there's a cert provisioned as well, since we'd always have a map entry (for the flag) with a nil cert.
The only reason to track expiry here is because of the very nature of time itself: It progresses forwards, and can be checked very cheaply.
All the other checks (like revocation) are going to be handled async by the ACME client. Ideally we'd never hit this, but if something else failed, I wouldn't want to let the revoked/valid certs expire. Arguably I wouldn't even bother to check if an expired cert is expired (because that code shouldn't have made it go current until after it expires, and certs can't un-expire except via clock shenanigans), but it seems nice from a symmetry perspective to just add that very cheap check here.
There was a problem hiding this comment.
here's a single-map of structs version:
https://github.com/letsencrypt/test-certs-site/pull/17/files
aarongable
left a comment
There was a problem hiding this comment.
Ok, I don't have a strong preference between this and #17. I think that both of these feel slightly weird and non-idiomatic, pointing at some deeper design refactoring that could happen to make this feel obvious and natural. I think that this solution makes the expired status of certs feel too important (being given a whole data member on the top-level manager struct), and begs for more symmetry. I think that the other solution meshes poorly with the modification-in-place done by LoadCertificate. In the end this is a small project that we're trying to move quickly, and it won't make a huge difference to readability overall.
|
ok! I think we can definitely think about how to refactor things once the rest of this comes together and we've got a working project, and see what's left feeling unnnatural |
This adds tracking of whether a certificate is expected to be expired to the CertManager.
In GetCertificate, called in the handshake, we verify that the certificate is expired or not against what it should be.
This means we won't serve a certificate instead of serving an incorrect one. This is a tradeoff, but we think is more likely to go unnoticed, and avoids a class of compliance problems we've seen other CAs have.