Skip to content

Add checking expiry in handshake#12

Merged
mcpherrinm merged 10 commits intomainfrom
mattm-expired-check-2
Dec 3, 2025
Merged

Add checking expiry in handshake#12
mcpherrinm merged 10 commits intomainfrom
mattm-expired-check-2

Conversation

@mcpherrinm
Copy link
Copy Markdown
Contributor

@mcpherrinm mcpherrinm commented Nov 26, 2025

This adds tracking of whether a certificate is expected to be expired to the CertManager.

In GetCertificate, called in the handshake, we verify that the certificate is expired or not against what it should be.

This means we won't serve a certificate instead of serving an incorrect one. This is a tradeoff, but we think is more likely to go unnoticed, and avoids a class of compliance problems we've seen other CAs have.

real certs are coming.
Used at startup to load certs, and will be used to reload when the ACME
client has completed issuance

Drop the cert struct and just store a *tls.Certificate.

The struct isn't needed.
This prevents serving certificates which are expired (if they should be valid/revoked),
or serving certificates which should be expired (but aren't).

Once we have metrics wired up, this will be a key monitoring point.
Make sure all four possible cases are covered.
@mcpherrinm mcpherrinm requested a review from a team as a code owner November 26, 2025 20:07
@mcpherrinm mcpherrinm requested review from beautifulentropy and removed request for a team November 26, 2025 20:07
Comment thread certs/certs.go Outdated
Comment thread certs/certs.go
certs map[string]*tls.Certificate

// expired is a map of domain to whether the cert is expected to be expired
expired map[string]bool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for storing this as metadata on the certificate itself, rather than having a bunch of separate maps. Something like:

type certificate {
  *tls.Certificate
  shouldBeExpired bool
  shouldBeRevoked bool
}

type CertManager struct {
  ...
   certs map[string]*certificate
}

Though I admit that this somewhat conflicts with the current design of "LoadCertificate modifies the CertManager, but is agnostic of what cert it is loading".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first instinct (and I wrote it that way), but I found the conditionals in the rest of the code felt more awkward. This also keeps expired entirely static, which just feels less prone to errors.

When I found a bug in my implementation, realized I could eliminate the possibility of that bug by writing it in this style. I'm not wedded to this choice, and I've already done it both ways, but this one looked nicer in the end :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the conditionals elsewhere felt worse to you? In my mind it's as simple as

if cert.shouldBeExpired && !expired {
  ...
} else if !cert.shouldBeExpired && expired {
  ...
}

Or a different sketch:

type certKind int
const (
  Valid certKind = iota
  Expired
  Revoked
)
type certificate {
  *tls.Certificate
  kind certKind
}

func (c *CertManager) GetCertificate(info *tls.ClientHelloInfo) (*tls.Certificate, error) {
  ...
  switch certificate.kind {
  case Valid:
    // ensure it's actually valid
  case Expired:
    if !time.Now().After(cert.Leaf.NotAfter) { oops }
  case Revoked:
    // ensure it's actually revoked
}

Though that has its own code duplication problems.

So I guess I'm largely sold on the idea of "here are static maps identifying the domains that should use expired certs". Then my mind returns to the same question as above: why the asymmetry? If we take for granted that the cert manager is going to keep track of which domains should be serving which kinds of certs, then I'd expect:

type certManager struct {
	// certs is a map of domain to the certificate served
	certs map[string]*tls.Certificate

	// valid is the set of domains whose cert is expected to be valid
	valid map[string]struct{}
	// expired is the set of domains whose cert is expected to be expired
	expired map[string]struct{}
	// revoked is the set of domains whose cert is expected to be revoked
	revoked map[string]struct{}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, LoadCertificate just unconditionally overwrites the map entry.
Without that, the code had to do a read first, just for the .expired bool field, then read from storage, and write back a new struct.

But at startup we might not have a cert yet, but we want to track if it should be expired, so we put a struct with a nil certificate but the flag set.

Then, GetCert needs to check if there's a cert provisioned as well, since we'd always have a map entry (for the flag) with a nil cert.

The only reason to track expiry here is because of the very nature of time itself: It progresses forwards, and can be checked very cheaply.

All the other checks (like revocation) are going to be handled async by the ACME client. Ideally we'd never hit this, but if something else failed, I wouldn't want to let the revoked/valid certs expire. Arguably I wouldn't even bother to check if an expired cert is expired (because that code shouldn't have made it go current until after it expires, and certs can't un-expire except via clock shenanigans), but it seems nice from a symmetry perspective to just add that very cheap check here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's a single-map of structs version:

https://github.com/letsencrypt/test-certs-site/pull/17/files

Base automatically changed from mattm-basic-storage to main December 1, 2025 20:10
mcpherrinm added a commit that referenced this pull request Dec 2, 2025
This makes the Certificate Manager implement Lego's challenge.Provider
interface, and serve challenge certs when a TLS-ALPN-01 challenge comes
in.

It builds on #11 to avoid some merge conflicts, but it will inevitably
have a bit of conflict with #12 as well.
aarongable
aarongable previously approved these changes Dec 2, 2025
Copy link
Copy Markdown
Contributor

@aarongable aarongable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I don't have a strong preference between this and #17. I think that both of these feel slightly weird and non-idiomatic, pointing at some deeper design refactoring that could happen to make this feel obvious and natural. I think that this solution makes the expired status of certs feel too important (being given a whole data member on the top-level manager struct), and begs for more symmetry. I think that the other solution meshes poorly with the modification-in-place done by LoadCertificate. In the end this is a small project that we're trying to move quickly, and it won't make a huge difference to readability overall.

@mcpherrinm
Copy link
Copy Markdown
Contributor Author

ok! I think we can definitely think about how to refactor things once the rest of this comes together and we've got a working project, and see what's left feeling unnnatural

@mcpherrinm mcpherrinm merged commit 89f45f6 into main Dec 3, 2025
1 check passed
@mcpherrinm mcpherrinm deleted the mattm-expired-check-2 branch December 3, 2025 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants