Reflections on certificates, Part 1

I’ve written a couple of posts on (X.509v3) certificates in the past, starting with this one in 2001. In the two decades since then a number of developments have taken place (to name a few: OCSP, ACME, Let’s Encrypt certificates and the general role of automation). On the other hand the fundamental mechanisms of certificates have stayed the same. In this post I argue that understanding the inherent (but often hidden) complexity, the trust relationships and the trade-offs of certificate use in a given environment can lead to better decision making and to more efficient operations.

The basic scheme (for the purposes of this post) usually involves a set of parties:

  • (1) A server (in the sense of an entity receiving a connection request, incl. network devices)
  • (2) A client (an entity that initiates a connection)
  • (3) A user who uses the client, and we can safely assume this is a human, so motivations & desires come into play (which can influence trust decisions)
  • (4) An operator being in charge of (1), or of (2), or of both. Here again we assume humans, so they have objectives (in particular “make the users happy by providing a service which is available, and which they can use with their present skill set”)
  • (5) CAs who issue certificates to be used on (1), or on (2), or on both. Evidently this involves (potentially complicated) relationships with the operators
  • (6) developers
  • (7) infosec people


Let’s start with some high-level concepts (yep, regular readers remember my love for those ;-).

Complexity

Working with certificates frequently induces a high level of complexity (definition of the term here), for a number of reasons:

  • multiple standards bodies have contributed to specifying what we have today, one of them (ITU) being notorious for complex outcomes. The main IETF document, that is RFC 5280, has 151 pages.
  • using certificates often involves other, not necessarily simple, things like ASN.1 or DER.
  • most importantly there are all types of extensions which can be employed for nearly unlimited creative uses ;-). See this part of the table of contents of RFC 5280

Unfortunately one of the objectives of the ‘traditional’ certificate use case (that is: securely buying stuff in the Internet) was to hide this complexity from the users. At the same time certificates being capabilities (see below) – which get deployed once and, seemingly, don’t have to be ‘operationally taken care of’ at least for a while – causes them & their complexity being underestimated (& being ‘invisible until something breaks’) in fast-moving environments.
Realizing that certificates are complex beasts, and especially so when employed for certain use cases (=> below), might be the first step of getting better at handling them ;-).

Trust

By their very core value proposition trust (some definition of the term here) plays a huge role when certificates come into play. They’re exactly meant to contribute to trust between communication partners (by assuring the identity of one or multiple of them). In the classic use case this works as follows:

  • I can trust that this web site I’m visiting belongs to the organization holding the domain name I typed into my browser, because I see that little lock in the URL.
  • Behind the scenes this trust is established as another party (the CA) assured the binding of some cryptographic material to some identity information, based on some more or less rigorous checks. I might not know this other party but my browser does, and the mere existence in my browser’s (or OS’s) certificate store expresses this trust.

Alas, matters involving trust can be way more complex in today’s world. Imagine you operate an application which runs on several systems and at some point connects to a system operated by a 3rd party (called $ORG in the following), e.g. for querying a database. As smart & security-conscious people are involved, certificates are used everywhere incl. that one external system. When asked about the dimension of trust as for the certificate over there (in the following: $CERT) one might be tempted to respond “well, that one enables us to trust we’re connecting to the right system (and infosec told us to ubiquitously use certificates anyway)”.
However, in reality

  • you now inherently trust that $ORG to has done a reasonable job when getting an appropriate certificate for the purpose.
  • you trust the respective CA to have done a proper job vetting $ORG (and to have issued an appropriate certificate for the purpose).
  • you now inherently trust that $ORG knows or monitors the expiry date of $CERT (and, evidently/subsequently, that related alerting capabilities are in place).
  • you inherently trust that some sufficiently qualified personnel will be available latest on the day when $CERT expires.
  • overall you inherently trust $ORG’s operational maturity to properly handle certificates ;-).

Looking closer you may also find out that $CERT is a wildcard cert covering the full domain of $ORG, so the initial assumption of trust (‘make sure we connect to the right system’) might be… debatable.
In short understanding the (hidden) trust relationships in an environment can generally be beneficial for prioritizing operational resources. Which brings me directly to the next point.

Trade-offs

The world of certificates is full of trade-offs (as, of course, are all settings with many different parties and their – differing – objectives). Here they are usually clustered around two main themes:

  • performing certificate validation at all ;-). This may sound strange at the 1st glance – I mean, using certificates only makes sense once you validate them, right? – but many of us know situations of the “oops, that expired cert over there breaks our service delivery right now. what about temporarily [ed.: by some definition of temporary 😂] disabling cert validation for the TLS connections between those systems to quickly fix the issue?” type. You may also look at the Wi-Fi authentication use case below.
  • how to determine if a certificate is (still) valid. This can be time-based, or based on checks of the revocation status, or both. Such checks (and the concept of certificate lifetimes/validity periods as a whole) are related to a specific property of certificates (them being capabilities, see next section), and these checks can induce significant operational complexity (e.g. see the post I referenced at the beginning of this one). I will cover certificate revocation & checking in a later part of this series.

Finding the right balance between objectives of different parties, read: going with the right trade-offs, can greatly help to efficiently steer operational resources (in all directions, e.g. increasing cert lifetimes between systems which are all part of the same – your – operational domain can be a good idea when cert expiry is a frequent cause of issues. better yet to increase the level of automation for renewal then ;-).
You may hence spend some intellectual cycles on understanding/questioning the trade-offs in your environment.
As stated above quite some of the trade-offs are commonly related to the most important, yet at times least understood, point of my little theory discourse here, that is:

Certificates are capabilities

Imagine there’s a subject (a user/process) that wants to access an object, e.g. a resource (network, file etc.). The enforcement mechanism controlling the subject’s access to the object can then look

  • at an attribute of the object itself (we could call it sth like ‘access control list’). This attribute/list is then checked every time the subject shows up and asks for access, and it’s usually maintained by the object’s owner.
  • for an entitlement (not to be confused with, but similar to these) which at an earlier point of time was granted to the subject and which generally allows some access. Such a thing is sometimes called a capability, and certificates can be perfect examples of capabilities (strictly speaking & technically the private key corresponding to a cert’s pub key constitutes the actual capability, but let’s keep it simple).

I’m using the above terms a bit loosely here, and there’s a lot of theoretical discussion in OS security circles on these. In any case capabilities have two main challenges:

  • Delegation: how can you make sure that one subject does not transfer the capability to another subject after it has been granted.
  • Revocation: if circumstances change (e.g. a system/key material is compromised or when a user leaves an organization) how can you make sure that the once-granted entitlement can no longer be used.

Both are well-known in certificate circles, and various architectural or technical approaches exist how to deal with them incl.

  • Come up with a flag (‘non-exportable’) for private keys and hope that the OS environment properly enforces it.
  • Store the private key(s) in some extra-secure place. That’s the main reason why smart cards once gained a lot of popularity in some industry sectors (namely heavily regulated ones like banks), and why hardware security modules (HSMs) exist.
  • Implement an additional layer where, at the very moment of a certificate’s use, some extra check of the ‘ok, it is still within its validity period, but has it been revoked?’ type happens. Voilà the birth of certificate revocation checking, and welcome to a whole new space of complexity, trust relationships, and trade-offs (=> detailed discussion in next post).

It should be noted that

  • revocation checks significantly change the trust relationships (“ok, I see the cert that you present to me. It was meant to create trust between you and me, but I’m not convinced. let me reach out to somebody else to verify.”)
  • they kind-of move the needle towards an object-based security model which many people intuitively prefer as it gives them the notion of being in control (also this is better aligned with many compliance frameworks 😉 ).

Let’s now discuss some certificate use cases from the above perspectives. In the following I will look at five of them (the first two in this post, the others in the next):

  • E-commerce web server offering HTTPS
  • Authentication in enterprise Wi-Fi networks
  • Client/user authentication (e.g. for VPN access)
  • Client/user authorization (as in “enrich a certificate with additional information which is then parsed in order to take security decisions like controlling access to a specific resource”)
  • mTLS 

Use case: e-commerce web server with HTTPS

This is probably the most classic use case, and it’s the one the paved the way for widespread use of certificates. When e-commerce became a thing, there were two challenges to be solved from a user’s (buyer’s) perspective:

  • How do I know I’m connected to the right server (assuming that this one only uses my credit card data for the goods I want to purchase)? 
  • How can I be sure that my payment data is not compromised when using the Internet for its transfer?

Both could be addressed by deploying a cert on the web server(s) and enabling HTTPS.

To note:

  • From a trust perspective this is a kind-of easy one. The user has a certain desire (e.g. to buy something, or to watch specific content) which generally highly influences trust decisions (otherwise Ponzi schemes wouldn’t work). The CAs were trusted as there were only a few of them, and their trustworthiness was rarely questioned or verified from the ppl requesting certificates (in the early days the latter sometimes even were part of a company’s marketing team, which usually have a more optimistic approach to life – than those ever-skeptical infosec folks – anyway).
  • From a company’s security objective perspective it was an easy one, too: none of the to-be-protected assets (user’s credit card data) were really of relevance (wrt to protection need) for the owners of the web servers. This only changed when PCI came up.
  • From an operations perspective it wasn’t particularly difficult either: certs had comparably long lifetimes (usually two years), there were only a few of them, and while renewal was known to be somewhat inconvenient it was at least less cumbersome than the initial request.

Use case: authentication protocols used in enterprise Wi-Fi networks

Pretty much all extensible authentication protocols (EAPs, some overview here) used in enterprise Wi-Fi networks employ certificates, some of them only on the side of infrastructure elements (e.g. PEAP), others (EAP-TLS) also for clients. Especially the latter one brings high operational complexity (see for example this old setup guide which my fine buddy Chris Werny authored many yrs ago). With that come both heavily differing objectives of the involved parties and quite interesting failure scenarios.
Let’s analyze some of the involved parties.

  • operators of the RADIUS servers. They might not be super-familiar with certificates, hence installing those may not a daily task for them, so they’d be happy with generally longer cert lifetimes.
  • ‘enterprise desktop team’ – they will strive for auto-enrollment & renewal, and again they will want to keep things simple (“why do they bother us with this certificate stuff, our life is already difficult”). This group/task could be outsourced (=> $CONTRACTOR1).
  • the users just want Wi-Fi to work, they (legitimately) don’t care about the underlying technologies, and they will happily click away any certificate-related warnings “as long as the damn corporate Wi-Fi works”.
  • the infosec people want to prevent the users from doing the latter, and they’d be happy if lifetimes of involved certificates were better shorter than longer. Bonus if they come up with the idea of implementing some additional scheme where “Wi-Fi (security) profiles” are mapped to certain parts of the certificate (did I already mention that certificates have various types of fields which can be overloaded populated with all types of information?)
  •  the operators of the whole Wi-Fi infrastructure want to keep the users happy. Some chance here that operations of (some parts of) the network infrastructure might be outsourced/provided by contractors ($CONTRACTOR2).
  • The CA issuing the involved certificates might be in-house, or not. Common scenario that this is another contracted service ($CONTRACTOR3). Bonus if the wireless infrastructure uses intermediary certificates from another CA ($CONTRACTOR4).

Let’s imagine at some point one of the following two things happens

  • Something breaks
  • One of the certificates, in particular on the infrastructure level (RADIUS server, AP, wireless controllers) level expires. High chance that the renewal requires human labor & skills and, evidently, requires touches availability-critical network infrastructure. Maybe the certificates in question are not monitored. Overall quite some probability that cert expiry leads to “something breaks”.

How well do you think, will CONTRACTORS{1,4} interact in such a case? – Exactly 😉

It should be noted that most of the above parties do not have a deep familiarity with certificates in their daily life. Those being mostly invisible until sth breaks, doesn’t help either (=> incentives?).
I can also tell you from practical experience (from my days as a network consultant in a US Fortune 10 company 15 years ago) that all of the above parties (except the infosec folks) will happily & immediately sacrifice all cert-related security properties once, say, 50K users might not be able to use the corp Wi-Fi anymore (due to expiring intermediate certs from a vendor with whom $CONTRACTOR2 had ended their contractual relationship). Then the following suggestions might show up on the table:

  • Can’t we just disable certificate validation as a whole, on certain $INFRASTRUCTURE_ELEMENTS?
  • What about publishing guidance in which we tell users to ignore certificate warnings?
  • Any chance of configuring some grace period, say 4–8 weeks, during which we still accept the expired certs? $VENDOR already promised us a custom image which somehow avoids the issue (don’t ask…).

If only some group of experts had reflected on the certificate deployment in that environment, its operational complexities, its inherent trust relationships, and the trade-offs between the different parties & their incentives earlier ;-).
If this post makes you think about these aspects in your own world, I’m a happy man. Thank you! for your time spent reading, and see you in a few weeks for the next part.

Published by Enno

Old-school networking guy with a certain focus on network security. This blog is a private blog and it contains private musings, even though I have a day job around the Internet Protocol. I leave it to the valued reader to guess which version of it ;-). Some tweets on related topics at https://twitter.com/enno_insinuator.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: