Launch browser. Enter username password. Log in. Done. How hard could it be? This is a challenge I run into when I talk about security and identity. My demos are so boring, because after all that jazz, you see “Hello username” written in Times New Roman. It really punctures the demo, doesn’t it? As it turns out, authentication can be quite involved. And it’s still evolving every day because this boundary to your application is under du- ress, constantly, and our enemies are getting smarter. That’s what this article is about. And although I’ll talk generically and stick to general concepts, I’ll use Azure AD as an example.
A Long Time Ago
A long time ago, when the internet was nascent, we used a text-based browser called Lynx. There were other mechanisms to access information on the internet, too. There was this protocol called SMTP, which is still widely used today. Did you know that SMTP existed basically as an unauthenticated protocol for almost three decades? All it took was understanding the protocol and Telnetting to the SMTP server, and you could send email as anyone. Boy, did I have some fun with that.
I used to wonder, how could this be so simple? I could set up three email addresses, A, B, and C. A forwards to B and B forwards to A and C, and C is the email address. Now, I could just initiate an unauthenticated email to A, and C will get overwhelmed by email messages, causing old-style DDOS (distributed denial of service), and you know what? It worked. This is how insecure the internet used to be, not too long ago. Frankly, with a slight modification, you could defeat inboxes even today.
This is what keeps me up at night. The internet, still, feels very flimsy, and thanks to my unimpressive demos, almost nobody cares about security in a project. It’s all about deadlines and features, and security is just an inconvenience to get past.
A very long time ago, we invented an authentication protocol called basic authentication. This was just sending the username and password over the wire in a header. We tried to secure it using HTTPS, but plenty of governments and organizations can sniff HTTPS traffic as a man-in-the-middle. So we came up with NT Lan Manager version 1 (NTLMv1), which was easily defeatable. Followed by NTLMv2, which was better and is still in wide usage today, but has plenty of shortcomings. And perhaps the most common legacy protocol in widespread use today is Kerberos. Kerberos relies on a central authority and plenty of network chit chat, so it didn’t scale to the internet.
As these dinosaur protocols roamed the earth, a meteorite called the internet came by to spoil their party. Organizations tried to solve the puzzle by using VPNs and trust relationships between active directories, but you can’t survive a direct hit from a meteorite like the internet. Well, to be precise, some dinosaurs survived. I mean, we still have crocs and gators around us, right? Kerberos still has a place on your intranet. But we needed something different, something that focused on the internet.
WS-Fed and SAML
In the early 2000s, the WS-* standards started taking shape. One of these was WS-Fed, and with it came SAML, security assertion markup language. SAML is an XML packet format. In the early 2000s, accessing the internet was either through a website, i.e., a browser on your desktop, or via protocols such as FTP, Telnet, etc. It became clear that websites could offer value if we found a secure way for users to log in.
A post and redirect-based standard emerged, called WS-Fed. At a high level, it separates the responsibility of the IdP (or identity provider), the entity that performs authentication, and the RP (or relying party), also known as the service provider, which is the application you’re trying to access. The RP trusted the IdP, and this trust was established using certificates. The idea was that the user lands on the RP, and the RP says, “hey, you aren’t authenticated, so please go here to prove who you are.” The user could optionally be given more than one choice of an IdP. The user goes to the IdP, proves who they are (via credentials such as username, password or more), and the IdP sends back a SAML packet with enough information about the user that the RP can use to establish identity and proceed.
This “enough information” is the attributes about the user, also known as claims.
WS-Fed served us well for many years. One of the products that used it was SharePoint. As time progressed, the demands of applications increased. For instance, they expanded who could initiate an authentication. If the RP initiates an authentication, can it request specific claims in specific situations?
Over time, a new protocol called SAML 2.0 was developed. Hold on for a second there. I thought SAML was a packet format? Well yes, it is a packet format, and a protocol. SAML 2.0 is a protocol, which also uses SAML assertions (fancy name for packets) to perform the authentication dance.
SAML has served us well for many years. These days if someone says, “hey we use SAML-based authentication,” they’re probably talking about SAML 2.0. SAML is still in use in many enterprises, but SAML is designed for the web. Now it’s been shoehorned into other scenarios, such as mobile, but it was never designed for mobile.
I should note that whenever we talk about authentication and its history, you’ll hear dissenting voices. You will find a very smart person explain how SAML works perfectly fine for mobile apps. Sigh! You can also row to Japan in a little boat, but that doesn’t mean it’s a good idea. (For my readers in Japan, just row to Hawaii instead.)
The World Today
A few years back, Steve Jobs took the stage and released a pocket-based device that was an internet communicator, web browser, and phone all rolled into one device, called the iPhone. Years later, Tim Cook calls the same tired design an “all new design” for the last 10 years in a row. But the iPhone was revolutionary. Google shortly thereafter came up with another very capable platform called Android. And Microsoft give it a good college try with Windows Phone. Whichever way you looked, the world had changed. Just think of all you did on your phone in the last week. Did you order food? Did you order a cab? Did you share your location? Take pictures? What websites did you visit and how many times did you sign in? We take the simplicity of it for granted, but when I check my security camera with Siri, a lot happens.
First, AI recognizes my voice and the command I issue. More importantly, it differentiates me from some other person. Then, an authenticated request goes from my phone to the cloud and bounces across numerous servers in the cloud, which are not all controlled by the same vendor. The request then lands on my security camera, which then sends a secure feed to the cloud, which is then streamed securely to my screen, all the while making sure I’m authenticated and the stream for me is viewed by me, so I can answer the doorbell and receive the pizza I ordered on another app, talking to another system.
There is a lot at play here, but let’s break it down in authentication terms.
First, the action from Siri talks to an app on my phone which securely communicates using my identity to a cloud-based service. This means that the app has a secure way of remembering my identity. And apps must use a consistent mechanism because we cannot rely on every app reinventing the wheel.
That cloud-based service needs to communicate to various nodes, sometimes to get a secret, sometimes to get some configuration. Here we have an example of server-to-server communication not done under the user’s identity. Additionally, all the infrastructure powering those containers needs to be tracked, paid for, patched, upgraded, etc. All that has its own layers of security and identity.
Then the request comes from there to my camera. Here, a bunch of network boundaries must be crossed. Also the camera’s identity must be confirmed. There’s an IoT device proving its identity to the cloud and the cloud making sure this identity matches with a camera on my account.
Then a stream is sent to my phone and my mobile device should somehow receive data securely, so we have transport layer security (TLS) at play, my device’s identity, and my identity in play.
You see, the world is a lot more complex than a web app redirect can handle.
Meanwhile, a delegation protocol called OAuth emerged, mostly pushed by social media companies with very poor intentions. OAuth meant that I’m allowing website A to do X on my behalf based on an identity proven by website. The word “website” could be replaced by any technology. The problem is that to do X, A requests all sorts of permissions from B. And people would just say “Okay” and click so they could participate in an internet fight.
One concern about X is that we’d like to know the user’s identity. So this delegation protocol effectively ended up being used as an authentication protocol. The problem is that nobody agreed on which claim establishes the user’s identity. Also OAuth 1 was horrendously complex because it didn’t bake in the requirement of HTTPS.
The world then agreed that we needed something better, and OpenID Connect (OIDC) emerged. OIDC was OAuth + standards. The world agreed that we will have certain kinds of tokens, certain kinds of endpoints, and certain minimum claims in tokens to qualify to be OIDC-compliant. Additionally, the world agreed on certain flows/grants to support scenarios, such as mobile apps, web apps, etc.
And this is where we are today.
A Peek into the Future
Okay, let’s take a little diversion and talk about the future before I return to talking about the main theme of this article, which is OIDC and Web App authentication. OIDC is maturing every single day. As new threats and scenarios emerge, a lot of smart people collaborate and improve upon OIDC. There are some interesting new standards emerging that you can expect to see in the future.
You have proof of possession that will protect tokens from being exfiltrated. To be precise, an exfiltrated token will be useless because the server can verify that it didn’t come from the original issuer.
Conditional access evaluation (CAE) allows the IdP to inform RPs that an access token is now invalid due to revocation at the server. The problem is that access tokens are usually valid for a short but too long duration. Short because it isn’t days or months. Too long because the typical 30-minute to two-hour duration is far too long for many scenarios. CAE bridges that problem, allowing RPs to know if an access token, while still valid, belongs to an invalid user.
Step-up authentication is great for scenarios where an elevated action or anomaly detection can cause the user to reauthenticate with a higher strength factor to continue. This is a very good compromise between convenience and security.
Verified credentials are essentially credentials that the user controls, but that originate from an attesting authority. For example, you could create a VC based on your driver’s license, but the credential and the various attributes are yours to keep and secure. They are proven via a distributed technology such as blockchain. And you’re the owner of that identity, so next time someone asks you to prove if you are over 21 years of age, you can share only that information instead of sharing your address, eye color, height, weight, etc.
Broker-based authentication is another emerging standard in the Microsoft space. Brokers involve some executing code on your computer, typically very secure and built into the OS, that can back a primary refresh token to secure hardware. This means that they can use a primary refresh token to ask for access tokens for more than one audience, giving the user the convenience of single sign on across a family of apps and also giving the user greater security.
A Speedy OIDC Tutorial
Treat this section as the bare minimum you need to know about OIDC and it’s intentionally incomplete. There are three things you need to know about OIDC: tokens, endpoints, and grants.
Tokens
In OIDC, among other tokens, there are three main tokens you need to know about. These are issued by the IdP to the RP.
The first is an ID token. An ID token establishes the user’s identity. The token is signed by the server (IdP) and its veracity can be verified by the RP. The ID token contains a bare minimum set of claims in a standard format. No matter what IdP issues an ID token, you know that there’s a standard mechanism of verifying it, and a standard claim you need to look at to know the user’s identity. Once the app receives an ID token, it “logs the user in.” The exact definition of login depends on the kind of application and platform. For instance, a web-based app may establish a web-based session. A mobile app, on the other hand, could use a different approach.
An access token is what you use to establish the identity of a user or process to an API. The access token is also issued by the IdP and it also can be verified by the RP, in this case, the API, for veracity and there are standard claims in an access token that help the API establish the caller’s identity and the validity of the access token. The access token is valid for a short duration. Although the OIDC standard doesn’t mandate a specific duration, it’s typical to see access tokens between 30 minutes to two hours, although plenty of exceptions exist. For example, Azure AD-managed identity access tokens are valid for 24 hours.
Because an access token is of a shorter duration and you don’t want to show the login UI to the user every single time the user logs in, you’re given another kind of token called the refresh token. A refresh token, typically an opaque string, can be used to request new access tokens. Usually, you’ll ask for a new access token when the old access token is about to expire. This is so that the user sees an uninterrupted experience and isn’t shown a log-in UI when, from the user’s perspective, they never logged out. A refresh token also allows you to use the user’s identity when the user isn’t present. This is why you ask for a specific scope, called offline_access, to request a refresh token.
Endpoint
There are many endpoints in OIDC. There is one where you discover the OIDC configuration of an IdP. There’s another where you can get the signing keys from. There’s another that you can use for introspection, which is evaluating a token just in time by sending it back to the IdP. And yet another for a special flow called device code flow.
But there are two key endpoints that you must know about: the authorize endpoint and the consent screen.
The authorize endpoint is where the user proves who they are (via entering a password or doing MFA or FIDO2 etc.) and the user consents to whatever the requesting application needs. In the case of Azure AD, this endpoint is at this URL: https://login.microsoftonline.com/{tenant}/oauth2/v2.0/authorize.
The {tenant} is either the name or GUID tenantID of the tenant. For example, Microsoft Corp’s tenant is at microsoft.onmicrosoft.com, so you can try visiting https:// login.microsoftonline.com/microsoft.onmicrosoft.com/ oauth2/v2.0/authorize to see what Azure AD does. You should get an error saying that a client_id was not present. Well, that makes sense. The client_id is the unique ID of the application you’re trying to sign into. With no application ID and no sign in, you’re given an error.
The other screen you’ll see after you’ve proven your identity is the consent screen.
Certain permissions require you to be an administrator. Tenants are frequently configured by tenant admins so that average (non-admin) users cannot grant consent. Or you may have custom consent policies where certain kinds of consents can be granted by the average user, and others need approvals. Consent, as you can imagine, is a primary way that information leaks out to third parties. All it takes is to download an app from an App Store and grant consent where you shouldn’t have. It’s no surprise that, over the years, Microsoft has invested and built a lot of thought, process, and features to lock this facility down.
The other endpoint you need to know about is the token endpoint. The token endpoint for Azure AD is at this URL: https:// login.microsoftonline.com/{tenant}/oauth2/v2.0/token.
Usually, this isn’t an endpoint that the user interacts with. But the application issues a POST request here, and exchanges one kind of token, such as an auth_code, for an ID token, refresh token, or access token. This, of course, depends on the grant type in use, which is the next thing you need to know about.
Grants
OIDC is a standard, or, more accurately, an umbrella of standards. It’s designed to support many kinds of applications. For example, it can support single page applications, or web applications, or mobile apps, etc.
Here are some important grant types you should be familiar with.
Implicit Grant was designed for single page application (SPA) kinds of applications. It relied on a hidden iframe to renew an access token. This meant that the RP at www. yourrp.com is making requests on a hidden iframe to login. microsoft.com, which is the IdP. Modern browsers block this because this request looks shockingly like a tracker. Safari was the first browser that blocked this and now all modern browsers do. As a result, this flow is no longer recommended, and you should use auth code with PKCE instead.
Authorization code grant provides a way to retrieve tokens on a back channel as opposed to the browser’s front channel. It also supports client authentication. This means that after you prove your identity, you get a one-time use auth code, which you use to exchange for an ID token, refresh token, or access token. You can certainly use this grant type on its own, but it’s common to combine it with identity tokens, which turns it into the so-called hybrid flow. Hybrid flow gives you important extra features, like signed protocol responses.
Hybrid grant is a combination of the implicit and authorization code flowit uses combinations of multiple grant types. Usually in an auth code grant, you ask for a code. Here you can ask for multiple token types, typically code id_token. This means that the IdP returns both the code and id_token in one request response. This is great for applications that don’t need to call an API. Effectively, they skip an extra hop to the IdP in the log-in process because they no longer have to exchange the code for an ID token. The back channel can still be used to retrieve the access and refresh token. In hybrid flow, the identity token is transmitted via the browser channel and contains the signed protocol response along with signatures for other artifacts like the authorization code. This mitigates a number of attacks that apply to the browser channel.
Client credentials grant is the simplest grant type and is used for server-to-server communication. The tokens never have a user identity, and requesting a token is a matter of a POST request to the token endpoint with the client ID and a credential (secret or a derivation of a certificate). The return is always an access token, and never a refresh token. Also, because this is server-to-server, and there’s no user interface, there’s no opportunity to do consents, so all consents must be done ahead of time.
Resource owner password (ROPC) grant allows you to request tokens on behalf of a user by sending the user’s name and password to the token endpoint. You can imagine why this is a bad idea. It gets past many conditional access policies, it thwarts many security protections, and the app gets to know the user’s password. I’ve been in so many meetings where someone insisted they must have this because it allows them to own the native UI for the login experience. I try very hard to discourage anyone from using this grant type.
Device flow grant is designed for input-constrained devices, where the device is unable to securely capture user credentials. This flow shows you a code, which you use to sign into a different, more capable device. This flow is typically used by IoT devices and can request both identity and API resources. I’m not a fan of this grant type because authentication is being performed on a device that’s different from the device where the tokens are sent. This means that all protections apply on a device, one that is different from the device where tokens are sent, effectively defeating many of those protections.
I also see this particular flow being misused way more than it should. I see numerous desktop applications and lazy developers making up for shortcomings of their architecture by piggybacking this flow. In fact, I’ve literally seen some major applications where they give you a code, you authenticate somewhere else, they show you an access token, that you are required to copy paste somewhere else. I’m not going to name this application, but you know who you are and you suck. Your first sin is using device code where you didn’t have to. Your second sin is to show the access token as plain text to the user. Your third sin is to allow the access token to move from device to device and still be valid. Anyway, I’m getting carried away.