Skip to content

Allow authentication by JWT Bearer token#7826

Open
melton-jason wants to merge 19 commits intomainfrom
issue-5163
Open

Allow authentication by JWT Bearer token#7826
melton-jason wants to merge 19 commits intomainfrom
issue-5163

Conversation

@melton-jason
Copy link
Contributor

@melton-jason melton-jason commented Mar 18, 2026

Fixes #5163

This PR allows a new method of authenticating with the API. Specifically, this PR allows authentication via JWT Bearer tokens.

Previously, the required workflow to authenticate via the API required:

  • Sending a GET request to an endpoint that doesn't require clients to be logged in (e.g., /context/login/).
  • Extracting the CSRF Token from the response's cookies
    • The token must be passed as a X-CSRFToken header with the each unsafe request made to the backend
  • Sending a PUT request to /context/login/ with the user's name, password, and collection id

For an example, the prior workflow can be modeled by something like the following Python pseudo code (inspired by the requests library):

initial_resp = session.get("/context/login/")
# collections is the mapping of collection name to collection id
collections = json.loads(initial_resp.content)["collections"]
# We need to store the CSRF Token for later
# It then must be passed along with every unsafe request (PUT, POST, DELETE)
# in the X-CSRFToken header
csrf_token = initial_resp.cookies["csrftoken"]

login_resp = session.put("/context/login/", json={"username": "myuser", "password": "mypassword",
                         "collection": my_collection_id}, headers={"X-CSRFToken": csrf_token})

if login_resp.status_code != 204:
    # invalid credentials
    return

# now the user is logged in
# note they still have to pass the CSRF Token if they want to make an unsafe request

# For example, to create a new John Doe Agent: 
new_agent = session.post("/api/specify/agent/",
             json={"agenttype": 1, "lastname": "Doe", "firstname": "John"},
             headers={"X-CSRFToken": csrf_token})

With the new approach, users of the API only require:

  • The ID of the Collection they wish to perform actions in (this can still be retrieved from the prior /context/login/ GET endpoint)
  • Sending a POST request to /accounts/token/ with their username, password, and desired collection id to retrieve an access token
  • In future requests, send the access token within an Authorization header

Overview

Acquiring an Access Token

Access tokens can be acquired by sending a POST request to /accounts/token/ and passing the username, password, collectionid, and optionally expires.

If the request is successful, the access token is retrievable by the access_token key in the response's JSON output.

By default, access tokens last 1800 seconds (30 minutes), but their lifespan can be configured (see below Setting a token's lifespan).

Example with curl:

> curl -d "username=myuser&password=mypass&collectionid=4" http://localhost/accounts/token/
{"access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOjEsInVzZXJuYW1lIjoic3BmaXNoYWRtaW4iLCJjb2xsZWN0aW9uIjo0LCJqdGkiOiI2NWMzMmYwNy1hYTMzLTQxN2MtYjI2Ny02MDQwOGQyOTQ0ZjYiLCJpYXQiOjE3NzM5NDUxODIsImV4cCI6MTc3Mzk0Njk4Mn0.s3FTc9EeObiSmm9FLywlpdHkXMKiAob1QuVkW8pp3_o", "expires_in": 1800}

In the above case, the resulting access token is eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOjEsInVzZXJuYW1lIjoic3BmaXNoYWRtaW4iLCJjb2xsZWN0aW9uIjo0LCJqdGkiOiI2NWMzMmYwNy1hYTMzLTQxN2MtYjI2Ny02MDQwOGQyOTQ0ZjYiLCJpYXQiOjE3NzM5NDUxODIsImV4cCI6MTc3Mzk0Njk4Mn0.s3FTc9EeObiSmm9FLywlpdHkXMKiAob1QuVkW8pp3_o.

Example with Python requests

import requests
session = requests.Session()

resp = session.post('http://localhost/accounts/token/', data={
    "username": "myuser",
    "password": "mypass",
    "collectionid": 4
})

response = json.loads(resp.content)

Setting a token's lifespan

By default, access tokens last 1800 seconds (30 minutes).
An access token's lifespan can be set by passing in an expires attribute when requesting the token. The backend expects expires to be in seconds.

Once an access token expires, it will not be usable and a new access token needs to be generated.
An access token can be made invalid regardless of its expiration time by revoking it (see Revoking an Access Token).

Example of generating an access token that's live for 5 minutes (300 seconds) with curl:

curl -d "username=myuser&password=mypass&collectionid=4&expires=300" http://localhost/accounts/token/

Example of generating an access token that's live for 5 minutes (300 seconds) with Python requests:

import requests
session = requests.Session()

session.post('http://localhost/accounts/token/', data={
    "username": "myuser",
    "password": "mypass",
    "collectionid": 4,
    "expires": 300
})

Using an Access Token

Once an access token is generated, it can be used by passing it in subsequent requests by the Authorization header with the Bearer scheme.
In other words, the general form of the Authorization header should look like Authorizarion: Bearer <my_token>, where <my_token> is replaced with the access token.

Example of fetching the institutional hierarchy (Institution, Division, Discipline, Collection) for each Collection using curl:

> curl -H "Authorization: Bearer my_token" "http://localhost/api/specify_rows/institution/?fields=name,divisions__name,divisions__disciplines__name,divisions__disciplines__collections__collectionname"
[["University of Kansas Biodiversity Institute", "Ichthyology", "Ichthyology", "KU Fish Observation Collection"], ["University of Kansas Biodiversity Institute", "Ichthyology", "Ichthyology", "KU Fish Teaching Collection"], ["University of Kansas Biodiversity Institute", "Ichthyology", "Ichthyology", "KU Fish Tissue Collection"], ["University of Kansas Biodiversity Institute", "Ichthyology", "Ichthyology", "KU Fish Voucher Collection"]]

Example of creating a new agent using Python requests:

import requests
session = requests.Session()

resp = session.post("/accounts/token/", data={
    "username": "myuser",
    "password": "mypassword",
    "collectionid": my_collection_id
})

token = json.loads(resp.content)["access_token"]

session.post("/api/specify/agent/", json={"agenttype": 1, "lastname": "Doe", "firstname": "John"}, headers={"Authorization": f"Bearer {token}"})

If the token is invalid, expired, or revoked then Specify will return a 401 Unauthorized response with the WWW-Authenticate headers indicating an invalid token:

> curl -I -H "Authorization: Bearer my_invalid_token" http://localhost/api/specify/collectionobject/
HTTP/1.1 401 Unauthorized
Server: nginx/1.29.6
Date: Thu, 19 Mar 2026 19:13:31 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
WWW-Authenticate: error="invalid_token", error_description="The access token is expired, revoked, or invalid"
Vary: Accept-Language
Content-Language: en-us

Revoking an Access Token

An access token can be made invalid by revoking it. To revoke a token, a POST request can be sent to /accounts/token/revoke/ where the request body includes the token to be revoked under an access_token key.
The client must be authenticated (whether via the previous session authentication or by access token) to make the request.

The same token that is being used to authorize the request to revoke an access token can be revoked. That is, an token can revoke itself.

Below is a snippet of Python that shows how to revoke an access token:

session.post("/accounts/token/revoke/", headers={"Authorization": f"Bearer {my_existing_token}"}, data={"access_token": my_token_to_revoke})

If the token to be revoked is invalid or expired, a 400 Bad Request is returned by the server.

OpenAPI

If you need a reminder/refresher about the token endpoints, they are documented and available to try out at the instance's Operations API page (accessible via User Tools)

Screenshot 2026-03-19 at 3 13 48 PM

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add pr to documentation list
  • Add automated tests

TODO

  • (In this PR or in the future) Support passing a refresh token along with an authorization token when providing the access token to the client. Decrease/limit the lifespan of access tokens and instead allow refresh tokens to assign new access tokens to an "already authorized" client.

Testing instructions

In your testing, you can use any client that supports sending HTTP/HTTPS requests: curl, Postman, any supported programming language, etc.

  • Send a POST request to /accounts/token/ containing the username for the user you want to login as, the password, and the desired collection

  • Ensure the access token is returned, and record the access token for use in future requests

  • Send a "safe" request (one with a GET method) that requires permissions (such as fetching a specific record or a collection of records) and set the Authorization header of the request to Bearer <my_token>, replacing <my_token> with your access token

  • Ensure the request can be fulfilled and the correct data is returned

  • Send an "unsafe" request (one with a POST, PUT, DELETE method, such as creating a new record, updating/delete a record, etc.) and set the Authorization header of the request to Bearer <my_token>, replacing <my_token> with your access token

  • Ensure the request can be fulfilled and the requested operation successfully performed

  • Generate an access token with a short time to live (lifespan)-- such as 30 seconds, 1 minute, 3 minutes, etc.

  • Wait for the token to expire and the time to live to elapse

  • Send a privileged request using the access token and ensure the request fails and the response has a 401 status code

  • Revoke an active access token that is still going to be live by the time the next step is performed using the /accounts/token/revoke/

  • Send a privileged request using the revoked access token and ensure the request fails and the response has a 401 status code

  • Attempt to generate an access token to a collection that exists but that the user does not have access to

  • Ensure server returns with a 403 Forbidden status response and does not generate the access token

@melton-jason melton-jason marked this pull request as ready for review March 19, 2026 18:28
@melton-jason melton-jason added this to the 7.12.1 milestone Mar 19, 2026
Copy link
Member

@acwhite211 acwhite211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement to our API authentication 👍

Comment on lines +642 to +649
@require_POST
@csrf_exempt
def acquire_access_token(request):
username = request.POST.get("username")
password = request.POST.get("password")
collection_id = request.POST.get("collectionid")
raw_expires_in = request.POST.get("expires", DEFAULT_AUTH_LIFESPAN_SECONDS)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we cap expires parameter on the server side? Right now any positive value is accepted, so a client could technically create a very long lived bearer tokens. A max TTL setting or clamp might be a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, from a strictly security standpoint I'd like to cap TTL. Using access tokens as something like an API key would not be recommended and largely actively discouraged.

I left it uncapped because the user still has to be very intentional if they want to issue an access token with a non-default, and possibly very long, lifespan.
Essentially: allow the client the flexibility to issue access tokens with any non-zero lifespan to fit their needs, but also place the burden on the client on knowing the security implications if they do issue a longer-than-reasonable access token.
Simple applications can become a little more complex if they have to worry about any request in their pipeline returning a 401 Unauthorized and programmatically obtaining a new access token if the lifespan is too short (though at that point we might recommend the prior session-based authentication if they don't want to deal with that complexity 🤷).

I was actually considering setting the default access token lifespan to 15 minutes rather than 30 (15 minutes seems to be the de-facto standard for access token lifespan in the industry), but figured most Specify applications would be longer-lived, and we don't have a refresh token equivalent at the moment, so for simplicity of (hopefully) most API clients, I left it longer-than-normal.

What do you think would be a reasonable cap on access token lifespan, given that we currently don't have a refresh token equivalent?

Comment on lines +233 to +245
RUN cat <<'EOF' > settings/secret_key.py
import os
import secrets

current_key = os.getenv('SECRET_KEY')

if current_key is None or current_key.strip() == "" or current_key.strip().replace(" ", "_") == "change_this_to_some_unique_random_string":
new_key = secrets.token_hex(16)
else:
new_key = current_key

SECRET_KEY = new_key
EOF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to be generating a random key at import time? Would this cause any problems with differing keys when restarting containers, or when Django is running on multiple processes gunicorn -w 3? Not for sure, might be fine.

@github-project-automation github-project-automation bot moved this from 📋Back Log to Dev Attention Needed in General Tester Board Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Dev Attention Needed

Development

Successfully merging this pull request may close these issues.

Improve API authentication

2 participants