Joe Thomas | Writing a REST API with Dream

Writing a REST API with Dream

In the last few weeks I've been building a simple REST API using the Dream web framework. Writing APIs has been one major component of my paid programming work and I wanted to compare and contrast the experience of writing an API in a strongly typed functional programming language with the same workflow in a dynamically typed language like Python.

You can find the source code for this project here. Hopefully, this repo will be a useful resource for anyone who wants to look at a slightly larger example application that uses Dream. Though I'm still a beginner, I was surprised how easy it was to complete familiar tasks in this framework. I'm optimistic about the future of web programming in OCaml!

This post provides a sort of "experience report" for working with Dream, Yojson, and Caqti. Obviously, these types of reports are highly subjective and you should run your own experiments too! I'll be comparing with my experience in working with Pyramid and SQLAlchemy, since those are the Python projects I've used most.

Finally, it's important to point out that Dream is in alpha (version 1.0.0~alpha2 as of this writing). This is an early version! Comparisons with frameworks like Pyramid or Flask are in some sense apples-and-oranges, because those projects have had a long time (and many engineering hours) to mature. Ultimately, the point of this post (and the repo that goes with it) isn't to make make value judgements ("framework X is good, Y is bad"), but rather to understand how to translate concepts from one workflow to another.

Problem Definition

I decided to build a API for managing time series data. My requirements for the project were as follows:

The API will allow creating, reading, and deleting "sensors". A sensor is an abstract IoT device that measures a floating point quantity at a specific cadence (for example a weather station measuring average hourly windspeed).
Every sensor gets an API key that allows it to upload (POST) data to an endpoint.
A GET endpoint will allow users to retrieve the data a sensor generated between two dates.
The API sends and receives data formatted as JSON and must be backed by a PostgreSQL database.
The API should have login/logout endpoints for a (not yet built) frontend to use.
Finally, a user should only be allowed to access sensor data that belongs to them.

I chose these requirements because they cover a number of different read/write operations that require data formatting/parsing and tracking relationships between four different entities (users, keys, sensors, and readings).

Dependencies and Setup

For this build, I needed three core dependencies:

A server to handle web requests (Dream).
Support for parsing and synthesizing JSON records.
A library for integrating with PostgreSQL.

After a bit of research, I settled on Yojson and Caqti for (2) and (3) respectively. Both of these are fairly standard and appear in the excellent corpus of examples that accompany Dream. Since I had previously written an example of how to use Dream with a dockerized PostrgreSQL container, I used that as a starting point.

It's useful to have a way to run (and re-run) ad-hoc requests against the API during development. I used Insomnia for this, but other tools like curl would work too.

Adding an endpoint

To give a sense of my development workflow, I want to walk through my process for adding a simple endpoint, /api/login. Inside the router, that endpoint looks like:

Dream.post "/api/login" login;

In this excerpt login is a "handler". Handlers are responsible for translating requests into responses; this is captured in their type signature.

Conceptually, the login handler needs to do three things:

Receive a JSON body and confirm it's formatted as expected.
Look up the relevant user/password combination in the database.
Load the relevant session information if the credentials are valid.

For the first part, the endpoint needs to receive a JSON payload that looks like {"username": "...", "password": "..."}. To describe that payload, I introduced a simple record type:

type login_doc = {
  username : string;
  password : string;
} [@@deriving yojson]

Adding [@@deriving yojson] means the compiler can generate functions to convert these records to and from JSON. In this case those functions are called login_doc_of_yojson and yojson_of_login_doc.

My login handler looked roughly like this:

let json_receiver json_parser handler request =
  let%lwt body = Dream.body request in
  let parse =
    try
      Some (body
      |> Yojson.Safe.from_string
      |> json_parser)
    with _ ->
      None
  in
  match parse with
  | Some doc -> handler doc request
  | None ->
    { error="Received invalid JSON input." }
    |> yojson_of_error_doc
    |> json_response ~status: `Bad_Request


let login =
  let login_base login_doc request =
    let%lwt user_id = Dream.sql request
        (Models.User.get login_doc.username login_doc.password) in
    match user_id with
    | Some id ->
      let%lwt () = Dream.invalidate_session request in
      let%lwt () = Dream.put_session "user" (Int.to_string id) request in
      Dream.empty `OK
    | None -> Dream.empty `Forbidden
  in
  json_receiver login_doc_of_yojson login_base

If a user uploads an invalid JSON body, this will cause login_doc_of_yojson to throw an exception. By default, this produces a 500 server error response. To handle this situation more gracefully, I introduced json_receiver. If the parser passed to json_receiver succeeds on the request body, the results are passed to the inner handler. Otherwise, the server responds with a 400 Bad Request. Re-using json_receiver across my endpoints allows me to avoid introducing atry/with block any time time I need to handle JSON input.

I decided to manage models in a separate library, Models. The build tool, dune, made this easy to do; I just needed separate dune files for the server/model and server/bin folders. I like this design because it simplifies re-use of the database portion of my project. For example, if I later needed to build a CLI admin tool, that tool could utilize my existing queries without being exposed to API concerns.

Inside User, the get query function is defined as:

let get =
  let query =
    R.find_opt T.(tup2 T.string T.string) T.int
      "SELECT id FROM app_user WHERE username = ? and password = ?" in
  fun username password (module Db : DB) ->
    let%lwt user_or_error = Db.find_opt query (username, password) in
    Caqti_lwt.or_fail user_or_error

Caqti allows us to define a function that consumes our query parameters (the username and password) along with a database connection, and produces a promise that will resolve with the results of the query. In this case, the types encode that the function produces an int option containing the user's primary key if the credentials are valid. (Note that it's not a good idea to store passwords in plain text like this; this is just for illustrative purposes.)

Session management, the final item that the endpoint needs to address, is handled by Dream. The framework allows sessions to be stored in cookies, memory, or the database; I opted for database sessions because I had used similar session backends in the past. Sessions allow us to store pairs key/value strings across requests. I used the session just to store the logged-in user's ID, so that the API has easy access in later database queries.

Thoughts on the Dream Router

The final router for my server looked like this:

let () =
  Dream.run ~interface:"0.0.0.0"
  @@ Dream.logger
  @@ Dream.sql_pool "postgresql://sensors@127.0.0.1/sensors"
  @@ Dream.sql_sessions
  @@ Dream.router [
    (* More endpoints here ... *)

    Dream.get "/version" version;
    Dream.post "/api/login" login;
    Dream.post "/api/logout" logout;

    Dream.scope "/api" [login_required] [
      Dream.get "/user/:user_id" user;
      (* More endpoints here... *)
    ];

    Dream.scope "/sensor" [] [
      Dream.post "/upload" @@ api_key_required sensor_upload;
    ]
  ]
  @@ Dream.not_found

I really appreciate how Dream makes it possible to get a concise overview of the API; in some Pyramid projects, I found this wasn't always possible. In fact, in Pyramid I sometimes struggled with subtle routing bugs that were not obvious until run time. Having the compiler validate the router in Dream was a welcome change.

At the moment, Pyramid makes it somewhat easier to handle access/permissions concerns compared to Dream. Working in Pyramid, it was relatively common for the routes to manage some amount of parameter validation and permissions. For example, given a path /user/123/article/abc456, the route (or "resources" in Pyramid terminology) would be responsible for:

Extracting the user and article IDs (123 and abc456).
Determining those records actually exist in the database, and passing them to the view/handler in a context record.
Validating that the requester has permissions to operate on those User and Article records.

This is discussed a bit more in the Pyramid Docs under URL traversal. Effectively, Pyramid resources mean that handler/view code downstream can focus on updating database records and/or rendering HTML/JSON without worrying about permissions concerns.

I didn't want to build a permissions system for my API so instead I incorporated User IDs into my function signatures in Model and used inner joins to model permissions. For example, here is an example of a query that fetches the metadata for all sensors belonging to a particular user:

SELECT s.name, s.description, k.uuid
FROM sensor s
INNER JOIN user_sensor us
 ON us.sensor = s.id AND us.app_user = $1 AND us.sensor = $2
INNER JOIN api_key k
 ON s.api_key = k.id

By joining against user_sensor, I ensure that a user only gets data for the sensors they own.

I should point out that Dream allows us to use a custom router, however! The project has an issue for more elaborate routing and Ulrik Strid has introduced dream-routes, which allows additional type information to be encoded in routes. So, writing a more sophisticated router that behaves like Pyramid's resource system is certainly possible.

Comparing Caqti and SQLAlchemy

As a web programmer, especially one working on an analytics project, it's important to get comfortable working with the database library you've chosen for your project. Indeed, a significant portion of this project consisted of familiarizing myself with Caqti.

Caqti is a bit different from SQLAlchemy. With SQLAlchemy, you typically define one class for each table. Then, using SQLAlchemy's metaprogramming features, those classes are used to define queries. For comparison, here are equivalent queries against the user table:

Caqti

let get =
  let query =
    R.find_opt T.(tup2 T.string T.string) T.int
      "SELECT id FROM app_user WHERE username = ? and password = ?" in
  fun username password (module Db : DB) ->
    let%lwt user_or_error = Db.find_opt query (username, password) in
    Caqti_lwt.or_fail user_or_error

SQLAlchemy

db.session.query(User) \
    .filter(
        User.username == username,
        User.password == password
    ).one_or_none()

Working with Caqti, a mistake in the syntax of a SQL query won't be discovered until runtime. I made fewer mistakes like this with SQLAlchemy because most queries are written in Python and can be linted by the IDE. Another recurring point of confusion for me with Caqti had to do with matching the function on the Caqti_request (R.find_opt above) with the function invoked in Db (Db.find_opt); this is how we indicate that the query produces zero, one, zero/one, or multiple rows. If the two don't agree, the program won't compile; initially I struggled to understand what this compiler error meant:

Error: The function applied to this argument has type
         ?env:(Caqti_driver_info.t -> string -> Caqti_query.t) ->
         ?oneshot:bool ->
         ('a, unit, [< `Many | `One | `Zero > `Zero ]) Caqti_request.t
This argument cannot be applied without label

Using Caqti became easier as I became accustomed to thinking in terms of prepared queries. About halfway through the project I realized I needed to switch from SQLite to PostgreSQL, and changing databases was fairly painless because Caqti supports both.

I also made use of Caqti's features for dealing with custom column types. Caqti does not have built-in support for JSON columns, but adding a custom column type to handle this was straightforward:

type readings = float option array [@@deriving yojson]

let sql_readings =
  let encode (a: readings) =
    Ok (a |> yojson_of_readings |> Yojson.Safe.to_string)
  in
  let decode text =
    Ok (text |> Yojson.Safe.from_string |> readings_of_yojson)
  in
  T.(custom ~encode ~decode string)

In my case, the JSON that I needed to store consisted of arrays of floating point numbers, so building the custom type was simply a matter of connecting Yojson and Caqti in the right way.

Discussion

Ever since I started building web applications in Python, I've thought of an API server as a big function that transforms requests into responses, with any nontrivial state residing in databases or services like S3. Dream provides a modern web framework that matches my mental model for how a web server should work.

Web programming in OCaml "feels" a bit different than working in Python. In Python it's possible to get something running quickly, but it's also easy to forget corner cases and introduce defects that have to be addressed later in the application's lifecycle. An OCaml project requires some up-front investment, but this investment pays off later in several ways. First, fast feedback from the compiler (together with editor integrations like merlin and tuareg) helped me to identify issues earlier. Yojson made it simple to enforce that JSON requests and responses adhered to a fixed schema, and helped me process inputs more systematically. OCaml made refactoring safe and easy, so that I could confidently adapt my design as I introduced new requirements. In the right context, I think the advantages of using OCaml could significantly reduce the total lifecycle costs of maintaining many web applications without reducing the pace of new development.

References

I found the resources below helpful for working on this project:

The Dream API docs set a high standard for readability and helped me quickly get up to speed with the framework.
This section of the Dream repo has excellent examples, both for working with the framework and deploying it.
I referred to Bobby Priambodo's blog posts about interfacing Caqti and PostgreSQL as I wrote the models in my project.
The Caqti docs, especially for Caqti_type and Caqti_request were helpful as I was writing queries.
There is also an example in the Caqti repo that is worth looking through.
The README on ppx_yojson_conv had helpful examples that I referred to while writing my own JSON-related types.

Feedback

If you ended up looking through the source code for my API, let me know how what you thought! I'm interested in adding more tutorial resources to the OCaml ecosystem, so feel free to post a PR or issue to sensors if you have ideas about how to make these resources better.

posted at 00:00 · OCaml

Jul 12, 2021