Jul 23, 2021
I recently learned the
OCaml ecosystem has most of the libraries needed to build email
features, but I wasn't able to find an example that brings all of
those pieces together. I wanted a simple example I could easily adapt
to build common account verification and password recovery workflows
in the Dream web framework. This post is my attempt to solve that
problem and explain what I learned in the process. The accompanying
code can be found
here.
Defining the problem
As an application developer, my main concerns when
sending mail are:
- Trying to ensure the address I'm sending to is valid.
- Handling the email synthesis process asynchronously, so it doesn't
block the web server from handling requests quickly.
- Transmitting the email content (SMTP or a REST API call)
- Monitoring that the applications messages aren't getting
blocked/dropped for some reason.
To test how easily I could satisfy these requirements, I built a web
app that lets the user send an arbitary email.
The app has two parts:
- A server process responsible for showing the form above and
validating user inputs.
- A "worker" process responsible for sending mail. The worker gets its
tasks from a queue populated by the server.
Validating Email Addresses
I decided to utilize the library
emile
for address
validation. An email address might be invalid for reasons outside of
the application's control (like the domain going way), but emile
at
least provides a way to check a given string conforms to 5(!) RFCs
relevant to parsing email addresses. Check out the
tests
to see the impressive diversity of valid email addresses on the web.
The library provides several different predicates for checking
addresses. I utilized Emile.of_string
, which generates a simple
result type, like this:
let form_post request =
match%lwt Dream.form request with
| `Ok ["address", address; "message", message; "subject", subject ] ->
let parse = Emile.of_string address in
let alert = match parse with
| Ok _ -> Printf.sprintf "Sent an email to %s." address
| Error _ -> Printf.sprintf "%s is not a valid email address." address
in
let queue_req = match parse with
| Ok _ -> queue_email address subject message
| Error _ -> Lwt.return_unit
in
queue_req >>= (fun _ ->
Dream.html (Template.show_form ~alert request))
| _ -> Dream.empty `Bad_Request
This way, if a user provides an invalid address, the form displays a simple
warning message and won't queue any email tasks.
Queuing Tasks with RabbitMQ
The second issue I needed to address is handling the email send
process with a background worker (queue_email
in the code excerpt
above). Sending an email can be a slow process, especially if you need
to make additional database queries or render images to produce a
customized email. That shouldn't block the web server from quickly
responding to the user's requests. Instead, the server should record
the email task in a queue, to be handled by a background worker that
isn't as sensitive to latency.
There are several different technologies that can be used to solve
this problem, for example RabbitMQ,
Redis
and SQS. I decided to use RabbitMQ
because I didn't want to tie my implementation to one cloud provider
and RabbitMQ is supported by
amqp-client on
OPAM. This isn't the only way to go: for Redis and SQS, you might have
success with the packages
redis or
aws-sqs, respectively.
The worker and server process share a common function for establishing
a connection to RabbitMQ:
let rabbit_connection () =
let%lwt connection = Connection.connect ~id:"dream" "localhost" in
let%lwt channel = Connection.open_channel ~id:"email" Channel.no_confirm connection in
let%lwt queue = Queue.declare channel ~arguments:[Rpc.Server.queue_argument] "email" in
Lwt.return (channel, queue)
I was initially little confused about why I needed a "channel" when I
already had a connection to the broker. Both the worker and the server
connect to the RabbitMQ Broker via TCP. It's common for clients to
need several connections to the broker, but it's costly to have many
TCP connections. RabbitMQ lets us avoid this cost with channels, which
are like "lightweight connections that share a single TCP
connection", according to these docs.
I created this function to submit messages to the queue:
type email = {
address: string;
subject: string;
text: string;
} [@@deriving yojson]
let queue_email address subject text =
let email = {address; subject; text} in
let text = email |> yojson_of_email |> Yojson.Safe.to_string in
let%lwt channel, queue = rabbit_connection () in
let%lwt _ = Queue.publish channel queue (Message.make text) in
Lwt.return_unit
Here the server writes the address, subject, and text into a record
type that can be serialized to JSON before being pushed into the queue.
The worker process, queue_worker
, is fairly simple to define:
let handle_message (m: Message.t) =
let text = snd m.message in
let email =
try
Some (text |> Yojson.Safe.from_string |> email_of_yojson)
with _ ->
Printf.printf "Received invalid email record from RabbitMQ: %s" text;
None
in
match email with
| Some e -> send_email e
| None -> Lwt.return_unit
let queue_worker () =
Printf.printf "Starting Queue Worker\n";
let%lwt channel, queue = rabbit_connection () in
let rec handle () =
Queue.get ~no_ack:true channel queue >>= fun message ->
let task = match message with
| Some m -> flush stdout; handle_message m
| _ -> Printf.printf "No new tasks, waiting.\n"; flush stdout; Lwt_unix.sleep 5.
in
task >>= handle
in
handle ()
The worker consists of an infinite loop. When the worker wakes and
finds tasks in the queue, it uses yojson
to deserialize the
messages and passes them to send_email
. If no work is available,
the worker just sleeps for 5 seconds.
Sending Email with cohttp
and Mailgun
In order to send mail, I set up a trial
Mailgun
account. I selected this provider for several reasons:
- Mailgun has a generous free tier (5000 emails).
- A new account also comes with a "sandbox" domain, so I didn't have
to deal with registering a domain.
- Mailgun supports both a REST API and SMTP, allowing me to try
different integrations.
- Mailgun provides dashboards for tracking messages sent over time as
well as emails that were rejected (bounced, marked as spam, etc.).
Mailgun's API allowed me to send an email by sending a POST with an API key and form
data, like this:
let send_email (e: email) =
let open Cohttp in
let open Cohttp_lwt_unix in
let api_key = Sys.getenv "MAILGUN_API_KEY" in
let sender = Sys.getenv "MAILGUN_SEND_ADDRESS" in
let api_base = Sys.getenv "MAILGUN_API_BASE" in
let params = [
("from", [sender]);
("to", [e.address]);
("subject", [e.subject]);
("text", [e.text])
] in
let cred = `Basic ("api", api_key) in
let uri = api_base ^ "/messages" |> Uri.of_string in
let headers = Header.add_authorization (Header.init ()) cred in
let start =
Printf.printf "Initiating email send to %s.\n" e.address;
Unix.gettimeofday () in
let%lwt resp, rbody = Client.post_form uri ~params ~headers in
let%lwt finish = Unix.gettimeofday () |> Lwt.return in
let%lwt rbody_str = (Cohttp_lwt.Body.to_string rbody) in
Printf.printf "API Response Status: %d.\n" (Code.code_of_status resp.status);
Printf.printf "API Response body %s.\n" rbody_str;
Printf.printf "Time to send an email: %.2fs\n" (finish -. start);
Lwt.return ()
The endpoint replies with either 200 and a JSON body like this:
{
"id": "<some mailgun email address here>"
"message": "Queued. Thank you."
}
or else a 400 and an explanation of why the request was rejected. In
my tests, a successful API request typically took between 0.5 and 1
seconds. For comparison, the webserver can serve the email form in
less than 200 microseconds, so the performance benefits to shifting
email tasks to a background queue are nontrivial.
Sending email with SMTP
Mailgun supports SMTP but they recommend using their API, especially
for sending large amounts of mail at once. I used
letters to test their SMTP
support with this script:
let body = Letters.Plain {|This is a test email body.|}
let send_email () =
let config = (Letters.Config.make
~username: Sys.getenv "MAILGUN_SMTP_USER"
~password: Sys.getenv "MAILGUN_SMTP_PASSWORD"
~hostname:"smtp.mailgun.org"
~with_starttls:true)
in
let sender = Sys.getenv "MAILGUN_SMTP_SENDER" in
let recipients = [Letters.To (Sys.getenv "EMAIL_ADDRESS")] in
let subject = "Test email from mailgun." in
let mail = Letters.build_email ~from:sender ~recipients ~subject ~body in
Printexc.record_backtrace true;
match mail with
| Ok message ->
(try%lwt
Letters.send ~config ~sender ~recipients ~message
with
e ->
Printf.printf "Error. %s\n" (Printexc.to_string e);
Printexc.print_backtrace stdout;
Lwt.return_unit
)
| Error _ -> Printf.printf "Message synthesis failed."; flush stdout; Lwt.return_unit
let () =
Lwt_main.run @@ send_email ()
I ran into some
difficulties because
Mailgun does not consistently follow RFC 4954 (thanks to Calascibetta
Romain for figuring out the problem and creating a
patch). Once I applied
the patch, I was able to send mail successfully.
I haven't fully explored the tradeoffs of using SMTP versus an
API. The main goal of my experiements was to confirm that if I needed
to integrate with a specific SMTP server, I knew how to do so.
Discussion
This example focused on email, but task queues and background workers
can be used to implement other important pieces of functionality for
web applications. For example, I've utilized
celery
in the past to generate weekly reports and schedule recurring ETL
jobs. This exercise was a nice opportunity to see how one would
construct a system similar to celery
in OCaml. It also provides a
simple illustration of how we can utilize queues to facilitate
horizontal scaling. If the volume of email requests grows too big for
one machine, we can put the web server and the worker on separate
instances or have multiple worker instances consuming from the email
queue.
I didn't cover error handling in this example, but that's clearly an
important piece of functionality to include when building a production
email system. At a minimum, it would be useful to add a retry
mechanism in the event that Mailgun is unavailable.
References
I found the resources below helpful for working on this project:
- The
amqp-client
documentation,
especially the
examples
were helpful for setting up with RabbitMQ.
- The client
tutorial
was helpful for getting started with
cohttp
. Later, I utilized
this
section
of the docs to understand that I needed to use post_form
instead
of post
when interacting with Mailgun's API.
Feedback
If you ended up looking through the source code for this example, let
me know how what you thought! I'm interested in adding more tutorial
resources to the OCaml ecosystem, so feel free to post a PR or issue
to
dream-email-example
if you have ideas about how to make these resources better.
Jul 12, 2021
In the last few weeks I've been building a simple REST API using the
Dream web framework. Writing APIs
has been one major component of my paid programming work and I wanted
to compare and contrast the experience of writing an API in a strongly
typed functional programming language with the same workflow in a
dynamically typed language like Python.
You can find the source code for this project
here. Hopefully, this repo will
be a useful resource for anyone who wants to look at a slightly larger
example application that uses Dream. Though I'm still a beginner, I
was surprised how easy it was to complete familiar tasks in this
framework. I'm optimistic about the future of web programming in
OCaml!
This post provides a sort of "experience report" for working with
Dream, Yojson, and Caqti. Obviously, these types of reports are highly
subjective and you should run your own experiments too! I'll be
comparing with my experience in working with
Pyramid and
SQLAlchemy, since those are the Python
projects I've used most.
Finally, it's important to point out that Dream is in alpha (version
1.0.0~alpha2
as of this writing). This is an early version!
Comparisons with frameworks like Pyramid or Flask are in some sense
apples-and-oranges, because those projects have had a long time (and
many engineering hours) to mature. Ultimately, the point of this post
(and the repo that goes with it) isn't to make make value judgements
("framework X is good, Y is bad"), but rather to understand how to
translate concepts from one workflow to another.
Problem Definition
I decided to build a API for managing time series data. My
requirements for the project were as follows:
- The API will allow creating, reading, and deleting "sensors". A
sensor is an abstract IoT device that measures a floating point
quantity at a specific cadence (for example a weather station
measuring average hourly windspeed).
- Every sensor gets an API key that allows it to upload (POST) data to
an endpoint.
- A GET endpoint will allow users to retrieve the data a sensor
generated between two dates.
- The API sends and receives data formatted as JSON and must be backed
by a PostgreSQL database.
- The API should have
login/logout
endpoints for a (not yet built)
frontend to use.
- Finally, a user should only be allowed to access sensor data that
belongs to them.
I chose these requirements because they cover a number of different
read/write operations that require data formatting/parsing and
tracking relationships between four different entities (users, keys,
sensors, and readings).
Dependencies and Setup
For this build, I needed three core dependencies:
- A server to handle web requests (Dream).
- Support for parsing and synthesizing JSON records.
- A library for integrating with PostgreSQL.
After a bit of research, I settled on
Yojson and
Caqti for (2) and (3)
respectively. Both of these are fairly standard and appear in the
excellent corpus of
examples that
accompany Dream. Since I had previously written an
example
of how to use Dream with a dockerized PostrgreSQL container, I used
that as a starting point.
It's useful to have a way to run (and re-run) ad-hoc requests against
the API during development. I used Insomnia
for this, but other tools like curl
would work too.
Adding an endpoint
To give a sense of my development workflow, I want to walk through my
process for adding a simple endpoint, /api/login
. Inside the router,
that endpoint looks like:
Dream.post "/api/login" login;
In this excerpt login
is a "handler". Handlers are responsible for
translating requests into responses; this is captured in their type
signature.
Conceptually, the login
handler needs to do three things:
- Receive a JSON body and confirm it's formatted as expected.
- Look up the relevant user/password combination in the database.
- Load the relevant session information if the credentials are valid.
For the first part, the endpoint needs to receive a JSON payload that
looks like {"username": "...", "password": "..."}
. To describe that
payload, I introduced a simple record type:
type login_doc = {
username : string;
password : string;
} [@@deriving yojson]
Adding [@@deriving yojson]
means the compiler can generate functions
to convert these records to and from JSON. In this case those
functions are called login_doc_of_yojson
and yojson_of_login_doc
.
My login
handler looked roughly like this:
let json_receiver json_parser handler request =
let%lwt body = Dream.body request in
let parse =
try
Some (body
|> Yojson.Safe.from_string
|> json_parser)
with _ ->
None
in
match parse with
| Some doc -> handler doc request
| None ->
{ error="Received invalid JSON input." }
|> yojson_of_error_doc
|> json_response ~status: `Bad_Request
let login =
let login_base login_doc request =
let%lwt user_id = Dream.sql request
(Models.User.get login_doc.username login_doc.password) in
match user_id with
| Some id ->
let%lwt () = Dream.invalidate_session request in
let%lwt () = Dream.put_session "user" (Int.to_string id) request in
Dream.empty `OK
| None -> Dream.empty `Forbidden
in
json_receiver login_doc_of_yojson login_base
If a user uploads an invalid JSON body, this will cause
login_doc_of_yojson
to throw an exception. By default, this produces
a 500 server error response. To handle this situation more gracefully,
I introduced json_receiver
. If the parser passed to json_receiver
succeeds on the request body, the results are passed to the inner
handler. Otherwise, the server responds with a 400 Bad
Request
. Re-using json_receiver
across my endpoints allows me to
avoid introducing atry/with
block any time time I need to handle
JSON input.
I decided to manage models in a separate library, Models
. The
build tool, dune
, made this easy to do; I just needed separate
dune
files for the server/model
and server/bin
folders. I like
this design because it simplifies re-use of the database portion of my
project. For example, if I later needed to build a CLI admin tool,
that tool could utilize my existing queries without being exposed to
API concerns.
Inside User
, the get
query function is defined as:
let get =
let query =
R.find_opt T.(tup2 T.string T.string) T.int
"SELECT id FROM app_user WHERE username = ? and password = ?" in
fun username password (module Db : DB) ->
let%lwt user_or_error = Db.find_opt query (username, password) in
Caqti_lwt.or_fail user_or_error
Caqti allows us to define a function that consumes our query
parameters (the username and password) along with a database
connection, and produces a promise that will resolve with the results
of the query. In this case, the types encode that the function
produces an int option
containing the user's primary key if the
credentials are valid. (Note that it's not a good idea to store
passwords in plain text like this; this is just for illustrative
purposes.)
Session management, the final item that the endpoint needs to address,
is handled by Dream. The framework allows sessions to be stored in
cookies, memory, or the database; I opted for database sessions
because I had used similar session backends in the past. Sessions
allow us to store pairs key/value strings across requests. I used the
session just to store the logged-in user's ID, so that the API has
easy access in later database queries.
Thoughts on the Dream Router
The final router for my server looked like this:
let () =
Dream.run ~interface:"0.0.0.0"
@@ Dream.logger
@@ Dream.sql_pool "postgresql://sensors@127.0.0.1/sensors"
@@ Dream.sql_sessions
@@ Dream.router [
(* More endpoints here ... *)
Dream.get "/version" version;
Dream.post "/api/login" login;
Dream.post "/api/logout" logout;
Dream.scope "/api" [login_required] [
Dream.get "/user/:user_id" user;
(* More endpoints here... *)
];
Dream.scope "/sensor" [] [
Dream.post "/upload" @@ api_key_required sensor_upload;
]
]
@@ Dream.not_found
I really appreciate how Dream makes it possible to get a concise
overview of the API; in some Pyramid projects, I found this wasn't
always possible. In fact, in Pyramid I sometimes struggled with subtle
routing bugs that were not obvious until run time. Having the compiler
validate the router in Dream was a welcome change.
At the moment, Pyramid makes it somewhat easier to handle
access/permissions concerns compared to Dream. Working in Pyramid, it
was relatively common for the routes to manage some amount of
parameter validation and permissions. For example, given a path
/user/123/article/abc456
, the route (or "resources" in Pyramid
terminology) would be responsible for:
- Extracting the user and article IDs (
123
and abc456
).
- Determining those records actually exist in the database, and
passing them to the view/handler in a context record.
- Validating that the requester has permissions to operate on those
User and Article records.
This is discussed a bit more in the Pyramid Docs under URL
traversal.
Effectively, Pyramid resources mean that handler/view code downstream
can focus on updating database records and/or rendering HTML/JSON
without worrying about permissions concerns.
I didn't want to build a permissions system for my API so instead I
incorporated User IDs into my function signatures in Model
and used
inner joins to model permissions. For example, here is an example of a
query that fetches the metadata for all sensors belonging to a
particular user:
SELECT s.name, s.description, k.uuid
FROM sensor s
INNER JOIN user_sensor us
ON us.sensor = s.id AND us.app_user = $1 AND us.sensor = $2
INNER JOIN api_key k
ON s.api_key = k.id
By joining against user_sensor
, I ensure that a user only gets data
for the sensors they own.
I should point out that Dream allows us to use a custom router,
however! The project has an
issue for more elaborate
routing and Ulrik Strid has introduced
dream-routes, which
allows additional type information to be encoded in routes. So,
writing a more sophisticated router that behaves like Pyramid's
resource system is certainly possible.
Comparing Caqti and SQLAlchemy
As a web programmer, especially one working on an analytics project,
it's important to get comfortable working with the database library
you've chosen for your project. Indeed, a significant portion of this
project consisted of familiarizing myself with Caqti
.
Caqti is a bit different from SQLAlchemy. With SQLAlchemy, you
typically define one class for each table. Then, using SQLAlchemy's
metaprogramming features, those classes are used to define
queries. For comparison, here are equivalent queries against the user
table:
Caqti
let get =
let query =
R.find_opt T.(tup2 T.string T.string) T.int
"SELECT id FROM app_user WHERE username = ? and password = ?" in
fun username password (module Db : DB) ->
let%lwt user_or_error = Db.find_opt query (username, password) in
Caqti_lwt.or_fail user_or_error
SQLAlchemy
db.session.query(User) \
.filter(
User.username == username,
User.password == password
).one_or_none()
Working with Caqti, a mistake in the syntax of a SQL query won't be
discovered until runtime. I made fewer mistakes like this with
SQLAlchemy because most queries are written in Python and can be
linted by the IDE. Another recurring point of confusion for me with
Caqti had to do with matching the function on the Caqti_request
(R.find_opt
above) with the function invoked in Db
(Db.find_opt
); this is how we indicate that the query produces zero,
one, zero/one, or multiple rows. If the two don't agree, the program
won't compile; initially I struggled to understand what this
compiler error meant:
Error: The function applied to this argument has type
?env:(Caqti_driver_info.t -> string -> Caqti_query.t) ->
?oneshot:bool ->
('a, unit, [< `Many | `One | `Zero > `Zero ]) Caqti_request.t
This argument cannot be applied without label
Using Caqti became easier as I became accustomed to thinking in terms of
prepared queries. About halfway through the project I realized I
needed to switch from SQLite to PostgreSQL, and changing databases was
fairly painless because Caqti supports both.
I also made use of Caqti's features for dealing with custom column
types. Caqti does not have built-in support for JSON columns, but
adding a custom column type to handle this was straightforward:
type readings = float option array [@@deriving yojson]
let sql_readings =
let encode (a: readings) =
Ok (a |> yojson_of_readings |> Yojson.Safe.to_string)
in
let decode text =
Ok (text |> Yojson.Safe.from_string |> readings_of_yojson)
in
T.(custom ~encode ~decode string)
In my case, the JSON that I needed to store consisted of arrays of
floating point numbers, so building the custom type was simply a
matter of connecting Yojson and Caqti in the right way.
Discussion
Ever since I started building web applications in Python, I've thought
of an API server as a big function that transforms requests into
responses, with any nontrivial state residing in databases or services
like S3. Dream provides a modern web framework that matches my mental
model for how a web server should work.
Web programming in OCaml "feels" a bit different than working in
Python. In Python it's possible to get something running quickly, but
it's also easy to forget corner cases and introduce defects that have
to be addressed later in the application's lifecycle. An OCaml project
requires some up-front investment, but this investment pays off later
in several ways. First, fast feedback from the compiler (together with
editor integrations like merlin
and tuareg
) helped me to identify
issues earlier. Yojson
made it simple to enforce that JSON requests
and responses adhered to a fixed schema, and helped me process inputs
more systematically. OCaml made refactoring safe and easy, so that I
could confidently adapt my design as I introduced new requirements. In
the right context, I think the advantages of using OCaml could
significantly reduce the total lifecycle costs of maintaining many web
applications without reducing the pace of new development.
References
I found the resources below helpful for working on this project:
- The Dream API docs set a high
standard for readability and helped me quickly get up to speed with
the framework.
- This section
of the Dream repo has excellent examples, both for working with the
framework and deploying it.
- I referred to Bobby Priambodo's blog
posts
about interfacing Caqti and PostgreSQL as I wrote the models in my project.
- The Caqti
docs,
especially for
Caqti_type
and Caqti_request
were helpful as I
was writing queries.
- There is also an
example
in the Caqti repo that is worth looking through.
- The README on
ppx_yojson_conv had
helpful examples that I referred to while writing my own
JSON-related types.
Feedback
If you ended up looking through the source code for my API, let me
know how what you thought! I'm interested in adding more tutorial
resources to the OCaml ecosystem, so feel free to post a PR or issue
to
sensors
if you have ideas about how to make these resources better.
Jul 01, 2021
This Summer I'm studying OCaml and systems programming at the Recurse
Center. Currently I'm reading the
delightful book Operating Systems: Three Easy
Pieces by Remzi
Arpaci-Dusseau and Andrea Arpaci-Dusseau. Besides great exposition,
the book includes several interesting projects, one of which is to
write a
shell.
The project materials, a set of shell scripts that define tests, are
language agnostic. I believe the authors intended for students to use
C, but I worked through the exercise in OCaml; you can find the final
result in this
repo. This blog post
is a short experience report; if you're interested in working on a
beginner OCaml project, hopefully it provides some useful scaffolding
for your studies.
Shell features
The shell I built (osh
) has a somewhat limited feature set. It:
-
Allows several commands to be run in parallel (e.g. program1 arg1 &
program2 arg2 arg3
)
-
Supports simple output redirection (e.g. echo "hello" >
hello.txt
). Both stdout
and stderr
go to the same file.
-
Has three builtin functions, cd
, exit
, and path
. path
allows
you to define a list of directories to search for executables; we
need this because environment variables aren't supported.
Other niceties, like tab completion and pipes, aren't supported. To
facilitate testing, the shell can run in a batch mode (osh <my file
of commands>
) as well as an interactive mode. A final simplifying
assumption is that osh
only provides a basic error message when
inputs are malformed.
Concepts
There are three main topics I studied while working on this project:
-
The Unix
module, especially fork
and
execv
. This
page, from
Xavier Leroy and Didier Rémy, was helpful for getting up to
speed. The man-pages were also useful.
-
The Stream
module. Streams
in OCaml are roughly analogous to generators in Python; they
allowed me to use a single type (a string Stream.t
) to represent
commands coming from an interactive session or a batch file.
-
The Angstrom
parser-combinator library. I used this library to determine whether
a given line of user input is well formed.
I suspect you could implement the parser without Angstrom, if you
really wanted to. I wanted to learn more about this library, and this
project seemed like a good place to try it out.
Although I previously worked through some simple monadic parser
exercises in Haskell, I struggled the most with Angstrom. Reading one
of the main mli
files
seemed to be the best way to get an overview of the combinators the
library provides. I also spent a fair amount of time experimenting in
utop
as I put my parser together.
Steps
Here is a rough outline for how I built my shell.
First, I defined a main function that would fetch a line of text from
stdin
and echo it back to the user. Then I added the ability to
represent either interactive or batch input with a Stream
:
let prompt_stream =
let f _ =
Printf.printf "osh> ";
Some (read_line ()) in
Stream.from f
let file_stream filename =
let in_channel = open_in filename in
let f _ =
try
Some (input_line in_channel)
with End_of_file ->
close_in in_channel;
None
in
Stream.from f
It's useful to do this part as early as possible, since batch input
allows you to run the tests.
Once I could get osh
to fail all 22 test cases, I started adding the
simpler built-in commands cd
and exit
. (Note ./test-osh.sh -c
will run all 22 tests, without stopping at the first one that fails.)
My first version of main
just treated user input as string list
:
let rec main stream =
let parse text = String.split_on_char ' ' text in
let process text =
match parse text with
| [] -> main stream (* Empty lines should just be treated as no-ops. *)
| [""] -> main stream
| ["exit"] -> exit 0
| ["cd"; x] -> Unix.chdir x
| "cd":: _ -> print_error ()
| x::xs ->
match Unix.fork () with
| 0 -> (* Child *)
Unix.execv x (Array.of_list xs)
| child_pid -> (* Parent *)
let _ = Unix.waitpid [] child_pid
in ()
in
Stream.iter process stream
This worked well enough for running simple commands like /usr/bin/ls
-lah
. In order to get output redirection and &
to work, I needed to
upgrade the program's parsing capabilities. I switched from
representing user input with a string list
to a variant type:
type exec = {
executable: string;
arguments: string list;
output: string option;
}
type line =
| NoOp
| ChangeDirectory of string
| PathChange of string list
| Executable of exec list
| Quit
I found the OCaml compiler (and its emacs integrations, merlin
and
tuareg
) very helpful for refactoring. When I revised my types to
make them better reflect the data I wanted to model, type errors
identified all of the places I needed to update. At this point, I
introduced an Angstrom parser and re-implemented support for cd
,
exit
, and path
. That allowed me to get familiar with the parser
combinators *>
, many
, and choice
, before attempting the more
complicated parsers needed to support &
and >
.
The type system protected me from a number of silly mistakes, but it
wasn't a silver bullet. I spent an hour confused about why one of my
parsers would go into an infinite loop before realizing I needed to
use many1
instead of many
. I think this is the sort of mistake I
would learn to avoid if I spent more time working with Angstrom.
Discussion
As I worked through this project, I started to appreciate how OCaml
chooses "good defaults" for my code (especially immutable data)
without forbidding me from working in an imperative mode when I need
to.
For example, I found that the function responsible for running a child
process was most natural to express with imperative language features:
let run_executable path {executable; arguments; output} =
let full_paths = List.map (fun p -> p ^ "/" ^ executable) path in
let executables = List.append full_paths [executable] in
let present = List.filter executable_present executables in
match present with
| [] -> print_error (); None
| x :: _ ->
match Unix.fork () with
| 0 -> (* Child *)
let out = match output with
| Some filename -> Unix.openfile filename [Unix.O_TRUNC; Unix.O_WRONLY; Unix.O_CREAT ] 0o640
| None -> Unix.stdout
in
let err = match output with
| None -> Unix.stderr
| _ -> out
in
Unix.dup2 out Unix.stdout;
Unix.dup2 err Unix.stderr;
Unix.execv x @@ Array.of_list @@ executable::arguments;
| child_pid -> (* Parent *)
Some child_pid
Writing this way made it easy for me to switch between C and OCaml; in
both languages, we follow a pattern of calling fork
, then execv
in
the child process. The fact that I can easily re-use my knowledge of
the UNIX APIs across different programming languages is one of the
main reasons I'm excited to learn more systems programming concepts!
Variants and pattern matching are features I miss when working in
languages like Python. These features make it so easy to look at a
function like:
let main stream =
let path = ref ["/bin"] in
let process text =
match Parser.parse_line text with
| Error _ -> print_error ()
| Ok (NoOp) -> () (* Empty lines should just be treated as no-ops. *)
| Ok (Quit) -> exit 0
| Ok (ChangeDirectory x) -> change_directory x
| Ok (PathChange paths) ->
let cwd = Unix.getcwd () in
path := List.map (amend_path cwd) paths
| Ok (Executable execs) -> run_executables !path execs
in
Stream.iter process stream
and quickly understand all of the different commands osh
supports.
Although it took me some time to get up and running with Angstrom, I
strongly prefer working with parser combinators to writing an
imperative parser (as I've had to do in Python and Java). Angstrom
makes it easy to build up small parsers and test them in isolation
before composing them into the larger parser you really need. If I
planned to develop a shell with more features, I would probably break
Parser
out into its own file with separate unit tests.
Feedback
If you give this project a try, let me know how it turned out! I'm
interested in adding more tutorial resources to the OCaml ecosystem,
so feel free to post a PR or issue to
simple-ocaml-shell
if you have ideas about how to make these resources better.
Jun 06, 2017
Map
is one of the first higher-order functions I remember
encountering when I learned some rudimentary functional programming
topics as an undergraduate. More recently I began learning OCaml. The
map
implementation
I found in the standard library was the textbook definition I was
expecting
let rec map f = function
[] -> []
| a::l -> let r = f a in r :: map f l
but I was surprised to learn there are actually several
implementations of map
in the OCaml ecosystem. I wanted to know more
about these different implementations and their performance
characteristics. This article summarizes what I learned.
Part of my motivation to write this post arose from
this discussion of a pull
request that proposes the default map
implementation in lwt
should
be tail recursive. After reading all of the comments, I wanted to
develop experiments that would convince me whether this was a good
idea.
Introduction
Since I'll introduce several versions of map
, I'll refer to the
implementation above as stdlib
. An important aspect of this version
is that it isn't tail recursive. A tail recursive function uses
recursion in a specific way; it only calls itself as the final
operation in the function. In OCaml, tail recursive functions get the
benefit of tail-call optimization so that they only use one frame on
the stack. In contrast, the stdlib
implementation takes space on the
stack proportional to the length of the input list. For long enough
lists, this causes a stack overflow.
We can make a basic tail-recursive version of map
by composing two
standard library functions that are tail recursive: List.rev_map
(which creates the output of map
, but in reversed order) and
List.rev
which reverses lists. I'll call this naive implementation
ntr
(naive tail recursive). Compared to stdlib
, ntr
allocates
twice as much space on the heap (one list for the output of
List.rev_map
, and a second equally long list for List.rev
) and
spends additional time performing a list traversal. The benefit is
that with this implementation, calling map
on a long list won't
result in a stack overflow.
As we'll see later in the Results section, the naive tail recursive
implementation is, overall, slower than the stdlib
implementation. However, there are some optimizations we can apply to
improve the performance of both versions.
The first trick I call "unrolling". Similar to
loop unrolling in procedural languages,
the idea is to use cases to map
on groups of list elements so fewer
function calls are needed to complete the work. For example, an
unrolled version of stdlib
looks like:
let rec map f l =
match l with
| [] -> []
| [x1] ->
let f1 = f x1 in
[f1]
| [x1; x2] ->
let f1 = f x1 in
let f2 = f x2 in
[f1; f2]
| x1 :: x2 :: x3 :: tl ->
let f1 = f x1 in
let f2 = f x2 in
let f3 = f x3 in
f1 :: f2 :: f3 :: map tl
We get to choose how many elements to unroll (here I picked 3); some
profiling is needed to decide the most appropriate number.
The second trick I call "hybridizing". The stdlib
version is faster,
but isn't safe for long lists. The tail recursive implementation is
slow, but safe for all lists. When we "hybridize" the two, we use the
faster version up to some fixed number of elements (e.g. 1000), and
then switch to the safe version for the remainder of the list.
These two optimizations are useful for understanding the
implementations of map
we find in other libraries. For example, both
base
and
containers
use an unrolled stdlib
-style map
, hybridized with an
ntr
-style map
.
Looking at ntr
, one might ask: Is it strictly necessary to use
additional heap space to get the implementation to be tail recursive?
The answer is no. The batteries
package
implements map
so that the work is done by a single tail recursive function, creating
one new list on the heap. To achieve this, the implementation uses
mutable state and casting to avoid a second list traversal
(specifically,
here).
As with any optimization, one makes tradeoffs between elegance,
performance, and the language features one considers acceptable to
use. I wanted to know:
Both base
and containers
decided to hybridize with an ntr
-style
map
for safety. But the batteries
implementation is also robust to
stack overflow. That led to one final question:
- How fast is an unrolled
stdlib
-style map
hybridized with a
batteries
-style map
?
Experimental Setup
In 2014, Jane Street introduced a
library for
microbenchmarking called core_bench
. As they explain
on their blog,
core_bench
is intended to measure the performance of small pieces of
OCaml code. This library helps developers to better understand the
cost of individual operations, like map
.
There are a couple benefits to measuring map
implementations with
core_bench
.
-
The library makes it easy to track both time and use of the heap.
-
Once you specify your test, core_bench
automatically provides
many command line
options
to help you present your test data in different ways.
-
The library uses statistical techniques (bootstrapping and linear
regression) to reduce many runs worth of data to a small number of
meaningful performance metrics that account for the amortized cost
of garbage collection and error introduced by system activity.
I wrote
this program
to compare performance between the six implementations I described
above. The program includes the batteries-hybrid I described above,
which I'll refer to as batt-hybr
. Other test details:
- I tested each algorithm against lists of lengths \(N=10^2, 10^3,
10^4\) and \(10^5\). On my system, \(N=10^5\) was the highest power of 10
for which the
stdlib
implementation did not fail due to a stack
overflow.
- Each list consisted of integer elements (all 0), and the function
mapped onto the list simply added one to each element.
- I ran these tests on a Lenovo X220, Intel Core i5-2520M CPU (2.50GHz ×
4 cores), with 4GB RAM.
- I used the OCaml 4.03.0 compiler, and compiled to native code
without using
flambda
. (This
makefile
gives the full specifications.)
Results
For each list size, core_bench
produces a table with several metrics besides time:
mWd
: Words allocated on the minor heap.
mjWd
: Words allocated on the major heap.
Prom
: Words promoted from minor to major heap.
The library also allows us to produce 95% confidence intervals and
\(R^2\) values for the time estimates.
Map Benchmark, N = 100
┌────────────┬──────────┬──────────┬───────────────┬─────────┬──────────┬──────────┬────────────┐
│ Name │ Time R^2 │ Time/Run │ 95ci │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├────────────┼──────────┼──────────┼───────────────┼─────────┼──────────┼──────────┼────────────┤
│ ntr │ 1.00 │ 741.20ns │ -0.23% +0.22% │ 609.01w │ 0.53w │ 0.53w │ 100.00% │
│ containers │ 1.00 │ 439.55ns │ -0.64% +0.67% │ 304.03w │ 0.17w │ 0.17w │ 59.30% │
│ batteries │ 1.00 │ 581.65ns │ -0.14% +0.16% │ 309.01w │ 0.35w │ 0.35w │ 78.47% │
│ base │ 1.00 │ 438.63ns │ -0.19% +0.20% │ 304.03w │ 0.16w │ 0.16w │ 59.18% │
│ stdlib │ 1.00 │ 610.45ns │ -0.10% +0.11% │ 304.01w │ 0.17w │ 0.17w │ 82.36% │
│ batt-hybr │ 1.00 │ 426.76ns │ -0.15% +0.17% │ 304.03w │ 0.17w │ 0.17w │ 57.58% │
└────────────┴──────────┴──────────┴───────────────┴─────────┴──────────┴──────────┴────────────┘
Map Benchmark, N = 1000
┌────────────┬──────────┬──────────┬───────────────┬─────────┬──────────┬──────────┬────────────┐
│ Name │ Time R^2 │ Time/Run │ 95ci │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├────────────┼──────────┼──────────┼───────────────┼─────────┼──────────┼──────────┼────────────┤
│ ntr │ 1.00 │ 8.84us │ -0.14% +0.15% │ 6.01kw │ 52.08w │ 52.08w │ 100.00% │
│ containers │ 1.00 │ 5.07us │ -0.13% +0.15% │ 3.00kw │ 17.23w │ 17.23w │ 57.34% │
│ batteries │ 1.00 │ 6.57us │ -0.14% +0.14% │ 3.01kw │ 34.71w │ 34.71w │ 74.32% │
│ base │ 1.00 │ 4.99us │ -0.20% +0.24% │ 3.00kw │ 17.08w │ 17.08w │ 56.44% │
│ stdlib │ 1.00 │ 6.84us │ -0.10% +0.10% │ 3.00kw │ 17.22w │ 17.22w │ 77.34% │
│ batt-hybr │ 1.00 │ 4.96us │ -0.13% +0.13% │ 3.00kw │ 17.07w │ 17.07w │ 56.10% │
└────────────┴──────────┴──────────┴───────────────┴─────────┴──────────┴──────────┴────────────┘
Map Benchmark, N = 10,000
┌────────────┬──────────┬──────────┬───────────────┬─────────┬──────────┬──────────┬────────────┐
│ Name │ Time R^2 │ Time/Run │ 95ci │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├────────────┼──────────┼──────────┼───────────────┼─────────┼──────────┼──────────┼────────────┤
│ ntr │ 0.99 │ 203.11us │ -0.45% +0.53% │ 60.01kw │ 5.40kw │ 5.40kw │ 100.00% │
│ containers │ 1.00 │ 144.95us │ -0.50% +0.60% │ 48.01kw │ 3.04kw │ 3.04kw │ 71.37% │
│ batteries │ 1.00 │ 136.87us │ -0.51% +0.63% │ 30.01kw │ 3.64kw │ 3.64kw │ 67.39% │
│ base │ 1.00 │ 130.85us │ -0.34% +0.39% │ 44.98kw │ 2.66kw │ 2.66kw │ 64.42% │
│ stdlib │ 1.00 │ 118.70us │ -0.30% +0.29% │ 30.00kw │ 1.80kw │ 1.80kw │ 58.44% │
│ batt-hybr │ 1.00 │ 106.91us │ -0.42% +0.52% │ 30.01kw │ 2.22kw │ 2.22kw │ 52.64% │
└────────────┴──────────┴──────────┴───────────────┴─────────┴──────────┴──────────┴────────────┘
Map Benchmark, N = 100,000
┌────────────┬──────────┬──────────┬───────────────┬──────────┬──────────┬──────────┬────────────┐
│ Name │ Time R^2 │ Time/Run │ 95ci │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├────────────┼──────────┼──────────┼───────────────┼──────────┼──────────┼──────────┼────────────┤
│ ntr │ 0.98 │ 10.27ms │ -2.39% +2.32% │ 600.02kw │ 411.83kw │ 411.83kw │ 94.33% │
│ containers │ 0.99 │ 10.62ms │ -2.19% +1.83% │ 588.02kw │ 404.91kw │ 404.91kw │ 97.61% │
│ batteries │ 1.00 │ 7.19ms │ -0.78% +0.79% │ 300.02kw │ 300.35kw │ 300.35kw │ 66.02% │
│ base │ 0.99 │ 10.88ms │ -1.54% +1.41% │ 584.99kw │ 397.60kw │ 397.60kw │ 100.00% │
│ stdlib │ 0.99 │ 6.33ms │ -1.08% +1.07% │ 300.01kw │ 171.73kw │ 171.73kw │ 58.18% │
│ batt-hybr │ 0.93 │ 6.09ms │ -4.30% +4.63% │ 300.02kw │ 285.46kw │ 285.46kw │ 55.93% │
└────────────┴──────────┴──────────┴───────────────┴──────────┴──────────┴──────────┴────────────┘
A few things I noticed about the data:
-
The tail recursive implementation is slow, but not always the
slowest. In the \(N=10^5\) case, ntr
,base
, and containers
show
similar performance; this makes sense given they have the same
behavior after a prefix of the list.
-
For short lists, base
, containers
, and batt-hybr
are fastest,
likely because of unrolling.
-
We can see from the mWd
and mjWd
columns that ntr
uses the
most heap space, as we would expect.
-
As we increase the size of the list, stdlib
gets faster relative
to ntr
; I suspect this can be attributed to the cost of garbage
collection.
-
Overall, batt-hybr
appears to have the best performance (though in
the \(N=10^5\) case, the confidence interval is large enough that we
can't conclude batt-hybr
is faster than stdlib
, and the \(R^2\)
value is a bit low).
Conclusions
The data above suggest three main findings:
-
The stack-based standard library implementation of map
is faster
than the naive tail recursive implementation, taking about 60-80%
of the time to do the same work depending on the size of the list.
-
There are clear benefits to both unrolling and
hybridizing. Hybridizing lets us take an implementation that
performs well on long lists (batteries
), and make it even better
by combining it with one that's fast on short lists, to produce
batt-hybr
.
-
Overall, the data show we don't have to give up safety (resistance
to stack overflow) to get better performance. A tail recursive
implementation can be quite competitive, albeit more complex.
Discussion
A follow up question to this analysis, in the context of the
discussion of Lwt_list.map_p
is: "How significant is the cost of
map
compared to other operations?" To partially address this, I
wrote
another program
that measures how long it takes to write 4096 bytes to a Unix pipe
using Lwt_unix.write
, and then read it back using Lwt_unix.read
. Using
core_bench
, I found:
Unix Pipe Read/Write Benchmark
┌─────────┬──────────┬───────────────┬─────────┬────────────┐
│Time R^2 │ Time/Run │ 95ci │ mWd/Run │ Percentage │
├─────────┼──────────┼───────────────┼─────────┼────────────┤
│ 1.00 │ 893.33ns │ -0.16% +0.18% │ 56.00w │ 100.00% │
└─────────┴──────────┴───────────────┴─────────┴────────────┘
It seems reasonable to estimate that Lwt_list.map_p
would primarily
be applied to IO operations like the ones in this experiment. In that
case, the data suggest the cost per element of applying the slowest
map
implementation (ntr
) is about 0.82% of the cost of one 4KB
read/write operation on a unix pipe. In the context of Lwt
, it seems
reasonable to conclude the implementation of map
isn't a significant
concern; the real cost is performing IO. In light of that, it seems
preferable to use an implementation that cannot cause a stack
overflow.
A second question that might be asked about this analysis is: "Was it
necessary to introduce the added complexity of core_bench
to measure
the performance of map
?" For example, one might set up a performance
test with just successive calls to Time.now
or
Unix.gettimeofday
. In an earlier draft of this article, I tried such
an
approach,
producing some misleading data. In that test, I did a full garbage
collection in between runs (applications of map
), which suppressed
that (nontrivial) cost and made the tail recursive implementation
appear quite fast. core_bench
does a much better job incorporating
the cost of garbage collection by varying the number of test runs
performed in a row and then establishing a trend (with linear
regression) that reflects the amortized cost of garbage collection per
run.
One interesting finding did arise from my earlier tests: for short
lists, allocating memory on the heap actually appears to be faster
than creating frames on the stack. Since the tests perform a full
garbage collection between test runs, the timings show only the cost
of allocating space on the heap. This produced data like the
following:
Summary Statistics, \(N=10^3\) Garbage Collected Map Benchmark Times (μs)
|
Std. Lib. |
NTR |
Base |
Batteries |
Containers |
Mean |
27.18 |
27.21 |
11.23 |
19.03 |
13.06 |
Min |
15.97 |
14.07 |
6.91 |
9.06 |
7.87 |
25% |
18.84 |
17.88 |
7.15 |
10.97 |
10.01 |
50% |
20.03 |
19.07 |
8.11 |
15.50 |
10.01 |
75% |
23.13 |
22.17 |
12.87 |
18.84 |
10.97 |
Max |
849.01 |
576.02 |
379.80 |
622.03 |
434.88 |
Summary Statistics, \(N=10^4\) Garbage Collected Map Benchmark Times (μs)
|
Std. Lib. |
NTR |
Base |
Batteries |
Containers |
Mean |
230.23 |
216.60 |
141.39 |
144.59 |
161.56 |
Min |
77.01 |
73.91 |
61.04 |
54.84 |
63.18 |
25% |
97.04 |
86.07 |
67.95 |
58.89 |
70.81 |
50% |
121.95 |
99.90 |
75.10 |
61.99 |
77.01 |
75% |
193.83 |
284.91 |
148.59 |
118.26 |
175.24 |
Max |
12719.87 |
2753.97 |
2960.21 |
3616.09 |
13603.93 |
When we ignore the time spent on garbage collection, we see that ntr
is surprisingly competitive, especially in comparison with stdlib
;
the difference between the two implementations, of course, is that
ntr
allocates more memory on the heap while stdlib
creates more
stack frames.
Notes
Thanks to Anton Bachin and
Simon Cruanes for reading drafts of this
post.