Oct 20, 2021

What I learned publishing an open source package

Today I published my first package on OPAM! You can find tidy_email and its sub-libraries (tidy_email_mailgun, tidy_email_sendgrid, and tidy_email_smtp) here. This wrapped up a goal I wanted to accomplish while at Recurse Center: Writing a simple (but useful) OCaml library and making it available via opam.

I chose this project for several reasons:

I wanted more experience designing small APIs in OCaml.
I thought releasing a small library would help me understand how "open source maintainer" work differs from paid closed-source application development.
I wanted to learn more about the mechanics/process of packaging code for OPAM.

The rest of this post explains the process I used and which parts worked well. Most of this discussion is applicable to general open source development, but some of the tooling-related points are specific to OCaml.

Picking a Topic

When I started my batch at RC, I knew that I wanted to publish something, but hadn't settled on a topic. Contributing to other projects was a valuable source of ideas. After working on a couple of issues for the web framework Dream, I wrote a small blog post showing how to build a simple email feature with Mailgun. That started a discussion with other Dream contributors about how to simplify the implementation of email features.

After a few more conversations, I decided to write an "adapter" library that creates a consistent interface across different email services. Instead of having to study Mailgun or Sendgrid's REST API and worry about the details of making HTTP requests, you can install tidy_email, provide your API keys, and use a send function that consumes a simple email type. The library also makes switching email providers as simple as changing your credentials.

I intentionally kept the scope of the library very small, since this allowed me to complete the project in a short time frame while learning several new skills/systems.

Looking at "Prior Art"

Before I started implementing, I researched existing libraries that dealt with email. This took time but paid dividends by helping me to avoid re-inventing existing functionality. I learned more about letters, a library for sending email via SMTP and supporting libraries like mrmime for parsing and generating email. Later, I utilized some of these libraries while adding an SMTP option to tidy_email.

Investigating other work in the OCaml web development space helped me to articulate some of the design goals for my library. I looked at the web framework sihl, which has email among it's many capabilities, and realized I wanted to make a very "slim" library that didn't couple sending email with other dependencies related to SQL, HTTP, etc.

Picking a Name

Naming things is hard! I spent some time looking through OPAM to see which names were already taken (emile, letters). Ultimately, I decided on these constraints:

The name needed to be short, easy to search and remember.
Ideally it would have only simple English words in it.
The name should relate to email in an obvious way.

I settled on tidy_email based on Dream's description as a "tidy, feature-complete web framework". The goal of the library would be to provide a "tidy" (i.e. well-designed, beginner-friendly) API for sending mail, and hopefully appeal to people who also like Dream.

One mistake I made early on was not sticking to a consistent naming scheme; I had tidy-email, tidy_email, and "tidy email" in different places. Fortunately, github makes it fairly easy to re-name a repo if you change your mind later; it will even redirect from the old repo name to the new one.

Writing a First Draft

Even though tidy_email is quite simple. It took me some time to set all the resources up.

Before I could get an account with Sendgrid, I had to register a domain and create a simple website. I used AWS for this because I was already familiar with it from past jobs. I also needed to register for Mailgun and create keys for API and SMTP access. At some point, I had accumulated enough credentials that I needed to set up password management to stay organized.

Once I had access to services, I was able to write a first draft of the library. This turned out to be harder than I expected! I wasted a lot of time over-complicating my code with new concepts I had learned (functors and higher order modules). Eventually I came to my senses and wrote a much simpler implementation.

I found keeping good notes to be quite helpful in this phase. Later, I turned these notes into documentation explaining how to set up mail services and some relevant trade-offs when choosing a service (tl;dr: Mailgun was the easiest, lowest-commitment way to get started).

Getting Feedback

My main advice is: Don't go it alone. Having someone (ideally a more experienced maintainer) review early drafts of work was really valuable for staying on track. This helped me:

Understand how new users would think.
Re-order things to make them easier for first-time users to understand.
Remove confusing or superfluous parts of my API.
Leverage github features I wasn't aware of (actions and secrets).

I also found it helpful to attend OCaml Cafe (this is currently meeting on Zoom). A lecture about dune given by Rudi Grinberg showed me some techniques that made managing multiple .opam files much easier.

Testing

I didn't keep a careful count of my hours but I estimate that more than half were spent writing unit and integration tests. Setting these up early made it much easier to iterate on my design and accept pull requests. Github actions were pretty straightforward to configure (specified with a simple YAML file) after looking at a few examples. I appreciated not having to sign up for yet another service in order to get a simple CI system working.

In tidy_email, an "integration test" is just a small executable that tries to send an email to a mailbox. I used github secrets to add API keys to my project so that integration tests could run securely on an PR/merge that I made. It would've been nice to be able to run integration tests on outside PRs automatically, but there isn't really a way to do that securely.

Integration tests ended up being very advantageous for shaking out problems that new users are likely to encounter. For example, by testing with (intentionally) invalid SMTP credentials, I learned that Mailgun can terminate SMTP connections in a way that is hard for colombe to handle. Debugging this example gave me some valuable experience digging into tidy_email's dependencies and helped me file a good bug report to help improve colombe.

Documentation

I spent more time than expected writing documentation. The main lesson I learned is: Don't bury the lede. When people look at your repo, they have limited time. Therefore, the first things you should show them are

A simple, one sentence summary of what the library does.
A minimal working example (code) illustrating the library in action.

Everything that comes after this introduction should be in the service of getting the user set up as painlessly as possible. In my case, that meant adding sign-up links to the relevant services the library supports. I also provided a brief explanation of the trade-offs when choosing a service.

tidy_email is small enough that I didn't write documentation beyond a readme, comments in the code, and folders of working examples. I think for a project any larger than this, I would have had to invest some time thinking about how to generate and host a static site with documentation (e.g. ReadTheDocs).

Release

Publishing my code on opam was relatively painless. To sum up the process:

I ran dune build one final time and verified my opam files were up to date.
I created a "tag" for my project (git tag -a 0.0.1 && git push origin 0.0.1).
I ran "opam publish", which created a pull request on ocaml/opam-repository.
~12 hours later my PR passed CI and was approved.
About 6 hours after that, my PR was merged and showed up on opam.

My first attempt at publishing didn't pass CI because of some simple mistakes I made in my .opam files. Fixing these was relatively straightforward after reading the docs and looking at some other repos for reference.

Final Thoughts

None of the work involved with releasing tidy_email was especially hard, but there were many small tasks to understand. Compared to paid application development work, I found that this project had more "admin work": signing up for services, configuring things, and writing prose.

Overall, I found this type of work quite rewarding. It has been fun to see new people discover and star the repo on Github. The project motivated me to learn aspects of email that I would not have otherwise explored. Given how much open source software I use, it feels great to make a small contribution to the community.

posted at 00:00 · OSS, Projects

Jun 27, 2021

How I Contribute to Open Source Projects

Contributing to public projects is one of the most fulfilling aspects of my software development practice. It allows me to make new professional connections and gain exposure to challenges I would never encounter on the job. But it is effortful, especially after a full day of paid work. Over time, I've developed (mostly by trial and error) a process that makes it easier for me to fit open source work into my schedule and make this extra effort worthwhile.

Earlier in my career I wanted to contribute to public projects but didn't know how to begin. In this post, I tried to provide a concise summary of the advice I wish I'd received long ago.

Why contribute?

Like most engineers, my paid work has been on years-long, closed-source projects driven by deadlines and business requirements. In that context, it's normal to build with the most standard components possible. If your Django app needs flash messages, you probably want to use the messages framework, rather than rolling your own solution; there just isn't much business value in writing something new. As a result, there are some design problems you don't get to spend much time thinking about while doing paid work.

Working on OSS projects lets me to grow my skills beyond boundaries imposed by business concerns. For instance, working on this issue let me build a flash message system for a new web framework, Dream. Within the structure of that project, I had an opportunity to study how Rails, Django, and Phoenix implement flash messages. Discussing the merits of these different approaches with more experienced engineers helped me become a better designer.

I've also found that OSS projects are an effective way to practice the investigative skills I use on the first few months of a new job. When I join an organization, I have to get up to speed on new codebases (sometimes in a language I haven't used before). Working on an unfamiliar open source project simulates these conditions, without the accompanying pressures of a new job.

How do I find new projects to work on?

Conferences focused on a specific programming language or family of technologies are a great way to meet maintainers. When I attended Compose 2017, a serendipitous conversation in between sessions led to a 3-4 month period of making contributions to lwt, learning a lot more about OCaml, and this blog post.

Conferences can be a bit of a gamble. The costs of travel, lodging, and fees are significant. Most conferences will post their talks later. Watching these is a great way to stay informed. This Strange Loop talk, for instance, gave me a good overview of the Unison programming language, which eventually allowed me to contribute this change. This approach is much more efficient for finding projects that spark your interest, since you can watch talks at 2x speed and skip ones that aren't useful.

Evaluating whether a project is a good fit

Once I've found a project (usually on github) that I think is interesting, there are a few things I check before deciding whether to contribute.

First, I look at whether the project has recent commits. If the most recent commit is more than a few months old, that might mean the project is abandoned. Since I'm interested in being a contributor rather than a maintainer, working on these repos is usually not a good use of time.

The second thing I look at are the project's pull requests. If there are many old, open pull requests (ones that don't have any comments), this is another sign that the project might be abandoned. Looking through closed PRs, or PRs with a lot of comments, typically gives me some idea of how contributors and maintainers interact. The maintainers I've interacted with have been friendly and welcoming; I avoid projects where that isn't the norm.

Next, I'll take a look at the project's issue list. If users and maintainers write clearly, it's much easier for me to make progress on an issue without having to ask too many questions. Issues with links to relevant code, minimal working examples, and repro steps are all good signs.

I also look to see whether the project has Travis or Circle CI set up, since that means I can get quick feedback on a PR without talking to anyone. The CI systems is also helpful as a model for setting up my development environment; often I will look through the CI configuration in order to run my unit tests with the exact same databases/dependencies/settings.

Finally, I'll take a more in-depth look at the codebase itself. At this point, I'm just trying to answer basic questions, like:

How is the source tree organized?
Do most of the functions/classes/modules have comprehensible names?
Where are the tests? Where would I add new tests?
Are there comments?

Finding a task

Once I'm confident that I want to participate, my next step is to choose an open issue to work on. Ones that are marked "good first issue" or "beginner" are great for a first contribution, and I'll often start with those.

I try to keep the scope of my first contribution small. For example, this issue on the OCaml library pgx asked for someone to add support for a different date-time library (in this case ptime) because not everyone can use Jane Street's Core library. This was an ideal first issue because I could use the existing Core implementation as a model for what I needed to build.

If an issue is a little bit old, it's a good idea to leave a note asking whether the maintainer is still interested in having it addressed. For example, I was interested in working on this issue, but it was a month old. Rather than jumping into an implementation that might go to waste, I left a comment expressing interest and offering some design ideas. Replies in that thread allowed me to create this pull request, confident in the knowledge that someone would want to use my work.

Once I've picked out task, it's (finally) time to implement a solution. If I can get as far as implementing some unit tests and getting most of them to pass, I post a comment on the issue indicating that I'm working on a pull request. Personally, I consider it best practice to only post one of these comments if I'm very confident that I'll have a PR ready in the next few days. Conversely, no-one wants to duplicate effort, so it's courteous to indicate that you're preparing a PR.

Pull request and review

Since pull requests take time to review, I try to perform at least one round of self-review before opening a PR. Generally, I won't open a PR unless I have fairly good test coverage for my work and the tests (and lint steps) are passing. At a paid position, it's easy to have a meeting or use slack to discuss a change. Public projects tend to be more asynchronous, so it's important to read instructions carefully, write clearly, and address obvious issues on your own.

When I write a description for my PR, I include a link to the issue it addresses, usually in the first sentence. If the change adds new types or functions, I include a summary of that as well. If there are design details I'm unsure of, I flag these with comments so that the reviewer(s) can discuss them with me.

The code review process on a public project can be a little different from code review inside a company. At some jobs, I've observed an unspoken rule that PRs need to be more or less correct on the first or second try, otherwise you'll be considered ineffective. My sense is that the review process on public projects is much more iterative. Sometimes it isn't possible to gather all of the necessary requirements until a first PR is drafted and people have something to react to. It can take several rounds of revision before all of the interested parties agree that the design is right.

This PR for instance, required several tries to find a flash message implementation that fit with the philosophy of the project. These situations are an opportunity to practice discussing (and justifying) your design decisions with other technical-minded people. I think the key to success with public reviews is to read reviewer's comments carefully, make an effort to understand their goals, and then fit your changes to support those goals.

General suggestions

If you plan on making OSS contributions long term, I'd suggest having both a schedule and dedicated hardware for your work. Working after dinner from 6 to 8 PM weeknights helps me get into a productive mindset quickly. Time-boxing my work also helps me avoid exhausting my interest in a project by putting in too many consecutive hours. Using an inexpensive, second-hand laptop for my public projects makes it easy for me to suspend my work when I'm done for the night and then start the next day exactly where I left off, without having to set up my browser/editor/terminals again.

posted at 00:00 · OSS, Projects