Writing my own static site generator

maurycyz.com2026年01月16日 00:00

In principle, a static site generator is a good idea: They automatically populate your homepage, index pages and RSS feeds, making it impossible to forget anything.

Unlike a CMS like Wordpress, they don't add runtime cost or security vulnerabilities: They run once to generate your site and are never exposed to the internet.

However, they all put weird restrictions on how you structure your site:

Nearly all of them require you to write in Markdown, which is common, but poorly specified and difficult to parse. As soon as you do anything more complex then bolding a few words, it becomes a struggle to get the parser to do the right thing.

I recently spent way too much time trying to figure out how to stop it from link-ifying something that looked like a URL.

In any case, the whole thing seems pointless: I'm writing a website, something that will be published exclusively on the web... so...

Why not just use HTML?

Converting from Markdown would make sense if HTML was super ugly, but it's not. Here's what bolding looks like in Markdown:

This is an **important point**

... and here's the same in HTML:

This is an <b>important point</b>

That might be harder to type on a phone keyboard, but if you're writing on a phone, you're not having a good time anyway.

There are some things that suck with HTML, like manual syntax highlighting, but that sucks in any language. Adding a whole markdown parser just to improve code blocks is overkill.

These translation layers are fragile: The original website still renders fine in modern browsers, but there's no shortage of Wordpress blogs where all the formatting broke after a server update... and not just a Wordpress problem.

Configuration:

The Markdown thing is somewhat justifiable: A lot of people are more comfortable with markdown then HTML.

... but they all use weird template formats, which are really entire programming languages shoved inside HTML: complete with questionable design decisions and without nearly enough documentation. As a result, it's very difficult to get these tools to do what you want:

I knew that something as simple as removing the publication date from pages that shouldn't have one (like the archives) would result in hours of googling cryptic error messages.

Using them doesn't feel like handcrafting a website. It feels like throwing arcane bullshit at the wall and seeing what sticks.

I'll just do it myself:

Each article on my site is an HTML file prefixed with a metadata block:

title Example page:
date 2021-02-11
edited 2025-01-21
tag electronics
tag radiation
---

Paragraph one.
.
Paragraph two 
<!-- snip -->
.
... the rest of the page

My program copies all files from the source tree into the destination directory. Everything except HTML is left unchanged.

When it sees an HTML file: The metadata block is parsed and saved for later. Everything else is added to a template with the navigation bar and styles and written to the output directory.

To make writing easier, the generator supports using a dot on it's own line:

.

... as a shorthand for paragraph breaks:

</p><p>

During this process, the generator also saves everything before the <!-- snip --> comment as an excerpt for the homepage. This manual because a good excerpt should be long enough get an idea of what the post is about... but not any longer.

Once done with the content, all the posts are sorted by date in reverse order: The first 5 are added to the homepage. The first 20 are included in the the RSS feed. All posts except space photos are included in the archive page. All 12 tags get their own index pages.

There's also my catalog page which is generated from objects tagged in photos I've taken: The index page has a section for each astronomical catalog (i.e. NGC), with links sorted by their number in that catalog. (i.e. 5194) Images of the same object are grouped together, and sorted by date taken. Everything shoulds to be on one page for easy searching and browsing.

This would be very difficult or even impossible to do in a templateing language, but is trivial with a custom site generator.

All of this is contained in a single C file: It has no configuration files, no external assets and no weird domain-specific-language. The code is ~700 lines, including the template HTML. It's something one person can understand and maintain.

I highly recommend doing this:

  • There will never be an update that breaks your site. Sure, you could stay on an old version, but it's only a matter of time before you find an annoying bug.

  • You have complete control: Want to write pages in Emacs Org-mode? LaTeX? AsciiDoc? Plain text? No problem.

  • Do you need something weird and very specific? Just add it.

It's not hard to write one: It took only a few hours of work to get mine to the point it could replace Hugo with it's 1000+ dependencies... and mine is written in C: Not exactly the friendliest language.

Switching:

The longest part was converting all my 100+ blog posts into the new format. This could have been automated, but I wanted the chance to freshen up the writing. (... and it still took less time then configuring Hugo.)

I started this blog to get better at writing: All the posts from before 2023 are bad.

Really, wanting to delete everything out of shame proves it's working. But since this is a blog, I don't have to delete the old stuff: I can just fix all the mistakes.

I made sure to keep all URLs and RSS <guid>-s the same as the old site, so most people shouldn't notice the change. If something broke, please send me an email:

[email protected]