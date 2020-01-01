Skip to content
Having Fun with Markdown and Remark

By Georges HaidarAug 20205 min read

I went down a rabbit hole of learning how to parse Markdown to add custom syntax to it. I had to go through a few projects and learn how to work with a new type of abstract syntax tree (AST). Previously, my experience with ASTs was limited to writing codemods using jscodeshift. In the end, it turned out to be a worthwhile investment of time! In this post, I'll go over what I learned using a problem I wanted to solve.

The problem

I wanted to introduce a new syntax to Markdown that lets authors add stylised subtitles to their documents. Using smaller headings to give the appearance of subtitles is not a great solution. It is not considered semantic HTML and it also hurts accessibility because it introduces extra, unintended headings to the document's accessibility tree.

A screenshot of a blog post that has a subtitle

What I wanted was a new syntax that looked like this:

# I am a level 1 heading

#- I am a level 1 subtitle

##- I am a level 2 subtitle

######- Up to six levels are supported corresponding to heading levels

When rendered to HTML these should come out as:

<h1>I am a level 1 heading</h1>
<p class="subtitle subtitle--1">I am a level 1 subtitle</p>
<p class="subtitle subtitle--2">I am a level 2 subtitle</p>
<p class="subtitle subtitle--6">
  Up to six levels are supported corresponding to heading levels
</p>

The ecosystem

Currently, the best tools to use for this task exist in the node.js ecosystem. These are:

  • unified - an interface for parsing, inspecting, transforming, and serializing content through syntax trees
  • remark - a Markdown processor powered by plugins part of the unified collective
  • mdast - a specification for representing Markdown in a syntax tree (see this example)
  • hast - a specification for representing HTML (and embedded SVG or MathML) as an abstract syntax tree. You can use rehype to parse html text as hast.
  • unist - is a specification for syntax trees. mdast and hast are unist-compliant syntax trees.

What do these tools look like in practice? The following code gives an idea of what a processing pipeline looks like:

// our Markdown parser that spits out mdast
const remark = require("remark");
// an mdast to html serializer
const html = require("remark-html");
// the plugin we'll write
const subtitlePlugin = require("./remark-subtitles");

const text = `
# Hello
###- How are __you__?
Great!`;

remark()
  .use(subtitlePlugin)
  .use(html)
  .process(text /* Markdown in */, function (err, file) {
    if (err) throw err;
    console.log(String(file)); /* HTML out */
  });
});

The solution

Starting from this, we can now write our plugin. The following snippet shows the plugin code with comments annotating the interesting parts.

// These `unist-util-*` utilities are super useful when working with unist
// syntax trees
const is = require("unist-util-is");
const visit = require("unist-util-visit");

// We're going to need this to convert some mdast nodes to hast nodes later on
const mdastToHast = require("mdast-util-to-hast");

// Our plugin's constructor function. This would receive configuration options.
module.exports = function subtitlePlugin() {
  // Plugins need to return a transform function that takes a unified compatable
  // AST and manipulate or walk it.
  return async function transform(tree) {
    // Go through the Markdown document (in mdast form) and call my callback
    // whenever you see paragraph nodes.
    visit(tree, "paragraph", (paragraphNode) => {
      const { children } = paragraphNode;

      // Get the first child node under the paragraph and make sure it's a text
      // node. If it's not, skip processing this paragraph node.
      const textNode = children && children[0];
      if (!is(textNode, "text")) {
        return;
      }

      // Does this text node start with a sequence of hash ('#') signs followed
      // by a dash ('-')?
      const text =
        typeof textNode.value === "string" ? textNode.value.trimLeft() : "";
      const re = /^(#{1,6})-\s+/;
      const matches = text.match(re);
      if (typeof text === "string" && !matches) {
        return;
      }

      // If it did let's count the number of '#'s as that will be our subtitle
      // depth
      const depth = matches[1].length;

      // Once we have what we need, let's make a copy of this text node without
      // the leading subtitle syntax.
      // i.e. '##- hello world' becomes 'hello world'
      const newValue = text.replace(re, "");

      // We can now attach some metadata to an mdast node. If the node is being
      // serialized to html by a hast-compatible library, it will know to use
      // these overrides instead of the default behaviour of rendering a plain
      // <p> tag.
      paragraphNode.data = {
        // we could use a different html tag but "p" is semantically correct for
        // the subtitle
        hName: "p",
        // The <p> tag will have the following attributes added to it.
        // Note that we need to use "className" for the html "class" attribute.
        hProperties: {
          className: `subtitle subtitle--${depth}`,
          "data-remark-subtype": "subtitle",
          "data-subtitle": depth,
        },
        // When we are passing custom children, it is our responsibility to make
        // sure they are in hast format instead of mdast. We use the library,
        // mdast-util-to-hast, to do this conversion.
        hChildren: [
          // We pass in a modified text node without the leading subtitle
          // characters
          {
            ...textNode,
            value: newValue,
          },
          // Then we pass in the rest of the children under this paragraph node
          ...children.slice(1),
        ].map(mdastToHast), // Finally convert it all to hast
      };
    });
  };
};

If you'd like to play with a runnable version of this code check out this runkit demo.

Wrap up: Why is this cool?

Beyond adding new syntax, being able to analyse Markdown unlocks a lot of automation and authoring enhancements. Here are some ideas of where you can go with this:

  • Write your incident response documents in Markdown and use - [] to create action items. Now, you can write a parser and pipeline that runs in CI to create tickets for these and update the document with links to them.
  • Use remark to lint Markdown documents for things like too many spaces (e.g. extra   space)
  • Use remark to take links to an excalidraw diagram and embed a preview on hover feature

Check out these awesome remark plugins if you're looking for inspiration. I hope this helps you get started!

