Making markdown-clj babashka-compatible

Recently I migrated the highlighting of Clojure code in this blog from CLJS to pure babashka code. See this blog post.

For rendering markdown files to html files I was using bootleg as a pod. A pod is a binary which can act as an RPC server for babashka. Learn more about that here.

Today I noticed a new PR in bootleg. The user who submitted the PR also used bootleg for markdown compilation in babashka, but bootleg didn't expose an option that the underlying library, markdown-clj, supports. The PR fixes that.

Then I wondered, can babashka run markdown-clj from source, rather than via a pod? Babashka supports a large subset of Clojure and a large subset of classes from the JVM. By now it can run a fair share of Clojure libraries from source, but sometimes minor tweaks are necessary. I looked at the dependencies of markdown-clj. The only dependency it uses is clj-commons/clj-yaml which is luckily included in babashka. If it wasn't, I'm pretty sure that dependency could be made optional for those that do not need any YAML support in their markdown compilation. When looking closer, I learned that this dependency is used in a small corner of markdown-clj to parse front-matters. I stripped those front-matters out when I migrated from Octopress to babashka, but I realize now I could have left those in. Since the rest of markdown-clj is pure Clojure, there's a reasonable chance it will work with babashka. Let's try and see.

I cloned the repo locally and tried this:

$ git clone git@github.com:yogthos/markdown-clj.git
$ cd markdown-clj
$ bb -cp src/clj:src/cljc -e "(require '[markdown.core :as md])
  (md/md-to-html-string \"# 100\")"
----- Error --------------------------------------------------------------------
Type:     java.lang.IllegalStateException
Message:  Can't dynamically bind non-dynamic var #'markdown.common/*substring*
Location: 73:3

The issue here is that SCI, the interpreter running Clojure code in babashka, doesn't understand this way of defining dynamic vars yet:

(declare ^{:dynamic true} *substring*)

Of course this should be fixed, so I filed an issue with SCI here.

But changing the code in markdown-clj to:

(def ^{:dynamic true} *substring*)

doesn't seem to have any downsides and would make that part compatible with babashka. There was another dynamic var that had the same problem. After fixing that, I got:

$ bb -cp src/clj:src/cljc -e "(require '[markdown.core :as md])
  (md/md-to-html-string \"# 100\")"
"<h1>100</h1>"

It worked!

To be sure all markdown-clj tests pass with babashka, I added a test runner. The Cognitect Labs test-runner (Cognitest :-D) works with babashka, provided that you include the babashka compatible tools.namespace fork in your dependencies. The bb.edn so far:

{:deps {markdown-clj/markdown-clj {:local/root "."}}
:tasks
{test:bb {:doc "Runs tests with babashka"
:extra-paths ["test"]
:extra-deps {io.github.cognitect-labs/test-runner
{:git/tag "v0.5.0" :git/sha "b3fd0d2"}
org.clojure/tools.namespace
{:git/url "https://github.com/babashka/tools.namespace"
:git/sha "3625153ee66dfcec2ba600851b5b2cbdab8fae6c"}
}

:requires ([cognitect.test-runner :as tr])
:task (apply tr/-main "-d" "test" *command-line-args*)}

,,,}
}

After that, you can run bb test:bb:

Running tests in #{"test"}
----- Error --------------------------------------------------------------------
Type:     clojure.lang.ExceptionInfo
Message:  Could not resolve symbol: org.apache.commons.lang.StringEscapeUtils/unescapeHtml
Location: /private/tmp/markdown-clj/test/markdown/md_test.cljc:331:11
Phase:    analysis

Ouch. The class org.apache.commons.lang.StringEscapeUtils, which is used in the tests, isn't available in babashka. Let's comment that one out. Luckily, the tests are already in a .cljc file, so using the :bb reader conditional lets us make changes specifically for babashka while leaving it the same for the other targets:

#?(:bb nil
:default
(is (= "<p><a href=\"mailto:abc@google.com\">abc@google.com</a></p>"
(#?(:clj org.apache.commons.lang.StringEscapeUtils/unescapeHtml
:cljs goog.string/unescapeEntities)

(entry-function "<abc@google.com>"))
)
)
)


#?(:bb nil
:default
(is (= "<p><a href=\"mailto:abc_def_ghi@google.com\">abc_def_ghi@google.com</a></p>"
(#?(:clj org.apache.commons.lang.StringEscapeUtils/unescapeHtml
:cljs goog.string/unescapeEntities)

(entry-function "<abc_def_ghi@google.com>"))
)
)
)

Yes, reader conditionals can be nested. You didn't see that one coming did you? After this change, bingo!

$ bb test:bb

Running tests in #{"test"}

Testing markdown.md-file-test

Testing markdown.md-test

Ran 84 tests containing 146 assertions.
0 failures, 0 errors.

While I was at it, I also added a deps.edn and tasks for running the Clojure and ClojureScript tests.

deps.edn:

{:paths ["src/clj" "src/cljs" "src/cljc"]
:deps {clj-commons/clj-yaml {:mvn/version "0.7.107"}}
:aliases
{:test
{:extra-paths ["test"]
:extra-deps {io.github.cognitect-labs/test-runner
{:git/tag "v0.5.0" :git/sha "b3fd0d2"}
criterium/criterium {:mvn/version "0.4.4"}
commons-lang/commons-lang {:mvn/version "2.6"}}

:main-opts ["-m" "cognitect.test-runner"]
:exec-fn cognitect.test-runner.api/test}

:cljs-test
{:extra-paths ["test"]
:extra-deps {olical/cljs-test-runner {:mvn/version "3.8.0"}
org.clojure/clojure {:mvn/version "1.10.1"}
org.clojure/clojurescript {:mvn/version "1.10.520"}}

:main-opts ["-m" "cljs-test-runner.main"]}
}
}

bb.edn:

{:deps {markdown-clj/markdown-clj {:local/root "."}}
:tasks
{,,,
test:clj {:doc "Runs tests with JVM Clojure"
:task (clojure "-X:test")}

test:cljs {:doc "Runs tests with ClojureScript"
:task (clojure "-M:cljs-test")}
}
}

$ bb test:clj

Running tests in #{"test"}

Testing markdown.md-file-test

Testing markdown.md-test

Ran 84 tests containing 148 assertions.
0 failures, 0 errors.

$ bb test:cljs

Testing markdown.md-test

Ran 75 tests containing 136 assertions.
0 failures, 0 errors.

After that change, I could use markdown-clj directly in the code for rendering this blog. You can see the diff here. Previously I also used bootleg for hiccup, but babashka already has hiccup as a built-in dependency so that wasn't necessary anymore either. So the blog rendering code is pure babashka now.

I submitted a PR with these changes to the markdown-clj repository. This PR was merged and a new version was published tn Clojars, which is used in the deps.edn of this blog.

Performance considerations

What about performance? Previous re-rendering all of the blog posts took 4 seconds and now it takes a second longer. Running markdown-clj from source is slower than using the pod since the code in the pod is all pre-compiled and doesn't run through SCI. Compiling a single blog post isn't noticeably slower. The difference is small enough to move forward with markdown-clj from source for now. Since it's easy to move between pure babashka, using the bootleg pod or running JVM Clojure, I keep my options open.

After writing the last paragraph, I made this blog's code run with JVM Clojure. Let's compare the time for recompiling all blog posts (which is triggered by a change to e.g. render.clj):

$ touch render.clj
$ time bb render
...
bb render   3.51s  user 0.17s system 75% cpu 4.891 total

$ touch render.clj
$ clojure -M -m render
...
clojure -M -m render   21.97s  user 1.21s system 276% cpu 8.386 total

Recompiling the entire blog with babashka is still faster, likely because the startup time on the JVM isn't that good because Clojure has to load more libraries at startup.

If we AOT those libraries then JVM Clojure becomes faster but still the startup time doesn't outweigh the performance of the JVM:

$ mkdir -p classes
$ clojure -M -e "(compile 'render)"
$ time clojure -M -m render
...
clojure -M -m render   13.91s  user 1.05s system 259% cpu 5.753 total

Most of the time I won't recompile all my blog posts but just one:

$ touch posts/markdown-clj-babashka-compatible.md
$ time bb render
Processing markdown for file: posts/markdown-clj-babashka-compatible.md
bb render   0.42s  user 0.12s system 79% cpu 0.682 total

$ touch posts/markdown-clj-babashka-compatible.md
$ time clojure -M -m render
Processing markdown for file: posts/markdown-clj-babashka-compatible.md
clojure -M -m render   5.58s  user 0.60s system 189% cpu 3.266 total

nbb

Dmitri Sotnikov, the author of markdown-clj, suggested that markdown-clj could also be made compatible with scittle and nbb. The only change I had to make to the original source was to change cljs.reader into clojure.edn and then it worked:

$ nbb -cp src/clj:src/cljc:src/cljs -e "(require '[markdown.core :as md])
  (md/md->html \"# 100\")"
"<h1>100</h1>"

After adding a runner:

(ns nbb-runner
(:require [clojure.string :as str]
[clojure.test :refer [run-tests]]
[nbb.classpath :as cp])
)


(cp/add-classpath (str/join ":" ["src/cljs" "src/cljc" "test"]))

(require '[markdown.md-test])

(run-tests 'markdown.md-test)

and a few minor tweaks to the tests, the library runs with nbb:

Testing markdown.md-test

Ran 75 tests containing 134 assertions.
0 failures, 0 errors.

Discuss this post here.

Published: 2021-11-17

Publishing an nbb project to npm

This post describes how to publish an nbb based project to npm. It is extracted from this entry from the nbb documentation.

Nbb is an ad-hoc CLJS scripting environment for Node.js. You could say that it is to CLJS on Node.js what babashka is to JVM Clojure.

As an example we will build a CLI, print-cli-args that prints command line arguments.

First, create a new directory print-cli-args and cd into it. Then create a package.json:

{
  "name": "print-cli-args",
  "version": "0.0.1",
  "dependencies": {
    "nbb": "0.0.109"
  },
  "bin": {
    "print-cli-args": "index.mjs"
  }
}

The CLI depends on a specific version of nbb and exposes itself as a binary called print-cli-args, which is linked to index.mjs in our project. It is important to use .mjs rather than .js so Node.js recognizes the file as an ES6 module.

The index.mjs file is a small wrapper that sets up the classpath for nbb to the src directory relative to the wrapper using addClassPath. It also calls the initial CLJS file using loadFile.

#!/usr/bin/env node

import { addClassPath, loadFile } from 'nbb';
import { fileURLToPath } from 'url';
import { dirname, resolve } from 'path';

const __dirname = fileURLToPath(dirname(import.meta.url));

addClassPath(resolve(__dirname, 'src'));
await loadFile(resolve(__dirname, 'src/print_cli_args/core.cljs'));

Finally, in src/print_cli_args/core.cljs we write the CLJS code:

(ns print-cli-args.core
(:require [clojure.string :as str]))


(def cmd-line-args (not-empty (js->clj (.slice js/process.argv 2))))

(println "Your command line arguments:"
(or (some->> cmd-line-args (str/join " "))
"None")
)

To test the CLI in development, run node index.mjs 1 2 3.

When you npm install -g from within the project, you can call print-cli-args from anywhere on your system.

When everything looks good, it's time to npm publish so everyone can enjoy your new CLI.

After you have done so, you can run this example from npm using:

$ npx print-cli-args 1 2 3
Your command line arguments: 1 2 3

or:

$ npm install -g print-cli-args
$ print-cli-args 1 2 3
Your command line arguments: 1 2 3

Discuss this post here.

Published: 2021-11-10

Writing a Clojure highlighter from scratch

In the aftermath of my previous blog post about using Nextjournal's clojure-mode for better highlighting, I tried optimizing the JS output and got a look at the internals of CodeMirror 6. I realized that writing a Clojure highlighter from scratch wasn't that hard if you had the right tools at hand:

I spent my Sunday afternoon combining these tools which resulted in a 160 line script called highlighter.clj which is now used to do the highlighting of this blog.

This blog post is a high level walkthrough of the code. Let's begin with the first step.

1. Parse blocks of Clojure code from markdown and apply highlighting.

(defn highlight-clojure [markdown]
(str/replace markdown #"(?m)```\s*clojure\n([\s\S]+?)\n\s*```"
(fn [[_ code]]
(try (-> (str/trim code)
(htmlize)
(str/replace "[" "&#91;")
(str/replace "]" "&#93;")
(str/replace "*" "&#42;")
(str/replace "_" "&#95;"))

(catch Exception e
(log "Could not highlight: " (ex-message e) code)
markdown)
)
)
)
)

Parsing blocks of Clojure code from a markdown post is done using a basic regex. Then we pass the Clojure code to the htmlize function. After that we escape some markdown-specific characters, so the markdown compiler won't be confused by them.. If the highlighting failed for some reason, we log it and fall back on the unprocessed markdown. During the implementation I found several snippets of Clojure code with unbalanced parens which I had to fix, since rewrite-clj doesn't accept it. So all examples from this blog should be copy-pastable into your Clojure editor without problems from now on.

2. Parse and analyze Clojure using clj-kondo and rewrite-clj:

(defn htmlize [code]
(binding [*analysis*
(let [ana (analysis code)]
{:locals (locals ana)
:var-defs (var-defs ana)}
)
]

(let [html (-> code p/parse-string-all node->html)]
(format "<pre><code class=\"clojure hljs\">%s</code></pre>" html))
)
)

Clj-kondo provides information about vars, keywords and locals. We will apply special styling to var definitions and locals and their usages.

3. Clj-kondo analysis

(pods/load-pod 'clj-kondo/clj-kondo "2021.10.19")

(require '[pod.borkdude.clj-kondo :as clj-kondo])

(defn analysis [code]
(let [tmp (doto (fs/file (fs/create-temp-dir) "code.clj")
fs/delete-on-exit)
]

(spit tmp code)
(-> (clj-kondo/run!
{:lint [(str tmp)]
:config {:output {:analysis {:locals true}}}}
)

:analysis)
)
)

To call clj-kondo from babashka, we use the binary from the pod registry which is automatically downloaded via load-pod if you provide a fully qualified symbol and version. We write the code to a temp file and lint it. We ask for the static analysis data. Locals are not included by default, so we set :locals to true. Later on we want to detect if a symbol is a local or a var. We do this by making a set of locations from the analysis data for each group:

(defn locals [analysis]
(->> analysis
((juxt :locals :local-usages))
(apply concat)
(map (juxt :row :col)) set)
)


(defn var-defs [analysis]
(->> analysis
:var-definitions
(map (juxt :name-row :name-col)) set)
)

4. Rewrite-clj nodes

Next, we parse the code to rewrite-clj nodes. Each node has a tag for which we write a multi-method to dispatch on:

(defmulti node->html tag)

For each kind of node we will emit a <span> element with an associated class. For instance, :foo will become <span class="keyword">:foo</span> and so on.

A small helper function:

(defn span [class contents]
(format "<span class=\"%s\">%s</span>"
class contents)
)

Here is the implementation for a map node:

(defmethod node->html :map [node]
(span "map" (format "{%s}"
(str/join (map node->html (:children node))))
)
)

A map node has :children so we just call node->html for each child and join the strings together.

I wrote a :default implementation that logs a warning for nodes that I hadn't implemented yet:

(defmethod node->html :default [node]
(log "Unhandled tag:" (tag node))
(span (name (tag node))
(if (:children node)
(str/join "" (map node->html (:children node)))
(str node))
)
)

and added implementations for all of the nodes that occurred in Clojure snippets in all the posts of this blog so far, by working through the list of unhandled tags.

Rewrite-clj doesn't give different tags for symbols, strings, numbers and so on: it groups them under the :token tag. So there is some extra work needed to get different highlighting for different types of tokens. I wrote a function that returns a CSS class by looking at the contents of the node or at the type of value of the node. For a symbol node, I want different highlighting for vars and locals. This is where I check in the clj-kondo analysis if the symbol on that location is a local or var and else fall back on the general symbol CSS class.

(defn token-class [node]
(cond (:k node) "keyword"
(:lines node) "string"
(contains? node :value)
(let [v (:value node)]
(cond (number? v) "number"
(string? v) "string"
(boolean? v) "boolean"
(nil? v) "nil"
(symbol? v)
(cond (contains? (:locals *analysis*)
((juxt :row :col) (meta node)))

"local"
(contains? (:var-defs *analysis*)
((juxt :row :col) (meta node)))

"def"
:else
"symbol")

:else (name (tag node)))
)

;; fallback, log missing case
:else (log (tag node) (keys node) (sexpr node) (type (sexpr node))))
)


(defmethod node->html :token [node]
(span (token-class node)
(escape (str node)))
)

5. Styling

Finally I wrote some styling:

.def { color: #00f; }
.symbol { color: #708; }
.local { color: cadetblue; }
.string { color: #a11; }
.number { color: blue; }
.keyword { color: #219; }
.uneval { filter: opacity(0.5); }

For :uneval nodes, which is rewrite-clj's name for expressions that are ignored using the reader underscore dispatch macro: #_(+ 1 2 3), I set opacity to 0.5. Can you see the difference?

(+ 1 2 3)
#_(+ 1 2 3)

That's it really. A Sunday afternoon well spent. The code for the highlighter is here. In the future I might pull out this code into a library. The renderer could support ANSI escape code sequences for the terminal as well. Let me know what you think.

Discuss this post here.

Published: 2021-11-08

Archive