Config spec feature creep

Configuration starts simple. A few keys and values, then you want to group values, and nest them, somebody asks for simple expressions, conditionals, variables …

Any sufficiently complicated configuration system contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of a programming language.

Greenspun’s Tenth Rule, applied to config

You gradually end up with a programming language that was never designed as one. YAML added templating, Nginx added if. Terraform invented HCL.

Helm is pretty bad, two languages, two escaping contexts, paired with sometimes conflicting indentation rules, and a preprocessor step.

{{- if eq .Values.env "production" }}
- path: /admin
  backend: admin-svc
{{- end }}

HCL avoids many of Helm’s complexity and duality, but still exposes a fixed set of language constructs available to your config, which could be too much, or too little. For example there is no if:

dynamic "route" {
  for_each = var.env == "production" ? [1] : []
  content { path = "/admin" }
}

Bring a full language in

Another method is to just add a scripting language: Lua, Python. Now the config can do everything, and some more.

if os.getenv("ENV") == "production" then
  add_route("/admin", admin_handler)
end

A full language usually also means os.execute(), io.open(), and require(). We can remove os, io, and require before handing it untrusted code. But we are just blacklisting, and …

blacklisting is preventing known dangers, but there are always unknown unknowns, and you can’t prevent what you don’t know you should

Whitelist with Rye

Quick preview

Rye takes the opposite approach: start with no language features at all, then explicitly whitelist capabilities.

Here’s a quick preview:

// Go side: grant exactly two operations
evaldo.RegisterBuiltinsFilter(ps, []string{"_++", "os/cwd?"})

On Go side, we register just two builtin functions _++ and os/cwd? (cwd? - current working directory built-in defined inside context os).

; Config side: use them
docs: os/cwd? ++ "/docs"

This is the entire vocabulary. Everything else was never given, no other word is defined. If you want to read a file

Read %my-secrets
; Error: Word `Read` not found

Not really a place here, but check the links if it bothers you, why there is a underline prefix when referencing ++ 1 and why Read is capitalized 2.

What is Rye

Rye is a general language written in pure Go (no CGO), you can also import it like a Go library.

Rye is a homoiconic language and every active word is just a function. Every active word is added on a library level. There is no if, fn, loop behaviour hardcoded into the evaluator.

So the language on its own can just load syntax into blocks of Rye values, and assign values to words, as that (different word types like set-words and mod-words) is a part of its syntax and nothing else.

More on set- and mod-words

What about Starlark

Starlark was built for exactly this. It’s a mature solution and it brings a lot to the table. We are still talking about concepts here. These are the differences. Starlark gives you if, for, and def unconditionally. You can’t take them away. While Rye has no reserved forms. Words like if are just functions you choose to register. Starlark’s modules are more all-or-nothing. Rye lets you grant _+ but not _*. Starlark is much more mature, but niche language - Rye is a language in development, but it strives to be a general purpose language.


Example: Markdown serving web-server

We will make a Go webserver that reads markdown, converts it to HTML and serves it over HTTP. Rye is used for config file.

Step 1 - The minimal server (~50 lines + validation)

Two dependencies: goldmark for markdown rendering and rye for the config.

package main

import (
    "fmt"
    "html/template"
    "log"
    "net/http"
    "os"
    "path/filepath"
    "strings"

    "github.com/refaktor/rye/env"
    "github.com/refaktor/rye/evaldo"
    "github.com/refaktor/rye/loader"
    "github.com/yuin/goldmark"
)

func safeMarkdownPath(baseDir, slug string) (string, error) {
    /* full code: https://github.com/refaktor/rye/blob/main/examples/whitelist-config-with-rye/step1-minimal/main.go */
}

func main() {
    raw, err := os.ReadFile("config.rye")
    if err != nil {
        log.Fatalf("failed to read config: %v", err)
    }

    ps := env.NewProgramState()

    blk := loader.LoadString(string(raw), false, ps)

    // Parse errors
    if errorObj, ok := blk.(env.Error); ok {
        log.Fatalf("parse error: %s", errorObj.Message)
    }

    evaldo.EvalBlock(ps, blk.(env.Block))

    // Runtime errors
    if ps.ErrorFlag {
        log.Fatalf("runtime error: %s", ps.Res.Print(*ps.Idx))
    }

    port := ps.Ctx.GetStringOr("port", ps.Idx, "8080")
    dir := ps.Ctx.GetStringOr("docs-dir", ps.Idx, "docs")

    tpl := template.Must(template.New("").Parse(
        `<html><body>{{.}}</body></html>`))

    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        slug := strings.TrimPrefix(r.URL.Path, "/")

        path, err := safeMarkdownPath(dir, slug)
        if err != nil {
            http.Error(w, err.Error(), http.StatusBadRequest)
            return
        }

        md, err := os.ReadFile(path)
        if err != nil {
            http.NotFound(w, r)
            return
        }

        var buf strings.Builder
        goldmark.Convert(md, &buf)

        tpl.Execute(w, template.HTML(buf.String()))
    })

    fmt.Printf("Serving on port %s\n", port)
    http.ListenAndServe(":"+port, nil)
}

And the config:

port:     "3000"
docs-dir: "content"

The config looks like YAML, but it’s normal Rye code. Rye is not space or newline sensitive, but requires spacing between each token ( parens also ).

There is no builtin registration call in the Go code. That’s deliberate. Without registering any builtins, the evaluator has nothing to call. The config file is just data notation: bind a value to a word with :, and that’s all.

No functions, conditions, arithmetic. The person writing config.rye can declare values, and nothing else. This is the zero capability baseline. We add capability one explicit registration at a time.

Step 2 - Basic computation (+2 lines of Go)

evaldo.RegisterBuiltinsFilter(ps, []string{"_*", "_+"})

Now the config can compute derived values:

port:          "3000"
docs-dir:      "content"
cache-max-age: 60 * 60 * 24    ; one day in seconds
max-body-kb:   10 * 1024       	; 10 kB expressed readably

Two words registered. Two operations available, nothing else is there.

Step 3 - Reading from the environment (+6 lines of Go)

We will create and register a custom built-in get-env this time. It returns false when a variable isn’t set. And Rye’s standard any combinator.

ps.RegisterBuiltin("get-env", 1, "get-env key",
    func(ps *env.ProgramState, a0, a1, a2, a3, a4 env.Object) env.Object {
        if v := os.Getenv(a0.(env.String).Value); v != "" {
            return *env.NewString(v)
        }
        return *env.NewBoolean(false)
    })

evaldo.RegisterBuiltinsFilter(ps, []string{"any"})

any evaluates expressions in a block and returns the first result that is not false.

port:          any { get-env "PORT" "3000" }
docs-dir:      any { get-env "DOCS_DIR" "content" }
cache-max-age: 60 * 60 * 24
max-body-kb:   10 * 1024

Step 4 - The config registers its own HTTP routes (+14 lines of Go)

So far the config has only produced values. Now it starts actively configuring the server. We use a Go map and register a custom built-in that adds to the map.

routes := map[string]string{}
ps.RegisterBuiltin("route", 2, "Defines a route",
    func(ps *env.ProgramState, a0, a1, a2, a3, a4 env.Object) env.Object {
        routes[a0.(env.String).Value] = a1.(env.String).Value
        return env.Void{}
    })

Then after evaluating the config, we wire up the collected routes:

for prefix, dir := range routes {
    dir := dir
    http.Handle(prefix+"/", http.StripPrefix(prefix,
        http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            slug := strings.TrimPrefix(r.URL.Path, "/")

            path, err := safeMarkdownPath(dir, slug)
            if err != nil {
                http.Error(w, err.Error(), http.StatusBadRequest)
                return
            }
            
            md, err := os.ReadFile(path)
            if err != nil { http.NotFound(w, r); return }
            var buf strings.Builder
            goldmark.Convert(md, &buf)
            tpl.Execute(w, template.HTML(buf.String()))
        })))
}

We also register Rye’s if and _= so the config can have conditional routes:

evaldo.RegisterBuiltinsFilter(ps, []string{"if", "_="})

Now this is possible:

port:          any { get-env "PORT" "3000" }
docs-dir:      "content"
cache-max-age: 60 * 60 * 24

route "/blog" "posts"
route "/docs" "docs"

if ( get-env "DEBUG" ) = "1" {
    route "/drafts" "drafts"
}

The /drafts route only exists when DEBUG=1. The config author is making runtime decisions about server structure using Rye’s if function.

Step 5 - The config injects logic into req. handling (+6 lines)

Up to now the config ran once at startup. Now it defines a function that the Go runtime will call on every request. We register fn, a function that creates functions for this.

evaldo.RegisterBuiltinsFilter(ps, []string{"fn", "replace", "capitalize", "str"})

In the HTTP handler:

title := slug
if fn, ok := ps.Ctx.GetFunction("page-title", ps.Idx); ok {
    evaldo.CallFunctionArgsN(fn, ps, ps.Ctx, *env.NewString(slug))
    if s, ok := ps.Res.(env.String); ok {
        title = s.Value
    }
}

ps.Ctx.GetFunction looks up a word in the config’s context. evaldo.CallFunctionArgsN invokes it with the program state, so the function sees the context it was defined in. The result comes back in ps.Res, type-asserted to string.

page-title: fn { slug } {
    slug .replace "-" " " |capitalize
}

So our config now has custom functions that the parent Go app uses.

Step 6 - Debugging the config live (+2 lines of Go)

Config files aren’t usually known for good debugging. If we are lucky we can print or log some value. Rye has a helpful function probe and even a console (REPL), so we can do better.

evaldo.RegisterBuiltinsFilter(ps, []string{"probe", "enter-console"})

// Execution limits for safety
ps.MaxCallDepth = 50
ps.MaxOps = 10_000

probe prints a value (with type information) and passes it through:

port: probe any { get-env "PORT" "3000" }
; prints [String 3000] when the env variable isn't set

enter-console is the useful one. Your process drops into a live REPL with the full current context. If your config is failing in production, you don’t just get a log line, you get a terminal:

[enter-console: after-routes]
> lc   ; list context
port         [String 3000]
docs-dir     [String content]
route        [Builtin(2): Defines a route]
page-title   [Function(1)]

> probe port
[String 3000]

> port:: "8080"

> [Ctrl-c]

Changes made at the prompt take effect when you exit. The rest of the config continues evaluating with the modified context. Remove the enter-console line when done.

What we ended up with

Step Go lines Config capability
1 ~80 Static values - pure data notation
2 +2 Arithmetic, derived values
3 +6 Env vars with any { } fallbacks
4 +14 Routes, if, conditional logic
5 +6 User-defined fn callbacks into request handling
6 +2 probe + live REPL debugging

A note on execution limits

Beyond controlling which words exist, you can also cap how much work the evaluator is allowed to do:

ps.MaxCallDepth = 50        // stop if recursion exceeds 50 frames
ps.MaxOps      = 10_000     // stop after 10k expression evaluations

Both default to zero (unlimited). Set them before EvalBlock and any runaway config, infinite recursion, returns an error instead of spinning forever.

Recap

The blacklist model (embed Lua, remove os and io) is dangerous by definition. In our Rye examples, we started with zero, just INI level config and kept adding just what we decided to add. At step 5 config author can even define live functions.

I’m not saying to dump all other approaches and go with Rye. It’s still a work in progress language, but I hope I’ve shown that the concept could be a good one, and I hope I documented how to embed Rye into Go apps, for config, or more.

Warning: This is capability control, and an exploration , not a security sandbox or something you should go and use today. Rye code still runs inside your process. If you expose unsafe builtins, the config gains those capabilities.


You can find all examples in full, ready to be executed in the Rye’s examples folder: examples/whitelist-config-with-rye