Skip to content

Add frontmatter#2197

Draft
traviscross wants to merge 12 commits intomasterfrom
TC/frontmatter
Draft

Add frontmatter#2197
traviscross wants to merge 12 commits intomasterfrom
TC/frontmatter

Conversation

@traviscross
Copy link
Contributor

@traviscross traviscross commented Mar 4, 2026

cc @ehuss

(Note the draft status. I put this up so @ehuss and I could discuss this in the lang-docs call, but it's not yet ready for general review -- I'm still revising.)

r[frontmatter.body]
No line in the body may start with a sequence of hyphens (`-`) equal to or longer than the opening fence. The body may not contain carriage returns.

[horizontal whitespace]: whitespace.md#grammar-HORIZONTAL_WHITESPACE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, I think this can use the automatic link:

Suggested change
[horizontal whitespace]: whitespace.md#grammar-HORIZONTAL_WHITESPACE
[horizontal whitespace]: grammar-HORIZONTAL_WHITESPACE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traviscross and others added 2 commits March 4, 2026 04:39
Currently, in our Markdown, we support `[text][RULE_NAME]` and
`[text][grammar-RULE_NAME]` for linking to grammar rules, but
we don't support this syntax within link reference definitions,
i.e., `[text]: grammar-RULE_NAME`, even though we do support
linking to (non-grammar) rule identifiers within link reference
definitions.  That's an inconsistency that continually surprises
us.  Let's fix that.

In this commit, we add `grammar_link_references`, which scans
link reference definitions for destinations that match a grammar
rule name -- either with a `grammar-` prefix or not.  When a match
is found, the destination is replaced with the resolved path and
anchor, just as `rule_link_references` does for rules.  Unrecognized
destinations pass through unchanged, falling through to `std_links`
for rustdoc resolution -- the same behavior as unresolved
`[text][NAME]` reference links.

We also update the dev-guide to document the new feature in both
`links.md` and `grammar.md`.
@traviscross traviscross force-pushed the TC/frontmatter branch 3 times, most recently from c36c071 to ec0193c Compare March 4, 2026 06:22
The prior commit added a grammar for frontmatter, but the grammar
notation available at the time that commit was prepared couldn't
express all of the invariants the language requires.  Opening and
closing fences must have the same dash count.  Indented fences must
be rejected as an error.  And once an opening fence is recognized,
the parser must commit -- it can't backtrack and reinterpret the
dashes as tokens.

Since then, we've added named range repeats, hard cut, and negative
lookahead to the grammar notation.  With these, we can express the
invariants directly.

In this commit, we rewrite the frontmatter grammar.  Named range
repeats let the closing fence reference the opening fence's
dash count.  Hard cut commits the parse after the opening
dashes.  And `FRONTMATTER_INVALID` uses hard cut followed
by the bottom rule (`^ ⊥`) to express that indented fences
are a recognized-and-rejected syntactic form.  We also add
`⊥` as a primitive production in the Notation chapter, move
`HORIZONTAL_WHITESPACE` to Whitespace, and fix some minor editorial
matters such as indentation and comment style.
The fence description uses the phrase "a matching pair of hyphens",
which can be misread as describing exactly two individual hyphens.
The constraints on fence length and matching are also compressed into
a single sentence with a trailing subclause ("from 3 to 255") that
reads as nonrestrictive.

Let's give each constraint its own sentence: what a fence is, where
it must appear, the length bounds on the opening fence, the matching
requirement for the closing fence, and trailing whitespace.  This
makes the structure clearer.
The infostring sentence uses an inverted construction ("Following the
opening fence may be an infostring"); it's a bit awkward.

Let's use active voice and tighten the phrasing.
The body restriction sentence combines two unrelated constraints --
the hyphen-line restriction and the carriage-return ban -- in a single
sentence joined by "or".  This makes "or carriage returns" read as
parallel to "hyphens", as though the line could maybe start with
carriage returns.

Let's split these into two separate sentences so that each constraint
stands on its own.
The prose mentions "horizontal whitespace" in two places (fence
trailing content and infostring trailing content) without linking
to the grammar definition.  Since `HORIZONTAL_WHITESPACE` is now a
defined production in Whitespace, let's add a link so readers can
click through to the precise definition.
The frontmatter removal section in `input-format.md` is a
single sentence ("After some whitespace, frontmatter may next
appear in the input") that doesn't clearly describe the removal
behavior.  By contrast, the shebang removal section provides a full
description with an example.

Let's rewrite the section with a precise description of the removal
process and add an annotated example.
The `frontmatter.document` rule said "Frontmatter may only be preceded
by a shebang and whitespace", where the "and" could be misread as
requiring both a shebang and whitespace rather than listing the set of
things allowed to precede frontmatter.

Since we merged the shebang prose revision (#2192),
the shebang position rule now reads as a positive statement of where
the shebang may appear.  Let's follow the same pattern here: state
positively where frontmatter may appear rather than leaning on "only"
and a negative constraint.

We'll also rename the rule identifier to `frontmatter.position` in
keeping with our conventions.
The example under `frontmatter.intro` used an external crate, a
nontrivial script body, and a bare `rust` code block that would
fail CI since the test runner doesn't support frontmatter.  Let's
simplify it to mirror the example in the frontmatter removal section
of `input-format.md`, and let's wrap it in an `EXAMPLE` admonition
consistent with our convention for examples that aren't demonstrating
the behavior of a specific rule.
The intro under `frontmatter.intro` said "an optional section for
content intended for external tools without requiring these tools
to have full knowledge of the Rust grammar."  This was a negative
construction (what frontmatter doesn't require) rather than a positive
one (what it is and what it enables).

In this commit, we rewrite the intro as "an optional section of
metadata whose syntax allows external tools to read it without parsing
Rust."  This tells the reader three things in one sentence: what
frontmatter is, who it's for, and the key design property.
For the `WHITESPACE` grammar rule, we cite `Pattern_White_Space`.  For
`HORIZONTAL_WHITESPACE`, we hadn't cited provenance.  Let's do that.

Horizontal whitespace, in a Unicode context, is defined by UAX
31, Section 4.1, which categorizes `Pattern_White_Space` into
line endings, ignorable format controls, and horizontal space.
The horizontal space category is exactly the two characters our
grammar specifies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants