Skip to content
14 changes: 14 additions & 0 deletions dev-guide/src/grammar.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,5 +154,19 @@ The [`mdbook-spec`] plugin automatically adds Markdown link definitions for all

In some cases, there might be name collisions with the automatic linking of rule names. In that case, disambiguate with the `grammar-` prefix, such as `[Type][grammar-Type]`. The prefix can also be used when explicitness would aid clarity.

Production names can also be used in link reference definitions to provide custom link text, both with and without the `grammar-` prefix.

```markdown
We accept any [type].

[type]: grammar-Type
```

```markdown
We accept any [type].

[type]: Type
```

[`mdbook-spec`]: tooling/mdbook-spec.md
[Notation]: https://doc.rust-lang.org/nightly/reference/notation.html
4 changes: 4 additions & 0 deletions dev-guide/src/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,10 @@ Link definitions are automatically generated for all grammar production names. S
This attribute uses the [MetaWord] syntax.

Explicit grammar links can have the `grammar-` prefix like [Type][grammar-Type].

Grammar links can also appear in link reference definitions, e.g. [type].

[type]: grammar-Type
```

## Outside book links
Expand Down
1 change: 1 addition & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

- [Lexical structure](lexical-structure.md)
- [Input format](input-format.md)
- [Frontmatter](frontmatter.md)
- [Keywords](keywords.md)
- [Identifiers](identifiers.md)
- [Comments](comments.md)
Expand Down
66 changes: 66 additions & 0 deletions src/frontmatter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
r[frontmatter]
# Frontmatter

r[frontmatter.syntax]
```grammar,lexer
@root FRONTMATTER ->
WHITESPACE_ONLY_LINE*
!FRONTMATTER_INVALID
FRONTMATTER_MAIN

WHITESPACE_ONLY_LINE -> (!LF WHITESPACE)* LF

FRONTMATTER_INVALID -> (!LF WHITESPACE)+ `---` ^ ⊥

FRONTMATTER_MAIN ->
`-`{n:3..=255} ^ FRONTMATTER_REST

FRONTMATTER_REST ->
FRONTMATTER_FENCE_START
FRONTMATTER_LINE*
FRONTMATTER_FENCE_END

FRONTMATTER_FENCE_START ->
MAYBE_INFOSTRING_OR_WS LF

FRONTMATTER_FENCE_END ->
`-`{n} HORIZONTAL_WHITESPACE* ( LF | EOF )

FRONTMATTER_LINE -> !`-`{n} ~[LF CR]* LF

MAYBE_INFOSTRING_OR_WS ->
HORIZONTAL_WHITESPACE* INFOSTRING? HORIZONTAL_WHITESPACE*

INFOSTRING -> (XID_Start | `_`) ( XID_Continue | `-` | `.` )*
```

r[frontmatter.intro]
Frontmatter is an optional section of metadata whose syntax allows external tools to read it without parsing Rust.

> [!EXAMPLE]
> <!-- ignore: test runner doesn't support frontmatter -->
> ```rust,ignore
> #!/usr/bin/env cargo
> --- cargo
> package.edition = 2024
> ---
>
> fn main() {}
> ```

r[frontmatter.position]
Frontmatter may appear at the start of the file (after the optional [byte order mark]) or after a [shebang]. In either case, it may be preceded by [whitespace].

r[frontmatter.fence]
Frontmatter must start and end with a *fence*. Each fence must start at the beginning of a line. The opening fence must consist of at least 3 and no more than 255 hyphens (`-`). The closing fence must have exactly the same number of hyphens as the opening fence. The hyphens of either fence may be followed by [horizontal whitespace].

r[frontmatter.infostring]
The opening fence, after optional [horizontal whitespace], may be followed by an infostring that identifies the format or purpose of the body. An infostring may be followed by horizontal whitespace.

r[frontmatter.body]
No line in the body may start with a sequence of hyphens (`-`) equal to or longer than the opening fence. The body may not contain carriage returns.

[byte order mark]: https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
[horizontal whitespace]: grammar-HORIZONTAL_WHITESPACE
[shebang]: input-format.md#shebang-removal
[whitespace]: whitespace.md
22 changes: 21 additions & 1 deletion src/input-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,25 @@ The shebang may appear immediately at the start of the file or after the optiona
r[input.shebang.removal]
The shebang is removed from the input sequence (and is therefore ignored).

r[input.frontmatter]
## Frontmatter removal

r[input.frontmatter.removal]
If the remaining input begins with a [frontmatter] fence, optionally preceded by lines containing only [whitespace], the [frontmatter] and any preceding whitespace are removed.

For example, given the following file:

<!-- ignore: test runner doesn't support frontmatter -->
```rust,ignore
--- cargo
package.edition = 2024
---

fn main() {}
```

The first three lines (the opening fence, body, and closing fence) would be removed, leaving an empty line followed by `fn main() {}`.

r[input.tokenization]
## Tokenization

Expand All @@ -79,7 +98,7 @@ The resulting sequence of characters is then converted into tokens as described
>
> - Byte order mark removal.
> - CRLF normalization.
> - Shebang removal when invoked in an item context (as opposed to expression or statement contexts).
> - Shebang and frontmatter removal when invoked in an item context (as opposed to expression or statement contexts).
>
> The [`include_str!`] and [`include_bytes!`] macros do not apply these transformations.

Expand All @@ -88,4 +107,5 @@ The resulting sequence of characters is then converted into tokens as described
[comments]: comments.md
[Crates and source files]: crates-and-source-files.md
[shebang]: https://en.wikipedia.org/wiki/Shebang_(Unix)
[frontmatter]: frontmatter.md
[whitespace]: whitespace.md
2 changes: 1 addition & 1 deletion src/items/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ r[items.mod.attributes]
## Attributes on modules

r[items.mod.attributes.intro]
Modules, like all items, accept outer attributes. They also accept inner attributes: either after `{` for a module with a body, or at the beginning of the source file, after the optional BOM and shebang.
Modules, like all items, accept outer attributes. They also accept inner attributes: either after `{` for a module with a body, or at the beginning of the source file, after the optional BOM, shebang, and frontmatter.

r[items.mod.attributes.supported]
The built-in attributes that have meaning on a module are [`cfg`], [`deprecated`], [`doc`], [the lint check attributes], [`path`], and [`no_implicit_prelude`]. Modules also accept macro attributes.
Expand Down
14 changes: 14 additions & 0 deletions src/notation.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,20 @@ Mizushima et al. introduced [cut operators][cut operator paper] to parsing expre

The hard cut operator is necessary because some tokens in Rust begin with a prefix that is itself a valid token. For example, `c"` begins a C string literal, but `c` alone is a valid identifier. Without the cut, if `c"\0"` failed to lex as a C string literal (because null bytes are not allowed in C strings), the parser could backtrack and lex it as two tokens: the identifier `c` and the string literal `"\0"`. The [cut after `c"`] prevents this --- once the opening delimiter is recognized, the parser cannot go back. The same reasoning applies to [byte literals], [byte string literals], [raw string literals], and other literals with prefixes that are themselves valid tokens.

r[notation.grammar.bottom]
### The bottom rule

In logic, ⊥ (*bottom*) represents absurdity --- a proposition that is always false. In type theory, it is the *empty type*: a type with no inhabitants. The grammar borrows both senses: the rule ⊥ matches nothing --- not any character, not even the end of input.

```grammar,notation
// The bottom rule does not match anything.
⊥ -> !(CHAR | EOF)
```

Placed after a [hard cut operator], `^ ⊥` makes a rule fail unconditionally once the parser has committed past the cut. This gives the grammar a way to express *recognition without acceptance*: the parser identifies the input, commits so that no other alternative can be tried, and then rejects it. In the frontmatter grammar, for example, [`FRONTMATTER_INVALID`] uses `^ ⊥` to recognize an opening fence preceded by whitespace on the same line --- input that is close enough to frontmatter to rule out other interpretations, but that is not valid.

[`FRONTMATTER_INVALID`]: frontmatter.md#grammar-FRONTMATTER_INVALID

r[notation.grammar.string-tables]
### String table productions

Expand Down
8 changes: 8 additions & 0 deletions src/whitespace.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ WHITESPACE ->
| U+2028 // Line separator
| U+2029 // Paragraph separator

HORIZONTAL_WHITESPACE ->
U+0009 // Horizontal tab, `'\t'`
| U+0020 // Space, `' '`

TAB -> U+0009 // Horizontal tab, `'\t'`

LF -> U+000A // Line feed, `'\n'`
Expand All @@ -26,10 +30,14 @@ CR -> U+000D // Carriage return, `'\r'`
r[lex.whitespace.intro]
Whitespace is any non-empty string containing only characters that have the [`Pattern_White_Space`] Unicode property.

r[lex.whitespace.horizontal]
[HORIZONTAL_WHITESPACE] is the horizontal space subset of [`Pattern_White_Space`] as categorized by [UAX #31, Section 4.1][uax31-4.1].

r[lex.whitespace.token-sep]
Rust is a "free-form" language, meaning that all forms of whitespace serve only to separate _tokens_ in the grammar, and have no semantic significance.

r[lex.whitespace.replacement]
A Rust program has identical meaning if each whitespace element is replaced with any other legal whitespace element, such as a single space character.

[`Pattern_White_Space`]: https://www.unicode.org/reports/tr31/
[uax31-4.1]: https://www.unicode.org/reports/tr31/#Whitespace_and_Syntax
41 changes: 41 additions & 0 deletions tools/mdbook-spec/src/grammar.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,47 @@ pub fn insert_grammar(grammar: &Grammar, chapter: &Chapter, diag: &mut Diagnosti
content
}

/// Converts link reference definitions that point to a grammar rule
/// to the correct link.
///
/// For example:
///
/// ```markdown
/// We accept any [token].
///
/// [token]: grammar-Token
/// ```
///
/// This will convert the `[token]` definition to point
/// to the actual link.
///
/// This supports both a `grammar-` prefixed form (e.g.
/// `grammar-Token`) and a bare rule name (e.g. `Token`).
pub fn grammar_link_references(chapter: &Chapter, grammar: &Grammar) -> String {
let current_path = chapter.path.as_ref().unwrap().parent().unwrap();
let for_summary = is_summary(chapter);
crate::MD_LINK_REFERENCE_DEFINITION
.replace_all(&chapter.content, |caps: &Captures<'_>| {
let dest = &caps["dest"];
let name = dest.strip_prefix("grammar-").unwrap_or(dest);
if let Some(production) = grammar.productions.get(name) {
let label = &caps["label"];
let relative = pathdiff::diff_paths(&production.path, current_path).unwrap();
// Adjust paths for Windows.
let relative = relative.display().to_string().replace('\\', "/");
let id = render_markdown::markdown_id(name, for_summary);
if for_summary {
format!("[{label}]: #{id}")
} else {
format!("[{label}]: {relative}#{id}")
}
} else {
caps.get(0).unwrap().as_str().to_string()
}
})
.to_string()
}

/// Creates a map of production name -> relative link path.
fn make_relative_link_map(grammar: &Grammar, chapter: &Chapter) -> HashMap<String, String> {
let current_path = chapter.path.as_ref().unwrap().parent().unwrap();
Expand Down
1 change: 1 addition & 0 deletions tools/mdbook-spec/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ impl Preprocessor for Spec {
}
ch.content = admonitions::admonitions(&ch, &mut diag);
ch.content = self.rule_link_references(&ch, &rules);
ch.content = grammar::grammar_link_references(&ch, &grammar);
ch.content = self.auto_link_references(&ch, &rules);
ch.content = self.render_rule_definitions(&ch.content, &tests, &git_ref);
if ch.name == "Test summary" {
Expand Down