Proposal
Add two zero-width operators that match against the upcoming input without consuming it:
public static Parser<Void>.OrEmpty lookAhead(Parser<?> inner);
public static Parser<Void>.OrEmpty notFollowedBy(Parser<?> inner);
Both return OrEmpty so they inherit dot-parse's existing zero-width safety: composing them in repetitions (atLeastOnce, zeroOrMore) or self-references (Parser.Rule) is a compile error, exactly as it is for optional() / orElse().
Why
Some grammar rules genuinely depend on what follows the current position. Example: a verb's meaning differs based on whether a particular continuation comes after it. The natural expression is:
phrase("verb").then(BODY).followedBy(lookAhead(phrase(". If")))
.map(MeaningA::new)
Today, with no lookahead, authors must approximate the rule by filtering on the parsed value:
phrase("verb").then(BODY.suchThat(MEANING_A_FILTER, "meaning-A shape"))
.map(MeaningA::new)
suchThat only sees the parser's output, not the input that follows, so the filter encodes a heuristic rather than the actual rule. The two correlate but aren't equivalent — silent correctness gaps appear when ambiguous shapes occur in continuations they shouldn't, or when unambiguous shapes occur without the expected continuation.
The same gap shows up when disambiguating overlapping vocabularies: today, dispatch order plus result-shape filters substitute for a "this arm requires X to follow" assertion, which scales poorly and forces every contributor to learn an implicit ordering invariant.
Compatibility with the no-zero-width philosophy
dot-parse's safety contract is "no zero-width parsers in unsafe positions," enforced by the OrEmpty type system. Lookahead is inherently zero-width, so naïve composition would introduce the same pathology already guarded against:
lookAhead(p).atLeastOnce() // would loop
expr.definedAs(lookAhead(expr).then(body)) // would recurse without progress
The fix is the same fix dot-parse already uses: lookAhead and notFollowedBy return Parser.OrEmpty<Void>, inheriting every existing restriction. The compiler rejects unsafe composition exactly as it does for optional().atLeastOnce() today. No new class of pathology is introduced — a new operator slots into the existing safety machinery.
Why not suchThat or flatMap workarounds
suchThat(predicate, name): operates on the parsed value, not on what follows. Cannot express continuation-dependent rules without re-encoding them as heuristics.
- Manual position-tracking via
flatMap: requires first-class position handles and rewind primitives that dot-parse doesn't expose, and would smuggle backtracking into a library that deliberately avoids it.
- Phrase-template inflections (
phrase("verb(s)")): work for one-token suffixes only.
API references
Lookahead is a baseline feature in essentially every parser-combinator library: PEG (&p, !p), Megaparsec / parsec (lookAhead, notFollowedBy), scala-parser-combinators (&, not), nom (peek, not).
Note: this was flagged by my agent as the 2nd most important friction it has when writing parsers. The first being sequence(OrEmpty,Parser).
Proposal
Add two zero-width operators that match against the upcoming input without consuming it:
Both return
OrEmptyso they inherit dot-parse's existing zero-width safety: composing them in repetitions (atLeastOnce,zeroOrMore) or self-references (Parser.Rule) is a compile error, exactly as it is foroptional()/orElse().Why
Some grammar rules genuinely depend on what follows the current position. Example: a verb's meaning differs based on whether a particular continuation comes after it. The natural expression is:
Today, with no lookahead, authors must approximate the rule by filtering on the parsed value:
suchThatonly sees the parser's output, not the input that follows, so the filter encodes a heuristic rather than the actual rule. The two correlate but aren't equivalent — silent correctness gaps appear when ambiguous shapes occur in continuations they shouldn't, or when unambiguous shapes occur without the expected continuation.The same gap shows up when disambiguating overlapping vocabularies: today, dispatch order plus result-shape filters substitute for a "this arm requires X to follow" assertion, which scales poorly and forces every contributor to learn an implicit ordering invariant.
Compatibility with the no-zero-width philosophy
dot-parse's safety contract is "no zero-width parsers in unsafe positions," enforced by the
OrEmptytype system. Lookahead is inherently zero-width, so naïve composition would introduce the same pathology already guarded against:The fix is the same fix dot-parse already uses:
lookAheadandnotFollowedByreturnParser.OrEmpty<Void>, inheriting every existing restriction. The compiler rejects unsafe composition exactly as it does foroptional().atLeastOnce()today. No new class of pathology is introduced — a new operator slots into the existing safety machinery.Why not
suchThatorflatMapworkaroundssuchThat(predicate, name): operates on the parsed value, not on what follows. Cannot express continuation-dependent rules without re-encoding them as heuristics.flatMap: requires first-class position handles and rewind primitives that dot-parse doesn't expose, and would smuggle backtracking into a library that deliberately avoids it.phrase("verb(s)")): work for one-token suffixes only.API references
Lookahead is a baseline feature in essentially every parser-combinator library: PEG (
&p,!p), Megaparsec / parsec (lookAhead,notFollowedBy), scala-parser-combinators (&,not), nom (peek,not).Note: this was flagged by my agent as the 2nd most important friction it has when writing parsers. The first being
sequence(OrEmpty,Parser).