Skip to content

Add lookahead operators (lookAhead / notFollowedBy) #116

@ogregoire

Description

@ogregoire

Proposal

Add two zero-width operators that match against the upcoming input without consuming it:

public static Parser<Void>.OrEmpty lookAhead(Parser<?> inner);
public static Parser<Void>.OrEmpty notFollowedBy(Parser<?> inner);

Both return OrEmpty so they inherit dot-parse's existing zero-width safety: composing them in repetitions (atLeastOnce, zeroOrMore) or self-references (Parser.Rule) is a compile error, exactly as it is for optional() / orElse().

Why

Some grammar rules genuinely depend on what follows the current position. Example: a verb's meaning differs based on whether a particular continuation comes after it. The natural expression is:

phrase("verb").then(BODY).followedBy(lookAhead(phrase(". If")))
        .map(MeaningA::new)

Today, with no lookahead, authors must approximate the rule by filtering on the parsed value:

phrase("verb").then(BODY.suchThat(MEANING_A_FILTER, "meaning-A shape"))
        .map(MeaningA::new)

suchThat only sees the parser's output, not the input that follows, so the filter encodes a heuristic rather than the actual rule. The two correlate but aren't equivalent — silent correctness gaps appear when ambiguous shapes occur in continuations they shouldn't, or when unambiguous shapes occur without the expected continuation.

The same gap shows up when disambiguating overlapping vocabularies: today, dispatch order plus result-shape filters substitute for a "this arm requires X to follow" assertion, which scales poorly and forces every contributor to learn an implicit ordering invariant.

Compatibility with the no-zero-width philosophy

dot-parse's safety contract is "no zero-width parsers in unsafe positions," enforced by the OrEmpty type system. Lookahead is inherently zero-width, so naïve composition would introduce the same pathology already guarded against:

lookAhead(p).atLeastOnce()                       // would loop
expr.definedAs(lookAhead(expr).then(body))       // would recurse without progress

The fix is the same fix dot-parse already uses: lookAhead and notFollowedBy return Parser.OrEmpty<Void>, inheriting every existing restriction. The compiler rejects unsafe composition exactly as it does for optional().atLeastOnce() today. No new class of pathology is introduced — a new operator slots into the existing safety machinery.

Why not suchThat or flatMap workarounds

  • suchThat(predicate, name): operates on the parsed value, not on what follows. Cannot express continuation-dependent rules without re-encoding them as heuristics.
  • Manual position-tracking via flatMap: requires first-class position handles and rewind primitives that dot-parse doesn't expose, and would smuggle backtracking into a library that deliberately avoids it.
  • Phrase-template inflections (phrase("verb(s)")): work for one-token suffixes only.

API references

Lookahead is a baseline feature in essentially every parser-combinator library: PEG (&p, !p), Megaparsec / parsec (lookAhead, notFollowedBy), scala-parser-combinators (&, not), nom (peek, not).


Note: this was flagged by my agent as the 2nd most important friction it has when writing parsers. The first being sequence(OrEmpty,Parser).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions