Implement nom-based ESI parser with streaming support#43
Open
Implement nom-based ESI parser with streaming support#43
Conversation
…ut esi:vars and recognize basic interpolation.
…little cleanup.
This merge brings in the new nom-based parser implementation up to commit f216023: Key changes: - Added nom dependency for modern parsing capabilities - Introduced new_parse.rs with comprehensive nom-based ESI parser - Added parser_types.rs with new Expr and Operator types for better type safety - Enhanced interpolation handling with process_interpolated_chars function - Improved error handling and debugging output - Fixed HTML tag closure issues in examples - Added support for expression comparisons (matches, matches_i operators) - Better integration between parser and interpreter with consistent types The new parser provides better performance, more robust parsing, and improved maintainability while maintaining full compatibility with existing ESI functionality.
… expressions and nested structures
…pdate related parsing logic
…a nested vector structure for improved handling of events
…nt handling - Introduced a request field in the Fragment struct to retain the original request. - Updated the Processor to utilize the provided fragment response processor. - Added tests to verify behavior with is_escaped_content configuration and response processing.
- Modify the `lower` function to return `Null` for `Value::Null` arguments. - Refactor `process_nom_chunk_to_output` to handle expression evaluation more robustly, including skipping output for `Null` values. - Replace the simple expression evaluator with a more comprehensive parser that supports additional operators and expressions. - Introduce parsing for interpolated strings and standalone ESI expressions. - Add support for logical, comparison, and negation operators in the parser. - Update `parser_types` to include new expression types and operators. - Enhance tests for expression parsing to cover new functionality.
…cess_chunk_to_output
- Introduced `WhenBranch` struct to represent branches in choose blocks. - Updated `Tag` enum to use `WhenBranch` and changed `Chunk` to `Element`. - Modified `Assign` tag to accept `Expr` instead of `String` for value. - Enhanced parsing logic to support interpolated expressions in assignments. - Added tests for long form assignments with interpolation and multiple variables. - Updated existing tests to reflect changes in parser structure and expression handling.
…`*chunk*` to `*element*`
…ess_include function for improved clarity and error management; update parser types to use string slices.
complex_document benchmark: ~15µs → ~12.5µs
- Updated tests in `streaming_behavior.rs` to utilize the `Parser` trait for improved clarity and consistency. - Changed instances of `tag` and `is_not` to use the `.parse()` method for better error handling with incomplete input. - Ensured that all relevant tests correctly assert the expected `Incomplete` results when parsing incomplete data.
minor style improvements
…ries, ensuring reference semantics
…ename and cleanup
- Clarified some documentation mistakes in README - Improved ESI expression evaluation by optimizing null value handling. - Refactored caching logic to utilize `Cow<str>` for better performance and reduced allocations. - Enhanced error handling and logging in cache configuration and request processing. - Updated various functions to use zero-copy techniques for string manipulation, improving efficiency. - Cleaned up code formatting and comments for better readability and maintainability.
…ce in parsing functions
… improved clarity and consistency
…caping, arithmetic safety - Change OP_AND/OP_OR from && || to & | per ESI spec (when bitwise not configured) - Add backslash escape support (\X → literal X) in strings, interpolated content, and attribute values - Check arithmetics on integer overflow, return errors - Short-circuit And/Or evaluation - Propagate errors from esi:assign expression evaluation instead of swallowing them - Pass fragment_response_handler through on_eval with `dsa=esi` mode - Emit HTML comment on failed fragment req only for HTML content (is_escaped_content)
…proved parser functionality
- Renamed `ExecutionError` to `ESIError` for clarity and consistency. - Enhanced error descriptions for better debugging and understanding. - Modified tests to reflect the changes in error handling and ensure correctness.
Introduce a new ParsingMode::Eof that treats incomplete ESI tags as truncation errors (ESIError::UnexpectedEndOfDocument) while still consuming trailing non-ESI text normally. This distinguishes genuine document truncation from the streaming "need more data" case. - Add parse_eof() for final-chunk parsing that errors on truncated tag - Early-return from parse_loop when all input is consumed - Simplify buffer carry-forward logic in the streaming processor
… dynamic expressions
… and string/list repetition
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete rewrite of the ESI parser from XML-based to nom-based parsing with full streaming support, comprehensive expression evaluation, and a rich function library.
New Features
ESI Tags
<esi:eval>— fetches content and always parses it as ESI (blocking operation), withdcasupport for two-phase processing<esi:param>— nested inside include/eval for query parameter injection<esi:foreach>/<esi:break>— iteration over lists and dicts<esi:function>/<esi:return>— user-defined functions with recursion depth control<esi:text>— raw passthrough (content emitted verbatim, no ESI processing)Expression Features
['a', 'b', 'c']) and dictionary literals ({'key1': 'val1', 'key2': 'val2'})['one', 2, ['nested']])[1..10])$(VAR{$(dynamic_key)})has,has_i,matches,matches_i,+,-,*,/,%3 == '3'istrue);+does integer addition when both operands are integers (e.g.3 + 4=7), list concatenation for two lists, string concatenation otherwise (e.g.3 + '4'='34');*does integer multiplication, or string/list repetition with an integer count (e.g.3 * 'ab'='ababab'); expressions evaluate left to right, so2 + 8 + ' days'='10 days'(integer add first, then string coercion);-,/,%require integer operands$intparses strings to integers,$substrcoerces index args)$fn_name(args...)with nested calls supported\',\\,\$,\<)Function Library
$upper,$lstrip,$rstrip,$strip,$substr$html_decode,$url_encode,$url_decode,$base64_encode,$base64_decode,$convert_to_unicode,$convert_from_unicode$dollar,$dquote,$squote$len,$exists,$is_empty,$index,$rindex,$string_split,$join,$list_delitem$int,$str$digest_md5,$digest_md5_hex,$bin_int$time,$http_time,$strftime$rand,$last_rand$add_header,$set_response_code,$set_redirect<esi:function>/<esi:return>Streaming Processing
src,alt,dca,ttl,method,entity, headers, params) are parsed into expression ASTs during parsing, then fully evaluated before each request is dispatched (old code treatedsrc/altas raw strings with$(VAR)interpolation only)bufslots for ordered flushingfastly::http::request::select— replaces sequential.wait()calls; all pending includes share a single pool and responses are harvested as they arrive while preserving document orderRequestKey(method + URL) mapped throughurl_maptoSlotEntrydca="none"(raw insertion) anddca="esi"(parse response as ESI)TryBlockTrackerwith per-attempt slot tracking, failure propagation, and except-block fallback viaassemble_try_blockCacheConfigVariable System
$(VAR|'fallback')— if undefined, the default expression is used$(HTTP_*)maps any prefix to the corresponding header;$(HTTP_COOKIE{'name'})for cookies,$(QUERY_STRING{'param'})for query params$(REQUEST_METHOD),$(REQUEST_PATH),$(REMOTE_ADDR)$(ARGS)/$(ARGS{n})for positional access$(MATCHES{n})populated bymatches/matches_ioperatorsConfiguration
chunk_size— streaming read buffer sizefunction_recursion_depth— max user-defined function call depthCacheConfig— rendered output caching and cache-control header generationValidation
Improved Features
Parser Architecture (rewritten)
Incompletesignals for partial inputBytesslices from the original input buffer wherever possible (slice_as_bytes)tag_handler) that parses the tag name once and routes to specific handlersesi_opening_tag) ensures full opening tags are buffered before dispatching to complete-mode attribute parsersImproved ESI Tags
<esi:include>— now with full attribute set:src,alt,dca,ttl,maxwait,no-store,method,entity,onerror,appendheader,setheader,removeheader(previously onlysrc,alt,onerror)<esi:try>/<esi:attempt>/<esi:except>— now supports parallel execution with multiple<esi:attempt>blocks<esi:vars>— now supports short form (name=attribute) and long form (with body)<esi:assign>— now with short and long form<esi:choose>/<esi:when>/<esi:otherwise>— now with pre-parsed expression evaluationImproved Functions
$lower— now handles edge cases properly$html_encode— encodes 4 special characters per ESI spec (>,<,&,")$replace— now supports optional count parameterTesting
parser.rs: tag parsing, expression parsing, operator precedence, backslash escapes, variable name validation, subkey assignmentesi-tests.rs: end-to-end processing with fragment dispatching, configuration options, variable evaluationstreaming_behavior.rs: incomplete tag detection for all ESI tag typestests/parser.rs: try/attempt/except, include attributes, header manipulationtests/eval_tests.rs: DCA modes, eval vs include behaviorfunctions.rs: all built-in functionsexpression.rs: function calls, HTML encoding, evaluationBenchmarks
parser_benchmarks— direct comparison with old XML parser using identical test cases (esi_documentsgroup)nom_parser_features— HTML comments, script tags, assigns, advanced expressions, mixed contentparser_scaling— 100 to 10,000 element documentsexpression_parsing— variable access, comparisons, logical operators, function callsinterpolated_strings— text with embedded expressionsExamples (updated)
All existing examples were updated to work with the new API — no new examples were added.
esi_example_minimal— updated fragment dispatcher signature (|req, _maxwait|)esi_example_advanced_error_handling— migrated fromReader/Writer/parse_tagstoprocess_streamwithBufReader; direct output stream writingesi_try_example— updated fragment dispatcher signatureesi_vars_example— updated fragment dispatcher signatureesi_example_variants— migrated fromparse_tags+process_parsed_document+ URL map toprocess_streamwith inline URL rewriting in the dispatcher