Skip to content

Fixed-Size Token Buffer Causes Memory Corruption #11

@SAMBA8695

Description

@SAMBA8695

Description

The lexer allocates a fixed-size token array of 256 elements:

Token* tokens = malloc(256 * sizeof(Token));

There is no bounds checking or resizing logic. If the input contains more than 256 tokens, the lexer will write past the allocated buffer, causing heap memory corruption and undefined behavior.

Current Behavior

  • Token buffer has a hard-coded size of 256
  • No checks prevent writing past the allocated memory
  • Large source files can cause crashes or corrupted output
  • No error is reported when token limit is exceeded

Expected Behavior

  • Token buffer should grow dynamically as needed
  • Lexer should safely handle arbitrarily large input
  • Memory writes must remain within allocated bounds

Suggested Direction

  • Replace fixed-size token array with dynamically resizing buffer
  • Use realloc() when token capacity is exceeded
  • Alternatively, pre-count tokens before allocation
  • Optionally report an error if allocation fails

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions