Skip to content

[SPARK-56459][SQL] Fix MatchError in FileDataSourceV2.attachFilePath for fatal errors#55324

Open
yaooqinn wants to merge 2 commits intoapache:masterfrom
yaooqinn:SPARK-56459
Open

[SPARK-56459][SQL] Fix MatchError in FileDataSourceV2.attachFilePath for fatal errors#55324
yaooqinn wants to merge 2 commits intoapache:masterfrom
yaooqinn:SPARK-56459

Conversation

@yaooqinn
Copy link
Copy Markdown
Member

@yaooqinn yaooqinn commented Apr 13, 2026

What changes were proposed in this pull request?

Add a catch-all case in FileDataSourceV2.attachFilePath to rethrow fatal errors (like OutOfMemoryError) directly instead of letting them cause a MatchError.

Why are the changes needed?

attachFilePath uses NonFatal pattern matching which does not match fatal errors like OutOfMemoryError. Callers catch Throwable and pass it to this method:

// FileScanRDD.scala:141
case e: Throwable => throw FileDataSourceV2.attachFilePath(currentFile.urlEncodedPath, e)

When OOM occurs during file reading (e.g., large Parquet row groups), the incomplete match causes a scala.MatchError that masks the original error:

scala.MatchError: java.lang.OutOfMemoryError: Java heap space (of class java.lang.OutOfMemoryError)
    at o.a.s.sql.execution.datasources.v2.FileDataSourceV2$.attachFilePath(FileDataSourceV2.scala:127)
    at o.a.s.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:141)

Does this PR introduce any user-facing change?

Yes. Previously, users would see a confusing MatchError when OOM occurred during file reading. Now they see the original OutOfMemoryError.

How was this patch tested?

Observed the MatchError during TPC-DS SF100 benchmark with large Parquet row groups on executors with constrained heap memory.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored using GitHub Copilot CLI (Claude Opus 4.6).

throw QueryExecutionErrors.fileNotExistError(filePath, e)
case NonFatal(e) =>
throw QueryExecutionErrors.cannotReadFilesError(e, filePath)
case other => throw other
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a unit test case for this, @yaooqinn ?

How was this patch tested?

Observed the MatchError during TPC-DS SF100 benchmark with large Parquet row groups on executors with constrained heap memory.

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the code itself.

yaooqinn and others added 2 commits April 14, 2026 15:17
…for fatal errors

attachFilePath uses NonFatal pattern matching which does not match
fatal errors like OutOfMemoryError. When OOM occurs during file
reading, the incomplete match causes a MatchError that masks the
original error.

Add a catch-all case to rethrow fatal errors directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lePath

Add comprehensive unit tests for all branches of FileDataSourceV2.attachFilePath:
- SparkUpgradeException is rethrown directly
- FAILED_READ_FILE.CANNOT_READ_FILE_FOOTER is rethrown directly
- SchemaColumnConvertNotSupportedException is wrapped
- FileNotFoundException is wrapped
- NonFatal exceptions are wrapped
- Fatal errors (OutOfMemoryError, StackOverflowError) are rethrown directly

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
assert(thrown.getCause eq cause)
}

test("SPARK-56459: attachFilePath rethrows fatal errors directly") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this.

assert(thrown eq oom)
}

test("SPARK-56459: attachFilePath rethrows StackOverflowError directly") {
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants