Skip to content

[GH-2652] Add RS_AsCOG SQL function for Cloud Optimized GeoTiff output#2669

Merged
jiayuasu merged 6 commits intoapache:masterfrom
jiayuasu:feature/rs-as-cog-2652
Feb 22, 2026
Merged

[GH-2652] Add RS_AsCOG SQL function for Cloud Optimized GeoTiff output#2669
jiayuasu merged 6 commits intoapache:masterfrom
jiayuasu:feature/rs-as-cog-2652

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

@jiayuasu jiayuasu commented Feb 21, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

This PR adds the RS_AsCOG SQL function that converts a raster to a Cloud Optimized GeoTIFF (COG) byte array. The underlying pure Java COG writer was already merged via #2663 (sub-issue #2662). This PR wires it up as a Spark SQL function with positional overloads:

RS_AsCOG(raster)
RS_AsCOG(raster, compression)
RS_AsCOG(raster, compression, tileSize)
RS_AsCOG(raster, compression, tileSize, quality)
RS_AsCOG(raster, compression, tileSize, quality, resampling)
RS_AsCOG(raster, compression, tileSize, quality, resampling, overviewCount)

Parameters:

  • compression: Deflate (default), LZW, JPEG, PackBits
  • tileSize: Tile width/height in pixels, must be a power of 2 (default 256)
  • quality: Compression quality from 0.0 (max compression) to 1.0 (default 0.2)
  • resampling: Overview resampling algorithm - Nearest (default), Bilinear, Bicubic
  • overviewCount: Number of overview levels, -1 for auto (default), 0 for none

Files changed:

  • common/.../raster/RasterOutputs.java - 6 positional asCOG Java overloads
  • spark/.../expressions/raster/RasterOutputs.scala - RS_AsCOG Spark SQL expression
  • spark/.../UDF/Catalog.scala - Function registration
  • common/.../raster/RasterOutputTest.java - 8 Java unit tests including round-trip
  • spark/.../sql/rasteralgebraTest.scala - 5 Spark integration tests including round-trip
  • docs/api/sql/Raster-writer.md - RS_AsCOG API reference documentation
  • docs/tutorial/raster.md - Added COG section to raster tutorial

How was this patch tested?

  • 8 Java unit tests in RasterOutputTest covering all overloads and a GeoTiff round-trip (read GeoTiff, write COG, read back, verify envelope, dimensions, and band count)
  • 5 Spark SQL integration tests in rasteralgebraTest covering defaults, compression variants, tile size, all arguments, and a round-trip through RS_FromGeoTiff(RS_AsCOG(raster))
  • All tests pass locally

Did this PR include necessary documentation updates?

  • Yes, added RS_AsCOG section to the Raster writer API docs (docs/api/sql/Raster-writer.md) with full parameter descriptions, format signatures, SQL examples, and output schema
  • Added a Cloud Optimized GeoTiff section to the raster tutorial (docs/tutorial/raster.md)

@jiayuasu jiayuasu added this to the sedona-1.9.0 milestone Feb 21, 2026
@jiayuasu jiayuasu marked this pull request as draft February 21, 2026 20:44
… in Scala

- Remove 6 Java asCOG() wrapper overloads from RasterOutputs.java
- Rewrite RS_AsCOG as custom Expression with ImplicitCastInputTypes
  that builds CogOptions via builder and calls asCloudOptimizedGeoTiff directly
- Update Java tests to use asCloudOptimizedGeoTiff + CogOptions.builder() directly
- Restore 6 Java asCOG() overloads in RasterOutputs.java that delegate
  to asCloudOptimizedGeoTiff via CogOptions builder
- Simplify Scala RS_AsCOG to InferredExpression, consistent with
  RS_AsGeoTiff, RS_AsPNG, and other raster output functions
- Update Java tests to call asCOG() directly
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Sedona Spark SQL raster writer function RS_AsCOG that converts a raster to Cloud Optimized GeoTIFF (COG) bytes, wiring the existing Java COG writer into the SQL/UDF surface and documenting/testing it.

Changes:

  • Added RS_AsCOG Spark SQL expression and registered it in the UDF catalog.
  • Introduced Java RasterOutputs.asCOG positional overloads backed by asCloudOptimizedGeoTiff and updated CogOptions to accept case-insensitive inputs.
  • Added Java + Spark tests and updated SQL API docs + raster tutorial documentation for COG output.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/RasterOutputs.scala Adds RS_AsCOG inferred Spark SQL expression with 1–6 argument overload resolution.
spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala Registers RS_AsCOG in the raster function catalog.
common/src/main/java/org/apache/sedona/common/raster/RasterOutputs.java Adds asCOG overloads that build CogOptions and delegate to the COG writer.
common/src/main/java/org/apache/sedona/common/raster/cog/CogOptions.java Adjusts option validation/normalization for case-insensitive compression/resampling.
common/src/test/java/org/apache/sedona/common/raster/RasterOutputTest.java Adds unit tests for asCOG overloads and basic round-trip validation.
spark/common/src/test/scala/org/apache/sedona/sql/rasteralgebraTest.scala Adds Spark SQL integration tests for RS_AsCOG and round-trip via RS_FromGeoTiff.
docs/api/sql/Raster-writer.md Documents RS_AsCOG signatures, parameters, examples, and output schema.
docs/tutorial/raster.md Adds tutorial section describing COG output usage.
Comments suppressed due to low confidence (2)

common/src/main/java/org/apache/sedona/common/raster/cog/CogOptions.java:235

  • This change makes resampling validation fail when callers explicitly set resampling to null/empty, whereas the previous logic treated null/empty as the default (Nearest). That’s a behavioral change for the public CogOptions.Builder API and can also make RS_AsCOG(..., resampling => NULL) unexpectedly error in Spark SQL. Consider restoring the previous behavior (treat null/blank as default) while still doing case-insensitive matching for non-empty values.
      // Case-insensitive matching for resampling
      String normalized = matchIgnoreCase(VALID_RESAMPLING, resampling);
      if (normalized == null) {
        throw new IllegalArgumentException(
            "resampling must be one of " + VALID_RESAMPLING + ", got: '" + resampling + "'");
      }
      this.resampling = normalized;

common/src/test/java/org/apache/sedona/common/raster/RasterOutputTest.java:268

  • testAsCOGDefaults indexes cogBytes[1] but only asserts cogBytes.length > 0 first. If the writer ever returns a 1-byte array (e.g., due to an upstream bug), this will throw ArrayIndexOutOfBoundsException instead of failing with a clear assertion. Assert cogBytes.length >= 2 before reading the TIFF byte-order marker bytes.
    byte[] cogBytes = RasterOutputs.asCOG(raster);
    assertNotNull(cogBytes);
    assertTrue(cogBytes.length > 0);
    // Verify it is a valid TIFF (starts with II or MM)
    assertTrue(
        (cogBytes[0] == 'I' && cogBytes[1] == 'I') || (cogBytes[0] == 'M' && cogBytes[1] == 'M'));

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Preserve original compression/resampling input for error reporting
  so invalid values are shown instead of 'null'
- Treat null/blank resampling as default (Nearest) instead of throwing
- Assert cogBytes.length >= 2 before checking TIFF byte-order marker
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jiayuasu jiayuasu marked this pull request as ready for review February 22, 2026 08:59
@jiayuasu jiayuasu merged commit 5b9d2bf into apache:master Feb 22, 2026
45 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add RS_AsCOG (Cloud Optimized GeoTiff) writer with necessary configs

2 participants