[GH-2652] Add RS_AsCOG SQL function for Cloud Optimized GeoTiff output#2669
[GH-2652] Add RS_AsCOG SQL function for Cloud Optimized GeoTiff output#2669jiayuasu merged 6 commits intoapache:masterfrom
Conversation
… in Scala - Remove 6 Java asCOG() wrapper overloads from RasterOutputs.java - Rewrite RS_AsCOG as custom Expression with ImplicitCastInputTypes that builds CogOptions via builder and calls asCloudOptimizedGeoTiff directly - Update Java tests to use asCloudOptimizedGeoTiff + CogOptions.builder() directly
- Restore 6 Java asCOG() overloads in RasterOutputs.java that delegate to asCloudOptimizedGeoTiff via CogOptions builder - Simplify Scala RS_AsCOG to InferredExpression, consistent with RS_AsGeoTiff, RS_AsPNG, and other raster output functions - Update Java tests to call asCOG() directly
There was a problem hiding this comment.
Pull request overview
Adds a new Sedona Spark SQL raster writer function RS_AsCOG that converts a raster to Cloud Optimized GeoTIFF (COG) bytes, wiring the existing Java COG writer into the SQL/UDF surface and documenting/testing it.
Changes:
- Added
RS_AsCOGSpark SQL expression and registered it in the UDF catalog. - Introduced Java
RasterOutputs.asCOGpositional overloads backed byasCloudOptimizedGeoTiffand updatedCogOptionsto accept case-insensitive inputs. - Added Java + Spark tests and updated SQL API docs + raster tutorial documentation for COG output.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/RasterOutputs.scala | Adds RS_AsCOG inferred Spark SQL expression with 1–6 argument overload resolution. |
| spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala | Registers RS_AsCOG in the raster function catalog. |
| common/src/main/java/org/apache/sedona/common/raster/RasterOutputs.java | Adds asCOG overloads that build CogOptions and delegate to the COG writer. |
| common/src/main/java/org/apache/sedona/common/raster/cog/CogOptions.java | Adjusts option validation/normalization for case-insensitive compression/resampling. |
| common/src/test/java/org/apache/sedona/common/raster/RasterOutputTest.java | Adds unit tests for asCOG overloads and basic round-trip validation. |
| spark/common/src/test/scala/org/apache/sedona/sql/rasteralgebraTest.scala | Adds Spark SQL integration tests for RS_AsCOG and round-trip via RS_FromGeoTiff. |
| docs/api/sql/Raster-writer.md | Documents RS_AsCOG signatures, parameters, examples, and output schema. |
| docs/tutorial/raster.md | Adds tutorial section describing COG output usage. |
Comments suppressed due to low confidence (2)
common/src/main/java/org/apache/sedona/common/raster/cog/CogOptions.java:235
- This change makes
resamplingvalidation fail when callers explicitly setresamplingto null/empty, whereas the previous logic treated null/empty as the default (Nearest). That’s a behavioral change for the publicCogOptions.BuilderAPI and can also makeRS_AsCOG(..., resampling => NULL)unexpectedly error in Spark SQL. Consider restoring the previous behavior (treat null/blank as default) while still doing case-insensitive matching for non-empty values.
// Case-insensitive matching for resampling
String normalized = matchIgnoreCase(VALID_RESAMPLING, resampling);
if (normalized == null) {
throw new IllegalArgumentException(
"resampling must be one of " + VALID_RESAMPLING + ", got: '" + resampling + "'");
}
this.resampling = normalized;
common/src/test/java/org/apache/sedona/common/raster/RasterOutputTest.java:268
testAsCOGDefaultsindexescogBytes[1]but only assertscogBytes.length > 0first. If the writer ever returns a 1-byte array (e.g., due to an upstream bug), this will throwArrayIndexOutOfBoundsExceptioninstead of failing with a clear assertion. AssertcogBytes.length >= 2before reading the TIFF byte-order marker bytes.
byte[] cogBytes = RasterOutputs.asCOG(raster);
assertNotNull(cogBytes);
assertTrue(cogBytes.length > 0);
// Verify it is a valid TIFF (starts with II or MM)
assertTrue(
(cogBytes[0] == 'I' && cogBytes[1] == 'I') || (cogBytes[0] == 'M' && cogBytes[1] == 'M'));
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Preserve original compression/resampling input for error reporting so invalid values are shown instead of 'null' - Treat null/blank resampling as default (Nearest) instead of throwing - Assert cogBytes.length >= 2 before checking TIFF byte-order marker
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes Add RS_AsCOG (Cloud Optimized GeoTiff) writer with necessary configs #2652What changes were proposed in this PR?
This PR adds the
RS_AsCOGSQL function that converts a raster to a Cloud Optimized GeoTIFF (COG) byte array. The underlying pure Java COG writer was already merged via #2663 (sub-issue #2662). This PR wires it up as a Spark SQL function with positional overloads:Parameters:
compression: Deflate (default), LZW, JPEG, PackBitstileSize: Tile width/height in pixels, must be a power of 2 (default 256)quality: Compression quality from 0.0 (max compression) to 1.0 (default 0.2)resampling: Overview resampling algorithm - Nearest (default), Bilinear, BicubicoverviewCount: Number of overview levels, -1 for auto (default), 0 for noneFiles changed:
common/.../raster/RasterOutputs.java- 6 positional asCOG Java overloadsspark/.../expressions/raster/RasterOutputs.scala- RS_AsCOG Spark SQL expressionspark/.../UDF/Catalog.scala- Function registrationcommon/.../raster/RasterOutputTest.java- 8 Java unit tests including round-tripspark/.../sql/rasteralgebraTest.scala- 5 Spark integration tests including round-tripdocs/api/sql/Raster-writer.md- RS_AsCOG API reference documentationdocs/tutorial/raster.md- Added COG section to raster tutorialHow was this patch tested?
Did this PR include necessary documentation updates?