Conversation
|
There are some metrics that ice is not currently tracking CommitMetrics |
shyiko
left a comment
There was a problem hiding this comment.
Looks good overall. Thank you!
ice/src/main/resources/logback.xml
Outdated
| <!-- hide "Unable to load metrics class: 'org.apache.iceberg.hadoop.HadoopMetricsContext', falling back to null metrics" --> | ||
| <logger name="org.apache.iceberg.aws.s3.S3FileIO" level="ERROR"/> | ||
| <!-- hide "Unclosed input stream" warnings from Iceberg's S3InputStream finalizer --> | ||
| <logger name="org.apache.iceberg.aws.s3.S3InputStream" level="ERROR"/> |
There was a problem hiding this comment.
isn't this an indicator that we're not doing a good job at closing input streams and need to fix that?
There was a problem hiding this comment.
This was coming from iceberg API and it was flooding the logs in a loop, I will get the exception
There was a problem hiding this comment.
removed, not able to reproduce it, will add in a separate PR.
|
|
||
| // ========================================================================== | ||
| // Public methods | ||
| // ========================================================================== |
There was a problem hiding this comment.
nit: comments like these should be avoided
| private final Counter messageParseErrorsTotal; | ||
|
|
||
| /** Returns the singleton instance of the metrics reporter. */ | ||
| public static InsertWatchMetrics getInstance() { |
There was a problem hiding this comment.
nit: why not use IoDH for instance lazy loading (and avoid the lock?
| "Total number of retry attempts due to failures"; | ||
|
|
||
| // SQS errors | ||
| private static final String SQS_RECEIVE_ERRORS_TOTAL_NAME = "ice_watch_sqs_receive_errors_total"; |
There was a problem hiding this comment.
should we perhaps drop sqs from metric names here and below for consistency with metrics above + in case we decide to add support for other message queue?
examples/scratch/README.md
Outdated
|
|
||
|
|
||
| # delete partition | ||
| ice delete nyc.taxis_p_by_day --partition '[{"name": "tpep_pickup_datetime", "values": ["2024-12-31T23:51:20"]}]' --dry-run=false |
| // Record deletion metrics | ||
| metrics.recordOrphanFilesDeleted(tableName, deletedCount.get()); | ||
| for (int i = 0; i < failedCount.get(); i++) { | ||
| metrics.recordOrphanDeleteFailure(tableName); |
There was a problem hiding this comment.
any reason it's not metrics.recordOrphanDeleteFailure(tableName, failedCount.get())?
| } | ||
|
|
||
| // Initialize Iceberg metrics reporter for Prometheus (singleton) | ||
| PrometheusMetricsReporter metricsReporter = PrometheusMetricsReporter.getInstance(); |
There was a problem hiding this comment.
some Metrics objects are passed around (like here), others are not (like MaintenanceMetrics.getInstance()). Let's settle on one approach
| builder.endpointOverride(endpoint); | ||
| } | ||
| } catch (Exception e) { | ||
| logger.warn("Failed to parse SQS queue URL for endpoint extraction: {}", e.getMessage()); |
There was a problem hiding this comment.
throw instead of ignore-logging, perhaps?
| if (host != null && !host.endsWith(".amazonaws.com")) { | ||
| URI endpoint = new URI(uri.getScheme(), null, host, uri.getPort(), null, null, null); | ||
| logger.info("Using custom SQS endpoint: {}", endpoint); | ||
| builder.endpointOverride(endpoint); |
There was a problem hiding this comment.
I'd suggest introducing cli option (e.g. --watch-endpoint) that allow to specify a different value from --watch as proposed approach falls apart when using localstack, etc. This way things also can continue to use AWS_ENDPOINT_URL_SQS env var when needed.
| */ | ||
| package com.altinity.ice.rest.catalog.internal.metrics; | ||
|
|
||
| import static com.altinity.ice.rest.catalog.internal.metrics.IcebergMetricNames.*; |
There was a problem hiding this comment.
examples/s3watch/test/README.md
Outdated
| `docker compose up` | ||
|
|
||
| Start ice in insert mode. | ||
| `insert flowers.iris -p --no-copy --skip-duplicates s3://bucket1/flowers/iris/external-data/ --watch="http://localhost:9324/000000000000/s3-events"` |
There was a problem hiding this comment.
this needs to be updated, right?
|
d8a655c should not be included in this PR as it's unrelated to the rest of the changes |
removed. |
|
Added iceberg_catalog_tables_total (gauge) snapshots_count |
xieandrew
left a comment
There was a problem hiding this comment.
Issues I found during testing:
- Tables created by insert watch don't increment the iceberg_catalog_tables (count of all tables) metric
- The "Scan Metrics" row in Grafana is empty. The corresponding panels are under "Overview" instead.





No description provided.