Skip to content

[FLINK-39422] [REST][docs] Fix OpenAPI spec for SlotSharingGroupId, MetricCollectionResponseBody, and SerializedThrowable custom serializers#27903

Open
nikhilsu wants to merge 8 commits intoapache:masterfrom
nikhilsu:fix-slotsharinggroupid-openapi-schema
Open

[FLINK-39422] [REST][docs] Fix OpenAPI spec for SlotSharingGroupId, MetricCollectionResponseBody, and SerializedThrowable custom serializers#27903
nikhilsu wants to merge 8 commits intoapache:masterfrom
nikhilsu:fix-slotsharinggroupid-openapi-schema

Conversation

@nikhilsu
Copy link
Copy Markdown

@nikhilsu nikhilsu commented Apr 8, 2026

What is the purpose of the change

This PR fixes three Flink request/response DTOs that use custom Jackson serializers, while their OpenAPI specs are generated from the DTO schema. This leads to a mismatch between the actual wire format and what the generated spec describes, breaking downstream clients:

  1. SlotSharingGroupId: Serialized as a hex string by SlotSharingGroupIDSerializer (added in FLINK-20090), but the spec incorrectly defines it as an object with bytes, lowerPart, upperPart fields because overrideIdSchemas() does not include SlotSharingGroupId.

  2. MetricCollectionResponseBody: Serialized as a raw JSON array [{"id": "metricName", "value": "1"}] by a custom Serializer (annotated with @JsonSerialize), but the spec incorrectly defines it as an object with a metrics property.

  3. SerializedThrowable: Serialized with three fields (class, stack-trace, serialized-throwable) by SerializedThrowableSerializer, but the spec override only included serialized-throwable, omitting the class and stack-trace string fields.

None of these changes break backward compatibility - the previous schemas never matched the actual wire format, so any client generated from the spec would have already failed to decode these fields correctly.

Affected endpoints:

  • GET /jobs/:jobid (SlotSharingGroupId in JobDetailsInfo.JobVertexDetailsInfo)
  • GET /jobmanager/metrics
  • GET /jobs/:jobid/metrics
  • GET /jobs/:jobid/vertices/:vertexid/metrics
  • GET /jobs/:jobid/vertices/:vertexid/subtasks/metrics
  • GET /jobs/:jobid/vertices/:vertexid/watermarks
  • GET /taskmanagers/:taskmanagerid/metrics
  • GET /taskmanagers/metrics
  • Any endpoint returning SerializedThrowable (savepoint/checkpoint failure responses)

Brief change log

  • Added SlotSharingGroupId to OpenApiSpecGenerator.overrideIdSchemas() as type: string, pattern: [0-9a-f]{32}
  • Refactored MetricCollectionResponseBody to extend ArrayList<Metric>, removing the custom serializer/deserializer. The wire format is identical (JSON array), but the class structure now matches, so the spec generator produces the correct schema without a manual override.
  • Added class and stack-trace fields to the SerializedThrowable schema override in OpenApiSpecGenerator, using public constants from SerializedThrowableSerializer.

Verifying this change

This change is already covered by existing tests, such as OpenApiSpecGeneratorTest and MetricCollectionResponseBodyTest.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: yes (MetricCollectionResponseBody custom serializer/deserializer removed, replaced by extending ArrayList<Metric> which Jackson serializes identically as a JSON array)
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@nikhilsu nikhilsu marked this pull request as ready for review April 8, 2026 07:59
@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Apr 8, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@nikhilsu
Copy link
Copy Markdown
Author

nikhilsu commented Apr 8, 2026

cc: @zentol @gaborgsomogyi

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Apr 8, 2026
@nikhilsu nikhilsu changed the title [hotfix][docs] Add SlotSharingGroupId to OpenAPI spec generator ID schema overrides [hotfix][docs] Fix OpenAPI spec for SlotSharingGroupId and MetricCollectionResponseBody custom serializers Apr 8, 2026
@nikhilsu nikhilsu force-pushed the fix-slotsharinggroupid-openapi-schema branch from a89e2bd to 6b7ff1e Compare April 8, 2026 20:58
@nikhilsu nikhilsu changed the title [hotfix][docs] Fix OpenAPI spec for SlotSharingGroupId and MetricCollectionResponseBody custom serializers [hotfix][docs] Fix OpenAPI spec for SlotSharingGroupId, MetricCollectionResponseBody, and SerializedThrowable custom serializers Apr 8, 2026
@nikhilsu nikhilsu changed the title [hotfix][docs] Fix OpenAPI spec for SlotSharingGroupId, MetricCollectionResponseBody, and SerializedThrowable custom serializers [FLINK-39422] [REST][docs] Fix OpenAPI spec for SlotSharingGroupId, MetricCollectionResponseBody, and SerializedThrowable custom serializers Apr 10, 2026
@nikhilsu
Copy link
Copy Markdown
Author

@RocMarshal do you know who can review this PR?

@RocMarshal RocMarshal self-assigned this Apr 11, 2026
Copy link
Copy Markdown
Contributor

@RocMarshal RocMarshal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @nikhilsu Thanks a lot for the contribution.
Left a few of comments.
Pls take a look~

* serializer that writes the metrics collection as a raw JSON array, not as an object with a
* "metrics" field.
*/
private static void overrideMetricCollectionSchema(final OpenAPI openApi) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, The method is not need.

Because the MetricCollectionResponseBody is not a collection like array or list.

Image

If the definition of MetricCollectionResponseBody like follows, maybe the change is need.

Image

Pls let me know your opinion.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @RocMarshal thanks for the review!

The reason you need this is because MetricCollectionResponseBody uses a custom Serializer which omits the metric key and serializes a response object as a JSON array.

Here's how a sample response looks like if you call the GET /v1/jobs/${jobid}/metrics endpoint. Notice, that there is not metric key here, MetricCollectionResponseBody is serialized to a JSON array:

$ curl 'http://localhost:8081/v1/jobs/bc6df938e80006c05639a139e6cf716f/metrics?get=numRecordsInPerSecond,numRecordsOutPerSecond,numRestarts' \
  -H 'Accept: application/json, text/plain, */*' \
  -H 'Accept-Language: en-US,en;q=0.9' \
  --insecure | jq .

[
  {
    "id": "numRestarts",
    "value": "0"
  }
]

However, the problem is that the OpenAPI Spec Generator uses the class schema (through reflections) and generates API specs where the response includes the metric key: https://github.com/apache/flink/blob/release-1.20/docs/static/generated/rest_v1_dispatcher.yml#L2721-L2727

Screenshot 2026-04-10 at 11 23 29 PM

This causes a mismatch in OpenAPI spec generated Flink clients. The client expects a metric key in the response but it actually does not exist.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for ignoring the details and Thank you @nikhilsu for your clarification.
Got it~

* serializer that writes the metrics collection as a raw JSON array, not as an object with a
* "metrics" field.
*/
private static void overrideMetricCollectionSchema(final OpenAPI openApi) {
Copy link
Copy Markdown
Contributor

@RocMarshal RocMarshal Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @nikhilsu
In my limited reading,Perhaps implementing the MetricCollectionResponseBody class in the style of
org.apache.flink.runtime.rest.messages.ConfigurationInfo would be a good choice.
Main considerations are as follows:

  • We wouldn’t need to modify the OpenApiSpecGenerator class when introducing similar new classes in the future
  • There is already a precedent with the implementation of ConfigurationInfo, and we wouldn’t need to explicitly introduce serialization and deserialization processors
  • In the REST API’s HTML and YML documentation, the definition of the return type would be relatively more detailed. For example, an object type would be refined to something like Array<Metric>

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @RocMarshal, this is a good suggestion. It would be much cleaner to extend MetricCollectionResponseBody from an ArrayList<Metric>.

However, after this change, endpoints no longer reference MetricCollectionResponseBody - they inline the array. Generated clients would need regeneration. The wire format is the same (JSON array), but the type in generated code changes from MetricCollectionResponseBody to List[Metric].

Here is the diff of the spec.

Image

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would break downstream client code. But it might be ok as this endpoint would have never worked in the first place.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the change. Let me know what you think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @nikhilsu In my limited reading,Perhaps implementing the MetricCollectionResponseBody class in the style of org.apache.flink.runtime.rest.messages.ConfigurationInfo would be a good choice. Main considerations are as follows:

  • We wouldn’t need to modify the OpenApiSpecGenerator class when introducing similar new classes in the future
  • There is already a precedent with the implementation of ConfigurationInfo, and we wouldn’t need to explicitly introduce serialization and deserialization processors
  • In the REST API’s HTML and YML documentation, the definition of the return type would be relatively more detailed. For example, an object type would be refined to something like Array<Metric>

Hi, @1996fanrui Could you help take a look for this strategy ?

I guess the generation of the REST API documentation might be related to the release quality of 2.3, although the impact is probably relatively small.

Thank you very much.

@nikhilsu nikhilsu force-pushed the fix-slotsharinggroupid-openapi-schema branch from e684820 to 5beedd0 Compare April 12, 2026 06:05
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: changes to *.html and *.yml were made as part of running ./mvnw package -Dgenerate-rest-docs -pl flink-docs -am -nsu -DskipTests 2>&1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants