Skip to content

DOC-13717 rest api for app telemetry#4088

Open
ggray-cb wants to merge 8 commits intorelease/8.0from
DOC-13717_REST_API_for_appTelemetry
Open

DOC-13717 rest api for app telemetry#4088
ggray-cb wants to merge 8 commits intorelease/8.0from
DOC-13717_REST_API_for_appTelemetry

Conversation

@ggray-cb
Copy link
Contributor

This PR adds documentation for the Server /settings/appTelemetry REST API endpoint.

Here's a list of the changes in this PR, with links to a preview. You will need the Docs Team credentials on Confluence to access the preview.

Note: I've made some assumptions about the maxScrapeClientsPerNode and scrapeIntervalSeconds settings, so be sure to pay special attention to their descriptions.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Couchbase Server documentation for the /settings/appTelemetry REST API and cross-links it from Prometheus monitoring docs and navigation.

Changes:

  • Adds a new REST API reference page for Application Telemetry (/settings/appTelemetry) with GET/POST examples and parameter descriptions.
  • Updates Prometheus monitoring docs to mention enabling application telemetry to expose SDK/application metrics.
  • Updates shared REST API curl parameter partial and adds the new page to the REST API navigation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
modules/rest-api/partials/user-pw-host-port-params.adoc Updates the shared parameter definitions for host/port used by REST API curl examples.
modules/rest-api/pages/application-telemetry.adoc New REST API reference page for application telemetry status and configuration.
modules/manage/pages/monitor/set-up-prometheus-for-monitoring.adoc Adds a section describing how application telemetry affects Prometheus metrics output.
modules/ROOT/nav.adoc Adds the new REST API page to the documentation navigation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

[source,bash]
----
curl -sS -u $USER:$PASSWORD \
-X GET 'http[s]://{host}:{port}/settings/appTelemetry'
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This curl example uses {host}/{port} placeholders, but most REST API docs use {HOST}/{PORT}. Consider switching to the established placeholder casing (and matching whatever is documented in user-pw-host-port-params.adoc) to keep examples consistent.

Suggested change
-X GET 'http[s]://{host}:{port}/settings/appTelemetry'
-X GET 'http[s]://{HOST}:{PORT}/settings/appTelemetry'

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, we're making {host} and {port} the standard for these placeholdsers going forward.

Copy link
Contributor

@ingenthr ingenthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this @ggray-cb . We're getting a number of questions about this functionality, so it'll be great to see it in the docs.

I had a few small comments and @shivaniguptacb may want to review. Also, it looks like copilot caught a few important things (like the kotlin/python ambiguity).

@shivaniguptasf
Copy link

Nice work!
One small thing - I didn't see a mention that in 8.0 it is disabled by default. Might want to add that both to the Server Monitoring page, and to the REST API reference page that you added.

@ggray-cb
Copy link
Contributor Author

@shivaniguptasf Added default value for the enabled setting, plus added notes stating that it defaults to off but that may change in the future.

@ggray-cb
Copy link
Contributor Author

Adding @RichardSmedley because he expressed interest in reviewing the current draft. Also added @Peter-Searby in hopes he could review the techical details about what happens if a node hits its limit for connected nodes. @ingenthr asked if there was a way for users to tell this has happened.

@ggray-cb ggray-cb requested a review from ingenthr February 18, 2026 20:58
Copy link
Contributor

@ingenthr ingenthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there, thanks! There's one bit of ambiguity I think we should try to address if we can. See comments.

If you enable application telemetry on a mixed-mode cluster with pre-8.0 nodes, it does not advertise to clients that it can collect telemetry.

* Your applications must use an SDK that implements version 3.8 or later of the SDK API.
See the table in xref:java-sdk:project-docs:compatibility.adoc#api-version[API Version] to determine which version of the SDK your application uses implements version 3.8 of the SDK API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do note that this is a link to Java, so the user will have to switch to their platform. I don't know if you want to turn this into something like "see the table for your SDK. For example, in java…"

@RichardSmedley may have a better way to approach this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not easy, but I think that we do need to avoid telling customers about the SDK API version, which will only cause confusion.
Really, the only way to do this is linking every SDK, much like the rendered-on-a-single-line links at https://docs.couchbase.com/server/current/guides/creating-data.html#related-links
(in this case going straight to https://docs.couchbase.com/java-sdk/3.9/howtos/collecting-information-and-logging.html#sdk-telemetry-from-the-server etc., with an "and all later versions" at the end of the list, perhaps.)

Probably not the only solution, so happy to discuss further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tangential point: we want people using the latest version of the SDK,
so could just link to current, and say it's in "recent" versions.

You can enable application telemetry to have Couchbase Server collect metrics from your applications that use the Couchbase SDKs.
When you enable application telemetry, Couchbase Server collects telemetry data from your applications.

Couchbase Server reports the collected data as metrics through the same Prometheus endpoint that it uses to report its own metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning that the metrics are aggregated across clients. Currently it reads in a way that suggests you'd be able to see per-client metrics, which is not the case.


=== Prerequisites

You Couchbase Server cluster and your clients must meet the following requirements to use application telemetry:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You Couchbase Server cluster and your clients must meet the following requirements to use application telemetry:
Your Couchbase Server cluster and your clients must meet the following requirements to use application telemetry:

* A Couchbase Server cluster only supports application telemetry when all of its nodes are running version 8.0 or later.
Earlier versions of Couchbase Server do not support application telemetry.
Your cluster cannot collect application telemetry if it's running in mixed mode where some nodes are running a pre-8.0 version.
If you enable application telemetry on a mixed-mode cluster with pre-8.0 nodes, it does not advertise to clients that it can collect telemetry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this referring to? You can't configure app telemetry on a mixed-mode cluster

[#get-status]
== Get Application Telemetry Status

The following method gets the current state of application telemetry for the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we normally refer to settings as "state"? Feels like it suggests it would be a status, rather than the configuration

Comment on lines +169 to +170
You can set `maxScrapeClientsPerNode` to a lower value to reduce potential overhead on your nodes from collecting telemetry data.
However, lowering it too far could result in clients being unable to find a node for their telemetry connection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower value to reduce potential overhead

lowering it too far could result in clients being unable to find a node

Feels like basically saying "lower value reduces number of clients, but lowering it too far could result in reducing the number of clients". Like, the whole point is to reduce the number of clients, so the two statements don't really feel consistent with each other.

Suggested change
You can set `maxScrapeClientsPerNode` to a lower value to reduce potential overhead on your nodes from collecting telemetry data.
However, lowering it too far could result in clients being unable to find a node for their telemetry connection.
You can set `maxScrapeClientsPerNode` to a lower value to reduce potential overhead on your nodes from collecting telemetry data by preventing too many clients connecting to each node.
However, if the limit is reached on all nodes, this will result in any additional clients being unable to find a node for their telemetry connection.

+
The default value is `60`.
You can increase this value to reduce the overhead of collecting telemetry data on your nodes.
However, setting it too high could result in a loss of telemetry data when clients close their connections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

@Peter-Searby
Copy link
Contributor

Also added @Peter-Searby in hopes he could review the techical details about what happens if a node hits its limit for connected nodes. @ingenthr asked if there was a way for users to tell this has happened.

That's basically up to the SDKs to handle. When they attempt to connect to a node that is full, they'll try other nodes, but I can't remember the exact details. Users would presumably see this from the SDK logging, but perhaps @DemetrisChr can confirm?

The admins for the cluster would also be able to detect this happening by following the cm_app_telemetry_curr_connections metric. Once this reaches the max scrape clients configured for that node, further connections are rejected.

* See the SDK Telemetry from the Server section of the Collecting Information and Logging page in the documentation for the SDK you use.
For example:

** xref:cxx-sdk:howtos:collecting-information-and-logging.adoc#sdk-telemetry-from-the-server[C++ SDK]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 34 comments (and note the use of a single line for 12 SDK links in the linked docs page).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments