Conversation
There was a problem hiding this comment.
Pull request overview
Adds Couchbase Server documentation for the /settings/appTelemetry REST API and cross-links it from Prometheus monitoring docs and navigation.
Changes:
- Adds a new REST API reference page for Application Telemetry (
/settings/appTelemetry) with GET/POST examples and parameter descriptions. - Updates Prometheus monitoring docs to mention enabling application telemetry to expose SDK/application metrics.
- Updates shared REST API curl parameter partial and adds the new page to the REST API navigation.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| modules/rest-api/partials/user-pw-host-port-params.adoc | Updates the shared parameter definitions for host/port used by REST API curl examples. |
| modules/rest-api/pages/application-telemetry.adoc | New REST API reference page for application telemetry status and configuration. |
| modules/manage/pages/monitor/set-up-prometheus-for-monitoring.adoc | Adds a section describing how application telemetry affects Prometheus metrics output. |
| modules/ROOT/nav.adoc | Adds the new REST API page to the documentation navigation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| [source,bash] | ||
| ---- | ||
| curl -sS -u $USER:$PASSWORD \ | ||
| -X GET 'http[s]://{host}:{port}/settings/appTelemetry' |
There was a problem hiding this comment.
This curl example uses {host}/{port} placeholders, but most REST API docs use {HOST}/{PORT}. Consider switching to the established placeholder casing (and matching whatever is documented in user-pw-host-port-params.adoc) to keep examples consistent.
| -X GET 'http[s]://{host}:{port}/settings/appTelemetry' | |
| -X GET 'http[s]://{HOST}:{PORT}/settings/appTelemetry' |
There was a problem hiding this comment.
For what it's worth, we're making {host} and {port} the standard for these placeholdsers going forward.
modules/manage/pages/monitor/set-up-prometheus-for-monitoring.adoc
Outdated
Show resolved
Hide resolved
ingenthr
left a comment
There was a problem hiding this comment.
Nice work on this @ggray-cb . We're getting a number of questions about this functionality, so it'll be great to see it in the docs.
I had a few small comments and @shivaniguptacb may want to review. Also, it looks like copilot caught a few important things (like the kotlin/python ambiguity).
|
Nice work! |
…ements. Also adding the preview config so the preview build will pull in the correct branches.
|
@shivaniguptasf Added default value for the |
|
Adding @RichardSmedley because he expressed interest in reviewing the current draft. Also added @Peter-Searby in hopes he could review the techical details about what happens if a node hits its limit for connected nodes. @ingenthr asked if there was a way for users to tell this has happened. |
ingenthr
left a comment
There was a problem hiding this comment.
Nearly there, thanks! There's one bit of ambiguity I think we should try to address if we can. See comments.
| If you enable application telemetry on a mixed-mode cluster with pre-8.0 nodes, it does not advertise to clients that it can collect telemetry. | ||
|
|
||
| * Your applications must use an SDK that implements version 3.8 or later of the SDK API. | ||
| See the table in xref:java-sdk:project-docs:compatibility.adoc#api-version[API Version] to determine which version of the SDK your application uses implements version 3.8 of the SDK API. |
There was a problem hiding this comment.
Do note that this is a link to Java, so the user will have to switch to their platform. I don't know if you want to turn this into something like "see the table for your SDK. For example, in java…"
@RichardSmedley may have a better way to approach this.
There was a problem hiding this comment.
It's not easy, but I think that we do need to avoid telling customers about the SDK API version, which will only cause confusion.
Really, the only way to do this is linking every SDK, much like the rendered-on-a-single-line links at https://docs.couchbase.com/server/current/guides/creating-data.html#related-links
(in this case going straight to https://docs.couchbase.com/java-sdk/3.9/howtos/collecting-information-and-logging.html#sdk-telemetry-from-the-server etc., with an "and all later versions" at the end of the list, perhaps.)
Probably not the only solution, so happy to discuss further.
There was a problem hiding this comment.
tangential point: we want people using the latest version of the SDK,
so could just link to current, and say it's in "recent" versions.
| You can enable application telemetry to have Couchbase Server collect metrics from your applications that use the Couchbase SDKs. | ||
| When you enable application telemetry, Couchbase Server collects telemetry data from your applications. | ||
|
|
||
| Couchbase Server reports the collected data as metrics through the same Prometheus endpoint that it uses to report its own metrics. |
There was a problem hiding this comment.
Might be worth mentioning that the metrics are aggregated across clients. Currently it reads in a way that suggests you'd be able to see per-client metrics, which is not the case.
|
|
||
| === Prerequisites | ||
|
|
||
| You Couchbase Server cluster and your clients must meet the following requirements to use application telemetry: |
There was a problem hiding this comment.
| You Couchbase Server cluster and your clients must meet the following requirements to use application telemetry: | |
| Your Couchbase Server cluster and your clients must meet the following requirements to use application telemetry: |
| * A Couchbase Server cluster only supports application telemetry when all of its nodes are running version 8.0 or later. | ||
| Earlier versions of Couchbase Server do not support application telemetry. | ||
| Your cluster cannot collect application telemetry if it's running in mixed mode where some nodes are running a pre-8.0 version. | ||
| If you enable application telemetry on a mixed-mode cluster with pre-8.0 nodes, it does not advertise to clients that it can collect telemetry. |
There was a problem hiding this comment.
What is this referring to? You can't configure app telemetry on a mixed-mode cluster
| [#get-status] | ||
| == Get Application Telemetry Status | ||
|
|
||
| The following method gets the current state of application telemetry for the cluster. |
There was a problem hiding this comment.
Do we normally refer to settings as "state"? Feels like it suggests it would be a status, rather than the configuration
| You can set `maxScrapeClientsPerNode` to a lower value to reduce potential overhead on your nodes from collecting telemetry data. | ||
| However, lowering it too far could result in clients being unable to find a node for their telemetry connection. |
There was a problem hiding this comment.
lower value to reduce potential overhead
lowering it too far could result in clients being unable to find a node
Feels like basically saying "lower value reduces number of clients, but lowering it too far could result in reducing the number of clients". Like, the whole point is to reduce the number of clients, so the two statements don't really feel consistent with each other.
| You can set `maxScrapeClientsPerNode` to a lower value to reduce potential overhead on your nodes from collecting telemetry data. | |
| However, lowering it too far could result in clients being unable to find a node for their telemetry connection. | |
| You can set `maxScrapeClientsPerNode` to a lower value to reduce potential overhead on your nodes from collecting telemetry data by preventing too many clients connecting to each node. | |
| However, if the limit is reached on all nodes, this will result in any additional clients being unable to find a node for their telemetry connection. |
| + | ||
| The default value is `60`. | ||
| You can increase this value to reduce the overhead of collecting telemetry data on your nodes. | ||
| However, setting it too high could result in a loss of telemetry data when clients close their connections. |
That's basically up to the SDKs to handle. When they attempt to connect to a node that is full, they'll try other nodes, but I can't remember the exact details. Users would presumably see this from the SDK logging, but perhaps @DemetrisChr can confirm? The admins for the cluster would also be able to detect this happening by following the |
| * See the SDK Telemetry from the Server section of the Collecting Information and Logging page in the documentation for the SDK you use. | ||
| For example: | ||
|
|
||
| ** xref:cxx-sdk:howtos:collecting-information-and-logging.adoc#sdk-telemetry-from-the-server[C++ SDK] |
There was a problem hiding this comment.
See line 34 comments (and note the use of a single line for 12 SDK links in the linked docs page).
This PR adds documentation for the Server
/settings/appTelemetryREST API endpoint.Here's a list of the changes in this PR, with links to a preview. You will need the Docs Team credentials on Confluence to access the preview.
Note: I've made some assumptions about the
maxScrapeClientsPerNodeandscrapeIntervalSecondssettings, so be sure to pay special attention to their descriptions.