Skip to content

Conversation

@Anyitechs
Copy link
Contributor

This introduces the foundational telemetry infrastructure to improve the observability of LDK Server.

It adds a new /metrics endpoint exposed on the REST service address, which serves Prometheus-compatible metrics. This endpoint is public and does not require HMAC authentication, allowing for easy integration with monitoring systems.

  • Added prometheus dependency and a Metrics utility struct.
  • Introduced a basic ldk_health_score gauge (0-100) that reflects the node's operational status based on connection to peer, sync state, and running status.

This is the first step in a larger effort to provide comprehensive telemetry. Future updates will expand this to include metrics for channels, balances, payments, and other critical node activities.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jan 27, 2026

I've assigned @benthecarman as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@Anyitechs Anyitechs force-pushed the introduce-telemetry branch from a3831dc to 8692c8c Compare January 27, 2026 01:09
This introduces the foundational telemetry infrastructure to improve
the observability of LDK Server.

It adds a new `/metrics` endpoint exposed on the REST service address,
which serves Prometheus-compatible metrics. This endpoint is public and
does not require HMAC authentication, allowing for easy integration with
monitoring systems.

- Added `prometheus` dependency and a `Metrics` utility struct.
- Introduced a basic `ldk_health_score` gauge (0-100) that
reflects the node's operational status based on connection to peer,
sync state, and running status.

This is the first step in a larger effort to provide comprehensive telemetry.
Future updates will expand this to include metrics for channels, balances,
payments, and other critical node activities.
@Anyitechs Anyitechs force-pushed the introduce-telemetry branch from 8692c8c to 56b5e4a Compare January 27, 2026 01:51
@Anyitechs Anyitechs marked this pull request as ready for review January 27, 2026 01:59
chrono = { version = "0.4", default-features = false, features = ["clock"] }
log = "0.4.28"
base64 = { version = "0.21", default-features = false, features = ["std"] }
lazy_static = "1.5.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid taking this dependency and rather use one of the std::sync primitives (Once/OnceLock/LazyLock), if we need this at all.

log = "0.4.28"
base64 = { version = "0.21", default-features = false, features = ["std"] }
lazy_static = "1.5.0"
prometheus = "0.14.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this dependency, or can we just reimplement it easily locally?

If we need it, this should be at the very least made optional behind the metrics feature, and disable any default features.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, its a plain text file with some numbers, don't think we need to take a dep :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this dependency, or can we just reimplement it easily locally?

We can reimplement locally, but I think the dependency already offers some benefits that will come in handy when we want to provide metrics for things like balances in different states, payments, fees, etc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also pulls in several solo-maintainer dependencies (okay, with relatively well-known Rust community folks like burntsushi and dtolnay but also the apparently-kinda-unmaintained fnv crate) which we very strongly try to avoid given the security risk. Unless there's something that takes many hundred to a few thousand lines of code or very complicated and hard to test code to replicate, we should absolutely avoid taking a dependency for it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I've implemented building prometheus endpoints in bash and python and....many times with probably less effort than it would have taken to figure out how to add a dependency and use it, so I'm not at all convinced that its worth it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, will drop the dependency and reimplement locally.

self.service_health_score.set(score);
}

/// The health score computation is pretty basic for now and simply
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is it customary to have such a score? I would find it very hard to interpret, tbh.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is it customary to have such a score?

I think it is as some users might want to rely on that to know how their node is performing per time at a glance.

I would find it very hard to interpret, tbh.?

The current computation is pretty basic and relies on the NodeStatus informations from ldk-node. Though, we might want to refine that to include more informations and decide the best weightage value to assign for each event.

pub const BUILD_METRICS_INTERVAL: Duration = Duration::from_secs(60);

lazy_static! {
pub static ref METRICS: Metrics = Metrics::new(default_registry());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having global statics is somewhat of an anti-pattern, can we avoid it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it's safe since it's not a mutable state and wrapped around lazy_static. But will drop that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants