Skip to content

Adds monitor endpoint to show manager responsibilites#6309

Open
keith-turner wants to merge 1 commit intoapache:mainfrom
keith-turner:manager-info
Open

Adds monitor endpoint to show manager responsibilites#6309
keith-turner wants to merge 1 commit intoapache:mainfrom
keith-turner:manager-info

Conversation

@keith-turner
Copy link
Copy Markdown
Contributor

Needed for #6190

@keith-turner keith-turner added this to the 4.0.0 milestone Apr 10, 2026
@keith-turner
Copy link
Copy Markdown
Contributor Author

The output of this looks like the following when running 5 managers.

{
  "localhost:10000": [
    "FatePartition[start=FATE:META:00000000-0000-0000-0000-000000000000, end=FATE:META:ffffffff-ffff-ffff-ffff-ffffffffffff]",
    "FatePartition[start=FATE:USER:cccccccc-cccc-ccc0-0000-000000000000, end=FATE:USER:ffffffff-ffff-ffff-ffff-ffffffffffff]",
    "TABLET_MANAGEMENT",
    "BALANCING",
    "CLIENT_RPC",
    "TSERVER_MONITORING",
    "CLUSTER_MAINTENANCE",
    "COMPACTION_COORDINATION"
  ],
  "localhost:10001": [
    "FatePartition[start=FATE:USER:00000000-0000-0000-0000-000000000000, end=FATE:USER:33333333-3333-3330-0000-000000000000]"
  ],
  "localhost:9999": [
    "FatePartition[start=FATE:USER:33333333-3333-3330-0000-000000000000, end=FATE:USER:66666666-6666-6660-0000-000000000000]"
  ],
  "localhost:10002": [
    "FatePartition[start=FATE:USER:99999999-9999-9990-0000-000000000000, end=FATE:USER:cccccccc-cccc-ccc0-0000-000000000000]"
  ],
  "localhost:10003": [
    "FatePartition[start=FATE:USER:66666666-6666-6660-0000-000000000000, end=FATE:USER:99999999-9999-9990-0000-000000000000]"
  ]
}

@dlmarion
Copy link
Copy Markdown
Contributor

What function does TSERVER_MONITORING equate to?

In the Manager view in #6278 I think all of these functions can be deduced from the metrics. For example, the presence of balancer metrics implies balancing, the presence of compaction metrics implies coordination, etc. Now, if we remove those metrics, then we will need something.

How do you envision displaying the Fate partition information?

@keith-turner
Copy link
Copy Markdown
Contributor Author

As an experiment, I pulled these changes into #6217 and modified them to display info about compaction coordination.

{
  "localhost:10000": [
    "FatePartition[start=FATE:USER:cccccccc-cccc-ccc0-0000-000000000000, end=FATE:USER:ffffffff-ffff-ffff-ffff-ffffffffffff]",
    "FatePartition[start=FATE:META:00000000-0000-0000-0000-000000000000, end=FATE:META:ffffffff-ffff-ffff-ffff-ffffffffffff]",
    "TABLET_MANAGEMENT",
    "BALANCING",
    "CLIENT_RPC",
    "TSERVER_MONITORING",
    "CLUSTER_MAINTENANCE",
    "COMPACTOR_GROUPS:[accumulo]"
  ],
  "localhost:9999": [
    "FatePartition[start=FATE:USER:33333333-3333-3330-0000-000000000000, end=FATE:USER:66666666-6666-6660-0000-000000000000]",
    "COMPACTOR_GROUPS:[ci_lrg]"
  ],
  "localhost:10001": [
    "FatePartition[start=FATE:USER:00000000-0000-0000-0000-000000000000, end=FATE:USER:33333333-3333-3330-0000-000000000000]",
    "COMPACTOR_GROUPS:[ci_small]"
  ],
  "localhost:10002": [
    "FatePartition[start=FATE:USER:99999999-9999-9990-0000-000000000000, end=FATE:USER:cccccccc-cccc-ccc0-0000-000000000000]",
    "COMPACTOR_GROUPS:[default]"
  ],
  "localhost:10003": [
    "FatePartition[start=FATE:USER:66666666-6666-6660-0000-000000000000, end=FATE:USER:99999999-9999-9990-0000-000000000000]"
  ]
}

@keith-turner
Copy link
Copy Markdown
Contributor Author

What function does TSERVER_MONITORING equate to?

The manager periodically pings all the tservers to get stats and it will eventually attempt to kill unresponsive tservers. We may remove the custom stats collection in favor of metrics. May still want to keep this ping functionality though to check for tservers that have a lock but can not be reached via RPC. This may be easy to eventually distribute across the managers, just have each assistant manger hash mod the tservers and only ping the ones that match its ordinal.

How do you envision displaying the Fate partition information?

I am not completely sure, I kinda like the way the formatted json is displaying it with a list of responsibilities under each manager addr.

In the Manager view in #6278 I think all of these functions can be deduced from the metrics.

We probably could do that for a lot of this info. The current impl pulls everything from zoocache and will give the most up to date and consistent info. For metrics there may be lag like if a compactor group is moved from manager A to manager B, not sure when that would work its way through w/ metrics. Pulling the small amount of data directly from zoocache will be quick and up to date.

@keith-turner
Copy link
Copy Markdown
Contributor Author

In the Manager view in #6278 I think all of these functions can be deduced from the metrics.

This can wait until #6278 is complete and then build on it.

@Path("manager/responsibilities")
@Produces(MediaType.APPLICATION_JSON)
@Description("Returns each managers responsibilities")
public Map<String,List<String>> getManagerResponsibilities() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this here, instead of in SystemInformation, will hit ZooKeeper every time this endpoint is hit, so every page refresh for every browser showing the Monitor. The model that we have been using so far is to store the information we want to display in the SystemInformation object, which will provide a consistent response for a point in time until the object is refreshed. This would make this endpoint real-time vs point-in-time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants