Skip to content

HBASE-30058 Eliminate unnecessary connection creation in snapshot operations#8030

Open
mini666 wants to merge 1 commit intoapache:masterfrom
mini666:HBASE-30058
Open

HBASE-30058 Eliminate unnecessary connection creation in snapshot operations#8030
mini666 wants to merge 1 commit intoapache:masterfrom
mini666:HBASE-30058

Conversation

@mini666
Copy link
Copy Markdown
Contributor

@mini666 mini666 commented Apr 8, 2026

Summary

Each snapshot operation created two short-lived connections in SnapshotDescriptionUtils.validate():

  1. isSecurityAvailable(conf) — created a Connection + Admin just to check hbase:acl table existence
  2. writeAclToSnapshotDescription() — created another Connection to read ACL data from hbase:acl

In Kerberos environments with the default ZKConnectionRegistry, each connection triggered a new ZK session with GSSAPI authentication and a TGS request to the KDC. During batch snapshot operations, this caused excessive KDC load that could lead to IP blocking.

This patch reuses the caller's existing connection instead of creating new ones:

  • isSecurityAvailable() now accepts a Connection parameter
  • writeAclToSnapshotDescription() passes the shared connection's Table to PermissionStorage.getTablePermissions()
  • All callers (MasterRpcServices, RestoreSnapshotProcedure, CloneSnapshotProcedure) pass through their available connection

Zero behavioral change — the same checks are performed, the same data is read, the same ACL is written to snapshots.

See HBASE-30058 for detailed root cause analysis including design ambiguity discussion.

@mini666 mini666 force-pushed the HBASE-30058 branch 2 times, most recently from 3e6e68c to c0497ca Compare April 8, 2026 01:41
Copy link
Copy Markdown
Member

@junegunn junegunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions:

Keep the old method signatures and overload

Could we add overloads that take Connection rather than changing existing signatures? It would minimize code changes and keep the existing API working. For example:

@Deprecated
public static SnapshotDescription validate(SnapshotDescription snapshot, Configuration conf)
    throws IOException {
  try (Connection conn = ConnectionFactory.createConnection(conf)) {
    return validate(conn, snapshot, conf);
  }
}

public static SnapshotDescription validate(Connection conn, SnapshotDescription snapshot,
    Configuration conf) throws IOException {
  // ... new implementation
}

Same idea for isSecurityAvailable.

Commit message

Please drop the Signed-off-by: JeongMin Ju <mini666@apache.org> line from the commit message. In the HBase project, Signed-off-by is used to attribute reviewers of the PR, not the author.

@mini666
Copy link
Copy Markdown
Contributor Author

mini666 commented May 5, 2026

@junegunn Thanks for the review! Updated:

  1. Overloads: Kept the original validate(SnapshotDescription, Configuration) and isSecurityAvailable(Configuration) signatures as @Deprecated overloads that delegate to the new Connection-based methods. Reverted the test file changes since they no longer need the null Connection workaround.
  2. Commit message: Dropped the Signed-off-by line.

Diff is now down to 5 production files (no test changes). Please take another look when you get a chance.

@junegunn
Copy link
Copy Markdown
Member

junegunn commented May 5, 2026

Do we need to deprecate the original ones? We're still using them in multiple places. I think it's enough to mention that the new ones are for avoiding connection overhead.

@mini666
Copy link
Copy Markdown
Contributor Author

mini666 commented May 6, 2026

@junegunn Good point — agreed. Dropped @Deprecated from the original validate(SnapshotDescription, Configuration) and isSecurityAvailable(Configuration) overloads, and added a Javadoc note on the new Connection-based overloads explaining they're preferred when the caller already holds a Connection (to avoid the per-call connection setup cost). Updated the commit message accordingly.

Comment on lines +489 to +493
public static ListMultimap<String, UserPermission> getTablePermissions(Configuration conf,
TableName tableName, Table t) throws IOException {
return getPermissions(conf, tableName != null ? tableName.getName() : null, t, null, null, null,
false);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing one without t can call this:

  public static ListMultimap<String, UserPermission> getTablePermissions(Configuration conf,
    TableName tableName) throws IOException {
    return getTablePermissions(conf, tableName, null);
  }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — done. The 2-arg overload now delegates to the 3-arg one with t=null to avoid duplicating the getPermissions(...) call.

…rations

Each snapshot operation created two short-lived connections in
SnapshotDescriptionUtils.validate() — one for isSecurityAvailable()
to check hbase:acl table existence, and another in
writeAclToSnapshotDescription() to read ACL data. In Kerberos
environments with ZKConnectionRegistry, each connection triggered
a new ZK session with GSSAPI authentication and a TGS request to
the KDC, causing excessive KDC load during batch snapshot operations.

This patch reuses the caller's existing connection instead of
creating new ones:

- isSecurityAvailable() now accepts a Connection parameter
- writeAclToSnapshotDescription() passes the shared connection's
  Table to PermissionStorage.getTablePermissions()
- All callers (MasterRpcServices, RestoreSnapshotProcedure,
  CloneSnapshotProcedure) pass through their available connection

The original validate(SnapshotDescription, Configuration) and
isSecurityAvailable(Configuration) signatures are preserved as
overloads that delegate to the new Connection-based methods, so
existing callers (including tests) keep working unchanged. The
new Connection-based overloads are documented as preferred for
callers that already hold a Connection, to avoid the per-call
connection setup cost.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants