Skip to content

Comments

Add Private link support#706

Draft
sylwiaszunejko wants to merge 2 commits intoscylladb:masterfrom
sylwiaszunejko:private-link
Draft

Add Private link support#706
sylwiaszunejko wants to merge 2 commits intoscylladb:masterfrom
sylwiaszunejko:private-link

Conversation

@sylwiaszunejko
Copy link
Collaborator

Implement scylla-specific ClientRoutes feature

This feature was implemented in scylladb/scylladb#27323
Idea is to enable clients to dynamically learn address translation information from the system.client_routes table.
When this table is updated drivers get CLIENT_ROUTES_CHANGE event with scope of the change.

This PR adds ability to configure driver to read this table and events and maintain address translation mapping updated.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

Fixes: #692

This feature was implemented in scylladb/scylladb#27323
Idea is to enable clients to dynamically learn address translation
information from the system.client_routes table.
When this table is updated drivers get CLIENT_ROUTES_CHANGE event with
scope of the change.

This PR adds ability to configure driver to read this table and events
and maintain address translation mapping updated.
"""

def __init__(self, endpoints, table_name="system.client_routes",
max_resolver_concurrency=1, resolve_healthy_endpoint_period_ms=500,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_healthy_endpoint_period_ms is too small, every client will re-resolve every known endpoints in half of milisecond, it will create dns storms.

self.table_name = table_name
self.max_resolver_concurrency = max_resolver_concurrency
self.resolve_healthy_endpoint_period_ms = resolve_healthy_endpoint_period_ms
self.block_unknown_endpoints = block_unknown_endpoints
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove block_unknown_endpoints - it should be blocking unknown all the time, unfortunately in gocql it was introduced and then i scraped id to reduce configuration cases but left config in place, which is missleading.


# CAS retry loop (similar to Go's for loop)
max_retries = 10
for attempt in range(max_retries):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't see any way it could iterate even once, i.e. retry and tbh, it is not needed, is it copy-cat from gocql code ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I used Copilot to compare my code to golang implementation and I guess I missed copilot making this change and accepted it by mistake, I will remove this

log.warning("[client routes] CAS retry limit reached for host_id=%s", host_id)
return current_ip

def get_port(self, host):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not used anywhere

column_encryption_policy=None,
application_info:Optional[ApplicationInfoBase]=None
application_info:Optional[ApplicationInfoBase]=None,
client_routes_config=None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add annotations


if address_translator is not None:
# Validate mutual exclusivity of client_routes_config and address_translator
if client_routes_config is not None and address_translator is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default for class variable of Cluster.address_translator is a IdentityTranslator, we need to sort it out to make it less confusing

Comment on lines +634 to +636
if current_route.current_ip and current_route.update_time and updated_route.update_time:
if current_route.update_time >= updated_route.update_time:
return current_route.current_ip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you join em in a single if ?

self._lock = threading.Lock()
self._semaphore = threading.Semaphore(max_concurrent)

def resolve(self, hostname, cached_ips=None, current_ip=None, cached_at=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason why DNSResolver exists in gocql is to address possible problem when one endpoint exposed on multiple AZs and cross AZ traffic is not allowed on nlb or DNS is returning DNS records without considering local AZ.
This issue of the table now, since on latest stages it was decided to allow cross az traffic, and if DNS does not consider local AZ it could be fixed on DNS side.
So, now you can resolve hostname to the ip right away with socket.getaddrinfo you still need to make sure it caches responses for some time to make sure it is not spamming DNS server when no system caching is present.

Comment on lines +252 to +255
if hasattr(translator, 'translate_with_host_id'):
translated_addr = translator.translate_with_host_id(addr, host_id)
else:
translated_addr = translator.translate(addr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may happen that cloud infra will redeploy private link intefrace, when it happens driver need to react quick to avoid prolonged traffic disruption, which means that you can't cache DNS responses onto host.endpoint as it is currently done.
It means that you need to find another way to solve this problem and not relay on AddressTranslator.
Good place to look at is an Endpoint:

def _get_socket_addresses(self):
address, port = self.endpoint.resolve()
if hasattr(socket, 'AF_UNIX') and self.endpoint.socket_family == socket.AF_UNIX:
return [(socket.AF_UNIX, socket.SOCK_STREAM, 0, None, address)]
addresses = socket.getaddrinfo(address, port, self.endpoint.socket_family, socket.SOCK_STREAM)
if not addresses:
raise ConnectionException("getaddrinfo returned empty list for %s" % (self.endpoint,))
return addresses

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PrivateLink support

2 participants