Conversation
This feature was implemented in scylladb/scylladb#27323 Idea is to enable clients to dynamically learn address translation information from the system.client_routes table. When this table is updated drivers get CLIENT_ROUTES_CHANGE event with scope of the change. This PR adds ability to configure driver to read this table and events and maintain address translation mapping updated.
| """ | ||
|
|
||
| def __init__(self, endpoints, table_name="system.client_routes", | ||
| max_resolver_concurrency=1, resolve_healthy_endpoint_period_ms=500, |
There was a problem hiding this comment.
resolve_healthy_endpoint_period_ms is too small, every client will re-resolve every known endpoints in half of milisecond, it will create dns storms.
| self.table_name = table_name | ||
| self.max_resolver_concurrency = max_resolver_concurrency | ||
| self.resolve_healthy_endpoint_period_ms = resolve_healthy_endpoint_period_ms | ||
| self.block_unknown_endpoints = block_unknown_endpoints |
There was a problem hiding this comment.
Please remove block_unknown_endpoints - it should be blocking unknown all the time, unfortunately in gocql it was introduced and then i scraped id to reduce configuration cases but left config in place, which is missleading.
|
|
||
| # CAS retry loop (similar to Go's for loop) | ||
| max_retries = 10 | ||
| for attempt in range(max_retries): |
There was a problem hiding this comment.
don't see any way it could iterate even once, i.e. retry and tbh, it is not needed, is it copy-cat from gocql code ?
There was a problem hiding this comment.
My mistake, I used Copilot to compare my code to golang implementation and I guess I missed copilot making this change and accepted it by mistake, I will remove this
| log.warning("[client routes] CAS retry limit reached for host_id=%s", host_id) | ||
| return current_ip | ||
|
|
||
| def get_port(self, host): |
There was a problem hiding this comment.
This one is not used anywhere
| column_encryption_policy=None, | ||
| application_info:Optional[ApplicationInfoBase]=None | ||
| application_info:Optional[ApplicationInfoBase]=None, | ||
| client_routes_config=None |
|
|
||
| if address_translator is not None: | ||
| # Validate mutual exclusivity of client_routes_config and address_translator | ||
| if client_routes_config is not None and address_translator is not None: |
There was a problem hiding this comment.
Default for class variable of Cluster.address_translator is a IdentityTranslator, we need to sort it out to make it less confusing
| if current_route.current_ip and current_route.update_time and updated_route.update_time: | ||
| if current_route.update_time >= updated_route.update_time: | ||
| return current_route.current_ip |
There was a problem hiding this comment.
can you join em in a single if ?
| self._lock = threading.Lock() | ||
| self._semaphore = threading.Semaphore(max_concurrent) | ||
|
|
||
| def resolve(self, hostname, cached_ips=None, current_ip=None, cached_at=None): |
There was a problem hiding this comment.
The only reason why DNSResolver exists in gocql is to address possible problem when one endpoint exposed on multiple AZs and cross AZ traffic is not allowed on nlb or DNS is returning DNS records without considering local AZ.
This issue of the table now, since on latest stages it was decided to allow cross az traffic, and if DNS does not consider local AZ it could be fixed on DNS side.
So, now you can resolve hostname to the ip right away with socket.getaddrinfo you still need to make sure it caches responses for some time to make sure it is not spamming DNS server when no system caching is present.
| if hasattr(translator, 'translate_with_host_id'): | ||
| translated_addr = translator.translate_with_host_id(addr, host_id) | ||
| else: | ||
| translated_addr = translator.translate(addr) |
There was a problem hiding this comment.
It may happen that cloud infra will redeploy private link intefrace, when it happens driver need to react quick to avoid prolonged traffic disruption, which means that you can't cache DNS responses onto host.endpoint as it is currently done.
It means that you need to find another way to solve this problem and not relay on AddressTranslator.
Good place to look at is an Endpoint:
python-driver/cassandra/connection.py
Lines 944 to 954 in 16fac63
Implement scylla-specific ClientRoutes feature
This feature was implemented in scylladb/scylladb#27323
Idea is to enable clients to dynamically learn address translation information from the system.client_routes table.
When this table is updated drivers get CLIENT_ROUTES_CHANGE event with scope of the change.
This PR adds ability to configure driver to read this table and events and maintain address translation mapping updated.
Pre-review checklist
./docs/source/.Fixes:annotations to PR description.Fixes: #692