Skip to content

Comments

IGNITE-27871 Improve deployment lookup to reduce deploy() contention …#12760

Open
oleg-vlsk wants to merge 3 commits intoapache:masterfrom
oleg-vlsk:ignite-27871
Open

IGNITE-27871 Improve deployment lookup to reduce deploy() contention …#12760
oleg-vlsk wants to merge 3 commits intoapache:masterfrom
oleg-vlsk:ignite-27871

Conversation

@oleg-vlsk
Copy link
Contributor

…for locally available tasks with peerClassLoadingEnabled=true

Thank you for submitting the pull request to the Apache Ignite.

In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:

The Contribution Checklist

  • There is a single JIRA ticket related to the pull request.
  • The web-link to the pull request is attached to the JIRA ticket.
  • The JIRA ticket has the Patch Available state.
  • The pull request body describes changes that have been made.
    The description explains WHAT and WHY was made instead of HOW.
  • The pull request title is treated as the final commit message.
    The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
  • A reviewer has been mentioned through the JIRA comments
    (see the Maintainers list)
  • The pull request has been checked by the Teamcity Bot and
    the green visa attached to the JIRA ticket (see TC.Bot: Check PR)

Notes

If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com #ignite channel.

…for locally available tasks with peerClassLoadingEnabled=true
P2PClassLoadingFailureHandlingTest.class,
P2PClassLoadingIssuesTest.class
P2PClassLoadingIssuesTest.class,
GridDeploymentLocalStoreReuseTest.class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comma to the end of line please (to reduce conflicts on merge)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 135 to 141
CompletableFuture<T2<UUID, Set<UUID>>> fut = client.compute()
.withTimeout(timeout).
<T2<UUID, Set<UUID>>, T2<UUID, Set<UUID>>>executeAsync2(TestTask.class.getName(), null)
.toCompletableFuture();

try {
fut.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

                client.compute().execute(TestTask.class.getName(), null);

Copy link
Contributor Author

@oleg-vlsk oleg-vlsk Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used this snippet, thank you.

Comment on lines 108 to 113
List<IgniteInternalFuture<Void>> futs = new ArrayList<>(CLIENT_CNT);

for (IgniteClient client : clients)
futs.add(runAsync(() -> executeTasksOnClient(client, EXEC_CNT, 5_000L)));

waitForAllFutures(futs.toArray(new IgniteInternalFuture[0]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

            runMultiThreaded(i -> executeTasksOnClient(clients.get(i), EXEC_CNT), CLIENT_CNT, "worker");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided not to go for multi-threaded execution as the perpose of the test is to verify certain behaviour during subsequent executions of the same task. So I ended up using a simple for-loop.

ClusterNode[] allServerNodes = grid(0).cluster().forServers().nodes().toArray(new ClusterNode[0]);

for (int i = 0; i < CLIENT_CNT; i++)
clients.add(startClient(allServerNodes));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can connect to any server node,it's not necessary to provide all nodes, one is enough, i.e. clients.add(startClient(0));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 149 to 177
/** */
private static class DeploymentListeningLogger extends ListeningTestLogger {
/** */
private final ConcurrentLinkedQueue<String> depNotFound = new ConcurrentLinkedQueue<>();

/** */
public DeploymentListeningLogger(IgniteLogger log) {
super(log);
}

/** {@inheritDoc} */
@Override public void debug(String msg) {
if (msg.contains("Deployment was not found for class with specific class loader"))
depNotFound.add(msg);

super.debug(msg);
}

/** {@inheritDoc} */
@Override public ListeningTestLogger getLogger(Object ctgr) {
return this;
}

/** */
public List<String> depNotFound() {
return depNotFound.stream().collect(Collectors.toUnmodifiableList());
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's incorrect usage of listening logger, all you need is register listener like:

            LogListener lsnr = LogListener.matches(notFoundMsg).times(CLIENT_CNT).build();

            listeningTestLog.registerListener(lsnr);

listeningTestLog should be created on top of standard logger, for example:

        setLoggerDebugLevel();

        listeningTestLog = new ListeningTestLogger(log);

And passed to ignite configuration. No need for logger for each node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

meta.alias(rsrcName);
meta.className(clsName);
meta.senderNodeId(ctx.localNodeId());
meta.classLoader(ldr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the local app classloader check to GridDeploymentLocalStore#deployment so that in the initial call the meta does not contains classloader.

private final ConcurrentMap<String, Deque<GridDeployment>> cache = new ConcurrentHashMap<>();

/** Deployment cache by classloader. */
private final ConcurrentMap<ClassLoader, Deque<GridDeployment>> cacheByLdr = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheByLdr always used under the lock mux, no ConcurrentMap overhead required here.
Also maybe it worth to use IdentityHashMap in case someone redefine classloader's equals() in a wrong way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you for the hint.


dep = d;
for (GridDeployment d : depsByLdr) {
if (!d.undeployed() && d.classLoader() == ldr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's undeployed, it's cleaned from cache, how we can find it?
Why do we need to check classloader if we put in cache only items with exactly this classloader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those were 'extra safety' checks. Changed the lookup logic altogether (see below).

dep = candidate;
}
}
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this check? If deployment not found by classloader in classloader cache it can't be found in aliases cache. We preserve both caches synchronized and modify it only under the lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this else block.

if (d.classLoader() == ldr) {
// Cache class and alias.
fireEvt = d.addDeployedClass(cls, alias);
Deque<GridDeployment> depsByLdr = cacheByLdr.get(ldr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's one-to-one relation for deployment and classloader. Did I miss something?

Copy link
Contributor Author

@oleg-vlsk oleg-vlsk Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. In GridDeploymentLocalStore#cache we can have several deployments with the same classloader associated with one alias/class name (see attached screenshots). Most recent deployment are added to the beginning of the queue (the addFirst() call in GridDeploymentLocalStore#deploy).

local deployments cache - 1 local deployments cache - 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about cacheByLdr, not cache. For cacheByLdr it looks like only one deployment is possible for one classloader.

Valuyskiy.O.Y added 2 commits February 22, 2026 07:20
…calStore#deployment, correct cache lookup mechanism in GridDeploymentLocalStore#deploy, simplify GridDeploymentLocalStoreReuseTest#testNoExcessiveLocalDeploymentCacheMisses
Comment on lines +271 to +274
ClassLoader ldr = Thread.currentThread().getContextClassLoader();

if (ldr == null)
ldr = U.resolveClassLoader(ctx.config());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Let's move ldr initialization outside the loop.
  2. Just add || dep.classLoader() == ldr to the if condition

if (d.classLoader() == ldr) {
// Cache class and alias.
fireEvt = d.addDeployedClass(cls, alias);
Deque<GridDeployment> depsByLdr = cacheByLdr.get(ldr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about cacheByLdr, not cache. For cacheByLdr it looks like only one deployment is possible for one classloader.

Comment on lines +111 to +112
assertTrue(lsnr0.check(5_000));
assertTrue(lsnr1.check(5_000));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to wait here? As far as I understand here strict happens-before between task completion and log message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants