Skip to content

Warn when libp2p mdns multicast send fails#1681

Open
sblOWPCKCR wants to merge 2 commits intoexo-explore:mainfrom
sblOWPCKCR:mdns-err
Open

Warn when libp2p mdns multicast send fails#1681
sblOWPCKCR wants to merge 2 commits intoexo-explore:mainfrom
sblOWPCKCR:mdns-err

Conversation

@sblOWPCKCR
Copy link
Copy Markdown
Contributor

Motivation

Peer discovery consistently fails on my boxes from time to time. Once it gets into this this state, nothing fixes it short of reboot. I never had Firewall on.

I traced it down to multicast not working. Here's a minimal example to check if you're in this state (should fail with and without --pin):

use socket2::{Domain, Protocol, Socket, Type};
use std::env;
use std::net::{Ipv4Addr, SocketAddrV4, UdpSocket};

fn main() -> std::io::Result<()> {
    let ip: Ipv4Addr = env::args().nth(1).expect("usage: mdns-repro <src-ip> [--pin]").parse().unwrap();
    let pin = env::args().any(|a| a == "--pin");
    let sock = Socket::new(Domain::IPV4, Type::DGRAM, Some(Protocol::UDP))?;
    sock.bind(&SocketAddrV4::new(ip, 0).into())?;
    if pin {
        sock.set_multicast_if_v4(&ip)?;
    }
    let sock = UdpSocket::from(sock);
    let dst = SocketAddrV4::new(Ipv4Addr::new(224, 0, 0, 251), 5353);
    println!("bound={}, pin_multicast_if={pin}", sock.local_addr()?);
    match sock.send_to(b"x", dst) {
        Ok(n) => println!("sent {n} bytes to {dst}"),
        Err(e) => println!("send_to({dst}) -> {e:?}"),
    }
    Ok(())
}

Interestingly, equivalent code works with /usr/bin/python3!

Even more interestingly, the above code (and exo) consistenly failed for me in my old-running tmux session, but both worked just fine if I created a new tmux (or even raw ssh) session.

My hypothesis is that macOS assigns different network policies depending on something. Somebody more knowledgable can pick the investigation up.

This PR only makes mDNS errors visible - you would know that discovery is non-functional immediately:

[ 12:58:25.6102PM | WARNING ] libp2p mDNS multicast send failed. address=192.168.1.10. Peer auto-discovery may not work in this process context. If peers do not form a cluster, relaunch exo from a fresh shell/session (for example outside an existing tmux server).

Changes

Set up tracing for mDNS

Why It Works

It doesn't solve the problem, but makes it visible

Test Plan

Manual Testing

I now see this warning when I'm in 'bad' state:

[ 12:58:25.6102PM | WARNING ] libp2p mDNS multicast send failed. address=192.168.1.10. Peer auto-discovery may not work in this process context. If peers do not form a cluster, relaunch exo from a fresh shell/session (for example outside an existing tmux server).

Automated Testing

N/A

Potentially related: #950

@AlexCheema
Copy link
Copy Markdown
Contributor

Sounds potentially similar to issues @ciaranbor was running into with tmux?

@ciaranbor
Copy link
Copy Markdown
Member

Yes, this is exactly the same issue. When the login shell that the tmux session was created in closes, discovery breaks and will never work again for that session. Perhaps the daemon stuff @Evanev7 is working on will fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants