Thursday, July 1, 2010

JGroups address reuse

One of the things we avoided for years in GemFire was the fact that its JGroups component can reuse an old address with a new member.

When a process joins a JGroups channel it creates an ephemeral datagram socket and uses its port number + the machines IP address to create a membership ID. It then finds the membership coordinator and sends a Join message to it with this ID. The coordinator sends out a new membership view containing the new member's ID.

If the address happens to be one that was recently used by another process on the same machine, it can look like the member left and then joined again. This is actually possible for a single process if you use the JGroups merge protocols.

GemFire keeps track of recently departed members in a "shunned members" collection and has used this to keep addresses from being reused. If a member tries to join using an old ID, all messages from it are ignored. The new member then times out in its attempt to send startup messages to other members of the system and throws a SystemConnectException.

That was fine for a long time, but new customers have brought new demands and we now allow you to restrict the range of ephemeral ports that JGroups uses. This makes ID reuse much more likely. Setting a port range of 1000-1050, for instance, only provides a pool of 51 ports to select from.

We tried several approaches to making identifiers less likely to be reused before finding a solution that worked. We tried adding the cache's peer-to-peer stream socket port number to the ID and adding randomly selected bytes to the ID before hitting on the idea of using the JGroups Lamport clock.

Now when a member starts to join the system it creates a temporary ID that contains its InetAddress and JGroups port. This address is put in the Locator's discovery set and is sent to the membership coordinator in a Join request. The coordinator responds by sending back the current membership view and a modified ID that the member should use. This ID has the JGroups Lamport timestamp embedded in it and this is then used in ID equality checks to differentiate between two IDs that have the same InetAddress and port.

In JGroups the Lamport timestamp is nothing more than the membership view number. It is usually incremented by 1 each time a new membership view is formed. Having this information in the ID not only lets us make them "more unique" but it also lets you see at a glance when a member joined the system.

Having this information in the identifier has eliminated address reuse completely. The Lamport clock is never decremented or reset, so each new member is guaranteed to have a unique ID and not be confused with an old, departed member.

1 comment:

  1. That's really bright, Bruce. This solves the problem completely. Good work!

    ReplyDelete