Friday, August 9, 2013

Shifting madness

My team was chasing an odd bug for a couple of weeks. In the GemFire distributed cache we store a 64-bit timestamp representing the last-modified-time for a cache entry and use it for entry expiration and inter-site consistency checks.  This value was all of a sudden going back in time a day and a half, causing early expiration and inter-site inconsistencies.

Well, what do you do?  You review recent changes to the product, for one thing.  Not long ago two people worked on the code for handling this timestamp.  The significant change seemed to be in using the top 8 bits of this field to store boolean flags and using a java.util.concurrent.atomic.AtomicLongFieldUpdater to access the field.

  private static final long LAST_MODIFIED_MASK = 0x00FFFFFFFFFFFFFFL;

   long storedValue;
    long newValue;
    do {
      storedValue = lastModifiedUpdater.get(this);
      newValue = storedValue & ~LAST_MODIFIED_MASK;
      newValue |= lastModifiedTime;
    } while (!lastModifiedUpdater.compareAndSet(this, storedValue, newValue));

This code looks okay unless the relatively new AtomicLongFieldUpdater is messing up.  The other change was introduction of a new bit to store in the 8 top bits of the field:

  private static final long VALUE_RESULT_OF_SEARCH   = 0x01L << 56;
  private static final long UPDATE_IN_PROGRESS       = 0x02L << 56;
  private static final long TOMBSTONE_SCHEDULED      = 0x04L << 56;
  private static final long LISTENER_INVOCATION_IN_PROGRESS = 0x08 << 56;

There's a problem with this last line.  We're shifting a 32-bit integer 56 places to the left. A good C compiler will complain about this but the Java compiler seems okay with it.  Here's a C program:

#include "stdio.h"

int main(int argc, char *argv[]) {
    long l = 1 << 30;
    printf("1 << 30=%ld",l);

    l = 1<<32 font="">
    printf("1 << 32=%ld",l);
    l = 3<<32 br="">    printf("3 << 32=%ld",l);
~> cc -o testit test.c
test.c: In function 'main':
test.c:6: warning: left shift count >= width of type
test.c:8: warning: left shift count >= width of type

And the equivalent Java program:


public class test {
  public static void main(String[] args) {
    long l = 1 << 30;
    System.out.println("1 << 30="+l);

    l = 1 << 32;
    System.out.println("1 << 32="+l);

    l = 3 << 32;
    System.out.println("3 << 32="+l);


~> javac

Running these two programs gives different results

~> ./testit
1 << 30=1073741824

1 << 32=0
3 << 32=0

~> java test
1 << 30=1073741824

1 << 32=1
3 << 32=3

So the << operator works differently in Java than in C.  Shifting 0x08 fifty six bits in C results in a 0 but Java turns it into 0x800,0000!  The Java Language Specification says

If the promoted type of the left-hand operand is int, only the five lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & (§15.22.1) with the mask value 0x1f (0b11111). The shift distance actually used is therefore always in the range 0 to 31, inclusive.
 So 56 (111000) is silently turned into 24 (011000) by the javac compiler! The new constant, 0x8000000, is 8 << 24, not 8 << 56 as intended!

This bit was being set and cleared in the timestamps when the new flag was used.  For instance, 2013/07/25 15:55:10.906 PDT is 1374792910906 on the millisecond clock.  Clearing bit 0x8000000 turns the clock back to 1374658693178, which is 2013/07/24 02:38:13.178 PDT.  That's roughly 37 hours earlier than the unmolested timestamp.  No wonder entries were being considered "old" before their time.

Changing the constant to have "L" like the others fixed the problem.