Friday, August 9, 2013

Shifting madness


My team was chasing an odd bug for a couple of weeks. In the GemFire distributed cache we store a 64-bit timestamp representing the last-modified-time for a cache entry and use it for entry expiration and inter-site consistency checks.  This value was all of a sudden going back in time a day and a half, causing early expiration and inter-site inconsistencies.

Well, what do you do?  You review recent changes to the product, for one thing.  Not long ago two people worked on the code for handling this timestamp.  The significant change seemed to be in using the top 8 bits of this field to store boolean flags and using a java.util.concurrent.atomic.AtomicLongFieldUpdater to access the field.

  private static final long LAST_MODIFIED_MASK = 0x00FFFFFFFFFFFFFFL;

   long storedValue;
    long newValue;
    do {
      storedValue = lastModifiedUpdater.get(this);
      newValue = storedValue & ~LAST_MODIFIED_MASK;
      newValue |= lastModifiedTime;
    } while (!lastModifiedUpdater.compareAndSet(this, storedValue, newValue));


This code looks okay unless the relatively new AtomicLongFieldUpdater is messing up.  The other change was introduction of a new bit to store in the 8 top bits of the field:

  private static final long VALUE_RESULT_OF_SEARCH   = 0x01L << 56;
  private static final long UPDATE_IN_PROGRESS       = 0x02L << 56;
  private static final long TOMBSTONE_SCHEDULED      = 0x04L << 56;
  private static final long LISTENER_INVOCATION_IN_PROGRESS = 0x08 << 56;


There's a problem with this last line.  We're shifting a 32-bit integer 56 places to the left. A good C compiler will complain about this but the Java compiler seems okay with it.  Here's a C program:


#include "stdio.h"

int main(int argc, char *argv[]) {
    long l = 1 << 30;
    printf("1 << 30=%ld",l);

    l = 1<<32 font="">
    printf("1 << 32=%ld",l);
    l = 3<<32 br="">    printf("3 << 32=%ld",l);
}
~> cc -o testit test.c
test.c: In function 'main':
test.c:6: warning: left shift count >= width of type
test.c:8: warning: left shift count >= width of type


And the equivalent Java program:


import java.io.*;

public class test {
  public static void main(String[] args) {
    long l = 1 << 30;
    System.out.println("1 << 30="+l);

    l = 1 << 32;
    System.out.println("1 << 32="+l);

    l = 3 << 32;
    System.out.println("3 << 32="+l);

  }
}


~> javac test.java



Running these two programs gives different results


~> ./testit
1 << 30=1073741824

1 << 32=0
3 << 32=0

~> java test
1 << 30=1073741824

1 << 32=1
3 << 32=3

So the << operator works differently in Java than in C.  Shifting 0x08 fifty six bits in C results in a 0 but Java turns it into 0x800,0000!  The Java Language Specification says

If the promoted type of the left-hand operand is int, only the five lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & (§15.22.1) with the mask value 0x1f (0b11111). The shift distance actually used is therefore always in the range 0 to 31, inclusive.
 So 56 (111000) is silently turned into 24 (011000) by the javac compiler! The new constant, 0x8000000, is 8 << 24, not 8 << 56 as intended!

This bit was being set and cleared in the timestamps when the new flag was used.  For instance, 2013/07/25 15:55:10.906 PDT is 1374792910906 on the millisecond clock.  Clearing bit 0x8000000 turns the clock back to 1374658693178, which is 2013/07/24 02:38:13.178 PDT.  That's roughly 37 hours earlier than the unmolested timestamp.  No wonder entries were being considered "old" before their time.

Changing the constant to have "L" like the others fixed the problem.

Tuesday, February 5, 2013

I was reviewing some code for a coworker and saw something that I didn't know was possible...


Integer counter = 0;
.
.
.
synchronized(counter) {
  counter++;
  // do other work under sync
}


I thought that the compiler might be accepting this as a valid use of autoboxing, as in

Integer counter = 0;

synchronized(counter) {
  int i = counter.intValue();
  i++;
  // do other work under sync
}

in other words the value is pulled out of the Integer and held in a temporary variable where it is incremented and then thrown away.

This turned out not to be the case at all.  The compiler actually creates code that will increment the value like this but it affects counter just as if this were an "int" field!  So my coworker was right - this "++" is incrementing the counter like he wanted it to do.

But now there is something else wrong with the code!  Java Integers are immutable, and that "++" is assigning a new Integer to the counter variable.  This makes his synchronized(counter)statement useless in protecting anything but the counter++.  Once that's finished there is a new object in counter.  If one thread synchronized on Integer(0) the counter++ would change it to Integer(1)  Another thread could then enter the synchronized block holding a lock on Integer(1) while the original thread continued to lock on Integer(0).

t1: synchronized(Integer(0)) {
t1: counter++ // in other words, counter=Integer(1)
t2: synchronized(Integer(1)) {
t1 & t2: // do other work under sync


What else is wrong with this?  What about the objects we're synchronizing on?  I wrote a short program to look at the Integers generated in code such as this.


public class incInt {

  public static void main(String args[]) {
    Integer i = 0;
    System.out.println("i="+i + " hash="+System.identityHashCode(i));
    i++;
    System.out.println("i="+i + " hash="+System.identityHashCode(i));
    i++;
    System.out.println("i="+i + " hash="+System.identityHashCode(i));

    System.out.println("resetting to zero");

    i = 0;
    System.out.println("i="+i + " hash="+System.identityHashCode(i));
    i++;
    System.out.println("i="+i + " hash="+System.identityHashCode(i));
    i++;
    System.out.println("i="+i + " hash="+System.identityHashCode(i));
  }

}

The result of running this with Oracle's JRE 1.7.0_5 shows that there are canonical values for autoboxed zero, one and two.

> java incInt
i=0 hash=4991049
i=1 hash=32043680
i=2 hash=9499036
resetting to zero
i=0 hash=4991049
i=1 hash=32043680
i=2 hash=9499036

Here's a blog post that claims that [-128,127] are cached by the JVM and used in autoboxing.  It turns out that the post is right.  I modified the test to print out the hashes for zero, one and two and they are the same objects

> java incInt
i=0 hash=31879808
i=1 hash=6770745
i=2 hash=12835244
resetting to zero
i=0 hash=31879808
i=1 hash=6770745
i=2 hash=12835244
Integer.valueOf(0)=31879808
Integer.valueOf(1)=6770745
Integer.valueOf(2)=12835244

Getting back to the code under review, this means that the synchronization is at least sometimes using a canonical object used by the whole JVM.  Anything could sync on Integer.valueOf(0), causing this code to be affected by code running in other threads.  All synchronization should be done on private state or by using well-tested concurrency utilities to avoid accidental conflicts and meddling.