How to deal with filesystem softlinks/symbolic links in Java

Ok, here is the problem: You are assigned to write a method for a Java based filemanager, that is able to delete an entire directory tree. Sounds like a trivial task, that can easily be solved by a very simple, recursive algorithm, doesn't it? Take a File object. If it is indeed a file, just delete it. If it is a directory, list it's contents and call yourself for every file contained in it. Thats it, 10 minutes of coding and taking the rest of the day off. Money earned easily.
A day later however, one of the betatesters calls in. After trying to delete a directory, all her personal files went missing as well.

You are assigned to debug the problem. After investing several hours for digging through the entire sourcecode in every possible way, you conclude that there is no possible way, in which the algorithm could have broken out of the assigned directory tree and therefore, the betatester must have been running some other software at the same time, which went postal and coincidentally destroyed any evidence of itself as well by doing so.

Happy and content with yourself, you go on vacation. Upon your return, you learn that in the meantime, the product shipped and a lot of angry customers have been calling in, reporting the very same problem, you told your boss, does not exist. Your company is is threatened by several lawsuits and losing money hand over fist. What happened?

Meet the symbolic link. Your betatester was not at all running some goofy software (besides the one supplied by you). She was working on a Linux box and the directory, she wanted to delete, contained a soft link. This soft link pointed back to her homedirectory and to that effect outside of the intended directory tree. While your algorithm was absolutely correct for trees, you based your work on a wrong assumption, namely that directory trees are in fact always acyclical graphs. Since every modern desktop OS implements the concept of symlinks, your present code therefore is utmost dangerous.

The fix to the faulty algorithm sounds easy: Never descend into a directory via a symbolic link. Always simply delete symlinks, no matter if they reference directories or files. With that in mind, you check the API of java.io.File for the isLink() method, just to find out... It does not exist! Right, the ability to identify links is missing and that has been an issue with Java for years. It is a problem, currently being worked on by JSR 203, but a solution will not go gold before Java SE 7. So in the meantime, a work around is required and this means doing dirty work using JNI (Java Native Interface).

Let's get started by defining a class called Link. This class will have a single static method, called isLink, which takes a path argument and returns -1 if path does not denote a file, 0 if it is a regular file and 1 if it is a link:

    import java.io.*;
 
    public class Link {
 
      /**
       * Check whether or not a file is a link.
       * @param fname the file in question. This is a string since the C function
       * doing the work will expect a string as pathname and unwrapping File
       * objects is way easier done in java then in C.
       * @return The file can either be a link (1), not a link(0) or not 
       * exist at all (-1) so a  boolean does not suffice as return value.
       */
      public native static int isLink(String fname);
 
      // Main method, expecting filenames as argument
      public static void main(String[] args) throws Exception {
 
        // Load the native code "link.so" in the current directory. Note that  
        // an absolute pathname is required. The library cannot be loaded from
        // within a jar file.
        System.load(new File("link.so").getAbsolutePath());
   
        // Iterate over all commandlinearguments and check if the file is a link
        for (int i=0;i<args.length;i++) {
          System.err.println("Is link ("+args[i]+"): "+isLink(args[i]));
        }
      }
    }

This class can and must be compiled in the standard way using javac. The magic starts with another tool called javah, which will create a C header file from the compiled class files. There are two important things to note regarding javah:

  1. javah operates on the classpath, not on files, invoke it on a class like a virtual machine.
  2. javah does not overwrite previously generated header files. If you change the signature of a native method in the java code, delete the according header files before calling javah again.

You should get a file called Link.h with the following content (if not, you can simply copy and paste the code below):

    /* DO NOT EDIT THIS FILE - it is machine generated */
 
    #include <jni.h>
 
    #ifndef __Link__
    #define __Link__
 
    #ifdef __cplusplus
    extern "C"
    {
    #endif
 
    JNIEXPORT jint JNICALL Java_Link_isLink (JNIEnv *env, jclass, jstring);
 
    #ifdef __cplusplus
    }
    #endif
 
    #endif /* __Link__ */

This is a so called C Header file, defining the signature of the function, that has to be implemented as C code, which looks like this:

    /* In case you are not familiar with C: #include roughly corresponds to 
     * Java's import statement. Header files in angle brackets are looked
     * for in system pathes, headers in quotes are relative to the working
     * directory. The tricky one is jni.h, which ships with the JDK and may
     * or may not be found in a system path. If thats the case, simply look for
     * it in the "include" directory of your Java installation, copy it to your
     * working directory and replace the brackets with quotes below.
     */
    #include <stdio.h>
    #include <sys/stat.h>
    #include <unistd.h>
    #include <sys/types.h>
    #include <jni.h>
    #include "Link.h"
 
    /*
     * When copy&pasting the signature from Link.h, remember that jclass and
     * jstring are both types (defined in jni.h) and the C compiler wants them
     * both named.
     * @param env reference back into the JVM
     * @param jc if isLink wasn't static, this would be a this pointer.
     * @param fname This is actually the paramater, passed to isLink().
     * @return a wrapped integer (also defined in jni.h).
     */
    JNIEXPORT jint JNICALL Java_Link_isLink (JNIEnv *env, jclass jc, jstring fname) {
      const char *c_string = (*env)->GetStringUTFChars(env, fname, NULL);
      if (c_string == NULL) {
        return -2; /* OutOfMemoryError already thrown */
      }
 
      struct stat buf;
      int ret = lstat(c_string,&buf); // see the manpage of lstat for more info
      (*env)->ReleaseStringUTFChars(env, fname, c_string);
      if (ret==-1) return -1;
      return S_ISLNK(buf.st_mode);
    }

Using gcc and assuming the snippet above being saved as Link.c, the code can now be compiled into a shared object and run for testing by calling:

gcc -shared -o link.so Link.c
ln -s Link.java A_LINK_FOR_TESTING
java -cp . Link *

Congratulations, thats it so far. The C code shown above is written using POSIX API calls only, so it should work on every POSIX supporting platform. Of course, the shared object itself is still platform dependent, so one precompiled binary must be shipped for every supported platform.