Archive for September, 2010

JBoss AS 6 release date published

11 September 2010

Although JBoss AS 6 CR1 and GA have been on JBoss’ JIRA roadmap for some time, no due date was set. Until recently…

These are the dates mentioned:

JBoss AS 6 CR1 15 Nov. 2010 jira
JBoss AS 6 GA/Final 17 Dec. 2010 jira

From CR1 on, JBoss AS 6 would be an officially certified Java EE 6 implementation. If JBoss were to succeed in sticking to these release dates, it would be almost exactly 1 year after the Java EE 6 spec has been released that JBoss is able to offer an implementation.

For the Java landscape, this release is fairly important. The entire concept of having a spec depends on there being multiple implementations, but for the better part of this year only Glassfish has had a Java EE 6 implementation.

Besides that, JBoss AS 6 will also simply mean that more developers have access to the wealth of improvements that Java EE 6 brings to the table. After all, JBoss AS is one of the most used Java EE Application Servers. Although technically possible, most people won’t swap application servers just to be able to use the new spec and will more likely wait for ‘their’ AS to be updated.

Arjan Tijms

Efficient way to determine if a String is a Number

9 September 2010

Introduction

After cloning the M4N projects from Mercurial, one of the first classes I checked was the M4N Common Utils class. I saw the following method:

public static boolean isNumber(String string) {
    try {
        Long.parseLong(string);
    } catch (Exception e) {
        return false;
    }
    return true;
}

It’s the most easy way to determine if the String represents a valid number. But it’s not the most cheap/efficient way for the case that the String doesn’t represent a valid number at all, because an Exception has to be created. Creating an Exception is relatively expensive, among others the whole stacktrace has to be collected/built during the creation of an Exception. Even when it doesn’t throw an exception, it does another unnecessary job: creating the long value based on the digits found.

Alternatives

We could copy the source of the Long#parseLong() and modify it so that it returns false rather than throwing an Exception and that it doesn’t create a long value. It makes use of Character#digit() to obtain the String’s character as a digit. We can replace this by Character#isDigit().

public static boolean isNumber(String string) {
    if (string == null || string.isEmpty()) {
        return false;
    }
    int i = 0;
    if (string.charAt(0) == '-') {
        if (string.length() > 1) {
            i++;
        } else {
            return false;
        }
    }
    for (; i < string.length(); i++) {
        if (!Character.isDigit(string.charAt(i))) {
            return false;
        }
    }
    return true;
}

Since we're looking for a certain pattern in a String, we could also grab regex for this. True, regular expressions are not the holy grail, but it may happen that its speed is very affordable. We want to allow an optional minus sign in the front -? and for the remnant only digits d+, so the regex pattern end up look like this:

private static final Pattern numberPattern = Pattern.compile("-?\d+");    

public static boolean isNumber(String string) {
    return string != null && numberPattern.matcher(string).matches();
}

Because compiling the pattern is also an expensive task at its own, we want to do it only once and declare it as a static final field.

(Micro) Benchmarking

Now we want to benchmark those three different approaches. This can basically be done by obtaining the System#nanoTime() as start time, then executing the piece of code -preferably in a fixed amount of iterations- and then obtaining the System#nanoTime() once again as end time and finally calculate the difference between the two times. However, there are some gotchas in this approach. You would be more benchmarking the JVM/Hotspot/JIT which is been used, not the code. The JIT for example may bring in some optimizations which may after all result in misleading benchmark results. Here are two articles which tells a bit more about the gotchas:

The most important considerations are that we'd like to put the code we want to benchmark in its own method which has a return value (which we in turn shouldn't ignore!) and that we also want to execute the particular method a bunch of times beforehand to trigger the JIT optimizations (to "warmup" the JVM).

public static void main(String... args) {
    // Prepare.
    String[] strings = { 
        null, "foo", "123", "+123", "-123", "0", "--123", "12345678901234567890"
    };
    int iterations = 1000000;
    boolean result = false;

    // Let for each of the strings show the isNumber() results.
    for (String string : strings) {
        System.out.printf("String: %s isNumberWithParseLong: %s WithIsDigit:"
            + " %s WithRegex: %s%n", string, isNumberWithParseLong(string),
                isNumberWithIsDigit(string), isNumberWithRegex(string));
    }

    // JVM warmup.
    System.out.print("Warming up JVM .. ");
    for (int i = 0; i < iterations / 10; i++) {
        for (String string : strings) {
            result ^= isNumberWithParseLong(string);
            result ^= isNumberWithIsDigit(string);
            result ^= isNumberWithRegex(string);
        }
    }
    System.out.println("Finished! Now the benchmarks ..");

    // Benchmark isNumber() with Long#parseLong().
    long st1 = System.nanoTime();
    for (int i = 0; i < iterations; i++) {
        for (String string : strings) {
            result ^= isNumberWithParseLong(string);
        }
    }
    long et1 = System.nanoTime();
    System.out.printf("isNumberWithParseLong: %d ms%n", (et1 - st1) / 1000000);

    // Benchmark isNumber() with Character#isDigit().
    long st2 = System.nanoTime();
    for (int i = 0; i < iterations; i++) {
        for (String string : strings) {
            result ^= isNumberWithIsDigit(string);
        }
    }
    long et2 = System.nanoTime();
    System.out.printf("isNumberWithIsDigit: %d ms%n", (et2 - st2) / 1000000);

    // Benchmark isNumber() with regex.
    long st3 = System.nanoTime();
    for (int i = 0; i < iterations; i++) {
        for (String string : strings) {
            result ^= isNumberWithRegex(string);
        }
    }
    long et3 = System.nanoTime();
    System.out.printf("isNumberWithRegex: %d ms%n", (et3 - st3) / 1000000);
    
    // Print the result. This way we let the JIT know that we're interested in the
    // result so that it doesn't optimize the one or other away, for the case that.
    System.out.println(result);
}

At the current machine, a Dell Latitude E5500 with Core2Duo P8400, the results are like this:

String: null isNumberWithParseLong: false WithIsDigit: false WithRegex: false
String: foo isNumberWithParseLong: false WithIsDigit: false WithRegex: false
String: 123 isNumberWithParseLong: true WithIsDigit: true WithRegex: true
String: +123 isNumberWithParseLong: false WithIsDigit: false WithRegex: false
String: -123 isNumberWithParseLong: true WithIsDigit: true WithRegex: true
String: 0 isNumberWithParseLong: true WithIsDigit: true WithRegex: true
String: --123 isNumberWithParseLong: false WithIsDigit: false WithRegex: false
String: 12345678901234567890 isNumberWithParseLong: false WithIsDigit: true WithRegex: true
Warming up JVM .. Finished! Now the benchmarks ..
isNumberWithParseLong: 9392 ms
isNumberWithIsDigit: 369 ms
isNumberWithRegex: 2763 ms
false

You see, using Character#isDigit() is in this particular benchmark up to 25 times faster than Long#parseLong(). True, this benchmark also covers the corner cases. In a lot of cases we expect valid numbers. If you remove the invalid numbers from the String[], you'll see that the difference isn't 25 times anymore, but only about 2.5 times.

Long Overflow

Maybe you've also noticed that there's a 12345678901234567890 string which is invalid according to Long#parseLong() (because it overflows), but is valid according to others. In practice, numbers won't be long like that, but if this has to be taken into consideration in the new isNumber() method as well to ensure its robustness, then it's worth the effort to call Long#parseLong() anyway when the string's length is equal to or greater than the number of digits in Long.MAX_VALUE. We'll change our winning isNumber() method like that:

private static final int NUMBER_MAX_LENGTH = String.valueOf(Long.MAX_VALUE).length();

public static boolean isNumber(String string) {
    if (string == null || string.isEmpty()) {
        return false;
    }
    if (string.length() >= NUMBER_MAX_LENGTH) {
        try {
            Long.parseLong(string);
        } catch (Exception e) {
            return false;
        }
    } else {
        int i = 0;
        if (string.charAt(0) == '-') {
            if (string.length() > 1) {
                i++;
            } else {
                return false;
            }
        }
        for (; i < string.length(); i++) {
            if (!Character.isDigit(string.charAt(i))) {
                return false;
            }
        }
    }
    return true;
}

It became a piece of code, but it's at least faster in the majority of use cases. It's however not very beneficial in a webapplication with 200ms response time and only one or two isNumber() calls.

Bauke Scholtz

Where to put named queries in JPA?

7 September 2010

JPA provides multiple ways to obtain entities. There is a very simple programmatic API that allows us to get an entity by ID, there’s a more elaborate one called the Criteria API and then there’s a query language called JPQL (Java Persistence Query Language).

JPQL is an object oriented query language that is based on the simplicity of SQL, but works directly on objects and their properties. The problem with such a language that’s used inside another language (Java) is where to store the query definition. Traditionally there have been 2 solutions:

  1. Store the definitions in annotations on some entity.
  2. Construct strings holding the definitions inline in your Java code.

The first solution is called a Named Query in JPA, it looks like this:


@NamedQueries(value={
   @NamedQuery(
      name = "Website.getWebsiteByUserId",
      query="select website from Website website where website.userId = :userId")
   @NamedQuery(...)
})
@Entity
public class Website { ... }

The advantages of this method are twofold: JPA checks your query is valid at startup time (no runtime surprises) and the query definition is parsed only once and re-used afterwards. As an extra bonus, it also strongly encourages to used named parameters. The disadvantage however is that its location is just plain awkward. The entity is typically not the location where we wish to store this kind of logic. It gets even more awkward when the query is about multiple entities, yet you have to choose a single one to store the query definition on. Instead, a DAO, Service or whatever code is used to interact with the entity manager is a much more logical place to store a query definition. Unfortunately, @NamedQuery only works on entities. Neither Enterprise beans nor any other kind of managed bean in Java EE supports them.

This thus brings us to the second solution, which looks like this:


public foo() {
    // some code
    Website website = entityManager.createQuery(
      "select website from Website website where" website.userId = :userId", Website.class)
      .setParameter("userId", userId)
      getSingleResult();

This is arguably a much better location, though still not ideal. If the query is long, we have to concatenate strings which makes the query hard to read and hard to maintain. It has the major disadvantage that the query is only checked at runtime and has to be re-parsed over and over again. There are some limited opportunities for reusing a Query object obtained by createQuery(), but since this object is only valid as long as the persistence context in which it was created is still active, those opportunities are really rather limited. Additionally, this style of query definition can make it tempting for developers to build their queries dynamically, giving rise to some nasty potential injection holes.

So having the choice between those two, which one do we choose? Actually, it appears there is a third solution, which is for some reason quite often overlooked by many people:

  1. Store the query text in XML (mapping) files.

In addition to annotations, JPA (and pretty much every API in Java EE that uses annotations) allows you to define the same thing or occasionally a little more in XML. Of course we don’t want one huge XML file with all our queries, but as it turns out JPA simply allows us to use as many files as we need and organize them in whatever way we want. We could for example put all queries related to some entity in one file anyway, or put all financial queries in one file and all core queries in another file, etc. The mechanism is actually quite similar to using multiple faces-config.xml files in JSF. The XML based solution looks as follows.

persistence.xml:


<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd"
	version="1.0">

	<persistence-unit name="somePU">
		<jta-data-source>java:/someDS</jta-data-source>

		<!-- Named JPQL queries per entity, but any other organization is possible  -->
		<mapping-file>META-INF/jpql/Website.xml</mapping-file>
		<mapping-file>META-INF/jpql/User.xml</mapping-file>
        </persistence-unit>
</persistence>

Website.xml:


<?xml version="1.0" encoding="UTF-8"?>
<entity-mappings version="1.0" xmlns="http://java.sun.com/xml/ns/persistence/orm" 
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://java.sun.com/xml/ns/persistence/orm http://java.sun.com/xml/ns/persistence/orm_1_0.xsd "
>
	<named-query name="Website.getByUserId">
		<query>
			SELECT 
				website
			FROM 
				Website website
			WHERE
				website.userId = :userId
		</query>
	</named-query>

	<named-query name="Website.foobar">
		<query>
			...
		</query>
	</named-query>

</named-query>
</entity-mappings>

Finally, using such query definitions in code is exactly the same as if they would have been defined using annotations, i.e. by calling entityManager.createNamedQuery().

Each XML file can contain as few or as many queries as you like. It might make sense to put a really complicated and huge query in one file, but to group several smaller related queries in another file. Do note that query names are part of a global namespace and are not automatically put in any namespace based on the file they are defined in. In the example above queries are pre-fixed with “Website.”, which happens to be the name of the entity but you can choose anything you want here. In the example META-INF/jpql was used as the directory to store queries, but any location on the class path including storing queries in jars will do.

As mentioned, for some reason this XML method seem to be often overlooked by many people. I’ve personally met multiple persons who build themselves a management system for storing JPQL queries in files, loading them, substituting parameters, etc while such a mechanism is in fact readily available in JPA (and has been since JPA 1.0!). Of course, the home grown systems don’t have the startup-time validation of queries nor do they do any pre-parsing and pre-compilation of queries.

Arjan

css.php best counter