Java best practices 5 – Code to Interface, Access by name and Instance Data

7 October 2007, by: Arjan Tijms

In this 5th and final installment of my series on best practices in Java I’ll wrap up with the last 3 items from the list I compiled in the first installment. Today we’ll be looking at generalizing parameters to counter the “Coding to a class instead of to an interface” bad practice, discuss why “Selecting ResultSet columns by index, instead of by name” is not always a good idea and finally we take a look at some of the usage patterns that are involved with the “needless use of instance variables”.  

Coding to a class instead of to an interface

One of the advantages of Java is its pervasive OO model. I’m aware that on the one hand not everyone thinks OO is the best thing since sliced bread and at the other hand some languages take object orientation to an even higher level. But let’s not digress, suppose you’re a Java programmer and that your team has decided OO is in fact a Good Thing.

A classic mistake people often make when starting to program in an OO language is not actually making use of the explicit features OO offers. Of course, the paradigm entails more than just putting “public class { … }” around your existing code. For people just coming from a procedural language, it may be a little understandable if they did just that. I have however seen people that have been programming in Java for 6, 7 years, even calling themselves ‘Sr Java programmer’, and in fact still do little more than just putting “public class { … }” around their otherwise procedural code.

Not using OO manifests itself at many levels in a software design. A particular devious way, and the one about which I’ll be talking here is “Coding to a class instead of to an interface”.

One of the (well known) benefits in an OO design is that a subclass can be used wherever code expects a superclass of that type. To actually reap this benefit a programmer has to be careful when e.g. crafting the signature of her methods. The parameters being used should not just be -a- superclass, but in fact the most general one that still captures all functionality that the code -actually- uses. For flexibility concerns and because of the fact that Java only supports single inheritance, an interface (aka a ‘pure abstract superclass’ in some other languages) is often the most preferred one.

As an example, consider the follow code:

public void doStuff( ArrayList<Integer> list ) {		
	list.add(1);
	// do stuff
	list.get(0);	
}

Using the ArrayList in the method’s signature severely limits the usage of the doStuff method, as only ArrayLists and their subclasses can be passed in. In this case it’s an unnecessary restriction. Nothing in the method uses anything that is specific to an ArrayList, making it a bad practice to require it.

A better practice is to use a superclass or an interface here, but which one? ArrayList has many, including superclasses AbstractList and AbstractCollection and interfaces List, Collection and Iterable. The choice between the superclass and the interface is easy here; the superclasses are merely default implementations of parts of the (complex) interfaces, so we’ll go with the interface. Looking at the method’s body we see add() and get() is being used. This means Collection is actually too general (it doesn’t have get()) so we’ll go for List:

public void doStuff( List<Integer> list ) {		
	list.add(1);
	// do stuff
	list.get(0);	
}

We can generalize this one step further by generalizing the generic parameter:

public void doStuff( List<? super Integer> list ) {		
	list.add(1);
	// do stuff
	list.get(0);	
}

Some readers might ask why the more intuitively List<Number> can’t be used here. Indeed, we could try to define the method as taking a List<Number> or a List<? extends Number>, but the first definition would exclude the possibility for passing in an actual ArrayList<Integer>, while the second definition would disallow the add() method (as someone may otherwise be passing an ArrayList<Float> in, and would find an Integer to be between the Floats after the call to doStuff).

In a way one could informally say that ‘coding to an interface’ for generic arguments works the other way around than for regular parameters in the OO model; instead of specifying the most general type and silently accept subtypes (e.g. specify List, accept ArrayList), you specify the most specific type and ‘silently’ accept more general types (e.g. specify <? super Integer>, accept <Number>). Of course, the latter is not completely silent as the Java generic syntax forces you to explicitly state your willingness to accept super types.

Selecting ResultSet columns by index, instead of by name

In Java a ResultSet represents a table of data that is typically obtained as the result from executing an SQL query. The two primary ways to retrieve data from such a Resultset’s columns is by using either the column’s index number or the column’s name.

Although certainly not everyone would agree with it, I think retrieving data by index from a resultset in Java business code is often a bad practice; it leads to code that is less readable and less maintainable. E.g. consider the following code fragment:

formatItem( allNews.getString("title"), allNews.getString("summary") );

Accessing by index it would look like this:

formatItem( allNews.getString(1), allNews.getString(2) );

With the former statement it’s fairly clear what’s happening, while with the latter we have a lot more guessing to do, especially if we are not also the writer of this piece of code. In fact, writing code in such a way has many parallels with the bad practice of using cryptic variable names (e.g. names like aa, bz, x, etc). While writing, it may be somewhat clear which variable holds what, but when reading it later on the meaning becomes incredibly hard to grasp.

As mentioned, there’s also a maintenance problem with this code. Suppose the Resultset is generated from this simple SQL query:

SELECT
	description,
	summary,
	text
FROM
	news_items

Now, later on someone may decide to add an extra column, say “author”:

SELECT
	author, 
	description,
	summary,
	text
FROM
	news_items

Especially if the query is shared it may not be obvious that somewhere there is some code with a dependency on the column order. If this new column is thus added before “description”, the Java statement given above will silently use the wrong columns.

SQL is not Java, but compare this with a Java class definition with some simple fields:

class NewsItem {
	String description;
	String summary;
	String text;	
}

When adding an additional field, nobody in his right mind would worry about code that might access a field by the order in which it was defined (although this is technically possible using reflection) and thus the new field would be added at a random location in the class definition.

Proponents sometimes argue that accessing Resultsets by index has several advantages; namely performance, the ability to select an unnamed column and the ability to distinguish between columns with equal names.

To start countering these arguments; for general business code the difference in performance between selecting by index and name is most probably unmeasurable. It’s the same kind of premature optimization that we encountered when looking at native arrays vs a Collection based alternative; the typical business code that retrieves a handful of values from a Resultset is not going to notice any difference here.

The ability to select an unnamed column or distinguish among equally named columns is more often than not a way around a bad practice in SQL. Instead of employing yet another bad practice to cover up an earlier one, it’s better to just fix this earlier one. E.g. by using the SQL clause “AS” in the original query.

Does this mean all methods for accessing results by column index should be deprecated? Of course not. For code that simply needs to iterate over all columns in a resultset, index based getters are most suitable. Other specialized code may have some specific use for them too.

However, in pure business oriented code these cases may be rare. In the code example above (a very common case in business software) accessing by index is surely a bad practice.

Needless use of instance variables

When discussing the ‘coding to a class instead of to an interface’ bad practice, we mentioned that this was one symptom of not using OO correctly. Another such symptom is the “needless use of instance variables”, which we too often see in code being written by people with a procedural background.

If OO means little more to you than putting “public class { … }” around your code, then instance variables in Java are the equivalent to (file scoped) global variables in procedural code. Many people agree that using global variables, even when limited to just the file scope, is not very often a good idea.

During auditing sessions I encountered classes consisting of 2000~3000 lines of code (which is actually a bad practice by itself), with in excess of 100 instance variables. In such a situation it becomes totally unclear which variable is effected by what method; a mess that looks and feels quite similar to the ‘global variable hell’ in some other languages.

In unraveling this mess I often encounter a number of patterns, some good, some bad.

A good practice for using instance variables is when those variables actually represent state that needs to be stored inside the mentioned object. It just may be the case that if there are many of these variables, grouping them into separate classes is recommended. E.g. consider a subscription class with 15 instance variables for personal user data and some 15 other for company data. It may be a better idea to group the first 15 variables to a separate class called User and the second 15 to a class called Company. Other than that, there’s nothing wrong with this usage of instance variables.

A truly bad practice for using instance variables is when these variables have no meaningful value outside of any method. In other words, they are actually just local variables that for some reason were declared at class scope. It’s a mistake sometimes made by beginners who seem to think that a declaration of a variable is somehow equal to actually instantiating an object and thus ‘optimize’ their code by declaring the variable as instance data. E.g.

class Foo {

	int temp1, temp2;
	
	public int multiplyAdd( int nr1, int nr2, int nr3, int nr4 ) {	
		temp1 = nr1 * nr2;
		temp2 = nr3 * nr4;
		
		return temp1 + temp2;		
	}
	
}

In the above example, temp1 and temp2 should of course be local to the multiplyAdd method.

Another bad practice for using instance variables is ‘inappropriate caching’. Of course it highly depends on the specifics of some piece of code what is actually inappropriate. It may be appropriate to instantiate some helper class that all methods are using as instance data, whenever that helper class is expensive to create. It’s however often not appropriate to use such caching when we’re dealing with stuff that is already cached at some other level or that represents some resource which needs to be closed after usage and is actually expensive to keep open. A Database connection in a Java EE environment is often a prime example of such inappropriate caching using instance data. E.g. :

class Bar {

	Connection connection = null;
	
	public void init() {
		connection = SomeFactory.getConnection();
	}

	public int doFooStuff() {
		Statement stmt = connection.createStatement();
		// execute some query and do foo stuff
	}
	
	public int doBarStuff() {
		Statement stmt = connection.createStatement();
		// execute some query and do bar stuff
	}
	
	public void close() {
		connection.close();
	}
}

In a typical Java EE AS, DB connections are already cached at the AS level (through a connectionpool). Caching a connection in instance data introduces far more problems than it solves. A typical usage pattern of a class such as Bar in the example above is to instantiate it, keep it around for some time, and call any of its doStuff methods when necessary. The problem with that is that it keeps the connection open for a much longer time than strictly required. Many application servers impose a limit on the number of simultaneously opened connections, so it’s often much better to close the connection as soon as we’re done with it. An additional problem with this pattern is the fact that the user of the Bar class needs to manually close the connection. This is not only an extra burden but also leads to the possibility of the user forgetting it and thus causing a connection leak.

Using instance data for ‘pseudo state’ may be another bad practice. With ‘pseudo state’ I refer to data that enters an object through a method call, is being used by several other methods that execute during this original call and which loses its usefulness for the object after this original call returns. Due to the fact that multiple methods make use of such data, it may be tempting for a programmer to simply store this data in an instance variable and let all methods refer to that. Sometimes this may indeed result in code that is more readable, but when used carelessly we’ll find ourselves again in the situation depicted by the “global variable hell”. Consider the following code:

class Kaz {
	
	public int data;
	
	public int doComputation(int data) {	
		this.data = data;
		increase();
		multiply(4);
		
		return this.data
	}
	
	public void increase() {
		data++;
	}
	
	public void multiply(int factor) {
		data*=factor;
	}	
}

In this example, an integer value is passed into doComputation() which assigns it to an instance variable upon which increase() and multiply() operate. After the call to doComputation(), the integer stored in data has no meaningful value for Bar anymore. It actually was a ‘shared local’ variable, where local doesn’t refer to a single method, but a small group of methods. Some people argue that this usage is exactly what instance data is for and that the alternative (passing the date value along to the different methods making use of it) actually degrades the coding style to procedural. Yet other people argue that instance data should only be used for data that has a meaningful value for the object as a whole.

In the example given above we have the additional problem that invoking the increase() or multiply() methods by themselves requires knowledge of the internals of these methods. If the method is long and complicated, it may not be obvious at all what instance data functions as ‘input’ for the method. E.g. to invoke increase() by itself we would need to do something like this:

Kaz kaz = new Kaz();
kaz.data = 4;
kaz.increase();
int result = kaz.data;

Indeed, a usage pattern that has a strong resemblance with invoking functions that depend on global variables in procedural languages. Naturally, most people would instantly say that the above is madness. Nevertheless, more often than not these patterns appear in code as the result of the ‘pseudo data’ practice. A programmer initially starts out with a class like Kaz, making only doComputation() public. After a while however there emerges a use to call increase() directly and instead of refactoring the code, the insane usage pattern of needing to assign directly to kaz.data before invoking increase() appears in the code.

Well, with this we arrived at the end of this series on best practices in Java. I hope it has been useful to people. Undoubtedly I’ll write a follow up sometime in the future. If there’s one certainty in my job, then it’s the fact that people always find ways to code using exciting new bad practices ;) In the meantime, please take a look at some of the other blogs my team-mates and I have written.

 

Arjan Tijms

 

10 comments to “Java best practices 5 – Code to Interface, Access by name and Instance Data”

  1. LiveStone says:

    Thanks, that was useful for me.

  2. aaawww says:

    I don’t agree on the result set best practice: the access by index is indeed a lot faster. the best practice to be used for bulk data is to retrieve the result set metadata, ask for the index of the interesting columns by column name and then use the index to get data

  3. arjan says:

    > the access by index is indeed a lot faster.

     

    A lot depends on your context. Please also read what I have to say about such micro optimizations in http://jdevelopment.nl/java/java-best-practices-4-native-arrays-and-not-using-java-5. This case is actually the same kind of tradeoff as using native arrays vs arraylists. The former is theoretically faster, but most businesscode is not going to be influenced by that. The bottomline is; if you have actually used a profiler to ‘prove’ that accesses by name are indeed a bottlenneck in your application, by all means, access by index. But please don’t sacrifice readability in your code for an ‘premature optimization’ which may have zero impact.

     

    >the best practice to be used for bulk data is to retrieve the result set metadata, ask for the index of the interesting columns by column name and then use the index to get data

     

    This is an intermediate solution. It’s atleast a lot better than only using hardcoded index references in your code ;) On the other hand, this does add some slight complexity to your code and degrades the readability slightly. The performance gain is also very minimal. Most (nowadays all?) JDBC drivers use a fast HashMap for looking up the columnname. So, per getter you would safe one hash lookup. On modern computers and for typical business code this is absolutely nothing. Even if you were doing thousands of such lookups, the effect would be barebly noticable, if at all.

     

    The tradeoff here is between a minimal performance gain and better readability of code. Would you like to have a 0.00001 second shorter execution time, but in case of a bug somewhere have your technicians spend 2 minutes more on reading the code, and have a 30 percent higher chance of misinterpreting the code when under presure? Of course everyone should make that decision for himself, although I think most businesses are better off to go for readability (and maintainability first and then optimize proven bottlenecks when they emerge.

  4. aaawww says:

    well, what about:int COLUMN_TITLE = getIDForColumn("title");and then the iteration over the result set?this doesn’t inpact performances, nor readabilitywell, assuming that the result set is used for a bulk read or update; else the the performance impact is negligible (as the overhead for doing it), so it’s just a kind of habit 

  5. arjan says:

    Yes I agree with you. If in COLUMN_TITLE, "TITLE" is actually the name of the column it hardly effects readability. Like I said before, refering to the column by name once and then dynamically using the index is certainly miles better than using static index numbers in your code.
     

    You do add at least one extra statement to your code for each column you access this way. Of course this doesn’t add that much ‘reading overhead’, but then again, even for bulk reads the performance gain typically isn’t that much either, if any. Just try it ;)

  6. Java best practices | J-Development says:

    [...] Coding to a class instead of to an interface [...]

  7. Java best practices 4 – Native Arrays and Not Using Java 5. | J-Development says:

    [...] that’s it for today again. Stay tuned for the next installment where we’ll be talking about “Coding to a class instead of to an [...]

  8. rggffgg says:

    any thigs

  9. hamadwazir says:

    you should have to disply complet coding

  10. unable to understand wildcard generics in this example in Java? | CopyQuery says:

    [...] reading through this article I got stuck here. I am pasting this from the link. I do not understand the reasoning given for why [...]

Type your comment below:


× 9 = eighteen

css.php best counter