Null pointers and how to avoid their negative effect
How we can describe well written code? How it looks like and can we recognize it if we stumble upon it. Do we need well written code at all, or just working? The last question seems the most easy to answer. It depends, how many people will use it and how often. More people suggest maintenance and frequent code changes. We better put ourselves in right position to handle such feature requests and make our future life a bit easier. Unfortunately we don’t know much about better positioning in terms of software development. Such knowledge requires knowing the future and addressing future issues in the beginning. We only know changes occur continuously, and we should deal with it somehow. In his paper “No Silver Bullet — Essence and Accident in Software Engineering” Fred Brooks describes two types of complexity — accidental and essential. Accidental complexity refers to problems software developers can fix (also sometimes created by us — software developers) while, essential complexity represents the complexity of the domain. We’ll ignore for now essential complexity and try to discuss only accidental. At least we have more control over it and we can avoid some common mistakes. Avoiding such mistakes will hopefully put us in better position to handle future requests with less effort.
Dealing with mistakes requires some effort — to identify them, understand their effects, and come up with solution. We already know few low-hanging fruits we could easily pick. Such example list may include null pointers, getters and setters, inheritance, ordering, to name a few. Don’t worry, I won’t attempt to ban any of them. They are quite popular and getting rid of them requires new habits, new tools and frameworks, etc. I’ll simply discuss their negative effects in next few posts and how to minimize it. I’ll start with null pointers. Even its inventor recognizes the mistake, but I won’t blame him. In similar situation I’ll most likely do exactly the same. Inventing such concept sooner or later seemed inevitable. He just introduced it first.
I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.
- Tony Hoare
Null pointer denotes special value that indicates invalid object. Object that doesn’t point to any memory and doesn’t contain any data. We use it in two main cases — indicating end of a list, stream, or file and failure to perform an action. It leads to one of the most common exceptions — Null pointer exception. If its inventor describes it as a mistake, probably we could learn something from his experience and avoid making same or similar mistakes. Some people may argue about above statement, and I understand their point. When we stumble upon an error caused by accessing invalid object, we simply add a check before access attempt, and the error vanishes until next time. Living with it seems easier than dealing with its complexities.
I also know devil hides in details. Ignoring such small details, soon we find ourselves in hell. In terms of software development hell means fragile source code, longer and more difficult maintenance, implementing new functionalities requires additional effort, passing tests do no guarantee working software anymore, etc. Invalid objects don’t stand alone as the sole reason for such situation, but they contribute to it. Often the problem comes not because we use corrupt objects, but how we use them. Some applications contribute less compared to others.
Some may attempt to ban null reference in general, but a closer look reveals how much effort it requires — we shall avoid many build-in functions, many frameworks, libraries, etc. The task become practically impossible. Instead of forbidding null reference, we can attempt to limit its spread. This means we should design our methods in such way they never return invalid objects. Turns out such limit solves the original problem as well. If no method returns null, we won’t stumble upon any null pointer exceptions in higher levels of our code, while we could still use null in limited environment as internal state in low level functionality.
To understand how to limit spread of null pointers, we need to examine several case studies for every typical application and discuss solutions with and without such objects — their pros and cons.
Typically, we use invalid pointers to represent end of a file or stream. An example implementation of reading file may look like this.
while ((line = reader.readLine()) != null) {
//process file line by line
}
Reader implementation returns null when it finishes reading file contents. Null represents a signal to the client when it should exit the loop. In this particular implementation client needs to know two things — how to read a line and how reader class represents end of a file. If reader’s internal representation change we must update every single module that uses this functionality. Replacing null with any other value won’t bring much difference, as we still need same amount of information to complete our job. Instead, we shall attempt to reduce required knowledge by hiding internal representation and expose only high level stable interface — like iterator. Our iterator will use null or other similar state internally, but this internal state won’t force other modules to deal with it.
while (iterator.hasNext()) {
processLine(iterator.next());
}
As we can see in this concrete example null represents bigger issue — unnecessary shared information between objects — information when omitted reduces coupling and improves readability in same time. We can use such applications as a signal for missing methods or even classes in our design as missing iterator in the example.
When client code invokes a function with invalid input parameters it can’t do much, but it needs to return something. In such cases the service often returns null. As in first scenario we must check the value and do something before we can continue with business logic. This time we can’t hide null inside the object, as we must return result. Such client code may look like:
result = exampleMethod();
if (result == null) {
//do something to handle invalid value
}
As we saw earlier to invoke a method, client requires prior knowledge about it — typically its name, parameters and return type. Returning null makes return type ambiguous — in some cases we have invalid value and valid in others. The client requires additional knowledge in order to process the result. More required knowledge means extra branching and extra coupling between client and service. We used to call this additional code boilerplate, but now boilerplate means additional code caused by the language itself. Because of this shifted meaning I prefer to call it implementation noise, to make it obvious this extra code comes from our designs and implementations and we shall take care of it.
Returning null also breaks single responsibility principle to some degree. Such Method returns practically usable result only in part of the scenarios. For the rest it delegates decisions to other parts of the system. Not handling all the scenarios makes the method incomplete, and spreads the responsibility of dealing with same input over multiple methods or even systems. Someone may say such approach brings flexibility, but when we need to change something we must make changes in several places instead of one. This increases the cost of maintenance. Instead of returning null we shall do something else. Depending on the context we can either throw an exception if we perform critical operation or return a default value and continue the program.
public String someAction() {
result = null;
if (someCodintion) {
result = "the result";
}
return result;
}
String result = someAction();
if (result == null) {
throw new RuntimeException();
}
In scenarios when we must fail, I personally prefer to fail as early as possible and throw an exception inside the method, instead of forcing client to verify the result after each call. Below you can find a better implementation.
public String someAction() {
if (!someCodintion) {
throw new RuntimeException();
}
return "the result";
}
If we design our systems like this, we get greater results than it appears at first glance. We free the client of completing other methods responsibilities and let it focus on its own job. By reducing information required to invoke a method we decrease coupling between client and service, which improves both. We increase client’s readability and make it shorter in the same time. Meanwhile, the service handles all possible scenarios and becomes complete.
Similar ideas apply for optional results, but instead of throwing exception, we return default value.
result = someAction();
if (result == null) {
result = defaultValue;
}
Sometimes we need to return different default values depending on the context. In such cases instead of hard-coding the value inside method, we can make our method accepts an extra parameter for default value and return it instead of null.
result = someAction(defaultValueInCaseOfFailure);
We have to choose default values carefully, such that they don’t force extra branching similar to null pointers. Popular values include empty strings, empty arrays, false, 0, 1, but it depends on the current situation. We strive to avoid unnecessary branching. The value shall pass the computations in which it participates without affecting them. Examples include adding zero to a number, multiplying a number by one, merging or iterating empty array, concatenation with empty string, etc. All this values won’t have any effect. In abstract algebra we call this the identity element and, we need exactly same behavior. Let’s compare two sample implementations. One that checks for invalid value and the other with default value. We will implement a catalog for book store with filters for author, genre, and ISBN. We store books in relational database, and our method shall return SQL string.
public String buildSQL(Arguments arguments) {
String sql = "SELECT * from books WHERE 1=1";
if (arguments.filter("author") != null) {
sql += " AND author = '" + arguments.filter("author") + "'";
}
if (arguments.filter("isbn") != null) {
sql += " AND isbn = '" + arguments.filter("isbn") + "'";
}
if (arguments.filter("genre") != null) {
sql += " AND genre = '" + arguments.filter("genre") + "'";
}
return sql;
}
And the second implementation:
/*
This code is for demonstration purposes.
It lacks parametter binding and it should not be used in production systems.
*/
public String select(Arguments arguments) {
return "SELECT * from books" + where(arguments);
}
private String where(Arguments arguments) {
if (arguments.empty()) {
//Note the default value doesn't force value check in select method
return "";
}
return " WHERE " + String.join(" AND ", arguments.filters(this::whereClause));
}
private String whereClause(Map.Entry<String, String> entry) {
return entry.getKey() + " = '" + entry.getValue() + "'";
}
The second implementation behaves the same way no matter if we list all the books, or we filter by some criteria. No branching, no value checking, no changes if requirements change, etc. We only need to make sure arguments instance contains valid strings. Many books and experts say never trust user input, and they are right, but we can and should trust our functions. Once validation pass successfully, it becomes safe to use such constructions.
Hiding invalid objects from outside world, replacing them with identity element or failing as early as possible should minimize negative effect of null pointers. In practice, we need a bit more effort. Invalid objects often come from external libraries or from the language itself. We need to take care of such cases. We can’t hack into them and change their source code, but we can wrap them and use the wrapper instead of external (for our system) instances. That may sound as overhead at first, but it pays off in long term. By wrapping external objects we gain more control, and we can stop the spreading of corrupt objects.
In conclusion, null usage refers to more deeper topics like coupling and information hiding. We can use it as as signal for refactoring or at least double think if we actually need it. We also discussed four strategies to reduce effect of invalid objects.
- Hide them inside internal logic.
- Rise error as early as possible, instead of postponing it.
- Return default value instead of null.
- Wrap external modules, so they don’t return invalid objects.