4 minute read

It’s been said before that debugging code is like “solving a mystery” and that develoers should “debug like a detective”. I recently ran into a situation where I had to put on my best Deerstalker (it’s the hat Sherlock Holmes wears) and debug why an application wasn’t running.

What follows are the steps we had performed (not all of them) when trying to figure out why a Java application wasn’t running. Keep in mind that although the tech stack itself (Java, WebSphere, and Db2) isn’t very exciting these days, the priciples used to figure out why things weren’t working are still very relevant.

Prologue

At any sufficiently large organization, admitting you have any sort of passing knowledge on a topic means that you’re now the go-to expert for all questions about that topic. I interned on the WebSphere team more than a decade ago, so it’s ineviable that I get the occaisional Slack DM asking me to look at something WebSphere related.

This time the problem was the two maintainers of an internal application had left IBM and no one was tagged to keep it up and running. The application itself was your basic three-tier applcation that used an LDAP for authentication. It was mature, in use for years, well defined in terms of scope, and was running with minimal maintenance and changes for years. But now it was offline.

To solve this mystery I was given the bare minimum of details. A hostname to a RHEL VM, the root user’s password, and a link to the source code on GHE. With plenty to fix and not a lot to go on, we assembled a team (John Walicki and David Carew) to go about getting this application up and running.

Could not start or stop the server

The first thing we tried to do was to restart the application server. After all, “Have you tried turning it off and on again?” is the best advice. But that resulted in an error asking for the application server’s credentials, which we definitely didn’t have.

./bin/stopServer oss -profileName Node01 

ADMN0022E: Access denied for the stop operation on Server oss due 
to insufficient or empty credentials.

To get around this issue we first killed the running Java processes:

$ ps -aef | grep java

And then followed this support doc to turn off WebSphere security. The gist of which is below.

NOTE Obviously this is a terrible idea in a production application. We re-enabled security soon after.

$ cd $WAS_HOME
$ ./bin/wsadmin.sh -conntype NONE
$ wsadmin> securityoff()
$ wsadmin> exit

Now the application was starting, yay! But accessing the application in our browser immediately caused stacktraces in our logs and the frontend wasn’t acually responding properly. We weren’t done yet.

was-server

Application was loading but not working

Looking at the browser’s console showed a pretty minimal log message:

Unable to load /oss/HTMLTemplates.action status: 500

The backend logs were much more useful. The application was bombing in our source code and throwing up an NPE error.

E com.opensymphony.xwork2.util.logging.commons.CommonsLogger error Exception occurred during processing request: null
    java.lang.NullPointerException
        at com.ibm.oss.action.HTMLTemplates.execute(HTMLTemplates.java:22)

Looking at the source code showed that the Java was trying to create a File object based on a value in a properties file.

public String execute() {
	// set up the method
	for (File file : new File(getText("oss.templates.location")).listFiles()) {
		// do more stuff here
	}
	// finish the method
}

Looking at the properties file it was using a relative path to a folder. We tried a few different values but none were working. We ended up using an absolute path.

# Application configuration strings.
#oss.templates.location=oss.war/jsp/templates
oss.templates.location=/opt/IBM/WebSphere/AppServer9.0/profiles/Node01/installedApps/localhostCell01/oss.ear/oss.war/jsp/templates

With that done our application was loading correctly and we could log in! But we couldn’t see panels that were administrators could typically see. Dang! We tried to find where in the code the administrators were defined by scanning the source code and luckily found a snippet that indicated that it was coming from the database, not an environment variable or in the code. So we had to look at the database… who doesn’t love Db2! (I had never used Db2 before.)

Flipping a bit in Db2

To connect to Db2 we had a little bit of luck. The password for the default superuser db2admin was the same as the root password. Yikes!

$ db2> connect to ossa user db2admin using 'mySuperSecretPassword!'

I also learned that to list all tables you can run something like the command below. Otherwise by default you’ll only see tables that the user has access to.

$ db2> list tables for all

We saw that there was a USERS table and noticed that there was a column called ADMIN… that seemed like a good spot to update! So we went ahead and updated our user IDs.

$ db2> update USERS set ADMIN=CHAR('y') where UID='123456789'

And with that, our application was finally working properly! We definitely let out a big Hurrah when nothing was bombing out.

The big takeaway here for me is that as a software developer you need to be able to understand error messages, determine what is likely causing them, and though the tech may change, the approach shouldn’t.

Updated: