Splunk and J2EE application logging – Part 1

All  modern J2EE application are distributed in nature and typically involves web servers, application servers, database servers etc. . Diagnosing application problems in a distributed environment is challenging. Tracing problems as application requests and transactions  traverses across the distributed components whether it is the web server, application server, database components is always time consuming and requires quite a bit of application knowledge. A number of monitoring tool software have promised the holy grail of transaction tracking and diagnosis but have largely failed. A lot of monitoring tools end up with large dashboards with red, yellow and green markers that just indicates if the system is running fine or has issues but not enough information to diagnose them quickly. Enterprises spend millions in licensing costs to monitor these distributed components only to spend more money on additional resources to diagnose and fix them. Most of monitoring tools end with operational teams that lack the deep knowledge to diagnose application problems quickly.

My years of experience diagnosing problems have helped me realize that to diagnose problems quickly one has to resort to application logs, stack traces, debug traces and some deep knowledge of application.  Deep application monitoring tools such as Wily, ITCAM for  Application Diagnostics etc. do offer deep instrumentation but have very high performance overheads when they are turned on always. Despite their deep tracing capabilities they still fall short of correlating the individual requests to provide meaning full insight to resolve problems quickly.

All application components have some kind of logs but obtaining the logs for application diagnosis from disparate systems and correlating log entries is a time consuming process.

Splunk log monitoring tool helps you search log files spread across disparate systems and organizes the results chronologically, by hosts, by log types etc., but by itself is not sufficient to help you correlate the log entries.  Splunk tool offers a slightly different approach to application problem monitoring by helping scan application logs for error symptoms. However Splunk does require some prior knowledge of application exception handling  and common error strings to look for. While Splunk is extremely powerful in scanning logs and displaying the log searches chronologically that can span the various distributed components, it still falls short of correlating them unless the application provides some unique identifier to tie the logs together.  Building a logging framework that can be shared by the distributed application will help application diagnosis immensely.

All modern application frameworks provides some kind of mechanism to add filters to application requests or you can leverage aspect programming patterns to add contextual information as requests pass through the different layers of your application. Caching application traces for requests and printing them when requests exceeds response time threshold can greatly aid in application diagnostics.

I will be talking about such an implementation in Part 2 of this series, that leverages the Splunk deployment in our application infrastructure to enhance the performance and application diagnosis process.

 

WebSphere Process Server Topologies

WebSphere Process Server  (WPS) consists of 3 key functional components – User Applications, Messaging Infrastructure and CEI (Common Event Infrastructure) support Infrastructure. WebSphere Process Server can be deployed in various topologies depending on how heavily each of these infrastructure components are used. Popularly called as “bronze”, “silver” and “gold” topologies address these different usage requirements.

“Bronze” or Single Cluster topology is used when WPS is deployed with all the 3 infrastructure in a single cluster

“Silver” or Remote Messaging topology is used where user application and CEI share a single cluster and messaging infrastructure is deployed in it’s own cluster. This is the best option when CEI infrastructure is not used heavily

“Gold” or Remote Messaging and Remote Support topology is used when each of the 3 functional infrastructural pieces of WPS has it’s own cluster.

For a very informative article refer to the following developer Works article: http://www.ibm.com/developerworks/websphere/library/techarticles/0803_chilanti/0803_chilanti.html

WPS InfoCenter document also provides additional information on these depoloyement topologies: http://publib.boulder.ibm.com/infocenter/dmndhelp/v6r2mx/index.jsp?topic=/com.ibm.websphere.wps.620.doc/doc/ctut_buildingclusteredtopologies.html