In my previous blog on this subject, I had talked about the value of log files in application diagnosis and performance issue resolution and how Splunk makes it easy to chronologically list the application log statements, search and analyze the data. Splunk is more useful when the log statements contain data that resemble name value format such as <name>=<value>. It can then index, search, sort and tabulate such data very easily.
Recently Pexus released Pexus PerfLog software package. Pexus PerfLog is an open source performance logging and diagnostic package for J2EE Applications. PerfLog can capture and persist key performance metrics from servlets, portlets, web services and SQL queries, along with request contextual data to log files and also to a database.
The performance logs generated are Splunk friendly and enables analyzing performance issues with J2EE applications.
All modern J2EE application are distributed in nature and typically involves web servers, application servers, database servers etc. . Diagnosing application problems in a distributed environment is challenging. Tracing problems as application requests and transactions traverses across the distributed components whether it is the web server, application server, database components is always time consuming and requires quite a bit of application knowledge. A number of monitoring tool software have promised the holy grail of transaction tracking and diagnosis but have largely failed. A lot of monitoring tools end up with large dashboards with red, yellow and green markers that just indicates if the system is running fine or has issues but not enough information to diagnose them quickly. Enterprises spend millions in licensing costs to monitor these distributed components only to spend more money on additional resources to diagnose and fix them. Most of monitoring tools end with operational teams that lack the deep knowledge to diagnose application problems quickly.
My years of experience diagnosing problems have helped me realize that to diagnose problems quickly one has to resort to application logs, stack traces, debug traces and some deep knowledge of application. Deep application monitoring tools such as Wily, ITCAM for Application Diagnostics etc. do offer deep instrumentation but have very high performance overheads when they are turned on always. Despite their deep tracing capabilities they still fall short of correlating the individual requests to provide meaning full insight to resolve problems quickly.
All application components have some kind of logs but obtaining the logs for application diagnosis from disparate systems and correlating log entries is a time consuming process.
Splunk log monitoring tool helps you search log files spread across disparate systems and organizes the results chronologically, by hosts, by log types etc., but by itself is not sufficient to help you correlate the log entries. Splunk tool offers a slightly different approach to application problem monitoring by helping scan application logs for error symptoms. However Splunk does require some prior knowledge of application exception handling and common error strings to look for. While Splunk is extremely powerful in scanning logs and displaying the log searches chronologically that can span the various distributed components, it still falls short of correlating them unless the application provides some unique identifier to tie the logs together. Building a logging framework that can be shared by the distributed application will help application diagnosis immensely.
All modern application frameworks provides some kind of mechanism to add filters to application requests or you can leverage aspect programming patterns to add contextual information as requests pass through the different layers of your application. Caching application traces for requests and printing them when requests exceeds response time threshold can greatly aid in application diagnostics.
I will be talking about such an implementation in Part 2 of this series, that leverages the Splunk deployment in our application infrastructure to enhance the performance and application diagnosis process.
If you use WebSphere Portal and JSF (Java Server Faces) and having issues with memory, garbage collection and slow performing Portal then you may want to review your JSF settings in your web.xml. JSF is known to be a memory hog. By default it maintains 15 view states on the server side. These states are stored in user session and may last considerable amount of time on applications that require users to be logged in for the complete work day schedule. We found out on a typical 2GB heap size JVM, about 800 MB of memory is held by these JSF view states. You can configure the number of “view states to save” in your web.xml file of your portal application.
We set the number of view states to 2 (from the default 15) and were surprised to find the reduction in session memory from 800 MB to to about 200 MB over 6-8 hour period. That was considerable savings on memory.
You can use the IBM Support Assistant – a free tool from IBM that provides a number of JVM problem analysis tool including a memory analyzer that enables you to look at heap dumps, thread dumps, logs etc. You can download this tool from the following location – IBM Support Assistant
You can use the memory analyzer tool to inspect heap dumps before and after changing this configuration and compare the memory usage by the key erring class that keeps the memory in session – com.sun.faces.util.LRUMap. Attached are some screen shots on our findings.
Data using the default view states of 15:
Heap Distribution and com.sun.faces.util.LRUMap instance data
– Notice about 800 MB chunk of heap data that is attributed to com.sun.faces.util.LRUMap classes instances
Data using the view states set to 2:
Heap Distribution and com.sun.faces.util.LRUMap instance data
– Notice that the amount of memory held has now come down to about 200 MB
WebSphere Process Server (WPS) consists of 3 key functional components – User Applications, Messaging Infrastructure and CEI (Common Event Infrastructure) support Infrastructure. WebSphere Process Server can be deployed in various topologies depending on how heavily each of these infrastructure components are used. Popularly called as “bronze”, “silver” and “gold” topologies address these different usage requirements.
“Bronze” or Single Cluster topology is used when WPS is deployed with all the 3 infrastructure in a single cluster
“Silver” or Remote Messaging topology is used where user application and CEI share a single cluster and messaging infrastructure is deployed in it’s own cluster. This is the best option when CEI infrastructure is not used heavily
“Gold” or Remote Messaging and Remote Support topology is used when each of the 3 functional infrastructural pieces of WPS has it’s own cluster.