Effective Response Time Monitoring
Response Time Monitoring
Service response time is one of most important factor in determining website visitor’s satisfaction level. While it is important to prevent system downtime, preventing system crashes alone isn’t enough for today’s web system management and administration. Website visitors must be able to surf through the site with ease and visitor waiting time and site loading time must be minimized. Website with slow response time tends to frustrate website visitors and it is one of leading cause of deterioration in customer satisfaction level, and ultimately, loss of the customers. Thus, service response time must be considered in customer satisfaction and website administrator much continually work to improve service response time, prioritizing services that are most frequently accessed.
The traditional graph used for monitoring respone time has been the line graph of avaerage reponsetime over time.

However, the traditional line graph cannot convey any more information than the response time itself. Line graph cannot show if there is a specific performance problem, or if there is overall increase in response time (ex: due to network issue) or if only certain services are receiving overwhelming requests, etc… furthermore, line graph is ineffective in discovering which service is experiencing problem, or the root cause of the crawling response time.
Monitoring Response Time with Scatter Graph…
Thus, response time must be monitored in a single graph while individual service transactions plotted serparately. In order to express individual service transactions in a single graph, line graph is not appropriate but scatter graph must be used.

We call this type of graph Response Time Scatter Graph, aka X-View. The entire history of service transaction can be monitored in a single graph and if any one or group of service transaction is selected, detailed information concerning selected transaction (Method Call Path, SQL, Socket, File etc…), is displayed in a separate window.

While, the concept of using scatter graph to display service transaction, and drag-and-drop feature to see the individual transaction detail is revolutionary, this is not the only advantage that the X-View offers to the users. While monitoring, if there are any performance problem in the WAS or related systems, the problem typically is expressed as a specific pattern in the X-View. The pattern found in the scatter graph provides important information about the monitored WAS
system to the user.
Please see below for the examples of scatter graph pattern typically encountered and their descriptions.
♣패턴은 세 분야의 규정을 정의하는 것인데, 이것은 특정한 상황(context)와 문제점 그리고 이에 대한 해결 방법 사이의 관계를 표현하는 것이다. – Christopher Alexander
♣패턴은 특정상황(context)에서 발생한 문제에 대한 검증된 해결 방법이다
- Gang-of-Four
♣패턴이란 특정 상황(context)에 대해 유용한 대처 방안이 다른 곳에서도 유용하게 적용 가능한 경우를 말한다
- Martin Fowler
Vertical Streaming Pattern
As if slowly pouring water from a kettle, the service transactions line up vertically.

This phenomenon describes situation where transactions called by different service requests experience delayed response time due to shortage of the same system resource. For example, when many different transactions are effected by single database lock, a vertical streaming pattern is formed.
Finding the common factor within this pattern is relatively easy. If the plotted dots in this pattern are from multiple different WAS instances, it is usually due to problem with a system resource outside of the WAS, and if the plotted dots are from a single WAS instance, then it is due to system resource within the WAS. Also, if the plotted dots are from the same service request, the shared resource within the application (usually DB) is the problem.
During web system tuning, administrator should give priority to transactions in vertical streaming pattern than any one plotted dot with extremely high response time. A single dot with delayed response time may have its own specific issues, but a group of lined up transactions may indicate significant problem within the entire system.
Layered Cake Pattern
As throughput increases, horizontal lines may form in X-View. If there is no relationship between increase in throughput and increase in number of horizontal lines, then this pattern is not applicable. Multiple applications with frequent service request may also form a horizontal line, but in this case, even if the throughput increases, the number of horizontal line will not increase along with it.

Layered Cake pattern is a phenomenon that can be seen when many transactions are forced to wait for same amount of time until a shared resource becomes available. As throughput increases, the probability of waiting becomes higher and the response time becomes longer; the response time becomes longer uniformly across all transaction that uses shared resource thus the horizontal lines form in X-View. In this case, just investigating whether the transactions in the line come from the same WAS instance, can provide important clue in finding the root cause of the problem. In addition, looking for resources with parameter that indicate wait time (the time difference between one layer and the next) is also necessary.
Matrix Pattern
Named after the opening scene of the movie Matrix, this phenomenon describes pattern where many short vertical (and sometimes horizontal) lines are spread all over the screen.

It can be seen when a resource bottleneck causes problem only for some of the application (통상 락), and when it happens evenly spread out over time.
An example is while DirtyRead is not allowed in the DB, there exists lock that is higher level than READ_COMMITTED at least. Matrix pattern does not happen due to just a single resource issue, it is not associated with typical H/W resource problem; it’s been observed in environment with Sybase or MS-SQL database. Once Matrix pattern is observed, the problem cannot be resolved without adjusting the LOCK LEVEL of the entire database. Fundamental adjustment of LOCK LEVEL and consequential adjustment of application is inevitable.
Waterfall Pattern
The plotted dots resemble series of water falls. Waterfall pattern can be seen when a specific resource experience shortage or lock suddenly then becomes normal again, and then the pattern repeats multiple times.

There seems to be no indication that the throughput or response time suddenly increase at the time when Waterfall pattern appears. But the requests that call on the problem resource must have been increased. In this case, the applicable resource usage is not proportionate to the overall response time until the resource is depleted; the Waterfall pattern occurs when at the point when the resource is depleted. Another characteristic is that while amidst of the pattern, short transactions less than 3 seconds can be seen. This means that the problem resource responsible for the Waterfall pattern is not one that is used by the entire application.
Waterfall patterns, as seen in second screenshot, share fixed height as one of its characteristic. Waterfall pattern occurs when resources that are used for a short time is depleted suddenly (Full, Overflow), and the services are suspended temporarily until the necessary resource is refreshed and suspension is released.
Conclusion
Response Time Scatter Graph is direct and powerful. User will feel that using Response Time Scatter Graph is more effective than using several line graph combined.
Lastly, screenshots below are two type of graph captured from same production system at the same time.

Line Graph
Do you feel that the red box indicate some sort of performance problem? Do they make you want to investigate them?
How about the next graph? Do you want to find out which transaction is represented by each plotted dots?

Response Time Scatter Graph
This the strength of the Response Time Scatter Graph |