Performance Tuning - Don't Forget the Filesystem

Claremont’s Database Administrators (DBAs) are very familiar with traditional performance tuning, including the Oracle database’s SGA, PGA, SQL profiles and indexing. However, when DBAs step back and look at the architecture the implicit aim is often to bring tuning back into the database – adding additional memory so that we can add more SGA, adding more CPUs so we can increase the level of parallelism, or even increasing the number of disks, or IOPS available from the storage layer, to improve the native speed of I/O.

Thus it is often tempting to purchase new hardware – with low-cost hardware, virtualised platforms and blade centres it’s a relatively small investment simply to upgrade the hardware. However, in many cases we can and should avoid doing this.

Claremont’s DBAs came across an example of this with one of our managed services customers for whom we host and support a 3.5TB Oracle E-Business Suite system. Their system was suffering from occasional poor performance caused by an I/O bottleneck. Rather than purchasing additional storage capacity, we were able to provide a significant boost in performance by employing traditional configuration-based performance tuning.

The Scenario

Our customer was experiencing system slow-downs – to the point of the system hanging when the environment was put under significant load. The problematic load tended to be simultaneous heavy OLTP and batch processing but this was necessary to meet business requirements.

During these periods Claremont’s monitoring detected huge amounts of I/O wait:

This graph shows the I/O wait for a single block read from the database taking upwards of 70ms – this is a huge and unacceptable wait when the generally accepted rule of thumb is that 5ms is excellent, 10ms is satisfactory and above 15ms will likely affect the user experience.

Oracle Enterprise Manager (OEM) also reported quite striking I/O activity (light blue):

Clearly there was something untoward causing such a drastic I/O problem.

The Fix

After generating a test case where the I/O issues could be reproduced at will in a pre-production environment, Claremont’s DBAs tested various changes in the database and architecture. These included the SGA size, redo log size, SAN caching, Virtualisation mode and more. This extensive and time consuming work identified some useful performance improvements that were subsequently put live, but none addressed the underlying problem. The poor performance persisted and at each stage the EMC SAN reported it was underutilised, responding to I/O requests with low wait times.

It was clear that I/O requests were queuing-up somewhere higher in the infrastructure. But this is a virtualised platform, using Oracle VM, and identifying the bottleneck would be a significant challenge. Was the issue in the database, operating system, virtualisation layer, server hardware or SAN fibre? Only systematic testing and analysis of each layer would reveal the problem. Ultimately it was the write barriers on the ext4 filesystem that proved to be the source of the bottleneck.

The fix was to turn off the write barrier for the ext4 filesystem. This was done by setting the mount option “barrier=0”. The write barriers ensure writes to disk are completed before the I/O operation is marked as complete such that there is no loss of data in the event of a power failure. Essentially the filesystem will write one data block and then apply a barrier which will wait for the first block to be written to the physical storage before allowing the next block write. Should disk power be lost, anything in the storage cache will still also be stored in the filesystem memory and will not be lost. However, our EMC SAN has an internal battery backup in addition to our data centre’s uninterruptable power supply (UPS). Even if the UPS fails and power is lost to the SAN, the internal battery will take over to ensure the cache is flushed to disk before the SAN shuts down. So the data in this instance is safe without the need for write barriers.

In many scenarios using a write barrier is perfectly acceptable, however they will significantly affect I/O efficiency if the storage has a large cache. The barrier waits for the block to be written through the cache to the physical storage and each write effectively causes a flush of the storage cache. At best the cache isn’t being used and at worst it is slowing down I/O operations. Claremont had invested in significant SAN memory cache and 500GB of solid state disk cache for this customer and that was proving counter-productive under high load, which is when it was needed most.

The cache is there to smooth out the “lumps and bumps” in performance that are experienced as I/O load ebbs and flows. However, in this high load scenario the cache was effectively redundant and the server had to wait for the native speed of disk to deal with the I/O. This is fine in the short term, with minimal slow down, but a system under sustained load will stack-up requests in the filesystem because the in-memory operations are always going to be quicker than the disk. A queue develops and the time spent in this queue is what accounts for the drastically high I/O wait times.

Turning off the write barriers allowed the filesystem to consider a write to the storage layer as being “on disk” and as such the cache is not flushed with every write and its efficiency improved. That meant the SAN could do its work in smoothing out the load preventing a queue building up.

The Results

With the write barrier removed our monitoring tools reported the IO waits drastically reduced:

Now I/O waits peak at 17ms as opposed to 74ms and OEM also shows a much healthier picture:

Ultimately, however, the goal here, as with any performance tuning exercise, was not to make the technical figures look good, but to actually improve the users’ experience. The following graph highlights 9 key business transactions and their average runtimes before (purple) and after (pink) the turn off of the wait barriers:

Users saw an average 60% improvement in system performance and that has delivered real business benefit to the customer.

In Conclusion

When tuning a system’s performance it is essential to look at all aspects of the architecture and focus on the root cause of the issue. It is very easy to view hardware upgrades as a quick and easy solution but with many performance issues, as in this case, no amount of hardware can overcome an underlying software issue. This demonstrates that traditional, structured performance tuning, using the DBAs’ experience as much as tuning tools, is very worthwhile and will prove extremely beneficial to customers’ environments.

Read more on Claremont‘s Oracle E-Business Suite Managed Services.