• Linked Applications

BVQ analysis VAAI: undercover write operations

White Paper

Download as pdf: BVQ Analysis VAAI undercover write operations.pdf
Download foils: BVQ Analysis PPT VAAI undercover write operations.pdf

Performance Bottleneck Analysis- VAAI undercover write operations
In this example we will see that the usage of the VAAI is somehow hidden from ordinary monitoring methods because the VAAI data streams are not visible on the top SCSI layers. So it might look like a system reacts overloaded without any reason but the reason can be uncovered when you look deeper into the system when your analysis tool is tailored good enough to SVC/Storwize and enables you to see needed the performance indicators.
_vStorage APIs for Array Integration is a feature introduced in ESXi/ESX 4.1 that provides hardware acceleration functionality. It enables your host to offload specific virtual machine and storage management operations to compliant storage hardware. With the storage hardware assistance, your host performs these operations faster and consumes less CPU, memory, and storage fabric bandwidth. (from http://kb.vmware.com)

Customer situation:
Customer was experiencing performance issues in his high performance managed disk groups since he added two new mdgs for a new type of storage. With these new mdgs the SVC had to divide the available write cache sizes into more partitions so the already existing mdg's cache sizes were halved from 60% of the existing global cache down to 30%.
In the same timeframe the customer also started to use VAAI – it was now unclear whether we had the performance issue because of the VAAI usage or because of the new mdgs.
The correct answer is, that the performance issue is from an overload situation in the mdg when the customer added new managed disk groups. The best solution for this is adding new nodes.
In some situations the overload of the mdg could be explained by high write data rates which led to a cache full condition in the mdg cache partition. In several situations this explanation did not work. So the question came up – what is responsible for the mdg cache overload?
In Pict 4 we found out that the volume with the highest write activity is working with near to zero cache size. This would not be a problem, because the volume does not win from cache! Instead of this, a volume with no visible activity owns the complete write cache - this is a totally unusual!!
Pict 6 shows the explanation – the volume with no write activity is writing heavily but this is only visible underneath the cache layer – astonishing enough, that this volume is consuming the complete cache. This write is not performed from a host – it is performed from the VAAI which is offloading operations from the VMware host into the storage hardware. This is why it cannot be seen on the volume layet and why it is can only be found with tools which allow deep analysis.

Pict 1. The managed disk cache group in this customer example is already heavily loaded. So only a little bit more load is needed to drive it into response time issues for all 257 volumes in this managed disk group. Typically the mean partition fullness is always on 80% and more and the max values are reaching 100% full for longer timeframes. The constant 80% and more mean cache fullness show that the mdg is already on the edge.



Pict 2: This is an aggregate curve of 257 Volumes in a managed disk group. The curve shows more or less steady read and write IO (green) and two response time peaks (red – mean value of all 257 volumes response time) which are not motivated by higher IO load.


Pict. 3: This graph shows response times of all volumes. In the questionable timeframe nearly all volumes have higher response times up to 1100 ms. These are looking like towers so I like to name them response time towers RTT. The winners are the two SRD volumes with 520 ms. The even higher response times belong to volumes with near to 0 IOPS.


Pict 4.: Analysis of the volume write cache sizes (pink) with the write data rates (blue). In the first view these curves look very common – high data rate of single volumes together with high cache usage but the astonishing here in RTT1 is, that the volume with the high cache usage (vmvdi03_v02 0.02MB/s, Caches size 10080 MB) is not the volume with the high data rate (vmvdi03 70 MB/s, Cache Size 0MB), This means that something is happening on vmvdi03_v02 what we cannot see in the SCSI top layer and which is reserving the complete write cache of the managed disk group.

Pict 5: The same analysis for the second peak RTT2. In the first 5 minute measurement period we find a very typical behavior of volume vmvdi03_v00 with high write data rates and a cache usage of 9734 MB. This is understandable but again in the second 5 minutes the situation changes back to the same we had on RTT1 the volume vmvdi03_v02 is reserving again the complete cache and the volume vmvdi03_v00 is working with the same data rate but no cache.


PIct 6: A deeper look into vmvdi03_v02 in RTT1. We did not find write activity in the SCSI layer above cache but we find high activity in the layer below SVC cache and high cache size for this volume. This specific activity came from VAAI copy tasks – the tricky side of this is that this load is not visible where we expect it but it is using the SVC resources like any other data write operation.


PIct 7: This is how a typical Response time peak should look like – the acting volume is reserving the majority of the cache.
Thanks to Thomas who did a first description of this!

BVQ Website

International Websites
Developer Works Documents and Presentations


Popular content:

Page: Solved buffer credit wait situation at a customer side , Page: BVQ analysis VAAI: undercover write operations , Page: Downloads and BVQ releases , Page: Customer use case BVQ Storage Tier Analysis , Page: Performance bottleneck analysis on IBM SVC and IBM Storwize V7000

General links

Return on invest 

Performance analysis whitepapers

The BVQ Blog
SVC / Storwize monitoring, reporting and performance analysis solution. BVQ is the fastest and complete performance analysis tool for IBM storage virtualization.
BVQ Version 3.4 adds Storwize drive statistics, point context menus and easy tier monitoring
We just released BVQ Version 3.4 with some very strong enhancement.   Download this version from our WIKI http://bvqwiki.sva.de/x/BQEz   Some of the major highlights are: Drive statistics for Storwize systems! BVQ 3.4 adds drive statistics below the...
SVC Storwize performance analysis video with BVQ version 3.4
With BVQ version 3.4, we will achieve great improvements in the handling of Performance analysis.   BVQ 3.4 makes it easy to do performance analysis. Ideal for people starting with it. BVQ 3.4 speeds up up performance analysis for experts BVQ 3.4 is the...
BVQ Version 3.3.3 download page in English language is now available
You can now find this page here http://bvqwiki.sva.de/x/BQEz   The improvements of BVQ 3.3.3 are Loading and drawing processes of the performance view were significantly improved. Long lasting data load processes can now be interrupted. The performance...
New BVQ offline scanner now scans Storwize drive performance information
We just released the new offline scanner which will now also scan the drive information from any Storwize product. Direct Link to BVQ offline scanner http://bvqwiki.sva.de/x/3IBdAQ With this we can make more accurate statements about the performance of...
BVQ V3.3.3 with great improvments for performance analysis released
We just released the new BVQ V3.3.3 with great enhancements in performance and usage of the performance analysis methods. We are in progress to translate the release notes to English. If you want to download V3.3.3 now you can do this from the German download...
Thoughts about storage monitoring or monitoring or analysis?
  I had a discussion about storage monitoring and analysis. This discussion ended in the following understanding: Monitoring with a simple product help you to understand when things go wrong. They show the symptoms but not the reasons. You will need to...
Sneak preview BVQ dashboard will greatly enhance storage monitoring capabilities
The BVQ development team is currently working on the new BVQ dashboard functionality. This new dashboard should become available in 1H2015. The dashboard is great for monitoring, you can open several dashboards and put them onto different screens, this will...
BVQ is so incredibly fast! Leads to SVC / Storwize performance analysis results in seconds instead of minutes or hours
This is just a new experience with the newest BVQ Beta code which will become available as BVQ Version 3.4 in some weeks. BVQ Version 3.4 will have a completely refurnished internal data handling which you will not recognize as a benefit today but this is the...
Analysis of SVC remote copy performance problem
This is an example of an successful latency peak analysis were we found the reason of a performance problem in the remote copy connection.  It is very complex to analyze RC performance problems but the BVQ Copy Services Package adds all tools needed to...
New SVC / Storwize /Flash and BVQ customer reference Bitmarck
  Find more details in the BVQ reference pages in English in German   TASK Expansion and optimization of SAN infrastructure for a higher performance and an improved monitoring SYSTEMS AND SOFTWARE IBM SAN Volume Controller (SVC) SVA...