Archive for June, 2009

IntelliJ – The IDE I wanted to love

27 June 2009

In the Java market we have a plethora of ide’s to choose from. The one I am most familiar with and also see used most often is Eclipse. Everyone knows Eclipse of course. Eclipse is not without its faults. So its natural to look around every so often. Eclipse is my baseline. At this point we arrive at IntelliJ. IntelliJ is another well known ide. I have heard IntelliJ described as the only ide for java developers, the ide that wil increase productivity 10-100 times. Its a commercial product which means you can have a reasonable expectation of stability, customer support and all the nifty tooling you would expect in Eclipse or any other ide. Naturally you expect it to be better than Eclipse or you would’t pay for it. I proceeded to download, install and put IntelliJ to use.

A few words on environment. I run Debian Lenny on a Dell Precision T-3400.

IntelliJ is an easy install. Download, unpack, run (,????, profit!). Its only 100MB compressed which is less than 150MB for Eclipse classic (and a lot less than 650MB for MyEclipse). You may need to check JDK_HOME variables but I’m guessing most people wil have this set up already. Otherwise what are you doing with a Java specific IDE. You may be confronted with a dialog or two. To begin with you should be safe with the defaults. Odds are you only need a subset of the defaults to begin with anyway.

My first impression after the install is that IntelliJ is very pretty. Especially after I changed the theme. This all went really well. Everything was fast and seemed well thought out. Of course IntelliJ has its own key mapping, menu ordering and things of that nature. If you like you can enable Eclipse key mappings but if you want to get to know a program rather than get to work quickly I recommend going with what’s totally unfamiliar. Although I am of the opinion that ctrl-s means save in any language.

The Subversion plugin works like a charm. I am even convinced it’s faster than Subversive…at least in our local environment where Subversion has, at times, been a pain to work with. If for no other reason, this plugin makes me want to use IntelliJ. If my work confined itself to talking to the repository I would be done. Forever.

So here we have a fresh working copy of our project, complete with Eclipse project files. IntelliJ understands Eclipse project files or at least so it claims. An import or two later and I have an official IntelliJ project. As far as I’m concerned I just need to point this puppy to Tomcat and we’re good to go.

Now IntelliJ has a lot of spiffy UI thingies. You got screens for configuring everything. You’ve got modules and facets. You’ve got raindrops on roses and warm woolen mittens. Confidently I configured the project to deploy to tomcat. Of course it wouldn’t work at once, that would be too much to ask. I was convinced any problems I was sure to encounter would be easy to solve.

My first problem comes in the form of compiler errors. “But wait,” you say, “did you not do a fresh checkout? Are your colleagues so daft that they enter broken code?” That would be yes and not usually. IntelliJ seems less permisive than eclipse in allowing certain constructs. Mind you Eclipse has no compile errors on the same code…so it can’t really be a compiler error since they use the same one. Ok, so its a precompile syntax validation check error or whatever. It’s not really a big deal.

Unfortunately I have no clue what IntelliJ is really doing. Somewhere there exists a tomcat. Somewhere there exists at least one copy of this tomcat. Somewhere you may or may not be deploying your project. Ok, I can actually find where IntelliJ hides its tomcat copy. Why can’t it just use my tomcat? I don’t know. Oh you can, sort of, but then you all of a sudden have to start and stop tomcat yourself. Not that I really care that IntelliJ uses a copy of tomcat, except that this copy still uses the originals directories. Sort of. I think. I’m not really sure.

A word on IntelliJ support. If you happen to write them they answer quickly. Really quickly. Actually the reply is there almost before you press compose mail. I was seriously impressed. Unfortunately they couldn’t help me and I dind’t want to rely heavily on support which is mostly meant for paying customers anyway. I also don’t like using support lines in general. It feels like a flaw in the product. Which it is.

You may be asking why I didn’t simply look up the answers in the documentation. IntelliJ documentation is one of the worst I have ever seen. It’s not worse than factually wrong information but it’s not better than a blank piece of paper. Basically if you have something labeled “Perform action” the documentation wil read “Performs the action Action”. The documentation will never actually explain what is happening or go into the details of anything.

So now I have a project which may or may not be deploying somewhere. After some looking around I do figure out I have lots of classpath problems. Also, my ant builds aren’t being run. Ok, add library here, ant build before deploy…Some progress made but not enough. My project still won’t work.

To make a long story short for about a week I’ve been playing with IntelliJ. I’ve installed, reinstalled, checked out, imported, etc, etc. I can’t get our project to work. This is a project which does work under Eclipse. So I blame IntelliJ. Where Eclipse always seems to have the thing you are looking for where you are looking for it, IntelliJ will make you search for it. Where Eclipse forces you to know what you’re doing and setup this and do that, IntelliJ has done it for you (mostly) and otherwise you can pretty much forget it.

To me IntelliJ is like a beautiful woman with a really anoying personality and a deaf ear. I really wanted to like it. Ask my colleagues, I am ready and willing to say that I hate something. I do it regularly and with passion. I do it to Eclipse all the time. Yet, despite all this, I really can’t hate IntelliJ. I would still really like to work with it. I’m convinced it could be really good. I can’t hate it and yet I can’t love it. For now it wil remain the IDE I wanted to love.

100K+ IOPS on semi-commodity hardware

1 June 2009

Abstract

A while ago we conducted a study to find the fastest IO subsystem that money can buy these days. The only restriction being that it should be build out of (semi) commodity hardware. That meant extremely expensive solutions where a sales representative needs to visit your office in order to get a price indication where off limits. We reported about our findings here: One DVD per second

The 65K IOPS barrier

When we examined the graphs resulting from our previous study, it was pretty obvious that we were hitting a bottleneck somewhere. No matter the amount of hardware that we threw at the system, all graphs quickly saturated at around a 65K IOPS limit. This is most apparent in the following graphs, which we repeat here from the previous study:

fig 1. random IOps v.s. block-size and threads
RAID0 8 SSDs

iops-random-raid0-8.png

fig 2. random IOps v.s. block-size and threads
RAID0 12 SSDs

iops-random-raid0-12.png

fig 3. random IOps v.s. block-size and threads
2x8xRAID0 (16 SSDs total)

iops-random-raid00-16.png

In these examples, maxing out the number of SSDs per controller (figure 2) or doubling the entire storage system and then striping these together (figure 3), does not yield any performance benefits with respect to the maximum random number of IOPS. We do see that the setup depicted in figure 3 gives us some performance benefits, but basically it only pushes two more data points (64k and 128k block sizes) towards the imaginary 60~65K barrier.

Alignment, stripe size and data stripe width

For SSD performance, correct alignment is a matter of the uttermost importance. A misaligned SSD will seriously under perform. In our previous study we did our best to find a correct alignment. Since we were using LVM to do the striping, we set the metadata size to 250. This is actually a little trick we found, and apparently others found out about too:

LVM likes to allocate 192k for its header information, and 192k is not a multiple of 128k. So if you are creating file systems as logical volumes, and you want those volume to be properly aligned you have to tell LVM that it should reserve slightly more space for its meta-data, so that the physical extents that it allocates for its logical volumes are properly aligned. Unfortunately, the way this is done is slightly baroque:

# pvcreate –metadatasize 250k /dev/sdb2
Physical volume “/dev/sdb2″ successfully created

Why 250k and not 256k? I can’t tell you — sometimes the LVM tools aren’t terribly intuitive. However, you can test to make sure that physical extents start at the proper offset by using:

See: Aligning filesystems to an ssds erase block size/

When digging some more into this, we learned about the concept “data stripe width”, which is defined as:

data width = array stripe size * number of datadisks in your array

E.g. for raid 6 it’s array stripe size * (number of disks in array -2) and for raid 5 it’s array stripe size * (number of disks in array -1). This is an important number when you are striping multiple smaller arrays (sub-arrays, or what we called ‘legs’) into a larger array. It gives you the amount of data that together occupies exactly 1 array stripe of each device that makes up the sub-array. It’s this number that you should work with when trying to align LVM.

Stevecs explains this rather well:

pvcreate –metadatasize X /dev/sda /dev/sdb
# X 511 or whatever it takes to start at 512KiB with padding)

vgcreate ssd -s Y /dev/sda /dev/sdb
# Y here is your physical extent size which you probably want to be a multiple of your data stripe width
#default is 4MiB and that’s fine as we our data stripe widths here are evenly divisable to that for 8K (32KiB) or 128K (512KiB)

lvcreate -i2 -IZ -L447G -n ssd-striped ssd
# Z here should be a multiple of your data stripe width (32KiB or 512KiB)

See: xtremesystems

Applying this theory did help in some cases, but the 65k barrier hadn’t moved an inch.

The Linux IO scheduler

Something we overlooked in the previous study is the fact that Linux has a configurable IO scheduler. Currently available are the following four:

  1. Complete Fair Queueing Scheduler (CFQ)
  2. Anticipatory IO Scheduler (AS)
  3. Deadline Scheduler
  4. No-op Scheduler

For an in-depth discussion of each scheduler, see: 3. Schedulers / Elevators

Especially the CFQ and AS schedulers are designed to optimize IO in such a way that seek latencies are prevented. The thing is of course, SSDs don’t have any notable seek delay, especially not relative to other blocks. It thus doesn’t matter at all whether you e.g. fetch blocks 1, 10 and 20 sequentially or totally at random. Setting the scheduler to noop, which doesn’t try to be smart and basically does nothing at all, improved overall performance again. However, there was STILL a clear saturation point in our performance graphs. The barrier was moved up a little, but it was still very much there.

The file-system

At this point we were frantically wading through the C source code of the Areca driver, monitoring the interrupts and basically grasping at every straw within our reach. It was then that my co-worker Dennis noticed the following when examining the jfs module:

modinfo jfs
filename: /lib/modules/2.6.26-1-amd64/kernel/fs/jfs/jfs.ko
license: GPL
author: Steve Best/Dave Kleikamp/Barry Arndt, IBM
description: The Journaled Filesystem (JFS)
depends: nls_base
vermagic: 2.6.26-1-amd64 SMP mod_unload modversions
parm: nTxBlock:Number of transaction blocks (max:65536) (int)
parm: nTxLock:Number of transaction locks (max:65536) (int)
parm: commit_threads:Number of commit threads (int)

After we also examined the source code of the jfs driver (hooray for open source, see jfs_txnmgr.c), it became clear that we had found our bottle neck. We thus tried with another high performance file-system that we had wanted to test before, but never found the time for. This appeared to be the break through that we were looking for. Basically by changing a J into an X in our setup, we more than doubled the amount of IOPS and got far more than the 100K IOPS we were targeting.

Figure 4 shows the difference between using JFS and the CFQ scheduler vs using XFS and the NOOP scheduler, both using the 2x6xRAID6 configuration (12 disks total). Results as before are obtained from the easy-co benchmark tool. Unfortunately, we changed our RAID controllers from a dual 1680 to a dual 1231 in the mean time and we didn’t redo the easy-co benchmark. It nevertheless clearly shows how the barrier has been broken:

fig 4. random IOps v.s. block-size
2x6xRAID0 (12 SSDs total)

iops-random-raid0-8.png

There are still some questions left unanswered. With this setup adding more disks to the array (from 2×6 to 2×8) did not improve the IOPS performance at all. Neither did changing the RAID level from 6 to 10, which theoretically should have given us another performance boost.

Alternative benchmark tools

In the previous study we only used easy-co’s bm-flash as our benchmark tool. For this study we verified our findings using another tool; xdd. Where bm-flash tests for a fixed amount of time (10 seconds per item), xdd tests for a given amount of data. This makes it easier to rule out caching effects by specifying an amount of data that surely doesn’t fit in the cache, thereby forcing the controller to actually go to disk. Since our setup uses 8GB cache, we used 16GB of data for each test run that was randomly read from a 64GB file use direct IO. Each test was repeated multiple times and the final result taken as an average over these runs (there was no notable difference between each run though).

Figure 5 shows the results for a 2x6xRAID6 setup, using 128 threads, LVM, array stripe size set to 128KB and the software stripe size set to 512KB. The following exceptions hold:

  • ssz = array stripe size 8KB, software stripe size 32KB
  • mdamd = mdadm used instead of lvm for software striping
  • raid10 = RAID10 used instead of RAID6

fig 5. random IOps v.s. block-size.
2x6xRAID6 (12 SSDs total)

xdd_2x6xraid6_jfs_vs_xfs.png

Note that this graph only shows part of the data that was shown in fig 4. Since XDD tests take a great deal longer than bm-flash tests and the interesting data is within the 1KB to 32KB range anyway, we decided to limit the XDD tests to that range.

When looking at figure 5 it becomes clear we’re seeing the exact same barrier as when using bm-flash. In this figure too, the graph shows that XFS breaks through this barrier. However, the actual number of reported IOPS are a good deal lower as that what bm-flash reported. Here too, going from 2x6xRAID6 to 2x8xRAID10 did not improve performance, although there is a slight performance difference for the larger block sizes 16KB and 32KB.

Conclusion

Performance tuning remains a difficult thing. Having powerful hardware is one thing, but extracting the raw power from it can be challenging to say the least. In this study we have looked at alignment, IO schedulers and file systems. Of course it doesn’t stop there. There are many more file systems and most file systems have their own additional tuning parameters. Especially XFS is rich in that area, but we have yet to look into that.

Benchmarking correctly is a whole other story. Its important to change only one single parameter between test runs meant for comparison, and to document all involved settings for each and every test run. Next to that its important to realize what it is that you’re testing. Since bm-flash only runs each sub-tests for 10 seconds, it’s likely that the cache is tested in addition to the actual disks. Also, in this study we only looked at random read performance, but write performance is important too. Taking write performance into account (pure writes, or doing a 90% read/10% write test) can paint a rather different picture of the performance characteristics.

Acknowledgement

Many thanks have to go to stevecs, who kindly provided lots of insight that helped us to tune our system. See this topic for additional details on that.


This benchmark was brought to you by Dennis Brouwer and Arjan Tijms of JDevelopment, an M4N team.

css.php best counter