SSD performance improvements set LIVE

22 mei 2009, door development

In a previous blog entry (One DVD per second) we described how we build a new fast SSD based Database server and how we benchmarked it.

This week we put the new SSD DB server to the ultimate test, namely “Going Live”.

It turned out that our investments definitely payed off. Our main online application M4N has seen a massive speed improvement. We found an average performance increase of ~6 times for all queries. As a result, the initial average performance increase for the whole application is approximately a factor 2. Keep in mind that the whole application depends on more than just the main DB. Of course, the performance of the Java Application Server and the network bandwidth play important roles as well.

The performance increase is most apparent when simply browsing through the different pages of the application. The whole experience now feels very snappy and even the more complex pages load nearly instantly. We have statistics on our average page speeds so I included this in the pictures.

Picture one the improvements with testing:

Speed improvements new database server SSD Postgresql

Speed improvements new database server SSD Postgresql



Picture two the improvements with speed on execution of queries:

Statistics speed improvement Postgresql database

Statistics speed improvement Postgresql database



Finally, during the night we execute a slew of maintenance queries. Normally, we see a number like this after the script that triggers these queries has finished executing:


real 348m6.451s
user 0m4.912s
sys 0m0.808s

The other morning however we saw this:


real 30m34.747s
user 0m2.752s
sys 0m1.008s

This amounts to a performance increase of over 11 times.

For now we thus carefully conclude that using SSDs indeed improves performance by a large margin.


Postgresql the most advanced opensource Database and full text search.

23 april 2009, door development

We work with Postgresql and are very happy with the performance and the existence of the DB. We are also very happy with the new full text search.
But what happened while we searched on the Postgresql website searching for documentation?
We search on:
http://search.postgresql.org/search?q=All

Our result on the search term “ALL” was:

Your search for All returned no hits“.

Searching for LIMIT ALL or “LIMIT ALL” (more specifically the thing I was searching for) the “ALL” was ignored and hits including “LIMIT” only were returned.

I tried some more words like, “This”, “because” and “that”. None of them gave any result.

Look at the slogan on the right side of the site:
The world most advanced opensource database“?
This give us some really nice thoughts! ;-)

searchall Postgresql the most advanced opensource Database and full text search.


One DVD per second

20 februari 2009, door development

Abstract

For modern database systems the main bottleneck is usually IO. In order to speed up our database we conducted a study to find the fastest IO system currently available. In this article we describe our approach and findings. The result: with our new database server filled with SSDs we can transfer the amount of data equivalent to nearly one DVD per second, 3.3GB/sec.

Introduction

Our current SAS hard-disk based database server (6 disks, RAID10) was having an access time bottle-neck, too much concurrent database access caused the server to slow down considerably. In a previous article building-the-new-battleship-mtron we promised test results using our new hardware.

We can now finally show you some of the obtained test results and give you a taste of what SSD based filing systems will bring. Of course, we run into several firmware incompatibility issues using this emerging hardware, found motherboard PCI-E bottle-necks, discovered Linux kernel performance differences, software specific benchmarking problems and strange file system hickups, but most of this has been circumvented to produce the figures listed in this article.

Benchmarking was not a goal by itself, given the amount of hardware (which reflects a certain amount of hard cash), what would be the best performing – cost effective and fail save setup using this hardware?

Well, we think we figured that out. Where to start… first let us explain why we choose this hardware setup at the first place. And for those who want to jump to conclusions, be our guest.

Hardware, the outfit

SSD: they come in several flavors where the best taste is the SSD which is build using SLC memory; they last longer and generally perform better on writing. The downside: they are much more expensive. We already had some hands on experience with MTRON SSDs in the past so we choose the newer, better and faster MTRON 7500 Pro series. Of course there are Intel, MemoRight, OCZ and Transcend too but at that time of writing MTRON was the best performer on paper. We ordered 12 MTRON MSP7535-032 (32GB) and 4 extra MTRON MSP7535-064 (64GB) SSDs.

Raid controller: that should be a PCI-E x8 board to handle all the IO. Unfortunately, at the time of writing no PCI-e x16 boards where available (and not so many, if any, server motherboards do support PCI-E x16 slots). Cache and even more cache should be available (only useful when Battery Back Up Unit applied). And, this may sound odd, but specifically for us, the controller also needed to support SAS disks. Namely we wanted to re-deploy some of our SAS hard-disks for good old time’s sake. A logical choice was to go for the ARC1680IX-12. Not only did we already gained some experience on a test server with it’s smaller brother, the ARC1680IX-8 but the recent firmware release 1.45 put it nearly on par with the native SATA controllers of the same brand. Problematic for us is the fact that all current high performance RAID controllers have been designed with traditional hard disks in mind. This means that the RAID controller often becomes the bottleneck. To overcome this bottleneck we used not one, but two ARC1680IX-12s in parallel. The setup we used is depicted in figure A.

fig A. RAID05 dual controller setup
dual_raid_setup.png

Server: the server motherboard selected is the Super Micro X7DWN+ that can hold up to 128GB of RAM and two Quad Core Xeon CPU’s. We equipped the board with two X5460 CPU’s (at that time the highest clocked Xeon, which comes in handy for CPU bound calculations) and a mere 48GB of RAM. The server will act as a database server, the more RAM the better so most of the data can then be hold in memory.

Casing: a couple of OS disks, legacy SAS hard-disks and 12 SSDs need a shelter, we have chosen the Super Micro SC846, a 4U server casing that nicely matches with our two 1680IX-12 controllers and comes with a redundant power supply, something which is truly enterprise worthy.

Firmware, the hurdles

First thing we did was to install the controller cards, wired the backplane, inserted an ordinary hard disk for the OS, installed Debian on it and inserted all SSDs in random order into the cabinet. Then the Reboot, and… show time!

No, not yet, the event log of the Areca controllers showed many Time-Out errors. Doing the regular stuff like building RAID arrays seemed impossible let alone to create a file system on the newly created block-devices.

Research revealed that the hardware came with different firmware as we used before. We found out that the 64GB SSDs where shipped with firmware 0.18R1H3, the same we used in our test server, while the 32GB SSDs came with 0.19R1, which was new to us. The drives having firmware 0.18R1H3 did work properly indeed. So we first asked our supplier to ship us the newer firmware 0.19R1H2 available at that time (sometimes newer is better). We flashed the disks to 0.19R1H2 but still no good; a lot of Time-Out errors appeared in the event log again. Then we figured out that sometimes older is better and asked MTRON if it was possible to downgrade the firmware and luckily it was. We where provided with the DOWN.EXE executable and the proper MTRON.MFI and we flashed the SSDs back to 0.18R1H3. Bingo… all seemed to work fine now. We recently tried to upgrade some SSDs to firmware 0.20R1 but that didn’t work either.

Not only MTRON releases firmware on a regular bases, but so does Areca. We therefore upgraded the ARC1680IX-12s to the 1.46 firmware released on 23-01-2009 and the whole ritual was performed again; flashing the SSDs to the latest version 0.20R1. At first glance no Time-Out errors where noticed after the flash, however building RAID arrays slowed down nearly ten times and a filing system created on the resulting block-device performed over 20 times worse as expected. We are back to 0.18R1H3 again. Currently Areca and Mtron are investigating in a joint effort what causes these problems. Some hurdles are taken but we are still running and didn’t finish yet, we will keep you posted. The latest news is that Areca was able to reproduce the problems, but located the problem to be within Intel’s IOP348 firmware. Of course Areca is unable to fix anything there, so it’s now Intel’s turn.

Motherboard, keeping the right PCI-E x8 lane

The motherboard comes with plenty of PCI-E x8 slots and we installed the RAID controllers in the slots that would give the best airflow in the casing. Since we have two controllers we created identical arrays on both controllers (lets call them “legs”). Tests showed that there was a read performance difference of some 25% between those legs. Swapping the controller cards, and even the SSDs showed that the cards where performing okay and the SSDs where not to blame either. When looking in the manual we discovered that the PCI-E x8 slots we tried where being driven by different chip-sets. Our board comes with an Intel 631xESB/632xESB IO controller chip (south-bridge) and the 5400MCH (north-bridge) where the 631xESB/632xESB IO controller chip performs less. If you are in the situation of putting any piece of hardware in your server be aware, or better test, which slot to take.

Put in figures: using the 5400MCH we where able to do a multi-threaded sequential and random READ with a maximum total bandwidth of 1750MB/s, while the 631xESB/632xESB IO controller stopped at 1400MB/s. WRITING was not affected by different slots, since this is bottle-necked by the disks them-selfs.

Benchmarking, the software as referee

This may be considered one of the most important aspects of deriving figures from a server; which benchmark software to use? Ideally that would be software for which a lot of results are already available for comparison. We evaluated some like: bonnie++, IOMeter, IOZone, bm-flash, Orion and some more, all with their own pros and cons.

We eventually settled down with bm-flash and IOMeter. bm-flash takes a fixed amount of time to finish and gives a quick impression on file system performance, while IOMeter with a default workload, is widely used so we have an abundance of figures for comparison. We dropped bonnie++ because under some conditions it will only show “++++” when no useful figures could be calculated. IOZone and Orion required a complete study to be interpreted.

In order to use IOMeter we had to drop Linux and installed Windows Server 2008. Dynamo (the test client) running on Linux and IOmeter GUI (manager) running on Windows are somehow incompatible and continuously caused crashes on our test server. For our escape to Windows Server 2008 we where stuck with NTFS.

For benchmarking Linux we chose for the default JFS file system because previous tests showed (in our case) that this used the least CPU overhead and performed better compared to ext3. One may go totally lost in file system tuning but we decided it was not doable to tune too many parameters at the time. Note that the mount options where set to have a minimum of writes to the disks:

rw,noatime,nodiratime

Linux software striping, the odd and even lanes

In order to increase the bandwidth we wanted to make use of software striping and as a side effect doubling the amount of RAM cache. We created two identical legs on the controllers, they where plugged in equal performing lanes and installed the mdadm and lvm2 packages for Debian (Debian Lenny – Linux 2 2.6.26-1-amd64). For both mdadm and lvm2 we configured striped block-devices. Let the games begin….

Not to bother you with too many figures right now we discovered an interesting finding: block-devices created by mdadm performed about 10% better compared to lvm2. The performance on write bandwidth is quite similar but the number of IOPS and read bandwidth using mdadm is higher. When doing a survey on internet we found that more people complained about this performance difference and that file system alignment and read ahead settings should be properly set. But even when all is properly configured the difference remains. E.g we can do max 65000 IOPS using mdadm and only 59000 IOPS with lvm2, we can read (sequential single threaded) max 1050MB/s using mdadm and only 980MB/s by lvm2. These measurements were obtained with reasonable fresh/new SSDs. Later on performance degraded and there was seemingly no way to get the original performance back. The mdadm/lvm2 performance difference can be attributed to the Linux kernel: mdadm uses the md kernel module while lvm2 uses the dm kernel module and these are beyond our control. Due to system administrator support we will not use mdadm but will go with lvm2.

Note: we recently retested on Debian Lenny, now stable. The max read bandwidth (sequential single threaded) for both mdadm and lvm2 stopped at 990MB/s. The difference in IOPS remained (65000 mdadm v.s. 60000 lvm2) but where only noticeable for small block-sizes (512B – 1kB) and therefor of minor importance to us.

Hickups, the file system’s full stop

During our test runs we sometimes noticed that the performance of the system was below expectations, a typical test run using bm-flash would look like [RAID0, 12 SSDs attached to one controller]:

test:/benchmark# ./bm-flash /ssd/test.txt 

Filling 4G before testing  ...   4096 MB done in 3 seconds (1365 MB/sec).

Read Tests:

Block |   1 thread    |  10 threads   |  40 threads
 Size |  IOPS    BW   |  IOPS    BW   |  IOPS    BW
      |               |               |
 512B | 11307    5.5M | 63751   31.1M | 63312   30.9M
   1K | 22720   22.1M | 63717   62.2M | 63086   61.6M
   2K | 24788   48.4M | 63510  124.0M | 62734  122.5M
   4K | 29158  113.8M | 63088  246.4M | 62361  243.5M
   8K | 30170  235.7M | 61824  483.0M | 61089  477.2M
  16K | 26110  407.9M | 61337  958.3M | 60991  952.9M
  32K | 19936  623.0M | 51250 1601.5M | 50853 1589.1M
  64K | 13589  849.3M | 27716 1732.2M | 27717 1732.3M
 128K |  8273 1034.2M | 13877 1734.7M | 13881 1735.1M
 256K |  4828 1207.2M |  6942 1735.5M |  6947 1736.8M
 512K |  2563 1281.5M |  3472 1736.1M |  3476 1738.4M
   1M |  1447 1447.0M |  1736 1736.6M |  1740 1740.2M
   2M |   796 1592.3M |   866 1732.5M |   868 1737.1M
   4M |   407 1630.0M |   430 1720.7M |   437 1748.3M 

Write Tests:

Block |   1 thread    |  10 threads   |  40 threads
 Size |  IOPS    BW   |  IOPS    BW   |  IOPS    BW
      |               |               |
 512B | 28529   13.9M | 29048   14.1M | 28567   13.9M
   1K | 27759   27.1M | 28213   27.5M | 28053   27.3M
   2K | 27393   53.5M | 28036   54.7M | 27513   53.7M
   4K | 26289  102.6M | 25278   98.7M | 26446  103.3M
   8K | 23068  180.2M | 22138  172.9M | 21824  170.5M
  16K | 17426  272.2M | 16820  262.8M | 17404  271.9M
  32K | 10607  331.4M | 10988  343.3M | 11173  349.1M
  64K |  6208  388.0M |  6894  430.9M |  6941  433.8M
 128K |  3611  451.4M |  3895  486.9M |  4019  502.4M
 256K |  1999  499.9M |  2116  529.0M |  2060  515.2M
 512K |   834  417.4M |  1074  537.3M |  1024  512.2M
   1M |   412  412.6M |   596  596.6M |   561  561.6M
   2M |   280  561.3M |   240  480.0M |   207  414.7M
   4M |   153  615.1M |   137  551.5M |   119  478.7M

However, if we configured different RAID configuration we noticed “gaps” in performance, output would typically look like [RAID5, 4 SSDs attached to one controller]:

test:/benchmark# ./bm-flash /ssd/test.txt 

Filling 4G before testing  ...   4096 MB done in 3 seconds (1365 MB/sec).

Read Tests:

Block |   1 thread    |  10 threads   |  40 threads
 Size |  IOPS    BW   |  IOPS    BW   |  IOPS    BW
      |               |               |
 512B |    24   12.2K | 28061   13.7M | 63593   31.0M
   1K | 22599   22.0M | 63882   62.3M | 63582   62.0M
   2K | 24644   48.1M | 63869  124.7M | 63275  123.5M
   4K | 29294  114.4M | 63519  248.1M | 62923  245.7M
   8K | 30188  235.8M | 62176  485.7M | 61474  480.2M
  16K | 26207  409.4M | 61542  961.5M | 61231  956.7M
  32K | 19944  623.2M | 51450 1607.8M | 50947 1592.1M
  64K | 13661  853.8M | 27717 1732.3M | 27718 1732.4M
 128K |  8356 1044.5M | 13878 1734.7M | 13881 1735.2M
 256K |  4756 1189.0M |  6942 1735.6M |  6947 1736.9M
 512K |  2563 1281.8M |  3473 1736.5M |  3476 1738.4M
   1M |  1383 1383.0M |  1737 1737.0M |  1739 1739.7M
   2M |   795 1590.1M |   852 1704.1M |   866 1733.0M
   4M |   408 1632.0M |   430 1720.3M |   436 1744.0M

Write Tests:

Block |   1 thread    |  10 threads   |  40 threads
 Size |  IOPS    BW   |  IOPS    BW   |  IOPS    BW
      |               |               |
 512B |  5642    2.7M |  7221    3.5M |  9223    4.5M
   1K |  7851    7.6M |  5218    5.0M |  5815    5.6M
   2K |  4142    8.0M |  4357    8.5M |  3976    7.7M
   4K |  3314   12.9M |  3379   13.2M |  2890   11.2M
   8K |  2251   17.5M |  2018   15.7M |  2500   19.5M
  16K |  1980   30.9M |  2659   41.5M |  3499   54.6M
  32K |  1303   40.7M |  1217   38.0M |  1342   41.9M
  64K |   240   15.0M |   144    9.0M |   126    7.9M
 128K |   499   62.3M |    96   12.0M |   110   13.8M
 256K |   309   77.2M |    13    3.2M
 512K |     6    3.4M
   1M |   154  154.0M |    10   10.5M
   2M |    80  160.3M |    99  198.7M |    99  199.1M
   4M |    42  170.7M |    54  218.7M |    57  230.0M

Notice the drop in performance that starts at writing the 64kB blocks.

Tests with IOMeter showed that under certain conditions the whole controller became unresponsive for several seconds, we even measured file system freezes in the order of 10 seconds (both reading and writing). We are still puzzling what causes these freezes but we assume that after heavy writing, and the controllers cache being fully filled up, the controller first writes some cached data to disk and only then starts working properly, we addressed Areca about this phenomenon what the intended behavior should be. We will keep you posted.

Figures, the benchmark results

Those interested in hard figures probably are looking at the graphs already. We deliberately will not show all the test results, we graphed them where possible to have an overview of SSD performance in different RAID configuarions.

In short; we have 16 SSDs and two controllers that support a maximum of 12 SSDs each. We tested RAID0 and RAID5 but skipped RAID6 and RAID10 arrays for write performance on these was rather poor. We repeatitively tested 1 to 12 SSDs with RAID0 and 3 to 12 SSDs with RAID5 configuration. For lvm2 striped block-devices (using the two controllers with software RAID0) we repeatitively tested 2×4 to 2×8 SSDs per RAID0 and RAID5 configuration.

Emphasized are 8kB block-size figures because our database implementation is compiled to work with 8kB block-size.

Sequential read, write and file copy versus number of SSDs and RAID configuration

dd if=/dev/zero of=/ssd/file.txt bs=8K count=5M [write 40GB file]
dd if=/ssd/file.txt of=/dev/zero [read 40GB file]
cp /ssd/file.txt /ssd/copy-of-file.txt [copy 40GB file]

fig 1. sequential read, write bandwidth [MB/s] v.s. SSDs and RAID configuration
8kB block-size

sequential-8k-1-thread.png

fig 2. 40GB file copy [seconds] v.s. SSDs and RAID configuration (lower is better)
40gb-file-copy.png
Conclusion: starting from 6 SSDs the controller’s read bandwidth saturates, while writing may benefit from adding additional SSDs. Usage of lvm2 software RAID0 and two controllers with an equal number of SSDs (8,10,12) always outperforms a single controller.

Random read, write bandwidth versus number of SSDs and RAID configuration

bm-flash /ssd/file.txt [random read, write and IOPS v.s block-size and threads]

fig 3. random read, write bandwidth in [MB/s] v.s. SSDs and RAID configuration
8kB block-size, 1 thread

8k-random-1-thread.png

fig 4. random read, write bandwidth in [MB/s] v.s. SSDs and RAID configuration
8kB block-size, 10 threads

8k-random-10-threads.png

fig 5. random read, write bandwidth in [MB/s] v.s. SSDs and RAID configuration
8kB block-size, 40 threads

8k-random-40-threads.png

Conclusion: bandwidth seems to be limited by the number of IO requests that can be spawned by a single thread (process). Scaling up the number of threads from 10 to 40 hardly influences the total bandwidth. Using lvm2 software RAID0 has a slight performance penalty on random read.

Random read, write bandwidth and IOPS versus block size and RAID configuration

bm-flash /ssd/file.txt [random read, write and IOPS v.s block-size and threads]

Note: since we concluded that 10 or 40 threads are not making much of a difference we only show the 1 and 10 thread random read, write bandwidth and IOPS graphs for readability.

RAID 0

fig 6. random read, write bandwidth in [MB/s] v.s. block-size and threads
RAID0 4 SSDs

bw-random-raid0-4.png

fig 7. random IOps v.s. block-size and threads
RAID0 4 SSDs

iops-random-raid0-4.png

fig 8. random read, write bandwidth in [MB/s] v.s. block-size and threads
RAID0 8 SSDs

bw-random-raid0-8.png

fig 9. random IOps v.s. block-size and threads
RAID0 8 SSDs

iops-random-raid0-8.png

fig 10. random read, write bandwidth in [MB/s] v.s. block-size and threads
RAID0 12 SSDs

bw-random-raid0-12.png

fig 11. random IOps v.s. block-size and threads
RAID0 12 SSDs

iops-random-raid0-12.png

RAID 5

fig 12. random read, write bandwidth in [MB/s] v.s. block-size and threads
RAID5 4 SSDs

bw-random-raid5-4.png

fig 13. random IOps v.s. block-size and threads
RAID5 4 SSDs

iops-random-raid5-4.png

fig 14. random read, write bandwidth in [MB/s] v.s. block-size and threads
RAID5 8 SSDs

bw-random-raid5-8.png

fig 15. random IOps v.s. block-size and threads
RAID5 8 SSDs

iops-random-raid5-8.png

fig 16. random read, write bandwidth in [MB/s] v.s. block-size and threads
RAID5 12 SSDs

bw-random-raid5-12.png

fig 17. random IOps v.s. block-size and threads
RAID5 12 SSDs

iops-random-raid5-12.png

RAID00 lvm2

fig 18. random read, write bandwidth in [MB/s] v.s. block-size and threads
2x4xRAID0 (8 SSDs total)

bw-random-raid00-8.png

fig 19. random IOps v.s. block-size and threads
2x4xRAID0 (8 SSDs total)

iops-random-raid00-8.png

fig 20. random read, write bandwidth in [MB/s] v.s. block-size and threads
2x6xRAID0 (12 SSDs total)

bw-random-raid00-12.png

fig 21. random IOps v.s. block-size and threads
2x6xRAID0 (12 SSDs total)

iops-random-raid00-12.png

fig 22. random read, write bandwidth in [MB/s] v.s. block-size and threads
2x8xRAID0 (16 SSDs total)

bw-random-raid00-16.png

fig 23. random IOps v.s. block-size and threads
2x8xRAID0 (16 SSDs total)

iops-random-raid00-16.png

As we can see in figures 18, 20 and 22, starting from a 256kB block-size, we finally reach our claimed 3.3 GB/sec. Although reading with 40 threads generally didn’t influence the results much, in this case however we already reached the mentioned 3.3 GB/sec consistently at a smaller block-size (128kB) using 40 threads. In figure 22 we show the results we obtained with 16 SSDs over 2 RAID controllers. However, the exact same read performance was also observed when using 8, 10, 12 and 14 SSDs over 2 RAID controllers, meaning read performance saturates quite rapidly and increasing the number of SSDs beyond a certain threshold doesn’t help to further improve read performance.

RAID05 lvm2

fig 24. random read, write bandwidth in [MB/s] v.s. block-size and threads
2x4xRAID5 (8 SSDs total)

bw-random-raid05-81.png

fig 25. random IOps v.s. block-size and threads
2x4xRAID5 (8 SSDs total)

iops-random-raid05-8.png

fig 26. random read, write bandwidth in [MB/s] v.s. block-size and threads
2x6xRAID5 (12 SSDs total)

bw-random-raid05-12.png

fig 27. random IOps v.s. block-size and threads
2x6xRAID5 (12 SSDs total)

iops-random-raid05-12.png

fig 28. random read, write bandwidth in [MB/s] v.s. block-size and threads
2x8xRAID5 (16 SSDs total)

bw-random-raid05-16.png

fig 29. random IOps v.s. block-size and threads
2x8xRAID5 (16 SSDs total)

iops-random-raid05-161.png

Figures 24 to 29 clearly show the awkward ‘gaps’ that we encountered multiple times during testing. The gaps shown here are of relatively minor severity (we saw ones that are much worse). As of now, we have no explanation for this and attribute it to compatibility problems between the Mtron SATA disks and the Areca SATA implementation (via the Intel IOP348 SATA stack).

RAID 5 – closer look at write performance

fig 30. random write only bandwidth in [MB/s] v.s. block-size and number of disks

bw-random-raid5-multiple.png

Figure 30 zooms in on the write performance we measured for several RAID 5 configurations, using 10 threads. The general trend is clearly that more disks improve write performance, a trend that was not so clearly visible for read performance. However, the graph shown in figure 30 is also a cause for great concern. There are a couple of awkward gaps to be seen and especially the 2×6 configuration is problematic. The first few and few last measuring points of the 2×6 configuration are exactly on par with the 2×8 one, but in the entire mid section all performance is gone.

Intended production setup: lvm2 – using 2 x RAID5 – 5 + 1 hot spare SSDs

In the above mentioned tests we found that software striping outperformed any configuration on a single RAID controller. Therefor the intention is to configure a software RAID0 setup using lvm2 for performance improvement. We have chosen a hardware RAID5 setup of 5 SSDs and 1 hot spare, this gives us much better performance over RAID6 and offers redundancy. In case one SSD fails the RAID will rebuild itself using the hot spare in about 15 minutes. During this time window we are vulnerable for data loss but that is a rather small and acceptable risk.

The performance measured for this setup sits somewhere between the performance shown in figures 24, 26 and 25, 27. We will spare you another graph. What about the sequential read, write performance? We measured this using a simple dd command for a big file (69GB) to rule out the mere 8GB RAM cache and a small file (4.3GB) that only hits the cache. In practical situations inserts and updates on the database will only hit the RAM cache and reading from database will be a mixture of reading from the 48GB of OS cache, the 8GB of on board cache on the Areca and finally the SSDs.

The big file (69GB):
dd if=/dev/zero of=/ssd/file.txt bs=512 x 2^[0..13] count=16k x 2^[13..0]
dd if=/ssd/file.txt of=/dev/zero bs=512 x 2^[0..13] 

The small file (4.3GB):
dd if=/dev/zero of=/ssd/file.txt bs=512 x 2^[0..13] count=1k x 2^[13..0]
dd if=/ssd/file.txt of=/dev/zero bs=512 x 2^[0..13]

fig 31. sequential read, write in [MB/s] v.s. block-size
bw-sustained-raid5-10.png

IOMeter

Coming soon

PostgreSQL performance

Coming soon

Test specification
Server; SuperMicro X7DWN+, 2 x X5460, 48GB RAM, 2 x ARC1680IX-12, 12 x MSP7535-032.
RAID controller ARC1680IX-12; 4GB DDR2 533MHz ECC, Firmware 1.46 23-01-2009, SAS 4.4.3.0, HDD Read Ahead = Auto, Cache = Enabled, Disk Write Cache Mode = Auto
RAID configuration; 2x RAID5 5 + 1 hot spare SSDs per controller – 128kB stripe size, Tagged Queuing = Enabled, Cache Mode =Write Back
SSD MTRON MSP 7535-032; Firmware 0.18R1H3
OS: Linux 2 2.6.26-1-amd64 [Debian Lenny]
Software RAID configuration: lvm2 with two RAID controllers – 128kB stripe size – READ AHEAD 8192

Conclusion

SSDs perform great, specially for database servers, where lots of concurrent read and write operations are carried out. Tests show an overall performance improvement of ten times for our database server but a general performance improvement can not be given, it all depends on your file system usage.

When used wisely SSDs are not much more expensive then traditional hard-disks (In the future probably cheaper because you need less SSDs to outperform hard-disks) and consume less energy.

However, due to SSDs being a rather new technology a lot of testing is required. It thus takes some time before one can actually go into production and until we have solved or at least understand where the file system hickups originate from we will not go live with SSDs.


This benchmark was brought to you by Dennis Brouwer and Arjan Tijms of JDevelopment, an M4N team.


Jboss AS 5 GA released!

5 december 2008, door arjan

Today is a historic day for Java, as one of the leading implementations of Java EE 5, Jboss AS 5 has *finally* been released. Originally planned for early 2007 orso and promised to be released almost every quarter.

But today, no more speculation and no guessing. It’s here, and this time it’s for real! Read all about this fabulous release here: http://www.jboss.com/index.html?module=bb&op=viewtopic&t=146773

Download it from here: http://www.jboss.org/jbossas/downloads/

With Jboss officially implementing the Java EE 5 spec now, a new baseline of Java has been set. From now on, technologies like JSF 1.2, EJB3 and JPA can really be considered as basic, standard available techs.

Congrats to the Jboss team, and hoping it won’t take them as long to release Java EE 6 ;)

References:


Jboss AS 5 GA release date

24 november 2008, door arjan

People have been wondering for some time about the release date of Jboss AS 5. As many of you know, Jboss 5 has been in development for quite some time and its release has been highly anticipated. Arguably, Jboss is one of the most important, if not -the- most important Java EE implementation. The fact that an official release of version 5 of this spec has been so long overdue, has been painful. Of course, Jboss 4.2.x supports most of the Java EE 5 stuff, but it’s just not the Real Thing.

The importance of Jboss AS is due to a number of reasons. Of course there are other Java EE 5 implementations out there, but the number of offerings seem to be declining. On the commercial side of the fence there’s basically IBM’s Websphere and Oracle’s OC4J, and on the open source side there’s Glassfish, Geronimo and Jboss AS 5. For many people, a closed source, commercial Java EE implementation seems to be little attractive. This leaves us basically with only 3 options for the moment. Geronimo may be nice, but nobody seems to be using this. Glassfish should maybe be the default choice (afterall, the Sun Java SE implementation is typically the default choice for many too), but has two main problems:

  1. It can’t be configured.
  2. It doesn’t offer any services.

This basically leaves Jboss AS 5 as the only choice, it it weren’t for the fact that it was always still in beta or in rc. In a way, this left a kind of void in the Java EE world (unless of course configuration and services don’t matter to you, then I suppose Glassfish would be perfectly fine).

Previously, Jboss AS 5 had been announced for february 2008 (see Where is JBossAS GA 5), but this seemed to be too optimistic. A major milestone was reached when Jboss AS 5 RC2 was officially EE 5 certified. A careful guess for the GA release date was then made for early November 2008 (see JBoss AS is now EE5 certified!).

Today however, a definite date has been given by Project Lead Dimitris Andreadis:

December 2008!

(See Jboss 5 release date?)

For those interested, these are some additional interesting resources:


Building the new battleship Mtron

20 november 2008, door arjan

A while back I stumbled upon the legendary article; Battleship Mtron, the absurdly fast RAID array build with 9 Mtron SSDs on a blazingly fast Areca ARC-1231ML, sporting an amazing 800 MHz Intel IOP341.

It was the fastest thing on the planet. Period.

A year has passed since then. At M4N we have been experimenting with an SSD setup consisting of 4 Mtron 7000 SSDs on a development server. After some extensive benchmarking, it appeared that in nearly all situations the IO power of these beasts is far superior to that of the traditional hard disk. A decision was made to order a bunch of SSDs to be placed in multiple servers. 12 of those arrived today, along with 2 Arceca ARC-1680IX-12′s equipped with a whopping 4GB of cache each and the fast 1.2Ghz IOP348.

Seeing all that hardware together however made us think. What if we… assemble it all together in -1- massive storage array? 12 Mtron 7500′s on 2 Arceca 1680′s (6 per controller), combining the power of the RAID sets of both Areca’s into 1 single volume using software RAID. What will be the performance of that?

Stay tuned for our upcoming benchmark reports!

mtrons
12 Mtron’s arrived at the office. Click for a larger image

Arjan


How we won an iPhone dev competition without any prior knowledge

1 september 2008, door arjan

About 2 weeks ago one of my team members, Robin Eggenkamp, mentioned there would be some iPhone dev ‘conference’ this month (iPhone Dev Camp, Amsterdam), originally at the building exactly opposite of the building where our own office is. Since I’m always interested in anything related to development I agreed to tag along. I expected it to be the kind of event where some talks are organized with a small hands on lab where some expert developer teaches newbies how to get up and running.

Now I’m certainly not a newbie when it comes to development. I lead a team of Java developers working on a rather large 200.000+ loc Java EE enterprise application and in a past life I worked as a win32/MFC C++ developer. Somewhere in between I also managed to finish a CS master. But… the iPhone was new to me. Although I’ve owned Apple computers ever since System 6 was still new & shiny, I’d never touched Xcode, Objective-C or Cocoa before. The closest I ever came was firing up Project Builder on my G3 iMac’s OS X 10.2, but only to test some C++ routine.

Because of the hands-on thing, I did plan to at least read an Objective-C tutorial the night before going to the event, but unfortunately couldn’t find the time to do so. When we arrived at the scene at exactly 10:00 in the morning, the place was already rather filled up. We found a cozy spot at a place in the back and while Robin started to connect his MacBook, I looked around and noticed it was not exactly what I thought it would be. Instead of an organized series of talks, this was a bunch of people sitting behind their computers, hacking away at stuff. The atmosphere seemed top notch though and I had a quick chat with some of the other people. At around 11 there was a short introduction talk and it became clear that the intend was to code something up for the iPhone and demo it at 17:00. The best apps would win a prize, with the first prize being a speaker set and a copy of Adobe CS3 or so.

By total coincidence, only moments after having his MacBook connected to the network, Robin finally received an email that he had been accepted for the iPhone developer program, something for which he had applied a whole month before. That meant we could start with some real development now! Robin had a little bit of experience with Xcode, but had done barely more than deploy some hello world examples to the simulator and tinkering a bit with the code. The fun thing about this was that normally whenever I need to use a new technique for my regular enterprise development, I first get myself a book of at least 600 pages, read the first 200 pages of that, try out some basic concepts, read another 200 pages, try another aspect of the tech, etc before I even attempt to apply it. Now we had to build something in a language we both didn’t know, on a platform we didn’t know, with tools we didn’t know and all of that in the course of a day :P

We started thinking about what kind of application we would try to create for the iPhone and I suddenly got the idea of letting the iPhone connect to a Mac and using data from its acceleratometer to move the Mac’s mouse pointer. I started with formulating some simple milestones to reach that goal:

  • Create local Mac app that moves mouse programmatically.
  • Create local iPhone app that just prints accelerator values to the screen.
  • Setup a connection from the iPhone to the Mac that just sends “hello”. Let the Mac prints this.
  • Integrate the individual steps to become the app we actually want. I assumed we would need some time to calibrate the raw acceleratometer and to find a suitable mapping from the meter’s range to the pixels on the Mac’s screen.

Meanwhile Robin was attempting to deploy his hello world example app to his iPhone using his just obtained certificate. It should have been a trivial thing, but after each deployment attempt a message box was displayed saying something like “0xE8000001, your mobile device has encountered an unexpected error (0x…) during the install phase: verifying application”. We tried many things, but nothing seemed to work. While we were feverishly googling for a solution, precious time on the clock ticked away. It must have been somewhere around 13:00 when Robin finally found out which settings in the project needed to be adjusted in what way. The example hello world app deployed correctly to the iPhone and it worked! Looking at the clock we realized we only had about 4 hours to go and we hadn’t written a single line of code ourselves yet…

Milestone 1 – The local Mac app – moving the mouse

The initial plan was to build the Mac app in Cocoa, but we decided that using Java would be the fastest way for us, basically since we simply know the language and environment. This milestone was easily completed. Using the java.awt.Robot class moving the Mac’s mouse pointer was a breeze.

Milestone 2 – The local iPhone app – printing accelerator values

For this milestone we couldn’t shy away from Objective-C anymore and actually had to take the plunge. We first looked up an example for getting data from the acceleratometer and luckily Apple had provided one. The next thing was to build a simple app, barely more than a hello world, that prints these values to the screen. This proved to be a little harder. Objective-C sometimes looks like Java and sometimes doesn’t. What are those square brackets everywhere? It looked like a kind of method call, but I couldn’t really figure out the meaning of the square brackets themselves. And how where we supposed to define properties so we could take advantage of the injection features of interface builder? Using @Property seemed obvious to us, but the compiler kept generating tons of warnings and errors. And how do we organize our code? We had created an AppDelegate, which we connected in interface builder to a mainView that inherited from the Window class. We added two labels that we injected to this view class, deployed our app to the iPhone, and… nothing happened. After feeling a little silly, we actually tried to quickly read some documentation. We learned that the square brackets have no extra special meaning, it’s just the Objective-C syntax for doing a method call. @Property needed to be accompanied with a declaration in the header and another annotation in the implementation file, @synthesize, that’s there to actually generate the getter and setter. Also, when creating a new project Xcode had already created an AppDelegate for us, something we overlooked.

With this new insight we ‘almost’ got our first real code completely working, but a few small things were still not going as planned. We therefor decided to throw it all away and change our strategy; start with an existing iPhone example application and just throw away what we don’t need and add what we do need. Going that route would save us from dealing with some of the nitty-gritty.

It was 14:00 by then and lunch had started. We enjoyed our nice and free lunch and had a chat again with the other guys. It seemed to be the case that we where hopelessly behind, since we still didn’t really had anything. After lunch things started to improve though. Having some idea of the Objective-C syntax now and using some of my almost forgotten C knowledge, we were quickly able to adapt an existing app to just print the 3 accelerator values (x, y, z) to 3 separate labels. Check!

Milestone 3 – Connecting iPhone and Mac

Since we had wasted a tremendous amount of time on the deployment and second milestone, we only had little time remaining. My original plan was to have one thread on the Mac listening to incoming communication, fetching commands and dispatching these to a (blocking) queue which will be read by another thread that controls the mouse movement. For the communication we wanted to dig through the iPhone API a little to see what it had to offer. With only 2 hours remaining, we decided to use the most basic communication method available; a simple BSD socket. At the Mac side we used a simple ServerSocket in Java and at the iPhone side we used the low level C socket()/connect() functions, for which we found a basic snippet of code that needed only a few adjustments. Although absolutely not the best technical solution, we decided to create and close a connection for each message sent.

Sending a basic test string from the iPhone to the Mac worked perfectly, so a little later we were able to send the accelerator values to the Mac. Check!

Milestone 4 – Integration

We had all the separate components up and running and now only needed to integrate them together. The acceleratometer’s values appeared to be in the range of -3 to 3 for all axis, while Robin’s Mac had a 1280*800 resolution. When totally in rest, there was a certain noise margin in the values that we got from the acceleratometer, so we expected that a little calibration was required. To test a little though we started with just multiplying the values we got by 60 and added that to the current mouse position. Surprisingly this already gave fairly good results. The multiplication and the rounding down to whole pixels canceled out the noise perfectly. In a few minutes we ended up with a really simple mapping that was just something like max(0,min(forceX * 15,1280)) for the movement on the X-axis. Sending about 15 messages per second appeared to be enough for smooth motion.

By now we suddenly had some time remaining, so we used that to implement the ability to also do a mouse click. Our initial approach to that was to sent a separate message for a mouse click, but it appeared to be more robust to just add the mouse button state as a fourth parameter to the existing message. At the very last moment there was a little panic when all messages being sent appeared to be empty. Apparently, our string formatting syntax for a boolean wasn’t supported by Objective-C (we used something like “%d,%d,%d,%b”) or maybe there was a difference between a primitive boolean and an Object boolean. We decided not to pursue the issue and simply use the string “false” and “true” (something I normally always stay far from, but with 30 seconds on the clock remaining there wasn’t much choice). Since we had been fumbling with the code for most of the day, we figured that our chances of winning anything where rather slim. Nevertheless we were happy that we had came up with something that worked, and actually worked rather nice.

The demo

It was now time for all of us to demo our application. Among others there was a tips of the days app, an app that retrieved quotes from the Internet, a very cool looking game where you had to touch the screen to cause a kind of bubble on which a moving object bounced to another side complete with sound effects and all and a very impressive looking application that measured your air time when skiing in addition to your speed, path and direction. Unfortunately this last app appeared to be only concept art.

When it was our time for the demo, I told something about the technical shortcuts we had taken, while Robin demonstrated how to use the iPhone to paint a running man in a painting application on the Mac.

Showing the demo
Picture by tizzle. See flickr.

Much to our surprise, our application was well received and we got the first price; a nice speaker set for the iPhone or iPod. We’ll install it in the office :)

Arjan Tijms

Links:


JSF 2.0, a glance

27 augustus 2008, door jasper.floor

Having been recently thrown into the deep waters of JSF and Facelets it seemed natural to do an evaluation of where this technology is going. JSF is currently at version 1.2. Version 2.0 is scheduled for release with Java EE 6. So what goodies will this bring us?

The main goals of the revision is to make development easier by integrating into the core system many tools/frameworks which were built on top of previous JSF versions. Ajax support, facelets-like templating, improved development support and better performance are all things that are mentioned in the specifications.

These are the highlights of what is coming:

I’m sure everyone is familiar with Ajax. JSF 2.0 includes Ajax in its life cycle and offers more direct support for its use. The view state can now be partially updated to reflect that only part of the view has changed. Ajax will make use of the new resource handler scheme. This isn’t very exciting in and of itself. A natural progression in the way one would expect this technology to mature.

Facelets is another technology needing to be incorporated into 2.0. Facelets allows us to define our pages in a different way by using XML and templating. The way to write these pages is very close to the the way we already work, which makes it easy to learn. The added usefulness of templating allows us to write less cluttered pages. It also increases code reuse. The popularity of this approach and its obvious usefulness make it inevitable that this would be included in JSF 2.0. This is currently referred to as ‘page description language’ in the JSF specification. While the name is less exciting than Facelets it is accurate and there is no doubt that this is a welcome addition to JSF

Development for JSF is getting some support through the ‘Project_Stage’ feature. This is no more or less than a context parameter. Now many people are excited by this, but I don’t quite seem to share the sentiment. Theoretically this will allow for actions dependent on where the project is in its life. Is it in production, unit test, system test or development? The options are limited, though probably enough. What I am wondering is what it is doing in JSF. While there is a use for a setting like this, indeed it seems common in many disparate projects, it seems to me to be a higher level setting than JSF level. JSF should support it, but not claim it.

Another interesting feature is the Resource Handler. Resources are anything that could be included in a component that is required for the component to be rendered correctly to the user-agent. Think about images, JavaScript or css files. The Resource Handler enforces a structure on your resources which in and of itself just helps in organizing your project. This will also make it easier for component developers. They know exactly where to put or look for resources without having to know what application is using them. Locale and versioning are handled automatically. Another advantage is that replacing resources runtime will be much cleaner. No restart of the system should be needed.

There are some things about resources I don’t like however. First of all the specification only specifies a structure for webapps. Implementations for other platforms are even invited to do whatever they want. It seems an unfortunate choice since you really want a standard to be standard. Another is the relocatable resource. Now you can tell certain resources where they are supposed to be. Put the tag in the body but tell it to be rendered in the head. I don’t understand the reasoning for this. Perhaps if it is based on some conditional programming elsewhere on the page….but quite frankly your page is probably getting to complicated at that point. If something needs to be in the head (body, wherever), just put it there. I wonder if I am not missing some essential insight with which I will all of a sudden see what a great and useful feature this is.

One argument for relocatable resource is that components may wish to write things in locations which are outside of their direct knowledge. The head of an HTML document is an example. A component, possibly called by other components, could decide it wanted to change the head of a document. This is completely undesirable behavior. If the component is generic it should never bother with anything outside its scope. If it isn’t generic then its use is limited anyway and one could ask why it is even a component. Thinking specifically of style information I would say that this should never be overwritten at component level. Scripts can be included in the body and shouldn’t require anything beyond what exists in the component anyway. Apparently the community desires this functionality. To me it seems a harmful construct which should not be supported.

All in all I think JSF 2.0 will be an improvement of the technology. It has listened and learned from what the community has built on top of JSF and incorporated the features that were most used. As well it improves its internal workings which is never a bad thing from a user perspective. While I may not yet understand all the choices made it is obvious that the changes were well thought out and planned. Assuming the implementation is improved as well then JSF 2.0 will be pretty much what you want from version 2.0.

Jasper Floor

useful links:


Java developers en programmeurs zoeken is een hele kunst

2 augustus 2008, door development

Hoe vind je een goede programmeur? Programmeurs hebben verschillende hobby’s, houden van verschillende soorten muziek, hebben verschillende meningen over politiek, enz. Je loopt niet zomaar een genootschap van programmeurs binnen waar je de mensen met de juiste kennis voor het uitzoeken hebt. Waar vind je dan die goede programmeur?

Eén ding weet ik wel – dat iemand vroeg moet zijn begonnen met programmeren om er echt gevoel voor te krijgen. Je moet als het ware kunnen denken in nulletjes en ééntjes. Als je als kind met een Atari of een Commodore 64 hebt gespeeld – dan ben jij er als programmeur vroeg bij geweest.

Tegenwoordig wordt je niet meer gedwongen om te weten wat programmeren is. Vroeger, met Basic, kon je eigenlijk niet anders. Vele technische ontwikkelingen hebben die ervaring overbodig gemaakt. Er zijn nu allerlei mogelijkheden om het echte programmeren te omzeilen. De programmeur die wij zoeken deinst er niet voor terug dit soort mogelijkheden links te laten liggen, en écht diep in de code te duiken.

Maar ervaring uit het verleden is niet alles. Je moet ook in staat zijn om je snel nieuwe technieken eigen te maken, en je moet veel lezen om je kennis steeds te blijven uitbreiden. En natuurlijk moet je weten waar je je computer voor kunt gebruiken. Dat programmeurs soms lui zijn is eigenlijk wel eens een goede eigenschap. Ik geloof in het woord automatisering! Laat de computer het werk maar doen.

Terug naar het onderwerp. Waar zijn de developers die we zoeken? Als je dit leest zijn we misschien klein stukje verder… Neem eens een kijkje op onze pagina Vacatures.

Met vriendelijke groeten,

Klaas


Java EE 6 progress page

28 juni 2008, door development

On this page I will try to keep track about resources related to the upcoming release of Java EE 6.

Java EE 6 will be the next edition of the enterprise platform that powers quite a lot of (web) applications. Java EE itself consists out of a lot of sub specifications, with JSF (web) and EJB (business) being major parts of that. New for this release will be Webbeans, a specification that integrates JSF and EJB more tightly than was possible before.

JSF 2.0

Despite some early critic, JSF is becoming the default web layer technology in Java EE. In a way, JSF can be seen as a foundational technology upon which a very vibrant community is able to build exciting new solutions.

Main JSR: http://jcp.org/en/jsr/detail?id=314

Discussion:
http://www.theserverside.com/news/thread.tss?thread_id=49870

Ryan Lubke’s blog:

New features in JSF 2.0
Part 1: http://blogs.sun.com/rlubke/entry/jsf_2_0_new_feature2
Part 2: http://blogs.sun.com/rlubke/entry/jsf_2_0_new_feature5
Part 3: http://blogs.sun.com/rlubke/entry/jsf_2_0_new_feature
Part 4: http://blogs.sun.com/rlubke/entry/jsf_2_0_new_feature3
Part 5: http://blogs.sun.com/rlubke/entry/jsf_2_0_new_feature1
Part 6: http://blogs.sun.com/rlubke/entry/jsf_2_0_new_feature4

Ed burns’ blog:

JSF and AJAX
http://weblogs.java.net/blog/edburns/archive/2008/02/jsf_20_update.html

Easier JSF custom components
http://weblogs.java.net/blog/edburns/archive/2007/11/jsf_usage_and_j.html

JSF 2.0 presentation
http://weblogs.java.net/blog/edburns/archive/20070628-swiss-jsf-user-group-jsf-2_0.odp

New JSF 2.0 features discussion
http://weblogs.java.net/blog/edburns/archive/2007/05/jsf_20_eg_kick.html

EJB 3.1

EJB is the part of Java EE that provides solutions for managing business code related artifacts, mainly transactions and domain entities. EJB started off in the wrong direction and especially EJB2 has received a lot of critic for being extremely cumbersome and heavyweight. EJB3 however is a complete redesign, using an ultra light approach and a very elegant design. Some say that EJB3 is actually a 1.0 version of a complete new technology. In Java EE 6, few major new additions will be done for EJB. Instead, existing functionality will be tuned and polished.

Main JSR: http://jcp.org/en/jsr/detail?id=318

New features in EJB3.1 by Reza Rahman:

Part 1: http://www.theserverside.com/tt/articles/article.tss?l=NewFeaturesinEJB3-1
Part 2: http://www.theserverside.com/tt/articles/article.tss?l=NewFeaturesEJB31
Part 3: http://www.theserverside.com/tt/articles/article.tss?l=NewFeaturesEJB31-3
Part 4: http://www.theserverside.com/tt/articles/article.tss?l=NewFeaturesinEJB3-Part4

Feedback on Reza Rahman’s articles:

Part 1: http://www.theserverside.com/news/thread.tss?thread_id=48198
Part 2: http://www.theserverside.com/news/thread.tss?thread_id=48684
Part 3: http://www.theserverside.com/news/thread.tss?thread_id=49108
Part 4: http://www.theserverside.com/news/thread.tss?thread_id=49749

JPA 2.0

JPA, Java Persistence Architecture, is the default ORM implementation in Java. Basically it allows a developer to specify a simple mapping with annotations or in XML from an Object to a relational data base table. JPA is based on existing ORM solutions like Oracle’s Toplink and Hibernate. Although originally part of EJB3, JPA has always been applicable to the entire Java platform, including Java SE. JPA 2.0 will be based on Eclipselink.

Main jsr: http://jcp.org/en/jsr/detail?id=317

Discussion:
Eclipselink for JPA 2.0
http://www.theserverside.com/news/thread.tss?thread_id=48757

Announcement
http://www.theserverside.com/news/thread.tss?thread_id=46406

Linda DeMichiel’s Blog:
http://blogs.sun.com/ldemichiel/entry/java_persistence_2_0_early


best counter