We’re waiting for a set of backplanes to ship from CoRAID so we can test them in our dev cluster and finalize the roll out plan. We’re expecting the test backplanes to arrive on Monday, November, 24, 2008. 

For more insight into the issues related to the CoRAID backplanes here is an email from CoRAID describing the discovery and issues related to the fix.

Hello Lee and all,

Recreating and identifying the cause of the multi-disk failure was not an
easy task.  Our initial testing of the chassis when it was first introduced
did not expose any problems.  Units were shipped to Engine Yard and other
customers without reported problems.  Our disk failure testing did not
expose any problems.  But it should be noted that the disk used by Coraid in
our tests were not the same model and make as Engine Yard is using.

In our search to locate the source of multi-disk failures in the SR2461
chassis, we tested many different theories; including firmware code bugs,
disk temperature, disk vibration and backplane power.  All of these tests
were unsuccessful in exposing the problem until we discovered that early
SR2461 chassis had been shipped with a different disk backplane.  We then
discovered that our chassis supplier had updated the backplane without
notifying us of the change.  Our purchasing agreement with our vendor
requires them to share ECO changes when they are made, but for this
particular change a mistake was made and it was not reported to Coraid.

After discovering the backplane change had occurred, we surveyed our test
systems and found that they all had the newer versions of backplane.  We
located an early backplane version and installed it in our lab test
environment.  When power stress testing was done on a full complement of
disks, we were able to repeatedly replicate the multi disk failure case that
Engine Yard has experienced.  When the backplane was replaced with the new
version, our power stress testing works without problem.

To insure we are satisfied with the test results, we purchased 24 of the
drives you use and repeated our stress tests without seeing any problems with the new version backplane.

Disk drive power consumption varies based upon operating load, manufacture
and model.  The power problem associated with the early backplane was caused
by marginal power distribution on the backplane.  The new design backplane
has corrected this power distribution problem, at least with all the disk
drives we have tested.

Coraid has obtained replacement backplanes for all SR2461's with the early
backplane version.  We are ready to start the backplane change out as soon
as Engine Yard has completed its survey to locate the affected chassis.  We
will do our best to work with Engine Yard's operational schedules to
expedite this field change.

Coraid will do all we can to remedy this problem.  Please advise how you
would like to proceed.

My apologies,

Jim

-Edward Muller

Post a Comment

You must be logged in to post a comment.