We met with EMC on Friday. 4 hour meeting. :-)  

We sent them away to re-do their proposal, something we’ve done with all the other vendors. We are promised a new quote by noon, Monday.

I’m waiting for the revised proposals from Pillar, NetApp & EMC at this point. 3Par came back quickly. The others have promised delivery either tonight or early Monday.

Once we have all of those proposals we’ll crunch a bit, discuss a bit and make a decision.

So I could validate what we’ve been getting from vendors, I compiled my own IOPS graphs on Saturday. Three cheers for RRDtool! Data visualization is cool.

I’ve also been reading over NetApp documentation and pulling random plugs out of the Pillar test system we have to see how it handles random failures.

There’s been a lot of lively discussion about the SAN inside of EY, including a brief session, where I spoke a little about our plans during an internal “all hands” call last Friday.

 

We’re very close to narrowing things down, just waiting for the final proposals.

-Edward Muller
P.S. Here’s another graphical teaser. The ey02 IOPS for a sample period taken last month.

The executive team met with NetApp today. These guys are hungry for our business.  They are a solid company, with a long history in NAS. They are the only vendor to date to offer a 10Gb ethernet option. And, they are also the only vendor, so far, that offers RAID-6 like protection (dual disk failures) in a raid set, which they call RAID-DP (dual parity).

Pillar Data, along with our data center staff, installed an Axiom One for testing purposes today in our CA data center. I hope to get some time tomorrow to play with it personally. I know Dan Peterson, one of our Systems Engineers, is going to be working it over though. I can’t wait to see some benchmark results.

Friday morning, we have a meeting to discuss a new cluster layout and an initial pass at a migration plan for our customers to the new SAN based cluster(s).

We’re also meeting with EMC tomorrow. I’m excited to see what they’ve put together for us. Perhaps we’re saving the best for last?

We’ll have more info tomorrow night.

Thank You!

-Edward Muller

P.S. Here’s a pic of the baby Pillar Axiom test system…

Dan Peterson made some progress in the lab yesterday and today setting up a reference iSCSI implementation, testing failover scenarios with the native linux dm-multipath software and multipathd.

We’re getting a Pillar Data systems test setup to play with tomorrow, mostly for iSCSI testing. It can’t hurt to get hands on with a potential purchase either.

Most of the executive team and some engineering resources met with both Pillar Data Systems and 3Par today. They presented their initial proposals to us.

NetApp is Thursday, EMC is Friday.

-Edward Muller

Thinking

Ever since I started this SAN project, even before we moved into this sudden death overtime mode, I’ve spent a lot of time thinking about how we would use the SAN we purchase. After meeting with several vendors I came to the conclusion, a conclusion I still believe to be true, that ALL of the vendors I met with can do everything we need.

So what matters then? IMO what matters are the edges of the integration points. The devil truly is, in the details. We’ve been paying a lot of attention to the different architectures offered by each vendor, how they impact our existing designs, how we can get the highest utilization of the disks we buy, what the migration strategy will be, failure modes, recovery modes, upgrade issues, etc.

We’ll be in the lab over the next few days testing iSCSI failure and recovery modes as well as basic performance and latency testing.

Capacity & IOPS

The first thing we need to do though, is properly size a solution from each vendor. To that end, I’ve spent a lot of time since Friday talking to each vendor’s technical teams so that they understand our existing architecture, the iostat & capacity calculation we’ve sent them.

All of the short list vendors have had multiple people crunching our numbers all weekend long to make sure the solution they provide will meet our existing needs as well as scale with our growth. This is very important considering the fact that, 12 months ago, Engine Yard had only 3 clusters to manage, compared to 12 (and growing) clusters today.

Here are some graphs customers may find interesting. These graphs are provided by 3Par. Most other vendors have provided similar numbers.

 

 

Those graphs give us a pretty good idea of our IOPS over the time period, which is 1 week IIRC.

So that’s all I have for today. It’s 2:00 AM and I’ve been at this since 9:00 AM yesterday. :-)  

More to come after some sleep.

-Edward Muller

Product Literature

Today we received new information from a number of vendors.  Most of the information received was in the form of a product comparison literature.  We are currently utilizing this literature to build “cheat sheets” for each vendor.

The cheat sheets will help us refine each products strengths and weaknesses.  During talks with the vendors, we can leverage these sheets to check product features, verify the accuracy of new literature, and generate questions about competing products.

Outage Related Developments

With the bad comes the good, and the outage last week was no exception.  I’m very hopeful that we will be able to announce some technical developments that our staff have been hard at work on since Friday.  As soon as we have finished testing, we’ll post a notice here.

–Taylor

The Team

Lance Walley (CEO), Tom Mornini (CTO), Sunil Pareenja (VP Finance), James Ash (Dir. Sales), Joe Arnold (Dir. Engineering), Taylor Weibley (Dir. Support), Dan Peterson (Sys. Engineer), Ed Muller (Man. Sys. Engineering), Lee Jensen (Man. Support) and Roy Pickron (Man. DC Ops.).

Team Meetings

Our Executive Staff team convened multiple times on Friday to formulate and review our plan for addressing the needs of those affected, and for the purchase, integration, and migration of a new SAN.

Contacting Vendors

Ed Muller sent an email to EMC, NetApp, Pillar and 3Par.  The email included a description of our ends, the proposal / quote delivery deadline, contact information for all of the staff involved, and io stat information for evaluation by the engineers at each vendor.

We already heard back from one vendor who promised to assemble a team which would work through the weekend.  I think the following line might have encouraged them to work over the weekend:

Roy Pickron, our Data Center Operation Manager, and myself are available all weekend and next week to discuss… Please call/email us at any time if you have any questions.

Vendor Follow Up

Engine Yard does not sleep!  We will contact each vendor Monday in an effort to answer any questions, and reiterate our firm Wednesday deadline.  Additionally we will have intra-team meetings on Monday to review any information we have received over the weekend.

Next Communication

Sunday.  I’ll detail some of the more technically related actions we’ve taken since Friday, and I’ll provide updates on interesting weekend communication.

–Taylor

Please check back tomorrow for information on our SAN upgrade.