Bummer Day for Production November 17, 2009
Posted by skunkworkscmj in DBMS Systems, Hardware.Tags: Greenplum, Sun 4540
add a comment
Our production Greenplum database system consists of 1 master node and 3 segment node servers. Hardware is Sun X4540/Thor servers with 32gb RAM and 250kb drives. We’ve been working through some workload management issues as we’ve brought the system online – mostly struggling with the few queries with huge costs that run at the same time normal queries are running. Either the single high-cost query consumes all of the resources, or never makes it into the queue.
Part of the resolution was to double the memory in the 3 segment nodes. The RAM was on back-order with Sun for a few weeks, then showed up today. We brought down our QA system to install its memory upgrade and everything went fine. At 3:00 pm we brought down production and began the installation . . . at 6:45 I was notified that after the memory was installed 2 of the 3 segments came back to life, but the third wouldn’t even boot. We are on Sun’s Gold service package with a 4-hr SLA, so we logged a ticket with them and hoped to the best. Its now 10:40 and I am hearing that we are in the process of installing a replacement server and re-distributing the data.
Definitely a bummer day for production, we’ll see what the damage is in the morning. The Sys-Admin teams are burning midnight oil on this one – Thanks guys!