CRBS Systems Status


May 28, 2012 - 2:00 pm PDT

For additional updates on our power outage recovery efforts, please visit the status blog on our CRBS Wiki. Email support@crbs.ucsd.edu, if you are experiencing problems that aren't already listed in the blog.

May 28, 2012 - 12:00 pm PDT

No significant change in status in the last 13 hours

EMAIL - Delays of up to 12 hours are still being seen on some messages
CAMERA - Oracle down, Portal down
CCDB - Oracle down, iRODS down because Oracle is down
NIF - nif-apps[1-2] are down, blog and website are down
NITRC - believed up, waiting for confirmation
WBC - Dave Little reported servers are down

May 27, 2012 - 11:00 pm PDT

All of our servers are up. We think we have all the network issues save possibly one affecting CAMERA. We are still working on many of the services. If you notice a service down please submit a Jira ticket.
IRODS is currently down Oracle is currently down.

May 27, 2012 - 10:00 am PDT

Campus network issues affecting CRBS appear to have been resolved. Email is being delivered, though it will take some time for it to catch up with all the messages that were queued up. No mail should have been lost, but time will tell.
We are resuming our recovery efforts. Many systems that depended on servers at NCMIR or Holly are still offline and we are working on getting them back online.

May 27, 2012 - 7:30 am PDT

There are campus network routing issues that are preventing access to many of our systems - for example, our CRBS Status Page, NCMIR email, CAMERA systems When the network issues are resolved, we will resume our recovery efforts

Many thanks to Sean Penticoff, Edmond Negado, and Brandon Carl for working through the night and early this morning to get us this far.

May 27, 2012 - 5:00 am PDT

WORKING - some CRBS systems are now back online - for example, this wiki and our support page

May 27, 2012 - 1:30 am PDT

Power restored

May 26, 2012 - ~11:45 am PDT

Most of UCSD Main Campus lost electrical power

Tuesday, September 20, 2011
10:58pm PDT

SDSC's network maintenance and our server relocations have been completed.


At this time, we are aware of two issues:
- the license server at NCMIR (aka crbswin) is not booting
- there is a reverse DNS issue for mail.nitrc.org that will need to be resolved with campus in the morning

Please take a minute to verify that your systems are working as expected. If not, please report the problem via http://support.crbs.ucsd.edu, or by emailing support@crbs.ucsd.edu

Thanks,
CRBS SysOps
==================== ====================================================

Friday, September 9, 2011
1:30pm PDT

At this point, if you notice anything not working properly, please submit a ticket via http://support.crbs.ucsd.edu.


There are a few dev and stage systems that still need to be brought online.

Additionally, we are working on getting the CAMERA cylume Rocks cluster, and it's associated systems, including the gama server, back online.

Thank you for your patience through this extraordinary event.

Sincerely, CRBS SysOps
==================== ====================================================

Friday., September 9, 2011
11:56am PDT

Specific service status (NOTE: Many services are responding slowly due to startup load.)

Email Mail server (mail.ncmir.ucsd.edu) is back online. Spam filter (ironport) is back online. Email should be filtering through, but slowly.
Authentication Campus AD is up. CRBS LDAP is up. Atlassian Crowd auth (for CRBS SSO) is up.
Websites CRBS Status, NITRC, CCDB, NCMIR are up
Production Virtual Machines All are online.
Atlassian systems (Jira, Confluence, Bamboo, Crucible and Fisheye) are up. Configuration Control systems (svn, cvs, mercurial) are up. CCDB iRODS
Databases Production MySQL is up.
CAMERA Oracle RAC is up (all 3 nodes).
CRBS Oracle RAC is up (all 3 nodes).
Development Systems Development systems are coming up now.

Friday., September 9, 2011
11:56am PDT

Status by Project (alphabetically)

BrainInfo All production systems are online.
CCDB Public website online. Production portal, iRODS system, image servers and WIB are up.
NCMIR Data Storage - All systems online
Data Processing - All systems online (jane, txbr1, txbr2, iridium)
NIF All production systems online.
NITRC All production systems are online.

Friday., September 9, 2011
8:00am PDT

It will take longer than originally anticipated to get our systems back online.
Revised estimate is 10:00am PT.

Friday., September 9, 2011
5:55am PDT

Power was out in our area until ~1:30am PT this morning. Core campus systems appear to be up and functioning properly at this time, and we are starting to bring CRBS systems online.
We estimate 2-3 hours to get everything back online.

==================== ====================================================

Wed., August 31, 2011
8:35am PDT

The Crowd authentication server is down, affecting logins on Jira, confluence, subversion, etc.
We do not yet have an estimate for when it will be fixed.

==================== ====================================================

Tues., April 19, 2011
7:17pm PDT

The network routing issues have been resolved.

From Ron Joyce at SDSC:

Event: SERVICE RESTORED : UCSD campus has experienced a disruption to its network connections beyond campus.

Time: Restored at approximately 7:40PM Tuesday, April 19th, 2011

Duration: 1 hour 45 minutes

Cause: Physical cut of fiber lines

Services Impacted: Internal services including local email services continued to work, external resources inbound and outbound were not be available at this time.

If you are still experiencing problems with your Collocated systems please contact SDSC Operations at (858)534-5090 or operations@sdsc.edu, or your designated service contact. SDSC Operations staff are available 24/7 to assist you.

Tues., April 19, 2011
5:38pm PDT

The UCSD Campus is having network routing issues. Updated information will be posted as it becomes available.

==================== ====================================================

Tues., October 5th, 2010
7:50pm PT

The maintenance work at SDSC has been completed. No disruptions were detected or reported during the outage window.

Tues., October 5th, 2010
5:00pm-8:00pm PT

ACT is retiring a major network switch located at SDSC. Network connectivity to systems at SDSC may be disrupted during this maintenance window.

We apologize for the inconvenience and thank you for your patience and understanding.

==================== ====================================================

Fri., October 1st, 2010
6:15pm PT

There was a problem with one of our CRBS switches. The problem has been resolved and systems are coming back online. We are still working to determine the root cause.

Fri., October 1st, 2010
5:50pm PT

We are experiencing network outages affecting connectivity to our SDSC-hosted servers and systems. Updated information regarding affected systems, etc. will be posted here as it becomes available.

We apologize for the inconvenience and thank you for your patience and understanding.

==================== ====================================================

Tues., September 21st, 2010
8:21am PT

At this point, all systems should be back online and operating nominally. Please visit http://support.crbs.ucsd.edu, or email support@crbs.ucsd.edu if you are experiencing problems.

Tues., September 21st, 2010
12:47am PT

We are expecting network related outages affecting connectivity to some of our SDSC-hosted servers and systems as a result of maintenance work that is scheduled to begin at 7am this morning. Updated information regarding affected systems, etc. will be posted here as it becomes available.

It is expected that both the CRBS Compute cluster (aka cluster0) and our prototype lustre cluster will be offline for the duration of the outage,

We apologize for the inconvenience and thank you for your patience and understanding.

==================== ====================================================

Thurs., September 9th
3:19pm PDT

At this point, all systems should be back online and operating nominally. Please visit http://support.crbs.ucsd.edu, or email support@crbs.ucsd.edu if you are experiencing problems.

Thurs., September 9th
2:28pm PDT

The cause of the problem has been identified and we are working to restore service. When the power outage occured, all the servers in our primary production VM rack simultaneously moved to the alternate power source. This surge overloaded the circuit, resulting in shutdown of the entire rack. Power to the rack has been restored and heavy loads have been moved off the circuit to prevent recurrance. Systems are coming back online.

Thurs., September 9th
2:23pm PDT

Atlassian applications, including Jira, all Confluence-based wikis, Crowd user authentication, Bamboo automated build testing and Crucible are off-line.

Thurs., September 9th
2:11pm PDT

All linux workstations at NCMIR are unaffected, though it may take a few seconds longer than usual to login.

Thurs., September 9th
2:00pm PDT

Power outage at SDSC affecting network connectivity to production equipment. Engineers are working to resolve the issue.

==================== ====================================================

Mon., April 19th
9am-12pm PDT

COMPLETED - Scheduled Maintenance: Torque install on CRBS cluster.

==================== ====================================================

Thurs., April 15th
2-4pm PDT

Scheduled Maintenance: Confluence upgrade.
20 minute outage expected, plus DNS propagation

UPDATE 4:12pm - We have received reports of issues related to
propagation of the DNS change. We are looking into
the problem.
Users experiencing this problem are being redirected
from the wiki to the maintenance page on
yukon.crbs.ucsd.edu, which is incorrect.

==================== ====================================================

Sun., Feb. 28th
6-9pm PST

Scheduled Maintenance: NetApp upgrade. No outage expected

UPDATE 7:23pm - one of the VM Hosts rebooted, affecting
exeter.crbs.ucsd.edu, neurolex.org, and puppet.crbs.ucsd.edu
At this point, our monitoring systems indicate all systems are up and functioning properly.
Please enter a ticket at http://support.crbs.ucsd.edu if you notice any problems.

UPDATE 7:12pm - we are investigating reports of possible outages

Affected systems include anything using data stored on the NetApp appliance, primarily
- CAMERA 2.0 11g production and staging databases (not yet in production use)
- Most of the VMs hosted at SDSC, including but not limited to:
Jira
Confluence
Bamboo
LDAP/Crowd
Production NIF VMs
NIF, LAMHDI and WBC mailservers
SVN and CVS

Detailed VM hosting info can be found at https://confluence.crbs.ucsd.edu/display/CRBS/VM+guests+by+host

Status information will be posted at
http://status.crbs.ucsd.edu
https://confluence.crbs.ucsd.edu/display/CRBS/Current+Status

Any problems should be reported by email to support@crbs.ucsd.edu

==================== ====================================================

Mon., Feb. 8th
9am PDT

RESOLVED: 12:30pm PDT
Problems with Crowd, Jira, Confluence
We are currently troubleshooting a problem with the server that hosts Jira and Confluence (our wikis). We hope to have the problem resolved by 1pm PDT.

==================== ====================================================

Wed., Sept. 30th
8:00pm PDT

Rehosting of primary DNS server.
No interruption of Domain Name resolution is expected.

==================== ====================================================

Thurs., Sept. 17th, 5:00-9:00pm PDT

Update 7:45pm PDT: We have restored all connectivity and services. At this point, please visit http://support.crbs.ucsd.edu, or email support@crbs.ucsd.edu if you are experiencing problems.

Update 6:40pm PDT: We have restored network connectivity to all servers and are working to bring a few services that are still down back online.
Update 6:00pm PDT: We are still experiencing difficulties with network connectivity to some of the servers at SDSC. We are working to correct the problem.
Update 5:20pm PDT: Switches have been switched to alternate power. Systems that experienced an interruption are coming back online. It may take a few minutes for the network traffic to settle.
Power interruption at SDSC to facilitate upgrade of Emergency Power Off switch. Downtime for CRBS equipment varies from no interruption to down for the duration. We are working to acquire additional hardware that will either shorten or eliminate the outage for most servers. Some servers, particularly the CRBS cluster and the CCDB cluster, including non-cluster machines located in those racks, will be unavailable for the duration of the power interruption. A list of affected servers, and the extent to which they will be impacted can be found here.

==================== ====================================================

Sun., Sept. 13th, 11:46pm PDT

Crucible is now back online, joining Jira, Confluence, and Crowd. Some minor configuration cleanup remains to be done. Please visit http://support.crbs.ucsd.edu or email support@crbs.ucsd.edu if you notice any problems.

==================== ====================================================

Sun., Sept. 13th, 1:41am PDT

The upgrade of the server for Jira, Confluence, Crowd and Fisheye/Crucible and of those applications is nearly complete. The applications are up and running on the new server, with the exception of Fisheye/Crucible. Some minor configuration cleanup remains to be done. We hope to have Fisheye/Crucible back online later today. Please visit http://support.crbs.ucsd.edu or email support@crbs.ucsd.edu if you notice any problems.
For more information, visit here.

==================== ====================================================

Sat., Sept. 12th, 6:00pm-TBD

The server for Jira, Confluence, Crowd and Fisheye/Crucible is being upgraded. Operation of these applications will be intermittent until further notice.
For more information, visit here.

==================== ====================================================

Tues., Aug. 18th, 10:20pm-11:30pm

We experienced network connectivity problems to servers at SDSC from some places outside UCSD/SDSC. SDSC resolved the problem.
For more information, visit here.

==================== ====================================================

Tues., Aug. 18th, 9pm-midnight

We will be upgrading the Bamboo build test server to version 2.3.
For more information, visit here.

==================== ====================================================

----- COMPLETED ----- Sun., Aug. 16, 6pm -
Mon., Aug. 17th, 7am

As of 7:14 am, the Oracle RAC upgrade and domain name change are complete.
For more information, visit here.

At this time, all Oracle RAC databases are back online and functioning nominally.

==================== ====================================================

------- EXTENDED ------- Sun., Aug. 16, 6pm -
Mon., Aug. 17th, 7am

As of 3:35am, upgrade was progressing, but not yet complete. Next update will be at 7am, Monday, August 17th.

We are patching our production Oracle RAC system.
During this time, databases hosted on this system will be unavailable.
For more information, visit here.

==================== ====================================================
August 5, 2009 12:30pm PDT We are experiencing problems with authentication in Jira, Confluence, etc due to problems connecting to UCSD Campus Active Directory. We are working to resolve the problem.

If this issue is affecting you, you may request a CRBS SSO account, as an alternative. Please visit this link.
==================== ====================================================
August 4, 2009 9:30pm PDT Upgrade of Fisheye/Crucible to resolve performance and minor bug issues.
==================== ====================================================
August 1, 2009 9:00am-5:00pm PDT

We have rolled back from our attempt to upgrade our CRBS SSO (Single Sign-On) application and the hardware hosting that, Jira, Confluence and Fisheye/Crucible.


At this time, all affected systems have been restored to their state prior to the upgrade attempt. The upgrade will be rescheduled for a later date.


If you have questions or concerns, contact Vicky Rowley, vrowley@ucsd.edu.

==================== ====================================================
July 19, 2009 7:42 pm PDT The work on the network infrastructure at SDSC has been completed, except for restart of Oracle RAC and verification of any systems that use it. If you notice any anomolies, please open a ticket by visiting http://support.crbs.ucsd.edu, or by emailing support@crbs.ucsd.edu. Again, we apologize for the inconvenience.
==================== ====================================================
July 19, 2009 4:26 pm PDT We are upgrading the network at SDSC. All systems at SDSC have been affected, including the Oracle RAC. The network work is complete and we are in the process of bringing the systems back online. We expect all systems to be back online later this evening. We apologize for the inconvenience.
==================== ====================================================
May 26, 2009 1:56 pm PDT Network problems at SDSC have been resolved. We apologize for the inconvenience.
==================== ====================================================
May 26, 2009 12:20pm PDT SDSC is experiencing network problems. This affects all services hosted at SDSC, including Jira, Confluence, CVS, database connectivity, etc.
==================== ====================================================
May 2, 2009 5:30am-6:30am PDT SCOPE: Network hardware upgrades and router reloads.

IMPACT: Rolling outages of attached networks during this maintenance window. Intermittent connectivity loss of approximately 10 minutes during this window can be expected. This includes both wired and wireless networks. CalIT (aka Atkinson Hall) will definitely be affected! Email may be delayed
==================== ====================================================
Apr. 30, 2009 10pm-11:45pm PDT SCOPE: Network hardware upgrades and router reloads.

IMPACT: Rolling outages of attached networks during this maintenance window. Intermittent connectivity loss of approximately 10 minutes during this window can be expected. This includes both wired and wireless networks. Basic Sciences Building (BSB) and Holly Building will definitely be affected! Email may be delayed
==================== ====================================================
4/28/09 4:50pm-5:13pm PDT Disruption of service to NCMIR Home directories has been resolved.
==================== ====================================================
4/19/09 8:46pm PDT COMPLETE: Crowd, Bamboo, Confluence, Jira, Fisheye and Crucible have been upgraded.
New Jira Features
New Confluence Features
NOTE: New URLs:




     Bamboo Build Testing:     http://bamboo.crbs.ucsd.edu
     Confluence Wiki:             http://confluence.crbs.ucsd.edu
     Crowd Single Sign-On:    http://crowd.crbs.ucsd.edu
     Crucible Peer Review:      http://crucible.crbs.ucsd.edu
     Fisheye CVS Interface:     http://fisheye.crbs.ucsd.edu
     Jira Tracking System:       http://support.crbs.ucsd.edu
                                            (http://jira.crbs.ucsd.edu will also work for support.)
==================== ====================================================
Apr. 13, 2009 6am PDT COMPLETE: SDSC Network upgrades - minimal disruption expected as we are switched from one main router to another
==================== ====================================================
April 7, 2009 - 3:05pm COMPLETE: The crowd database was successfully restored. All Jira logins should operate normally at this time.
==================== ====================================================