Recently, I had the had the honor of being a guest on the Nutanix Community Podcast with Angelo Luciani. We discussed the new book and the writing process. There are many methods that writers use to get their thoughts out on paper or pixels. I prefer to write things down as quickly as they come [...]
Blog
VMworld 2017 – Day Two – Live Blog
9:02AM - Pat Gelsinger / Michael Dell on stage - first topic on support - then a discussion on AI 9:14AM - Discussion on Dell/EMC and VMware relationship - Commitment to open $VMW ecosystem 9:17AM - Pivotal is now on stage to discuss evolution of Pivotal - Talk about challenges with Kubernetes - Pivotal Container Service office [...]
VMworld 2017 – Day One – Live blog
There are a number of anticipated announcements today. We should see some more details on: VMware Cloud on AWS VMware Cloud Services AppDefense and New Security Paradigms Data Center Modernization and SDDC I will be live blogging the announcements as they come in. 8:58AM Things are just getting started Here is a blog discussing some [...]
Why it pays to allow failure to occur in your infrastructure – The AWS S3 failure
On the last day of Feb in 2017, there was a failure that occurred in the AWS platform. It was not an outage, per-se, but it’s effect was seen as such. The terminology was: AWS services and customer applications depending on S3 will continue to experience high error rates. S3 services in the US-east-1 region [...]
When self healing systems attack themselves
On January 15th 1990, the AT&T long distance network had a full on collapse due to a single line of code and an obscure set of circumstances. The part of the event that caused more pain, was the fact that the system was supposed to isolate problem switches, or ones deemed “crazy”. In this instance [...]
Risk Intelligence Quotient (RQ) and decision fatigue
The idea of a risk intelligence quotient was put forth by the behavioral scientist Dylan Evans. He postulated that everyone has a capacity for determining probability based on the ability to internalize and mull information gathered from various sources. A quote from Dylan is: ‘Risk intelligence is not about solving probability puzzles; it is about [...]
The cost of an outage
Recently there was an article that discussed the measurable and unmeasurable costs of downtime. To quote from the article: Measurable Costs of Downtime Once you’ve understood the scope of the impact, you can quantify the actual dollar value of losses associated with downtime. In general, measurable downtime costs fall into the following areas: Direct and indirect [...]
The continual search for creative perspectives in infrastructure design
I like to find relevance in the obscure. To find parallels in topics that diverged long ago, yet have great similarities. In this instance I am talking about the history of structural engineering and material sciences on one side and designing scalable IT infrastructures on the other. The foundations are very similar if viewed in [...]
Transparency in the site failure (of Jan 31st 2017 ) and post-mortem for GitLab
On January 31st 2017, the GitLab site had to be taken down for emergency maintenance. Something bad had happened due to a number of issues occurring and the planned recovery process was thrown out the window because, in their own words: “out of five backup/replication techniques deployed none are working reliably or set up in the [...]
Site Reliability Engineering and embracing risk
Over the years, Google has had many different services come into existence. Some worked with the market, others did not. One thing that you could always count on though, was the reliability of any services they did have. Gmail for example, has over 1 Billion active users and has had 300% growth year over year [...]