Sunday, February 19, 2017

Customer reliability engineering

I was blown away when I spoke with Dave Rensin during my visit to the Google campus. He is Google's Director of Customer Reliability Engineering, and his views on customer support are world-class.

Dave's goal is to drive customer anxiety to zero; to remove the things that would cause customers to want to leave.

To do this, Dave's team has several principles they follow:
  1. Make sure your customer understands they are not alone. They need to feel a sense that "we are in this together" and you will stay with them until the problem is solved.
  2. Ensure customers never feel like they are talking in a vacuum. Never let the customer feel like you know more than you are telling them. Tell your customer all the details (without sensitive info, of course).
  3. Create a shared fate with your customers. Arbitraging issues with money (or credit) is not good enough. Dave's team reviews their customers' production systems and provide guidance on how to make it up to Google's standards. If the customer meets those standards, they will then identify when issues are caused by their customer's system, and proactively reach out with possible solutions (utilizing a shared dashboard both his team and the customer can see) -- and they do that for $0. This drives mutual accountability; truly being "in it together."
Dave goes into some details of the above in the video below. More details can be found in Dave's Google Blog post.


Sunday, January 8, 2017

Bet on machine learning

Companies like Google and Microsoft offer impressive machine learning capabilities in their public cloud products. This means artificial intelligence is significantly more accessible to any business than before.

How does it work?
At a high level, machine learning is a data analysis method which uses historical data, examples and experience to devise a model to automatically predict future outcomes (instead of hard-coded rules). The key is the "learning" part: the algorithm continues to evolve to make the predictions more accurate over time.


The traditional ways of machine learning involved more manual methods of developing models and algorithms. IBM's DeepBlue, for example, was programmed to learn to play chess in the 1990's (and beat the world champion). However, chess has a relatively small and finite set of moves per position (about 20) -- fairly easy to program a computer to learn through brute force.

Fast forward to 2016 and Google's DeepMind project AlphaGo. It utilizes sophisticated neural network algorithms, and was used to defeat the Go world champion. Go has about 200 moves per position, with more possible board configurations than there are atoms in the universe! This demonstrates the power of the neural network algorithm. Most importantly, it shows that general-purpose artificial intelligence can exist.

Neural networks mimic the learning process of the human brain. The AI from DeepMind uses a technique called Deep Reinforcement Learning. It learns from experience, using raw pixels as data input. AlphaGo was shown hundreds of thousands of Go games so it could learn from human players. Then Google had the AI play against itself 30 million times. Over time, it got better; to the point where one of the algorithms had an almost 90% win-rate against the other. That was the one selected.

Naturally, a human could never play 30 million Go games in their lifetime. The machine does not get tired, nor make emotional mistakes. The AI's experience becomes super-human, despite the fact it originally learned from humans.

Watch what happens when Google used the same algorithm to train the machine on the famous Atari game Breakout. The goal given to the machine was to maximize the score it could achieve in the shortest amount of time. At first, the AI is pretty terrible at the game. However, after about 2 hours of playing, it is very good. After 6 hours it does something amazing: it becomes super human.


Swiftkey, the makers of a keyboard app for mobile devices, nicely demonstrate how a neural network helps improve their word predictions.


Using ML in your organization
The ability to plug directly into some of Google's (and others' like Microsoft and Amazon) algorithms in the cloud make ML much more accessible. I am more familiar with Google's offerings, so will highlight a few:

Google's Cloud Vision API is image recognition in the cloud. It can detect what is occurring in images (including sentiment analysis of humans). A city in Canada trained Google's AI using thousands of school bus stop sign videos. The goal was to have the machine watch the videos and identify if a vehicle went passed the bus' stop sign illegally. The algorithm was trained to identify when the sign was out and active, and when vehicles had passed through it. It turned out to be 99% effective, while humans were only 83%. This resulted in increased revenue through traffic violation tickets.

Disney used Cloud Vision for their marketing campaign for the movie Pete's Dragon. The site set children on a hunt in their homes for common objects (like chair, door, tree, clouds, etc.). Once detected by the algorithm, Elliot the dragon would magically appear on the screen.


Google's natural language processing API is something which could be leveraged in the example given in my earlier blog post on data. By analyzing millions of public social media posts for certain sentiments and cues, the sales team can potentially land deeper leads.

Google also has a translation, audio-to-text, and even a new job search API.

Lastly, Google's open-source TensorFlow is a machine learning library for numerical computation using data flow graphs. Developers can use this to build models with very little code and eventually translate them into products in Google's cloud.

The future: humans + machines
I believe the businesses which adopt and master machine learning the best will be the most successful in the future, regardless of industry. (Of course, it helps to have a lot of data to train your model.)

While ML may eliminate some jobs, I feel it will be the successful partnership of humans and machines which will bring the most fruitful benefits. Take a radiologist, for example: she may leverage ML to assess her readings faster, but also provide additional oversight for deeper analysis. 

Ultimately, where the AI takes us is hard to predict, but the positive impact and advancements made will most certainly be exponential.

Monday, January 2, 2017

Boss vs. leader

I strive to be the best leader by emulating some of my favorite leaders. They tend to be the ones who put others' interests first, drive collaboration and teamwork, and promote and inspire a positive future.

The following two graphics demonstrate what I feel is the difference between being a "boss" and being a "leader."




I wrote earlier about the importance of being a great leader.

Friday, December 30, 2016

Data, data, data

"Data, data, data" is the new "location, location, location."

Uber owns no taxis, yet is the largest taxi company in the world. AirBNB owns no real estate, yet has the most accommodations in the world. These companies run their businesses on data, and lots of it.

Data is king, and it is only becoming more important. Proper analysis and utilization of data helps to uncover the what, the why, and even predict the future. As a result, data must be a core component of your digital strategy.

Hindsight
At the most basic level, data gives us hindsight. A simple example is how grocery stores utilize loyalty cards. Customers sign up for them with some basic personal information, and in return the store gives the customers discounts when they use their card. Data collected from these cards helps the grocer identify individual purchasing habits -- it gives them hindsight.

This is why online retailers encourage customers to create accounts. The data collected (which products are being viewed, which terms are being searched for, etc.) all help track what is happening in their store.

Insight
Understanding the "what" is just the basics when it comes to data analytics. Having the view into the "why" provides insight.

Why do certain customers buy one product over another? Why do certain products sell better at certain times of the year? These are the types of questions the data can help provide insight into.

Foresight
Being able to predict behavior is the next step; this is where the most positive transformation can occur for an organization.

Again using grocers as an example, stores can use big data to predict and suggest the price points of certain products at certain times to ensure the right amount is in stock and fresh. If the price of strawberries, for example, is too high grocers risk having too many in stock and the strawberries going bad. If they accurately predict the right price point, they can keep the right amount moving off the shelves at a pace that ensures each package is still fresh.

Lastly, there are some scenarios where proper data analysis can actually help to prescribe some actions. In other words, using data can help make things happen.

Let us use the car company Fiat as a fictitious example for this. If Fiat mined the publicly available social media posts specifically looking for terms which suggest a propensity to buy their car, they may be able to help drive more sales. The scenario could go something like this: John Smith posts to Twitter, "Thinking about buying the new Fiat. Can't decide between that or the Toyota Prius." That post will get picked up in Fiat's social media scanning algorithm, and alert the salesperson in John's region to contact him directly. That contact may help to influence John in purchasing a Fiat.

Making it happen in your organization
To leverage data effectively, naturally you need data. Determine the sources, and if none exist start setting up your data collection processes.

Once you have the data, it needs to be usable. Having it in 25 disparate systems will make life tough. Rather centralizing it and "cleansing" it for use (i.e. ensuring accuracy, removing duplicates, etc.) is key.

Additionally, data can help create a source of revenue. Identify any data which may be unique to your organization which others externally may pay to access. Ensure proper usage controls and governance are in place.

Also keep in mind potential external integrations or partnerships.

Ultimately, there are endless possibilities to how you can utilize data. Start small, take an MVP approach, and build from there as you learn what works for your organization.

Friday, December 23, 2016

High-performing teams, Part II - Being proactive

There is not one thing which creates a high-performing team (HPT). Trying to define the numerous aspects of an HPT culture took me an entire blog post. However, being proactive is one key attribute required for all individuals of a high-performing team.

Doing what is expected
My prior post discussed team growth expectations. In order to achieve continuous growth, each individual simply doing what is expected of them is not enough for achieving HPT status.

Take a software developer, for example. They are expected to create X features working on Y product while collaborating with their teammates. They are expected to complete those features on time, follow proper standards, and ensure their code is efficient and secure. That is the baseline. That is expected of them each and every day. While that may sound great, my view is if everyone on the team did that year after year team growth would be stagnant. (And it may get boring for the developer!)

Being proactive
Being proactive is the key to unlock exponential growth and creativity in both individuals and teams.

The definition of proactive:
Creating or controlling a situation by causing something to happen rather than responding to it after it has happened.
Take a software developer again as an example: They can be proactive in numerous ways, including identifying a new solution to a problem the team is facing (without being told, of course), implementing it, and organizing a lunch-and-learn session to ensure everyone is aware and understands the new way forward.

The key is for individuals to take the initiative in looking for ways they can help improve themselves, the team, and the company. This is often where new and creative ideas emerge, which naturally leads to learning and mastery.

The proactive expectation (and contradiction?)
I argue being proactive is therefore expected of all team members.

Does this mean, however, any proactive work is then simply viewed as par for the course? Does this mean no individual can ever be seen as going above-and-beyond?

No. The beauty of being proactive is while it is expected of everyone, there are so many ways in which it can be done. Therefore, it is impossible to define exactly how to do it, so it can never be explicitly expected.

Monday, November 21, 2016

Team growth expectations

Most companies have year-end performance reviews. This is a time to reflect on how each individual has performed. Managers have a critical responsibility to also look at the team as a whole.

  • Is the team growing overall?
  • Where is the team this year compared to last year?
  • Is the team well prepared for the year to come?

Each team is expected to continuously grow.
Each team may define growth differently, but I like to think of team growth as the collective improvement of skills (both technical and "soft"), processes, and tools. A high-performing team knows they must continuously improve to remain ahead of the competition. This implies constant growth.

New things means new challenges; expect them.
In order for team members to grow, managers must provide the space and a safe environment where trying new things and failure are OK. Managers should expect small bumps in the road ("growing pains") when new things do not work out or take time to learn. As long as they are followed up quickly by learning and more improvement, failures are OK.

Doing the same as last year is unacceptable.
These new improvements will become common knowledge, and collectively the team will grow. This sets a new bar for the team to aspire to. The collective team's growth means every individual is expected to keep pace. Someone performing at the same level the team was collectively at a year or two ago is now significantly behind pace. These individuals must demonstrate an immediate improvement.

I try to illustrate these expectations in a very rough graphical sketch below. The green line represents (at a high level) the overall growth of the team. Highlighted are sharp improvements (perhaps a new process was implemented, for example) followed by small declines, or "growing pains." The team's collective growth sets the bar for the next year, and all the individuals are expected to keep pace.

Sunday, September 18, 2016

Measuring anything

Most projects require some sort of measurement to obtain approval, determine viability, estimate return on investment, etc. It can appear challenging to think of how to measure risk, productivity, profit, etc.; however, Douglas W. Hubbard's book, How to Measure Anything, demonstrates anything can be measured, and in more practical ways than you might think.

A reduction of uncertainty
It is important to note how Hubbard defines measurement: observations which quantitatively reduce uncertainty. This is key as it takes the pressure off individuals for having to be exactly precise in their answers. Especially when just starting out, even a small reduction in uncertainty can be a large step toward a particular outcome. Hubbard points out that even sophisticated scientific experiments have margins of error; measurements for business are no different.

Really, anything?
Yes, anything can be measured. (Although not everything necessarily should be measured.) Hubbard suggests the following to help demonstrate this:
If it matters at all, it is detectable, observable.
If it detectable, it can be detected as an amount or range of possible amounts.
If it can be detected as range of possible amounts, it can be measured.
Determining the "what"
Understanding why you want to measure something helps guide the scope of what can be measured. For example, someone may say, "We want to measure IT security." The first question to ask is: what is IT security? From there, you should be able to identify particular objects of measurement within each part of your answer. Once you have your object of measurement and understand what it means, you are halfway there.

It is easier than you think
When we are struggling with measurements, Hubbard reminds us of the following:
  1. Your problem is not as unique as you think. Recognizing that others may have solved similar types of problems in the past may help to put things in more perspective.
  2. You have more data than you think. Some data is better than none. 
  3. You need less data than you think. Again, we are not looking for 100% certainty.
  4. An adequate amount of new data is more accessible than you think.
Obtaining measurements
Hubbard's "Applied Information Economics" has 5 steps to help obtain measurements. I try to summarize them below:
  1. Define a decision problem and the relevant variables. Asking "why?" helps here. Start with the decisions you need to make, then identify the variables which would make your decision easier if you had better estimates of their values. What is the decision this measurement is supposed to support?
  2. Determine what you know. Quantify your uncertainty about those variables in terms of ranges and probabilities. Hubbard uses the term Confidence Interval (CI) to gauge the level of uncertainty for a certain interval. A 90% CI would be one in which there is a 90% chance all outcomes fall in the interval you provided. For example, my 90% CI for average commute times in my office is 30-70 minutes. It is important to be "well-calibrated" in giving your 90% CI. Hubbard suggests the equivalent bet test as a way to gauge how calibrated you are.  
  3. Pick a variable, and compute the value of information for that variable. Some variables' measurements will be more valuable than others. The goal is to find the variable with a reasonably high information value. (If you do not find one, then skip to step 5.)
  4. Apply the relevant measurement instruments to the high-information-value variable. Go back to step 3 to repeat this process with any remaining high-value variables.
  5. Make a decision and act on it. 
Hubbard suggests at least 10% of the project budget for large efforts be spent on performing measurements to first justify the investment.

Note: Beware the "measurement inversion." Hubbard warns that most managers tend to measure the data which are easiest to obtain, but provide the least amount of economic value. Hence why step 3 above is critical.

Measurement instruments
Hubbard outlines the following to help start us toward our measurements:
  • Decomposition: Which parts of the thing are we uncertain about?
  • Secondary research: How has it (or its parts) been measured by others?
  • Observation: How do the identified observables lend themselves to measurement? Can you create a way to observe it indirectly?
  • Measure just enough: How much do we need to measure it?
  • Consider the error: How might our observations be misleading? Consider things like confirmation, observer, and selection bias. 

Hubbard describes at length many different types of measurement instruments like controlled experiments, regression modeling, and Monte Carlo simulations. I will highlight just a few of those which do not involve too much (or any) math, because I think it is important to have a few straightforward methods "in your pocket:"
  • Rule of 5. There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population. This rule allows us to obtain a CI greater than 90 by only sampling a small amount of the population. 
  • Spot sampling. Determining how many fish are in a lake can seem impossible (unless you drain the lake!), but spot sampling can help here without draining the lake. In this case, a biologist might catch 1,000 fish, tag them, and release them back to the lake. A few days later she may catch another 1,000 fish and see that only 50 fish (5%) had a tag on them. This means there are approximately 20,000 fish in the lake.

Simple personal example
I'll conclude with a personal example of how understanding that anything can be measured can help expand possibilities. I was in a senior management meeting on a topic around improving our company's leadership, and someone said, "It is almost impossible to measure the performance of managers."

So I suggested the following:
  1. What things do we consider make a good manager or leader? 
  2. Of those traits, what are ways we can observe and measure those?
The group discussed many areas like team performance (which itself needed to be broken down further to define measurements), as well as retention/attrition rates, referrals (i.e. employees referring friends for open positions under that manager), promotions, etc.

The group was able to identify why measuring manager performance was worth measuring, and as a team identified possible measurements. The next step would be for us to put a value on each variable, and decide what decisions the measurements would be used to support.

Sunday, August 21, 2016

Be open: Integrate and let integrate

A key principle I drive at my organization is technical openness. This means all the tech we leverage should be based on open standards and frameworks. There are many reasons for this, including:
  • Superior interoperability and integration with other systems.
  • Prevents "re-inventing the wheel."
  • Avoids being locked into proprietary and costly technologies or vendors.
  • Improves agility and choice; can select best-of-breed solutions for each job.
  • Broadens support pool and timelines.
  • Increases innovation, as open standards invite everyone to participate in providing feedback.
I believe the ability to integrate fast and effectively is a skill which all companies will need to survive over the next few years. This is why the first bullet above is most critical.

Examples of key business integrations 
Here are a few "integrations" which help drive business growth:
  1. Pizza Hut can be ordered directly through Amazon Echo (Alexa). Amazon provides vendors a standard way of connecting to their Echo service, and companies like Pizza Hut are able to connect their ordering systems to allow for another potential revenue stream. Pizza Hut was one of the first onto the platform because their systems allow for integrations with external sources to place orders.

  2. Uber is a great example of being able to integrate with various channels. Users can request rides directly within both Google Maps and Facebook Messenger. They try to capitalize on being available to request a ride at the exact moment when someone is likely to need one.

The examples above demonstrate the need for enabling the in-the-moment, simple, and fluid purchasing capabilities. None of which would be possible if the systems were closed and unable to move quickly to meet the changing dynamics of their users. 

There are other examples which do not include purchases, but rather provide information or other service more easily through atypical channels (see KLM's Messenger integration, for example). Those help to drive customer engagement, satisfaction, and loyalty. All wins for good business, and only possible with open technologies.

Integrate and let integrate!

Sunday, August 14, 2016

The future: Our fluid connectivity

I enjoy making predictions about the future. It is fun to see how accurate the predictions are as time goes by.

I believe the technologies of the future are a continuous building upon the present. At times inventions may appear to be huge leaps, but in reality they are logical progressions of existing ideas or novel combinations of both existing and new.

The iPhone, for example, can seem extremely ahead of its time (it was, of course, as compared to the competitors), but even that device was a combination of existing tech. As Steve Jobs introduced the iPhone in 2007, he said it was a:
  1. Widescreen iPod with touch controls,
  2. Revolutionary mobile phone, and
  3. Breakthrough internet communicator
MP3 players, mobile phones, and the networks required to communicate over the internet all existed before the iPhone. But the iPhone brought them together in novel ways.

The "problem"
Many innovations aim to solve problems or enhance our way of life. My prediction is no different. Here are some "problems" which I envision can be enhanced:
  1. We are dependent on physical screens to access our digital world. TV's, computer monitors, and smartphones all have screens. They are a fixed size, and TV's and PC monitors generally are fixed to a single location. Even with laptops and smartphones, moving around while looking at the screen requires coordination and one or both hands.
  2. With these many devices comes different data stores and modes of authentication. At your PC you may store documents, photos, etc. While your Xbox is used for gaming, and your smartphone for apps. Each one is registered to you in some way, usually authenticated by some form of a password. 
The problems listed above may not seem like problems to you, but to me it seems inefficient to depend on so much physical hardware (especially screens and monitors).

My present-day connectivity is intermittent, bouncing between all devices. Here is a typical weekday:
  • I wake up and check for any important messages on my smartphone.
  • I arrive at work and utilize that PC for work in my office.
  • I come home and use my laptop for browsing the web and other personal work. Maybe watch something on TV as well.
  • While throughout the day I may have several brief interactions with my smartphone.
Each bullet above requires actively seeking out, authenticating, and differentiating the devices for specific tasks.

My prediction: hyper fluidity
Like the iPhone, my prediction builds on and combines many existing technologies:
  1. Wireless hard drives. These devices allow you to connect your data to any device over Wi-Fi. This is useful as it removes the need for wires or complicated connections requiring software, drivers, etc.
  2. Smartphone to laptop. This invention allows you to turn your smartphone into a laptop (kind of). Plugging the phone into a special laptop gives you essentially a full laptop experience running off the phone's software.
  3. Authentication apps and tech. These allow you to authenticate with other services using a two-factor approach or your fingerprint, eyes, etc.
  4. Voice and gesture control. Think Amazon Echo and Xbox Kinect.
  5. The cloud. Software as a service through cloud providers is key as it reduces the need for both hardware and software at every endpoint.
  6. Drones (of course). Drones are getting quite sophisticated. They can stick to walls and ceilings, and can even coordinate among themselves.
My vision is one of hyper-fluid connectivity. Where your data, files, etc. are always within "reach," but you do not use your hands. Where authentication is seamless. Where moving from business to personal is instant. And where you are not tied to a specific location.

Imagine having a cup of coffee at your kitchen in the morning, and wanting to see the news. Why lug your laptop over? Why pull out your smartphone? Why not be hands-free and have your super-quiet smart drones do it? And they'll do it in a way which ensures proper posture by putting a display at your exact eye level.

Here's a concept sketch -- note that the screen in this is not a typical monitor simply being held up by the drone, rather it could be a projection or other type of light-weight and re-sizable display medium.

Illustration by Empty Bee Artwork & Photography

The Smartphone to Laptop idea listed above demonstrates the desire for people to need just one device for everything. My prediction is that one device will turn into something which can help authenticate and connect to a fleet of drones. It will be the bots who know who you are and which apps, files, etc. you have access to. There be significantly less need for traditional physical screens or monitors. A display will be able to appear anywhere the necessary amount of drones can go.

Our phones may get smaller, and used simply for as a secondary authenticator. Allowing us to connect to drones and other bots as needed. The drones will learn our preferences, styles, and eventually be able to predict what we want. It may not just be displaying the news while drinking coffee, but also making the cup of coffee. 

Will I buy a new TV soon? Yes, probably. But I am hoping soon after I will just need to buy 4K-capable projection drones with internet connectivity, authentication, and swarm capabilities to coordinate between my other ultra-smart drones.

UPDATE (10/24/16): I feel like we are one step closer to the vision above based on this amazing research done by the Ishikawa Watanabe Laboratory in Tokyo. Their image projection technology can keep an image stable even with a moving target object as the screen. Here is their video:


Sunday, July 31, 2016

Do DevOps

DevOps is not a buzzword; it is the way quality software gets deployed fast.

In order for software teams to truly embrace DevOps, they must have an inherent continuous improvement culture which embraces ruthless amounts of automation. Many of my examples below will be Java-specific, but this can apply to all types of software languages.

The deployment pipeline
Your deployment pipeline is critical to enabling speed, so I will expand a bit more here. Some questions to ask yourself: How often do you deploy code to production? How long are your builds? How long does it take to do a production deployment? How often do we have bugs in production? Staging/UAT? Dev? The answers may vary based on many factors, but odds are, you can improve dramatically in all areas.

  • Continuous integration. Enabling a distributed group of developers to integrate their local code into a shared development environment as efficiently as possible is the key first step. Generally a build server (like Jenkins or Bamboo) can help to enable this. Most importantly, though, are the automated tests which run on the code before moving it to development. These can be things like PMD or SonarQube which check for best practice violations, standards, or bugs. Similarly, unit, integration, and security tests can and should be run here. The key is code is not allowed to move to development until all tests are passed. We strive for quality, production-ready code even in development.

  • Peer code reviews.* This is probably the only manual step of the deployment process. Having an additional pair of (usually senior- or architect-level) eyes helps to drive team standards, code re-use, scalability, security, and efficiency. Some teams may find it hard to incorporate this critical step, but it must become part of the process.

  • Automated testing. Automated tests can occur at each stage, either with each build (depending on speed), or some regular rhythm (like nightly). These tests can be regression, smoke tests, integration, or performance tests. Visibility of the results are key, as test failures must be addressed promptly. Regular testing also helps to ensure tests stay current. As the test suite grows to have a comfortable percentage of coverage, code can move faster to production with less manual testing.

  • Auto-build, auto-deploy. The build servers mentioned above can automate the process of building and deploying code to each environment. Moving to production may require additional steps due to segregation of duties and change controls. As a result, I recommend making everything standard changes -- this way a change control ticket can be opened automatically by the deployment process rather than requiring manual change controls to be approved. In the lower environments, builds and deploys can be scheduled automatically or occur automatically once new code is committed.

  • Same artifact in each environment. Consistency is key in ensuring quality. Using the same artifact (or Docker image if you use Docker) throughout each environment minimizes variability.

  • Visibility. It is important that with all of the above it is easily accessible and visible to all stakeholders -- from the project managers to the developers. Broken builds, for example, need to be remediated fast as they prevent code from moving for the other developers.

  • Forward and back. Getting to production quickly is important, but it is also imperative to have a way to revert deployments fast. Your pipeline should support this.

Configuration management
Configuring and managing environments in a streamlined and automated fashion enables speed and consistency. Configuration management tools like Puppet or Chef enable centralized management of multiple servers at once. This is key to being able to quickly spin up or down new environments as needed, patch, or ensure the same settings are applied to each without individually tending to each.

These tools can also be used to push software to desktops. This is useful for a team of developers looking to ensure everyone has the same version and configuration of tools on their machines at all times. It also helps with installing those tools as it can literally be a simple double click and go get a cup of coffee.


Containers & container orchestration
Step aside VM's, containers are the new thing. Docker containers wrap your software in a complete filesystem. It is more lightweight than a VM, and enable speed through ensuring standardization of the environment. Their small size means you can have several containers inside one VM. The key point being that containers enable true application portability, as they abstract the underlying infrastructure from the app itself.

As your environment grows with more and more containers, orchestration tools like Kubernetes become important to help manage them all from a central place.


Situational awareness
It is key for the team to know the health of the system at all times. It encompasses the following:

  • Monitoring. A constant pulse on the key metrics (response times, CPU usage, server memory, etc.) allows for quick identification of potential issues and can help prevent failures. Tools like Icinga can even automate the creation of monitors when setting up servers through Puppet, for example. I recommend making as much of these metrics visible using tools like Graphite, StatsD, and Grafana.
     
  • Logging. Having additional details at hand help to give more insight into the various systems. Centralizing logging outputs using the ELK stack (Elasticsearch, Logstash, Kibana), or using tools like Takipi can help to reduce the time it takes to remediate issues.

  • Alerting. In addition to visual dashboards, automated alerting of key thresholds plays a key role in ensuring timely resolution of issues. A tool like Seyren can be useful here in conjunction with Graphite.


Zero downtime

Who likes staying late or working on the weekend to push new code live? No one. One of the main reasons why this occurs is because many deployments incur downtime in some fashion. With a streamlined pipeline, and a little help from Docker, staying late may become a thing of the past.

When we push new code live, we launch a second Docker container in production and point only our internal network traffic to it using Vulcan. If all tests pass, we point all traffic to the new container (using Redis to maintain sessions) and we are live without any downtime! The same can be done in lower environments as well.


Conclusion
Ultimately we want to achieve a continuous delivery state, where code changes have the potential to go live very quickly, with high assurance of quality at each step. Visibility is key to this process, as it ensures everyone is on the same page.

Lastly, the term DevOps is the combination of Development and Operations. Traditionally development teams and operations teams have competing priorities: devs want to move code to production fast; ops wants to keep the environment stable. With DevOps, the developers take more ownership throughout the process, while operations get involved earlier, more automated tools, and better visibility of the pipeline. The partnership is what drives great business results.


*Side note on peer code reviews: There may be a times where code reviews seem a bit of a burden.

First, when refactoring is required/requested by the reviewing person. Refactoring is an important and natural part of keeping the code base in good order over time. There may be times when refactoring may not be possible due to time, which I would then suggest that a user story (requirement in Agile) is added to the top of the backlog and done in a subsequent sprint (keeping in mind that there is nothing more permanent than temporary code). If you are following Scrum, ensure your teams do not consider their user stories to be "done" until all the code review comments are addressed.

Second, when there are disagreements between the reviewer and the developer. This is pretty simple to resolve, especially when the reviewer is an architect -- the developer does what the architect says. Discussions are always welcome, but tie goes to the architect.