Intelligence by Design: Culture

Showing posts with label Culture. Show all posts

Saturday, March 18, 2017

Technical debt

There were times in college when I would not do laundry for weeks (gross, I know). Then finally I would have no choice but to spend an entire Sunday in the laundry room. I could not do any of the fun things I would normally associate with a Sunday, and I would kick myself for not doing it more regularly to make it less of a hassle.

That is exactly what it is like when software, systems, and tools are not upgraded, patched, or replaced on a regular basis. The time and money associated with these legacy systems piles up and contributes to your company's "technical debt." The longer it lingers, the harder technical debt is to clean up.

The perils of too much technical debt...

Preventing and eliminating technical debt

The best way to avoid technical debt is to have a continuous pulse on your systems and their health. This is everyone's job. Everyone must be aware of the risks and understand the impact of their technology's lifecycle. This requires the people most familiar (usually the most technical) to be able to articulate the need and value of an upgrade, for example, in ways senior management or product owners can understand. That is not always easy to do, so I recommend ways in which addressing technical debt becomes part of the process:

Continuous improvement culture - A culture in which people are rewarded for making things better and always striving to improve is the single best way to minimize technical debt. When team members inherently feel motivated and empowered, technical debt goes down exponentially.
Mandate open standards - Utilizing proprietary technologies can lock companies into vendors or tools for many years. This can be quite costly. Mandating the use of technologies which are built upon or use open standards is required to be nimble and relevant long term. Open standards prevent team members from re-inventing the wheel, help avoid vendor lock-in, improve agility and choice, and dramatically increase application portability and integration. These all in turn help minimize technical debt.
Add a hardening sprint - If you are following Agile practices (and even if you are not), you can consider adding a hardening sprint to your schedule every quarter. So if you do 2-week sprints, the last 2 weeks of each quarter can be dedicated to working on stories related to reducing technical debt (or "hardening" the system). The beauty of this is it becomes part of the whole team's routine, and increases visibility to the importance and benefits of eliminating technical debt. It does still require those closest to the technologies to be proactive in identifying improvements and articulating the value to the product owner.
Semi-dedicated Systems Team - Having a separate team of people dedicated to "owning" the architecture and maintaining system health sounds good on paper, but actually it leads to accountability problems. Instead, having everyone support what they build makes people think twice about cutting corners and throwing things "over the fence." As a result, I suggest having a group of senior technical folks dedicate 20-30% (not 100%) of their time to continuously improve and monitor system health. This does not mean that they do 100% of the work, but rather set the standards, goals, and requirements for the them and the rest of the team to do and follow. Some (or all) of the work done in a hardening sprint, for example, likely will be suggested by the Systems Team.
Call for back-up - In many cases, legacy systems come with significant risks to the business, usually security-related. Leverage your friends in the security department to help you build a business case for making the necessary changes to your systems (or what might happen if the changes are not made). Be sure to also paint a positive picture of the business benefits to come in the future as a result.
Governance - While my least favorite on the list, sometimes it does take some hard and fast governance rules to help prevent technical debt. Projects should not be approved which build upon legacy systems or have no clear plan for future upgrades.

Monday, January 2, 2017

Boss vs. leader

I strive to be the best leader by emulating some of my favorite leaders. They tend to be the ones who put others' interests first, drive collaboration and teamwork, and promote and inspire a positive future.

The following two graphics demonstrate what I feel is the difference between being a "boss" and being a "leader."

I wrote earlier about the importance of being a great leader.

Sunday, July 31, 2016

Do DevOps

DevOps is not a buzzword; it is the way quality software gets deployed fast.

In order for software teams to truly embrace DevOps, they must have an inherent continuous improvement culture which embraces ruthless amounts of automation. Many of my examples below will be Java-specific, but this can apply to all types of software languages.

The deployment pipeline
Your deployment pipeline is critical to enabling speed, so I will expand a bit more here. Some questions to ask yourself: How often do you deploy code to production? How long are your builds? How long does it take to do a production deployment? How often do we have bugs in production? Staging/UAT? Dev? The answers may vary based on many factors, but odds are, you can improve dramatically in all areas.

Continuous integration. Enabling a distributed group of developers to integrate their local code into a shared development environment as efficiently as possible is the key first step. Generally a build server (like Jenkins or Bamboo) can help to enable this. Most importantly, though, are the automated tests which run on the code before moving it to development. These can be things like PMD or SonarQube which check for best practice violations, standards, or bugs. Similarly, unit, integration, and security tests can and should be run here. The key is code is not allowed to move to development until all tests are passed. We strive for quality, production-ready code even in development.
Peer code reviews.* This is probably the only manual step of the deployment process. Having an additional pair of (usually senior- or architect-level) eyes helps to drive team standards, code re-use, scalability, security, and efficiency. Some teams may find it hard to incorporate this critical step, but it must become part of the process.
Automated testing. Automated tests can occur at each stage, either with each build (depending on speed), or some regular rhythm (like nightly). These tests can be regression, smoke tests, integration, or performance tests. Visibility of the results are key, as test failures must be addressed promptly. Regular testing also helps to ensure tests stay current. As the test suite grows to have a comfortable percentage of coverage, code can move faster to production with less manual testing.
Auto-build, auto-deploy. The build servers mentioned above can automate the process of building and deploying code to each environment. Moving to production may require additional steps due to segregation of duties and change controls. As a result, I recommend making everything standard changes -- this way a change control ticket can be opened automatically by the deployment process rather than requiring manual change controls to be approved. In the lower environments, builds and deploys can be scheduled automatically or occur automatically once new code is committed.
Same artifact in each environment. Consistency is key in ensuring quality. Using the same artifact (or Docker image if you use Docker) throughout each environment minimizes variability.
Visibility. It is important that with all of the above it is easily accessible and visible to all stakeholders -- from the project managers to the developers. Broken builds, for example, need to be remediated fast as they prevent code from moving for the other developers.
Forward and back. Getting to production quickly is important, but it is also imperative to have a way to revert deployments fast. Your pipeline should support this.

Configuration management
Configuring and managing environments in a streamlined and automated fashion enables speed and consistency. Configuration management tools like Puppet or Chef enable centralized management of multiple servers at once. This is key to being able to quickly spin up or down new environments as needed, patch, or ensure the same settings are applied to each without individually tending to each.

These tools can also be used to push software to desktops. This is useful for a team of developers looking to ensure everyone has the same version and configuration of tools on their machines at all times. It also helps with installing those tools as it can literally be a simple double click and go get a cup of coffee.

Containers & container orchestration
Step aside VM's, containers are the new thing. Docker containers wrap your software in a complete filesystem. It is more lightweight than a VM, and enable speed through ensuring standardization of the environment. Their small size means you can have several containers inside one VM. The key point being that containers enable true application portability, as they abstract the underlying infrastructure from the app itself.

As your environment grows with more and more containers, orchestration tools like Kubernetes become important to help manage them all from a central place.

Situational awareness
It is key for the team to know the health of the system at all times. It encompasses the following:

Monitoring. A constant pulse on the key metrics (response times, CPU usage, server memory, etc.) allows for quick identification of potential issues and can help prevent failures. Tools like Icinga can even automate the creation of monitors when setting up servers through Puppet, for example. I recommend making as much of these metrics visible using tools like Graphite, StatsD, and Grafana.
Logging. Having additional details at hand help to give more insight into the various systems. Centralizing logging outputs using the ELK stack (Elasticsearch, Logstash, Kibana), or using tools like Takipi can help to reduce the time it takes to remediate issues.
Alerting. In addition to visual dashboards, automated alerting of key thresholds plays a key role in ensuring timely resolution of issues. A tool like Seyren can be useful here in conjunction with Graphite.

Zero downtime
Who likes staying late or working on the weekend to push new code live? No one. One of the main reasons why this occurs is because many deployments incur downtime in some fashion. With a streamlined pipeline, and a little help from Docker, staying late may become a thing of the past.

When we push new code live, we launch a second Docker container in production and point only our internal network traffic to it using Vulcan. If all tests pass, we point all traffic to the new container (using Redis to maintain sessions) and we are live without any downtime! The same can be done in lower environments as well.

Conclusion
Ultimately we want to achieve a continuous delivery state, where code changes have the potential to go live very quickly, with high assurance of quality at each step. Visibility is key to this process, as it ensures everyone is on the same page.

Lastly, the term DevOps is the combination of Development and Operations. Traditionally development teams and operations teams have competing priorities: devs want to move code to production fast; ops wants to keep the environment stable. With DevOps, the developers take more ownership throughout the process, while operations get involved earlier, more automated tools, and better visibility of the pipeline. The partnership is what drives great business results.

*Side note on peer code reviews: There may be a times where code reviews seem a bit of a burden.

First, when refactoring is required/requested by the reviewing person. Refactoring is an important and natural part of keeping the code base in good order over time. There may be times when refactoring may not be possible due to time, which I would then suggest that a user story (requirement in Agile) is added to the top of the backlog and done in a subsequent sprint (keeping in mind that there is nothing more permanent than temporary code). If you are following Scrum, ensure your teams do not consider their user stories to be "done" until all the code review comments are addressed.

Second, when there are disagreements between the reviewer and the developer. This is pretty simple to resolve, especially when the reviewer is an architect -- the developer does what the architect says. Discussions are always welcome, but tie goes to the architect.

Sunday, June 26, 2016

High performing teams -- Part I: Culture

We all want our teams to be high performing. Here are just a few general traits I associate with high performing teams:

Team members collaborate extremely well, with deep trust and openness;
Consistently output high quality;
Deliver at a rapid pace;
Continuously learn and able to shift to new areas;
Demonstrate innovation and creativity;
Strong customer focus and knowledge;
All team members contribute to their work and also proactively seek to improve the team;
Has the same goals, moves in the same direction.

How do you create high performing teams? It takes a bit of effort across several different areas. I will try to use my experience to provide a framework, starting with culture.

CULTURE

Having the right culture in place is the first step toward achieving high performance. Here are some key areas I focus on:

Relentless optimism. Doubt, fear, and negative outcomes from past experiences can hinder individuals, and bring down the entire team. With a positive outlook comes more possibilities. More possibilities bring infinite upside.

Relentless optimism must start from the top. Leadership has to believe in positive change, positive results, and envision a future that is bright.

This also means the senior members of the team (those whom the team look up to) need to pay attention to seemingly minor things like body language in meetings, and wise remarks that may spark doubt into others. There cannot be rolling of the eyes or anyone blurting out "Yeah, right!" Statements like, "This will never work," should be replaced with, "This could work if..."
Ownership. When individuals are held accountable for their deliverables, they are more likely to ensure its quality.

Leadership must identify areas where they want the team to take full ownership of their work. The team can also help to identify ways they want to be measured or demonstrate accountability. It is important for leadership to truly step away here; allow the team to be autonomous in their solutions while never micromanaging.

Taking ownership should not be dreaded, rather done with pride. Leadership must help paint this light, recognizing that demonstrating a lack of ownership (perhaps as first examples for the team) could be challenging. So I remind leadership to ensure praise is given to those doing well.
Failure + rapid learning is OK. A high performing team knows that it is fine to fail fast and cheap, as long as they learn from it. A team creating a new product, for example, does not want to find out it will not sell in the market after 18 months of development. They want to test their hypotheses and obtain feedback early to ensure the company does not waste time and money.
Continuous improvement. The team must have a constant and proactive urge to improve processes, products, tools, and each other. I wrote about the bad words in a previous post -- those must all be removed in some fashion. I also wrote about finding the time to innovate through automation and elimination -- the essence of continuous improvement.

Leadership must also make time for learning and development. Sending employees to courses can be beneficial, but how many of them are truly worth the time and money? Pick and choose wisely, and ensure results from training can be measured and demonstrated. Look for other avenues to embed training in ways which engage the team more. This takes a deep understanding of the individuals on the team, and how they operate.
Need for speed. Each team member must have an eye on the speedometer. What is slowing down the process? How can we get something out the door faster? This is similar to a continuous improvement mindset, but focused strictly on speed to deliver.
Safe to speak up. In order to achieve a lot of the above, the team must feel safe to speak up. Challenging status quo can be uncomfortable if leadership is not open and does not truly listen. New ideas will not emerge from team members if they are continuously stifled or ignored. Create an environment where everyone feels like their opinion is valued and they can make a difference.
Have a purpose. With everyone moving toward the same goal, vision and standards of work, the team will move in lock-step even when leadership is not looking. Leadership must set out clear goals, measurements, and objectives. They must be continuously re-enforced and re-visited in various ways to demonstrate progress and positive impact.
Have fun. A culture of friendliness, fun, and collaboration ensures everyone trusts each other, is willing to lend a helping hand, and enjoys coming to work. Smile, give high fives, and lighten up a little. :)

Sunday, April 10, 2016

The bad words

Certain phrases are considered curse words on my teams. They represent the opposite culture which I try to instill. We strive for continuous improvement and innovation. We cannot settle or become overly comfortable, because technology moves at the speed of light. We must always be learning and thinking ahead.

Here are a few of those:

"That's the way we've always done it..."
Or also, "We've done it this way for years." If you hear this often, it generally means your team is probably far behind high-performing teams. Getting complacent or not having a constant pulse on improvement will eventually make your team irrelevant.

"Legacy system"
Why does this legacy system still exist? It is likely that managing it is painful, it contains critical security holes, and only a few employees understand it. Removing or upgrading it will reap many positive benefits. Quantify those, demonstrate the value, and kill the legacy stuff!

"Temporary code"
There is nothing more permanent than temporary code. We spend a little bit more time up front to get things right and not have to pay the price three-fold in the future (when things may break or require additional efforts due to earlier "shortcuts").

"Manual work"
We believe in automating everything. We want to be doing deep work tasks, letting the machines handle the trivial stuff.

Changing culture and creating the time to innovate does not come overnight, but demonstrating small wins along the way helps to reinforce the desired behavior.

Tuesday, January 19, 2016

Creating the time to innovate -- Part I

I am frequently approached by leadership from other divisions asking how my teams find the time to be so innovative. I propose that it is not finding the time, but rather creating the time. We all need more time in the day, but if you create a culture which inspires quality, you will naturally have the time you've been looking for.

Culture
A culture of continuous improvement runs through my team's veins. When inefficiencies arise, the team identifies solutions to improve productivity.

Team members are encouraged to give back to the team (I call this "team community service") by proposing and implementing better ways of doing things. Generally it's about 20% of their time (equating to about 1 day per week).

The key is not assigning tasks nor me saying what to do, but rather giving each individual a blank slate to identify and contribute to the areas they are most passionate about. (See previous post about motivation.)

Quality
Where do we get the time to implement these solutions? We have a constant pulse on things which prevent us from working on value-add tasks. From here we identify where we need to simplify or improve quality. These improvements in quality add up to very large time savings.

At first the team uses this extra time to catch up on value-add work and achieve a consistent flow. However, once we achieve optimal flow, we use the extra time gained to continue to innovate, gradually reaching the magic 20% time for each individual.

Suggestions to get started
Analyze your team's errors, production bugs, defects, and other distractions which require someone to stop what they're doing and spend time fixing issues. Use the 80/20 rule to determine the 20% of items causing 80% of the issues, and start to eliminate them.

It may be difficult at first, but leverage key team member strengths and passions. Some folks will not mind putting in extra effort, especially if it means helping the team in the long run and working on something they enjoy.

Be sure to track your team's progress. Take a baseline of key metrics today (number of production defects, average time spent fixing issues, etc.), and track improvements along the way.

The key is to have a tipping point in mind: when do you stop giving the time saved back to "business as usual" work and start giving it to "team community service?" Some individuals may only be able to reach 10%, while others may reach 20% or more.

You will see that 10-20% of time spent on innovation and continuous improvement will produce 2-10x gains for your team in the long run. Create the time to do it.

Update: Read Part II of this topic here.