Visions and Principles
Table of contents
Open Table of contents
Vision Drafts
“Our DevOps systems empower teams to automatically release high-quality functionality built with secure, scalable, and reusable standards.”
“Consistent automated releases of secure, reliable, and high-quality functionality to customers.”
Key Elements:
- Consistent: Implies scheduling, predictability, and frequent standard releases.
- Automated: A non-negotiable requirement covering build, pipelines, deployment, testing, and monitoring.
- Release: Encompasses more than just deployment and build; it extends to customer delivery.
- Reliable: Must be measurable against a defined standard, ensuring observability.
- High-quality: Requires well-defined testing and quality assurance.
“Empower our developers to focus directly on providing customer value without worrying about how to securely develop, build, test, and deploy. The CI/CD platform provides common interfaces that satisfy the needs of all stakeholders across supported languages and environments.”
Principle Drafts
1. Standardized
In Practice:
- Maintain a finite selection of languages, runtimes, cloud services, security tools, and code templates.
- Adapt to evolving SDLC needs with flexibility.
- Implement critical guardrails to manage security and operational constraints with minimal overhead.
- Use standards to enable releases at scale.
- Provide freedom to build within secure constraints.
- Enforce appropriate standards for workflows and tools.
2. Metrics-Driven
In Practice:
- Continuously measure and improve SDLC performance over time.
- Implement a scorecard for tracking progress.
3. Remote and Distributed
In Practice:
- Enforce SDLC practices designed for distributed teams.
- Architect systems that scale to geographically diverse users.
- Promote remote-friendly development practices.
- Maintain comprehensive documentation and runbooks.
- Establish clear ownership of systems and processes.
- Enhance local development environments using cloud resources.
- Ensure equal access to networking and deployment capabilities for all developers.
- Support async approvals and on-call best practices.
4. Reusable
In Practice:
- Favor libraries over duplicated code.
- Use shared network and storage patterns over custom one-off solutions.
- Contribute to shared codebases when feasible.
- Prefer primitives over frameworks (adapted from AWS).
- Follow modular and composable design principles (inspired by HashiCorp and Unix philosophy).
5. Scalable
In Practice:
- Reduce technical debt by enabling refactoring over full rewrites.
- Build systems using small, independently iterated components.
- Design microservices and nanoservices for easy component swaps.
- Ensure monoliths can evolve efficiently through structured refactoring.
- Architect for seamless growth and adaptability (adapted from AWS).
6. Automated
In Practice:
- Code is deployed to customers without manual intervention.
- Automate build, deployment, canary testing, and feature flag management.
- Use automation to enforce constraints, standards, and best practices.
- Eliminate human-induced risks.
- Strive for zero operational toil (adapted from Google SRE and AWS).
7. Optimize for Easy Onboarding
In Practice:
- Ensure new and existing team members can seamlessly work with systems, both locally and remotely.
- Minimize effort required for non-engineers to engage in SDLC workflows.
- Maintain high parity between local and remote development environments.
- Enable designers, PMs, architects, QA engineers, and SREs to integrate into engineering workflows.
- Provide a scorecard for assessing adherence to principles.
8. Design API-First
In Practice:
- Prioritize system integration through APIs over direct code interactions.
- Define a control plane API to facilitate automation.
- Ensure all operational tasks can be executed via APIs.
- Maintain modular system architecture with API-driven automation (adapted from AWS).
9. Expect Failures
In Practice:
- Document failure states and recovery strategies.
- Automate outage recovery processes.
- Provide a control plane API for system operators.
- Define:
- How to detect failure states.
- Differentiation between atomic and partial failures.
- Indicators of degraded performance.
- Steps for automated outage recovery (adapted from AWS).
10. Performance as a Competitive Advantage
In Practice:
- Ensure continuous performance improvements over time.
- Implement deep observability and testing strategies.
- Define measurable performance metrics and SLAs.
- Commit to iterative system improvements through refactoring.
- Conduct performance testing at release and in production.
- Utilize A/B testing to optimize performance.
- Implement logging, metrics, and tracing across all system tiers.
- Monitor resource budgets (cloud, licensing, etc.).
- Track system changes and SDLC contributions.
- Generate comprehensive test reports.
By following these principles, we ensure that our development and operational practices are scalable, reliable, and focused on delivering high-quality software efficiently.