Optimization on the cloud can be different than in a traditional data center environment. A key reason can be because site reliability engineers put automated and decision-point processes in place from implementation day-one and continue this onward with existing and new hybrid workloads. At the same time, customers desire a proactive vs. reactive approach. To become a cloud clairvoyant, the optimization process in the cloud environment must factor in a design (or a re-design) of resources, processes and tools.
Optimization best practices must be planned while leaders map requirements to solutions, design migration steps, and make target foundations enterprise-ready. Customers and vendors (e.g., Systems Integrators SIs) jointly agree on optimization best practices. It's essential to start with the information you know (for example, contract agreements, business requirements, resource plans) and then determine the resources needed to optimize and integrate the right skills and optimization processes/tools into cloud operations, including CI/CD pipelines. From there, tracking and measuring the results, and actioning on a feedback loop brings agility to the continuous cloud cost optimisation.
Here are some questions and/or discussion points that can be used during Agile cloud optimization sessions:
- Lay out your cloud structure: subscription account(s), organization structure, departmental breakdown; look at who controls what, and if you have taken advantage of initial enterprise discounts?
- Obtain a list of delivery roles, and determine qualified resources that will perform cloud optimisation roles? (Dedicated or part-time?; Onsite, onshore, or offshore – or a combination?)
- Obtain your standard delivery plan, and determine how continuous optimization is integrated? (What is optimization in the contract agreement, including SLA)
- Does the customer have any corporate or IT cost savings goals? Do they have any required optimization/cost reduction goals that they must meet annually (e.g., dictated in contracts)?
- What is the cloud capacity plan? How does continuous optimization of resources align with that plan?
- What are your planned cloud resources (compute, storage database), and how are they billed (per time, per volume, per x-gress)?
- Start mapping your cloud resources to your annual savings target. What would be the ideal structure that would achieve the desired cost savings?
- What are all the cloud resources we need to continually optimise, who will be part of the cloud delivery team? (solution architects, engineers, cloud developers, etc.)
- Being reactive, what does the customer want to see in ongoing reporting? Resources, metrics, baseline vs. savings, graphs/charts, notifications, feedback?
- As part of the ongoing delivery cadence, how will optimization be reported to various stakeholders (program, project board, executive board, etc.), to show progress to savings goals, execution against the contract? Think about the level of detail needed to satisfy each level and mode to communicate that information / summary.
- What are the workflows to capture/monitor, alert, analyse, decide, optimize, report for each resource? For each resource, how do we set up the process to optimize? What does that architecture look like? Who is going to design it, and who is going to engineer it during delivery?
- Evaluate the trade-offs between the benefits, costs, risk and viability of cost optimisation initiatives. Develop a simple grid that can map to show the trade-offs for your cost reduction strategy
- Determine the best set of tools, whether it is acloud-native, 3rd-party vendor or supplier intellectual capital that meets your optimization goals at the least cost to satisfy savings goals. You may have to weigh the cost/benefit of build vs. buy, depending upon how complex is your optimization solution. Will cloud-native tools with some automation do? Will you get additional benefit from 3rd-party tools?
- In your design, don’t forget data dependencies, data gaps, time lags, and statistics.
- When you think about toolsets, consider getting the best license option for 3rd-party tools and make a business case for additional benefits along with ROI on sought monthly savings.
- What are the ways within respective cloud environments to automate the process with code? Who can do that automation? Who will maintain the change control?
- When you think about analysis, will it require some intelligence planned via machine learning, or a more manual effort? If some manual effort is required, who will do that analysis?
- When you think about the decision, determine which recommendations can be automated and which need verification. For decisions requiring recommendations to be stored, communicated, and decided by the owner, determine the process once the decision has been approved or denied.
- When you build your optimization process, think about the customer. Design in proactive thinking, from monitoring, alerts, to way to communicate decisions, to reporting – how do I leverage cloud automation (storage, workflow automation, use of code) to make it easier for customers to consume?
- Think about data retention for collected, analysed, reported, and decided-upon data and reports. How long do you need to keep reports for auditing, contract disputes, and DoS troubleshooting? Leverage storage archiving to save storage costs for older data.
- How do you design DevSecOps to make optimized decisions is part of design and deployment, and ensure new systems implementation is part of the continuous optimization process? Think about dev/test systems that may not be the most optimized during development but will evolve to be as it becomes deployed.
- Develop and make use of a heatmap showing peaks and valleys in infrastructure/resource demand.
Optimization is a continuous process, and you may have many agile sessions with your customers until your process matures and is well established.