IaC Development for Video Hosting

Project Description
The client provides high-load video hosting services. The entire infrastructure is based on bare-metal servers in Europe and Canada, utilizing UCDN as a backup CDN to smooth out loads in emergency situations. The content is divided into several independent blocks, each served by its own independent group of servers. Most nodes were configured manually and were only covered by basic monitoring.

The main request from the client was to increase manageability, accelerate the addition of new nodes, enhance monitoring transparency, and reduce the frequency of emergencies.
Key Metrics
  • 60+ manually configured servers
  • 3+ petabytes of monthly traffic for each server group
  • 400+ domain names associated with various projects
  • 97.2% availability level, insufficient for the target level of client service
Project Goals
  • To ensure full coverage of the entire infrastructure with code in accordance with the principles of Infrastructure-as-Code
  • To reduce the deployment time of a new node from 5 working days to 1 hour of working time.
  • To increase infrastructure availability from 97.2% to 99.9% and above.
  • To develop a methodology for predicting the dynamics of reserve capacity and the need for storage expansion.
Key Challenges and Results
We have done a great deal of work with full coverage of the entire infrastructure with code:
  • The entire server configuration is now deployed fully automatically based on the roles developed in Ansible.
  • Integrated monitoring based on DataDog. All necessary checks, including the availability of each of the hundreds of domains, are deployed using Ansible. The developed templates for terragrunt/terraform allowed automatic updating of all diagnostic dashboards in accordance with changes in the server composition and a single source of truth - the Ansible configuration.
  • Developed Terraform modules for managing the client's CDN and automatic switching to backup in case of emergency situations.
As a result, the average time to add a new server group was 45-70 minutes per node, meeting the client's requirements.

Thanks to our changes in the infrastructure, diversification by providers and geography, we managed to increase the availability of the infrastructure to the required 99.9% without increasing the specific cost of data storage and distribution.

We also conducted a large-scale study with simulation of various emergency situations, bot attacks, DDoS, and other types of problems on test nodes. As a result, we were able to create normalized dashboards that show an integral characteristic of the group's reserve resources. Our methodology allowed for precise planning of cluster expansion with minimal overspending on reserve capacities. As a result, we reduced spending on reserves by 38.4% while simultaneously increasing the overall reliability of the system.
Related services
Comprehensive IT and DevOps Audit Services | Boost Efficiency and Security
Enhance your IT operations with our comprehensive audit services, including it audit, it security audit, and devops audit. Ensure compliance, improve performance, and protect your data with our expert solutions.
24/7 DevOps Support Services | Expert DevOps Support Team | WiseOps
Discover top-tier DevOps support services with WiseOps. Our expert DevOps support team provides continuous integration, rapid deployment, and proactive monitoring to ensure seamless IT operations.
Comprehensive Infrastructure Monitoring Services 24/7 | WiseOps Team
Enhance your IT infrastructure with our expert monitoring services, including network, cloud, server, and remote monitoring. Proactive and continuous oversight ensures maximum performance and security.
Comprehensive IT Infrastructure Services | Managed, Hybrid, Consulting, and More
Discover top-tier IT infrastructure services, including managed, hybrid, and consulting solutions. Optimize your systems with our comprehensive IT infrastructure management and support services.
Infrastructure as a Code (IaC) Solutions | WiseOps
Discover expert infrastructure as a code (IaC) services with WiseOps. Optimize your IAAC cloud deployment and management processes. Contact us for reliable IAAC infrastructure solutions.