INMAGINE is a global creative ecosystem powered by design, technological innovation, and entrepreneurship specializing in creative content and services. The group has been in the market for over 20 years with a strong presence across the United States, Europe, and Asia.
Their extensive SaaS business models include leading brands such as 123RF.com, Pixlr.com, and Designs.ai, which empowers designers and non-designers to design smarter, faster, and easier. They provide a creative ecosystem that provides designers with access to ready-to-use stock content, from photo editing to creating engaging video-based workflows to produce virtually any type of creative content for their respective creative projects.
Unpredictable Traffic Patterns, Growing Needs and DDoS Attack Challenges
Inmagine began their digital journey hosting their applications in a data center. Over time, as their presence continued to grow globally, unpredictable web traffic patterns with unexpected Distributed Denial of Services (DDoS) attacks began to emerge. At that time, they realized that their traditional operating model of forecasting and managing compute capacity in their data center was not a sustainable model, resulting in site reliability issues with increasing incidents of application performance degradation. Along with that, managing storage capacity was the other main concern, as the team spend countless hours optimizing and maintaining storage infrastructure. These issues consequently led to higher operational cost with the team overinvesting in compute capacity and staff to manage the growing infrastructure. In this migration case study, we will cover how 123rf.com, the flagship website of Inmagine Group, was migrated over to AWS.
Moving to the AWS Cloud for Improved Operational Resilience and Efficiency
With the urgency to provide the best experience to their customers, the AWS Cloud became an attractive option for the team to explore, with elasticity, security, and cost effectiveness as the primary business considerations.
Inmagine chose to host their web applications, content, and search engine on the AWS Cloud to leverage the following:
a) AWS Global and Elastic Infrastructure
With operational uptime being a key consideration, the web applications were architected behind the AWS Application Load Balancers (AWS ALB) across multiple Availability Zones (AZ), coupled with EC2 Auto-Recovery to recover from underlying hardware failures. The team also have plans to have their web applications rearchitected to adopt Predictive Scaling to manage evolving web traffic.
The other important consideration was to offload storage management away. With Amazon S3, the team did not need to worry about capacity, availability, or reliability of their storage infrastructure. Storage management was simplified with Amazon S3’s lifecycle management and inventory reporting. These features enabled the team to perform housekeeping activities with less effort and cost optimize with S3 object classes based on historical request patterns.
b) DDoS Protection
Countless hours were spent on research to enhance their system against DDoS in their data center, which led to the procurement of hardware-based firewalls, which could not scale as the web traffic grew. By migrating their dynamic content traffic over to Amazon Cloudfront and AWS Web Application Firewall (WAF), they got AWS Shield as well, providing protection against common and layer 3 and 4 attacks such as SYN/UDP floods and reflection attacks. Additionally, AWS WAF provided protection at the web-layer from various layer 7 attacks, such as rate-based blacklisting, protecting the web applications from brute force login attempts, bad bots, and more. The team relied on the AWS WAF Managed Rules to simplify complex rule management.
How Inmagine Performed the Migration
As part of Inmagine risk management strategy, the migration was performed in well-defined phases to minimize any impact to customers. The following describes the initial state and the stages of migration:
i) Initial State
Before the migration was carried out, Inmagine had their servers hosted in their datacenter where they had on-prem load balancers, web application firewalls, web servers, databases, elasticsearch, and storage infrastructure powering their websites. On the application side, 123rf.com was designed based on the Service Oriented Architecture (SOA), where the application was distributed into services such as payment, search, photo details, checkout, and more. These services were hosted across multiple virtual machines, with certain services having its own database, while sharing a common elasticsearch cluster. A 3rd party CDN provider is used to deliver static content to the customers. Nginx is used as the ingress controller that manages complex URL rewrites based on business requirements.
ii) Assessment Phase
In this phase, the Inmagine team worked closely with the AWS team to build a comprehensive migration plan with the objective of minimizing potential disruptions to business operations with security, reliability and performance treated as a priority. The areas that were evaluated, included:
a) Establishing a baseline environment that enables governance across billing, security and workload management in the AWS Cloud
b) Minimizing risk of performance degradation during the migration with resilient high performing network connectivity
c) Establishing performance baselines across each service within 123rf.com to quickly identify deviations in application performance
d) Identifying application services that may have compatibility issues on the AWS Cloud that may result in application degradation or downtime
e) Identify a migration approach that mitigates risk of business impact while migrating
The Inmagine team had two options to carry out the migration.
a) Full migration before going live on AWS: Retain web traffic to the on-premise data center. Migrate every service within 123rf.com before shifting web traffic over to the AWS Cloud.
b) Gradual migration with immediate go live with key services on AWS: Migrate selected key services and have that deployed on AWS. Shift web traffic over to AWS with Nginx as the ingress controller directing request either to services across AWS Direct Connect or on AWS. The remaining services on-premise would be tested and moved over to AWS.
The team decided to go with the second option, where services within 123rf.com would run across the AWS cloud and their on-prem data center, while application testing and migration work were carried out. This approach enabled the team to battle test the migrated services quickly with production traffic and have these learnings applied to other services. Effectively, this approach would also address their immediate business concerns relating to capacity and DDoS attacks.
iii) Migration Phase
To have 123rf.com running across two sites at the same time, a high-speed resilient network was required to retain the sites’ performance. Inmagine had the following AWS infrastructure deployed and tested thoroughly to serve as the core components to support the entire migration:
a) AWS Direct Connect was established to provide a high network throughput dedicated line and minimize network latency to support real-time database transactions between both sites. The target was to maintain a network latency below 20ms.
c) AWS Transit Gateway was implemented to simplify the network connectivity across multiple connecting points (AWS VPN, AWS Direct Connect and AWS VPCs together securely.)
e) Nginx deployed on AWS as the ingress controller with rewrite rules that inspects and add headers onto incoming HTTP request. These requests are directed to services that resided either on-premise or on the AWS Cloud
f) Database read replicas were migrated over to AWS to serve read traffic, while write traffic was sent to the datacenter via the AWS Direct Connect. This strategy had two reasons for it. Firstly, 123rf.com is content heavy site and therefore would be read-heavy. Having database read replicas on AWS will reduce web application response time for viewers. Secondly, by having the write database remaining on-premise, the team could easily rollback traffic from AWS to the data center if there were network problems.
g) Key workloads (such as 123rf.com) were deployed in its own Amazon VPC and AWS account as part of Inmagine’s AWS Cloud governance strategy.
The team also relied on multiple performance dashboards to have full visibility over the system during migration:
a) Custom Application Performance Dashboard on NewRelic, to monitor the backend application response time of each service within 123rf.com with different alerting policies based on its respective service level objectives (SLO). Each 123rf.com service were instrumented to identify if there were any bottlenecks with application dependent resources such as 3rd party APIs or data stores.
b) Along with that, the team also had their customer’s browser loading times monitored with NewRelic’s real-usage monitoring (RUM) to have a full birds eye view on both server-side and client-side performance.
c) Network Performance Dashboard was built on AWS Cloudwatch to monitor the health and utilization of the Direct Connect link
The rollback strategy had two considerations:
a) Service Issues:
Should there be any performance issues detected, the nginx ingress controller would be updated to direct request back to the services that are running in the on-premise data center.
b) Network Issues:
Should there be any unexpected network issues across the Direct Connect and VPN link leading to downtime, have Amazon Route53 updated to shift traffic back to the on-premise load balancer.
“Running our web applications on-prem over the past few years has certainly been challenging, where we were constantly in a predicament whether to acquire more hardware to support the growth of the business.
While we clearly understood what benefits the AWS Cloud provided, the decision to move from our on-premise environment certainly wasn’t an easy one, given the large footprint we had in our data center.
With the AWS team support, we were able to build a migration strategy that kept the websites up and running 24/7, while we migrated. Now, we have 123rf.com architected for failure across multiple availability zones, along with AWS WAF and AWS Shield protecting our web applications. This initiative has reduced application performance degradation by 34%.
Not only that, with AWS handling the undifferentiated heavy lifting (such as capacity management), we’ve been able to spend more time optimizing and innovating new features for our customers. It’s exciting to see new products of ours, such as Designs.ai, leveraging services such as AWS Cognito and AWS Polly to enable us to build faster and deliver new and unique experiences to our customers.
More excitingly is that we have also observed an overall increase in our site’s performance, with dynamic content page loading times improving by 25%. Amazon Cloudfront’s distributed point of presence not only have reduced our customer’s travel distance to reach the AWS network, but, also allowed us to offload TLS handshakes and persistent connectivity away from our system to improve our site’s content delivery performance“ says Pang Jack Sen, CTO of Inmagine Group.
“We always strived to stay ahead of the curve and respond to our customers’ needs quickly. Therefore, accelerating our ability to deliver new features to market faster will be our next target and we believe containerizing our applications will help us achieve this goal. We already have our engineering teams building proof of concepts on select use cases on top of Amazon ECS and the results have been promising so far. With containers, we’re already seeing fewer dependency related issues and we’re excited to see where it takes us,” says Jack Sen.
Jack Sen is the Group CTO at Inmagine Group. He is deeply passionate with software engineering, algorithms, machine learning and creative products, with over 20 years of full stack experience. In his spare time, he enjoys photography, playing badminton and working out.
Fabian Tan is a Senior Solutions Architect at Amazon Web Services. He has a strong passion for databases, data analytics and machine learning and works closely with the Malaysian developer community to help them innovate. In his spare time, he enjoys the camping in the outdoors with his family, reading and playing sports.