It’s fast becoming a national tradition: Thanksgiving on Thursday, shopping in-store on Black Friday, and then hitting the internet on the following Monday. While the massive influx of eager shoppers certainly helps retailers clear inventory and balance the books for the year, it also places a strain on infrastructure.
We see it easily enough with travel: Roads are clogged, flights are chaotic, and everyone seems to be going someplace else. But when it comes to the internet, the buying spree is more behind-the-scenes. All the more reason to highlight the effect that holiday shopping has on the internet, and cloud infrastructure in particular.
Note these effects and see if they occur this holiday season. If you know what to look for, you will be in a position to plan effectively for the next one.
Increased Traffic Will Stress Test Websites, Apps
For the past decade, online sales have been slowly climbing every holiday season. A recent survey by Google cloud predicts that digital sales will grow between 11 and 14 percent this year, accounting for almost half of overall holiday sales.
Naturally, such numbers are a welcome thing. They indicate a public that has eagerly embraced buying digitally, justifying investments in the technology. But sudden spikes in site traffic generated over such a short period of time can begin to show “cracks”—i.e., small problems with the underlying servers or architecture.
For example, an app might start running slower and slower, or a website might take a longer time to process items in a shopping cart. Sometimes, connections could be rejected. These problems are usually solved by spinning up new VM instances and/or using modern load balancing to scale out the resources for the app or website in question.
But scale is not the only thing to watch. Sometimes, small infrequent errors become more prominent when traffic is increased. If your website received 500 orders a month (say), an error that occurs once every 100 purchases will hardly register in the organization. However, when 5,000+ orders come in one weekend, suddenly that customer service has 50 times more disgruntled customer inquiries to handle.
Problems Cause Downtime, Which Costs
If the above scenarios sound unlikely, consider what happened in 2018: Black Friday traffic actually overwhelmed several retailers, including J. Crew, Lowe’s, Ulta, and lululemon. These were added to a long list of Black Friday outages including Best Buy, Cabelas, and more.
The first obvious effect of these outages was the sheer number of lost sales for these major brands. Beyond that, there was serious harm done to their brand reputations as well. Lowe’s downtime is suspected to have driven even more traffic to rival Home Depot, whereas incensed customers of Ulta and lululemon took to Twitter to make their dissatisfaction clear.
How costly was this reputational damage? That’s hard to quantify. But given how simple it is to do a little stress testing, the ROI is certainly there.
Sudden Scaling Will Spike Cloud Spend
Let’s assume, though, that your cloud infrastructure scales as needed and so can handle the added traffic. You’ve also stress-tested your website and apps and no major problems arise. Are you out of the woods?
Maybe not. All of that extra capacity comes with a cost, of course. As resources automatically scale, so will your spend. If your team has not put alerts in place, you might be in for a nasty shock when the next bill comes.
That additional cost of cloud resources might be easily justified by the sheer volume of sales that come in during the extended holiday weekend. Still, it’s good to plan not only for the expanded infrastructure needs, but for the expanded budget as well.
The Issue of Bot Traffic
Much of the traffic that happens over the internet is the result of human activity, but of automated bots. These bots have been estimated to account for nearly half the traffic on Black Friday and Cyber Monday, and they can degrade performance on retail sites as easily as human traffic.
What are these bots doing? Many times, what they do is typical and innocuous…but not always. Some bots are checking prices on goods so that competitors can set prices and undersell you. Other bots are buying up low-inventory items to drive up demand (and prices). While some are designed to commit advertising fraud and these bots are particularly active during the busy holiday shopping season.
What Can Be Done to Prevent These Problems?
Needless to say, these problems can be extraordinarily costly. So how can a retailer prepare for them before the holiday season? There are a few simple steps that need to be followed:
Use a good cloud provider that can scale. Not only should it have a mechanism for scaling quickly and doing the appropriate load balancing, but it should have a wide enough geographic spread to ensure that your customers are served in a timely fashion (and that there is redundancy should any one data center encounter problems).
Do your stress testing well in advance. Predict what your traffic will be like during these key shopping days and then increase your prediction by 30 percent. Stress test the system under this assumption. What is the latency like? What are error rates like? Be sure to measure all relevant KPIs and adjust the system accordingly.
Be prepared to monitor everything. Traffic should be continuously monitored in real-time to look for spikes or unusual patterns. Be sure to set up alerts for anything that looks anomalous (sudden downswing in traffic, increase in error rate, a flood of traffic from one small region or set of IP addresses, etc.). It’s a good idea to keep an eye on spend, too, as new cloud resources are mobilized. In order to do all this, though, you might need to…
Get help. Chances are that your team members want to enjoy the holiday, too, and are not so eager to remain focused on monitoring your infrastructure. This is where third-party cloud monitoring and remote management come into play. There are many components to effective cloud monitoring, and these, too, will be “stressed” during the buying season. Having a company with this as their core competency can relieve some of the load on your team while ensuring uptime.