How we used an NGINX traffic split to run a stress-free website migration


NGINX is great for A/B testing and load-balancing. Why not use the same approach to gradually “cross-fade” your new website?

By Stephan Schulze

If you do an online search for “website migration” or “replatforming”, you’ll find a lot of ominous-sounding articles that warn about the tremendous risks involved in moving your website to a new CMS. And the anxiety around replatforming is understandable. If your business relies on e-commerce, you stand to lose a significant portion of your revenue if you don’t get it right. This is why many people talk about “launch day” with a tone of unease and trepidation.

You don’t have to launch your new site all at once

This epiphany came to us after we were asked to help with a replatforming project. We had already helped other startups with A/B testing and load-balancing, so why not use the same approach to migrate a website?

In our case, it was the website of a leading German tour operator. They sold package tours offline through brochures and print advertising, but they were getting more and more business through their website. But they needed a new CMS. Their current CMS was old and difficult to use. More importantly, they wanted to expand their online business, and their old CMS just didn’t have the right features.

The plan was to migrate to a more modern CMS, which was part of SAP Commerce Cloud. In fact, the CMS migration was part of a wider expansion strategy that included many business systems, such as the booking engine. The SAP Commerce Cloud provided the right infrastructure to support the steady growth of their online business.

Of course, moving the CMS meant rebuilding the website. To help with this, they bought in an external agency whose job was to rewrite the front end and optimize it for SEO. Our job was to help them deploy the new website. We had successfully tried out traffic splits with a previous company, and we realized it was the perfect solution for their website migration.


Implementing the traffic split

The first task was to set up a reverse proxy on the existing server, which was hosted in 1&1 IONOS. We used NGINX for this since it can handle a lot of traffic and is easy to configure for traffic splitting. For the time being, we routed all traffic to the old web app through their existing Unified threat management (UTM) gateway.

The transition from old to new with split traffic as the transition phase

We would eventually start routing some of the traffic to the SAP Commerce Cloud instance, but initially, we simply passed all traffic to the old web app. It provided us with a safe way to configure the server and test the new website before we changed the DNS entry and pointed all traffic directly to the SAP Commerce Cloud instance (new traffic flow). This setup would also allow us to instantly switch back to the old website in case we had a critical problem with the new website (in fact, we did have a critical problem after the migration, and this setup was a lifesaver).

We first updated the NGINX configuration for traffic splitting. We used the split_clients directive from the module ngx_http_split_clients_module. It allows you to create two completely separate traffic flows without investing in any proprietary A/B testing tools.

We created two upstreams that represented the old and new sites, respectively, like this:

upstream upstream_cms_old {
server 123.23.123.123:443;
}
upstream upstream_sap {
server 44.44.44.44:443;
}

Then we added the split_clients directive to the site configuration as follows:

split_clients "${remote_addr}${http_user_agent}" $split_upstream {
99% upstream_cms_old;
* upstream_sap;
}
Code language: PHP (php)

As you can see 99% of the traffic was still going to the old site, but we started to reroute the remaining 1% to the new website. The agency who was designing the new website had a first version ready to go live so we decided to do a tentative test with a very small user sample. This equated to approximately 60 to 100 user sessions per day.

We also wanted to make sure users had a consistent experience. People who had already seen the new website should never be taken back to the old website when they refreshed the page or opened a new window.

To do this, we use the NGINX map directive to map the different site versions to variables in a migration_split cookie.

map $cookie_migration_split $chosen_upstream {
default $split_upstream;
"upstream_cms_old" "upstream_cms_old";
"upstream_sap" "upstream_sap";
}
Code language: PHP (php)

The variable “upstream_sap” indicates the new SAP-based website.

Later in the site configuration, we set the cookie according to the what site the user had seen, and configured it to expire after a year (defined in seconds).

if ($deliver_upstream = 'upstream_sap') {
# users should stay on the new site forever
set $cookie_lifetime '31536000'; # 60*60*24*365 = 1 year
}
Code language: PHP (php)

We also added another map directive to make sure that testers could override their cookie by appending a specific parameter to the page URL.

# overwrite the upstream by appending "?force_upstream=oldcms" to the URL
map $arg_force_upstream $deliver_upstream {
default $chosen_upstream;
"oldcms" "upstream_cms_old";
"sap" "upstream_sap";
}
Code language: PHP (php)

When could then redirect normal users based on the value of migration_split like so:

add_header Set-Cookie "migration_split=$deliver_upstream;Path=/;Max-Age=$cookie_lifetime;";
proxy_pass https://$deliver_upstream;
}
Code language: JavaScript (javascript)
  • If a user has seen the new website, the variable $deliver_upstream is assigned the value upstream_sap.
  • The value upstream_sap is itself a variable and equates to the server 44.44.44.44:443 as defined in the upstream directive.

For more details on how to set up this kind of configuration, see this handy tutorial on freecodecamp.org.


Special Cases

There were cases where we needed to ensure that traffic from a specific source always saw a specific version of the website.

Internal and External Employees

We had cases where certain departments, such as the marketing department, needed to see the old website until the migration was done. Employees of the web design agency, however, always needed to see the new SAP-based website.

To address this requirement, we added map directives to redirect internal and external employees based on the IP address of their network. The following example shows how we did it (with dummy IP addresses).

map $remote_addr $bypass_ip {
default 0;
"1.111.111.111" 1;
}
map $remote_addr $bypass_ip_to_sap {
default 0;
"22.22.22.22" 1;
"33.33.33.33" 1;
"44.44.44.44" 1;
}
Code language: PHP (php)

Search Engine Crawlers

We also wanted to make sure that search engine crawlers only saw the old website until we had a 100% split going to the new website. To achieve this, we added the following map directive to reroute the request based on user-agent:

map $http_user_agent $bypass_user_agent {
default 0;
~*(Googlebot|bingbot|Slurp) 1;
}
Code language: PHP (php)

This configuration prevented any search results from containing a confusing mix of old and new links while the traffic was still being split.


Early observations

At a 1% split, that data sample was too small to make an informed assessment about conversions or checkout behavior. However, it was enough to know that the new frontend was too slow. This issue first came up in internal testing, but our split test confirmed it.

We found that the long load times were having a detrimental effect on the bounce rate and conversions, so we didn’t want to expose more people to the new website until the issue was fixed. We communicated this to the web design agency who tried to improve the speed index — the time it takes for above-the-fold content to completely render to the screen. This metric usually indicates the earliest point at which the page becomes usable.

As you can see in the following graph, it took some time to get the speed index time to a point where it was an improvement on the original website.

Screenshot: It took some time to get the speed index time to a point where it was an improvement on the original website
Getting the speed index down to an acceptable level

To get the speed index down, the agency worked on optimizing the asset caching and employed lazy loading.

It turned out that between 55% and 60% of the traffic came from mobile devices and tablets, but the old website wasn’t effectively optimized for mobile. The agency focused on improving usability and performance on small screens and mobile connections.


Moving 100% of the traffic on the new website

After the initial hurdle of improving the front-end stats, we gradually increased the split. We felt confident that we didn’t need so long to get to 100%. Commercial metrics such as conversions, checkout goals, and newsletter subscriptions all looked good.

We updated the percentage in the NGINX split_clients directive several times over three weeks. By September 5, 100% of the traffic was routed to the new website.

We updated the percentage in the NGINX split_clients directive several times over three weeks. By September 5, 100% of the traffic was routed to the new website
The split was gradually increased over 3 weeks

In terms of front-end performance, the new website outperformed the old version by just over 2 seconds. When we were running the split, the old site averaged at a speed index of 6.8 seconds whereas the new site had a speed index of 4.7 seconds.


The trouble with redirects

Redirects are one of the most important aspects of website migration and you can find countless articles on how to manage them. We tried to follow best practices but admittedly a few pages fell through the cracks.

Since we’re talking about a tour operator rather than a spare parts vendor, we didn’t have too many pages to redirect. The website featured a few hundred products that were generally over €1,000. We decided to manage the redirects with a map file.

However, there was one “gotcha”. We were using NGINX as a temporary, intermediate solution to redirect requests to the SAP server. Sooner or later, the SAP server would have to handle the redirections on its own, without the help of NGINX.

To avoid doing things twice, we created a map file on the SAP server, rather than using an NGINX map file. The SAP server requires the map file to be in Apache syntax, so our map file resembled the following truncated example:

Redirect /reiseziele/baskenland /reisen/baskenland
Redirect /reiseziele/st-petersburg /reisen/st-petersburg
Redirect /reiseziele/azoren-insel /reisen/azoren-insel
Redirect /reiseziele/irland /reisen/irland
...

Redirects from search results didn’t always work

When we started the split, Google was indexing the old version of the website which was still ranking well for all the relevant keywords. However, if a “new website user” clicked a search result from the old website, we had to redirect them to the equivalent page on the new website. Most of the time, the user experience was pretty seamless. The agency had taken care to ensure that the page architecture of the new website matched the old version.

However, there were occasionally pages that referred to older travel deals or other features that the company had decided not to include in the new website. These clicks usually resulted in a 404 page.

But this problem was around before the website migration. Tour offers get added and removed all the time based on demand and availability. Sometimes a user will inevitably click a link to an expired offer before the Google bot has had a chance to remove it from the search index.

Newsletter links weren’t always updated

The tour operator generated a large portion of their traffic from direct marketing initiatives. These initiatives included graphic-heavy promotions with lots of links to different parts of the site. Of course, there was a process in place for the marketing team to retrieve the correct links when constructing their newsletters. However, the process was error-prone because the components of old newsletters were often reused. These components linked to parts of the old website which had no direct equivalents in the new version. Again, users had to contend with occasional 404s.

How to deal with the redirect problem?

To solve this problem, some experts recommend a “content freeze” on the old site before you start the migration. However, in the fast-moving travel industry, this wasn’t really viable for us. The marketing team needed to be able to run different tour promotions and react to changing customer demand but occasionally, our team or the web design agency were left out of the loop.

This meant that the old website had promotions or pages that weren’t featured on the new website. Or the new website had matching content on a different URL which forced us to catch up with these changes and update our redirect rules.

Although automation is a great backup, communication is the best way to solve this problem. Make extra sure that you’ve identified all stakeholders in your website migration. These might be people who rarely talk to one another as part of their routine jobs (developers and marketers, I’m looking at you) but you have to ensure that they all communicate regularly until the migration is done.


Traffic splitting is a great tool, but use it wisely

Bear in mind that the scenario I’m describing is pretty specific. Traffic splits are great if you’re replatforming your website without changing its basic architecture. Even in this case, if your site has over 1000 pages or products, managing the redirects can get prohibitively complicated.

Additionally, traffic splits aren’t suitable if you’re doing anything that requires a change to your URLs. Such changes include moving to a new domain, changing from HTTP to HTTPS, or simply redesigning your content structure. If you tried to split traffic in any of these scenarios, it might be a nightmare managing the redirects and SEO performance.

However, for our replatforming project, it was the perfect solution. It was like launching the new website with training wheels. Once we knew it would be OK, we could take the wheels off and watch like proud parents as it went from strength to strength.