TMP, AMADEUS, SABRE, Webtool 1, 2, 3 — Our Challenges Along the Trendtours Journey

Co-authors: Stephan Schulze and Stefanie Jahnke

When it comes to Project A’s portfolio, most people probably think of start-ups and young, fresh, innovative businesses. And of course, we support these kinds of companies a lot. But a huge part of our job is also to work with established organizations that have already been on the market for more than a decade. One of those is trendtours Touristik.

Trendtours homepage featuring available holiday packages

Trendtours has been selling organized travel packages since 1988, and they’re quite successful at it.

When we started to work with these kinds of companies, we were often confronted with slightly different challenges than we’re used to from our experiences with start-ups. The codebase has usually grown over the years, so the technology stack doesn’t meet the very latest standards.

To give you some more insights into the challenges and benefits of our work with Trendtours, Stephan Schulze and I will give a talk at this year’s Project A Knowledge Conference. Here’s a sneak peek.

We started to support Trendtours at the beginning of 2019, and we soon had to increase our support dramatically when their technical lead decided to leave the company. Why is that? Because they only had a tiny IT team compared to the huge number of applications that they were hosting (mainly in-house).

Up to then, well, the technical lead was very dedicated to his job. He did coding, built pipelines, deployed, fixed bugs, maintained servers, set up test environments for the other developers, researched new technologies, and was the only person you would go to with whatever technical problem or question you had. Now imagine removing that person from the team. It’s like removing some of the load-bearing walls from a house. At first, everything still looks like it did before, but the moment you breathe a sigh of relief, everything crashes down. And you don’t have enough people to stabilize all the walls at the same time. And you don’t know what will happen when you let one part collapse. What else will be pulled down with it?

So we were confronted with a lot of challenges even though we’d started to support Trendtours before the technical lead left. We gathered some of his knowledge and prepared for the moment when he would be gone. To get a first overview, we created a map of all the applications used at Trendtours. So you could see an app’s URL, name, and IP, the VM it’s running on, whether it’s hosted in-house and whether it’s dockerized in one glance.

That paid off fast. In the beginning, it was hard to remember all this new information. When someone throws sentences at you like “The TMP is down. None of the travel agents is able to work anymore. Do something!” and you’re like, “Right… And the TMP is what again?” you could always check the map.

Of course, we also started with documentation: In Confluence, we created an overview page for each application. We collected information concerning the build processes, deployments, monitoring, logging, and special information.

For example, one of the apps was rebooted on a daily basis to close all the database connections that it had kept open. For another, someone was deleting the logs manually every few days to free up some disk space. For some of them, there was no monitoring in place apart from users complaining that the app wasn’t working anymore.

We also had to find out which repositories were associated with which applications. If you choose such meaningful app names as “Webtool1” and “Webtool2”, it can be hard to find the matching Gitlab repos.

Another thing that’s sure to confuse developers is to call your repository “AWS-Batch” and have it contain logic for the “Amadeus Web Service” rather than Amazon.

Documenting all this led to quite a few tasks that we wanted to get started with. But soon, we recognized why all those “quick fixes” had been implemented instead of fixing the real problems. It felt like every 5 minutes, a new issue came up. Just when you started to work on one task, another more pressing problem showed up that needed to be fixed immediately.

We put together a team of four people and started to improve the monitoring first so that alarms were sent out when something broke. We adapted the deployments to prevent application downtimes and got rid of some unnecessary manual tasks by implementing regular running jobs that did the work.

On top of that, we also extended the architecture with a so-called “Data Layer”. Its main purpose was to decouple and standardize data streams in their overall setup. You can imagine this as a Slack for Data where the Data Layer pushes data change events into different channels. At the same time, the Data Layer also provides constant, up-to-date access to all datasets that are stored within the data store itself. A system that is interested in a specific set of information can get a full dump of the available information and, in addition, listen for any updates to that data so that it can keep its local storage up to date. These examples may sound like small improvements, but they already had a big effect.

In general, the days are much more relaxed and plannable now. It doesn’t feel like we’re fighting one fire after the other but more like we’re doing standard feature development and bug fixing.

The journey with Trendtours has been a wild ride so far. Sometimes it was stressful, but it was also one of the most interesting projects I was ever involved in. Not one week went by without learning a bunch of new things, discovering a technology, or improving something that really made a difference.

Related

Keep Your Cloud Under Control: Infrastructure Best Practices

Unleash your inner DevOps

The mysterious case of the failing AWS Elastic Beanstalk instance