By Rudolf Donauer
Over the last couple of years, the data team at Project A supported multiple portfolio companies with building their data warehouse. A data warehouse is used for reporting/ data analysis and is considered a core component of business intelligence. By combining different data sources in one integrated system, the data can be used for many different use cases (e.g. advanced marketing attribution, churn prediction, sales funnel analysis) and is relevant for B2C and B2B companies alike. Before we go into details, here are a few trends we observed across the entire startup ecosystem:
- Data-driven workforce: demand for data is increasing as more employees are eager to gain access to reports, focus on KPIs and crunch data themselves
- Greater usage of tools: start-ups deploy a wide range of SaaS tools to run their business, thus creating valuable data points in various systems.
- Scalable infrastructure: there are new technologies available that are scalable, cost-efficient and easy to maintain
With these observations in mind, it has never been easier and cheaper to build a data stack. In general, we expect most of our ventures to build a data warehouse at some point, however, the timing will certainly differ depending on the industry, business model and the role data plays in it. We have seen DWHs being developed early (<10 employees) and late (>200 employees) and thus gained a better understanding of different reasons that led to building a more advanced data stack.
Data at early-stage startups
A typical startup around the Series A funding frequently has the following set-up: 1) application database 2) Google Sheets & .csv files 3) external tools. The Excel reporting is based on the output of multiple SQL queries, copied & pasted from one place to another. With each new report the complexity and time spent on maintenance increases, making it hard to debug and test. Also, there is no designated ownership of data-related topics. Don’t get us wrong here, we encourage reporting in tools and Google Sheets up to a certain stage because it’s fast, cheap and easily accessible. However, you should consider to step up your data infrastructure game when you start to feel too familiar with some of the indicators below:
- Built-in reporting by CRM tools is no longer sufficient and creates knowledge silos
- Heavy maintenance to sustain reporting in Excel / Google Sheets
- Present team that is capable to shape a data-driven culture
- The increasing volume of data exceeds the limits of current solutions
- Unclear definitions of core KPIs resulting in conflicting analysis and communication
- Growing demand for more advanced insights by business stakeholders
- Frequent data inconsistency issues and consequent lack of trust in reports
- Increasing complexity of calculations due to more advanced business logic
- Requirements to create a central report visualizing the performance of multiple data sources
- Many repetitive tasks such as manual historicization of KPIs per week
While some of the indicators are more visible (e.g. data inconsistencies); others usually only come to attention once the unstructured gathering and processing of data have become a huge hurdle. Sooner or later, almost every founder will eventually face some of the issues above, the solution for which is — a data warehouse.
Your first data hire
Before you can start building something awesome, there has to be somebody or even a team that can take ownership of this project (see #3). Therefore, it is essential to recruit a strong and senior generalist in the field of data. Think of somebody who can do data transformations in SQL and also understands business processes in sales, marketing and operations very well. Quite often, these people have a consulting or business development background and then went on to develop in the area of business intelligence. Generally, these profiles are hard to find but don’t make the mistake to hire a junior position or a ‘pure’ data scientist. It’s important to lay a solid foundation before you start getting fancy as initial mistakes in the set up will be costly once you are scaling. Here’s a list of traits that we’re looking for in our candidates:
- University degree and (more importantly) an affinity for numbers
- Relevant working experience in the fields of data and analytics in fast-paced, high-performance environments
- Familiarity with the data tool landscape to extract, transform and visualize data (Google Analytics, Snowplow, Snowflake, Big Query, PostgreSQL, Mode, Tableau, Looker,… — you name it)
- Excellent analytical, and strong communication skills as well as a hands-on mentality
Please keep in mind, having all relevant data available in one place (a single source of truth) is not the end but rather the beginning of a journey towards data-driven decision making.
I would like to thank my colleagues Selim, Cyprien, and Martin for their feedback.