April 11, 2017

Business Intelligence at TextNow

Written by developer-textnow


Here at TextNow, we take data very seriously. Data is used across the company to drive critical business, product and engineering decisions of new user experiences, new features development, improving reliability, quality, and forecasting growth. Streaming event logs are analyzed and made actionable using our custom Business Intelligence (BI) infrastructure.

“In God we trust. All others must bring data.” — W. Edwards Deming


BI Infrastructure

When building the BI infrastructure for TextNow, our goal was to build something that was self-managed, scalable and easy-to-use for our internal customers. As a result, we focused on the important components of the Data Pipeline, Data Warehouse & Data Visualization.

Data Pipeline

There are two main sources of the data: instrumentation in source code that sends events through AWS Kinesis Firehose, and production MySQL dumps via AWS Data Pipeline.

For streaming data we use AWS Kinesis Firehose, which is the easiest way to load streaming data into Amazon Web Services (AWS). It helps us capture, transform, and load streaming data into Amazon Redshift (with a hop in Amazon S3) enabling near real-time analytics and easy plug-and-play with our various internal visualization tools. Amazon Kinesis Firehose is a self-managed service that dynamically scales to the volume of our data based on peak traffic. It provides multi-day queue, batching, compression and encryption for granular manipulation of streaming data.

For MySQL data dumps and extracting data from 3rd party APIs we use AWS Data Pipeline, which helps us easily create complex data processing workloads that are fault-tolerant, scheduled, and highly available.

500 Million+ events a day via 20 streams

Data Warehousing

Our main data warehouse is Amazon Redshift which has a lot of benefits.

  • Petabyte-scale data warehouse but still performant

  • Using columnar data storage technology and massively parallel processing (MPP)

  • Easy to unload/load various formats of data to/from S3

  • SQL based

Cluster Specs:

  • Nodes: 8

  • CPU: 13 EC2 Compute Units

  • Memory: 31 GiB per node

  • Storage: 2 TB HDD per node

We also have a faster MySQL database in RDS that we use for summary tables and faster reporting of Key Performance Indicators (KPIs).

Data Visualization

Collecting high volumes of data is just the beginning. The real magic begins when this data is turned into insights via data visualizations. We evaluated a lot of existing tools in the market and finally decided to use Airbnb’s open-source data exploration tool called Superset.

We cloned their Github and brought up our own version in Amazon Elastic Beanstalk.Superset provides:

  • Easy-to-add databases using SqlAlchemy URI

  • Intuitive drag-and-drop interface to build rich set of visualizations

  • Built into statistical functions

  • Extensible, high granularity security model allowing intricate rules

  • Fast loading dashboards with configurable caching

Measuring Success

Here are just a few of the results we saw after implementing our BI infrastructure:

  • With detailed Acquisition funnel analytics, we are able to stay on the bleeding edge of performance marketing and acquire users at scale — upwards of 4.5 Million new registrations in the last 3 quarters.

  • With automated alerts for certain categories of spam, our response time to tackle specific types of abuse has decreased to just a few hours versus a few days previously.

  • With customized sales, we increased premium conversion by ~15%.

Business Intelligence empowers us to measure and improve business KPIs, net promoter score (NPS), spam detection, app performance, new feature development, and many other things so we can continuously improve the user experience.

BI is part of Growth at TextNow, and we’re actively looking for web developers, engagement managers and data scientists to join our talented team in San Francisco. Click here to learn more about our career opportunities.