Explainable Churn Analysis with MemSQL and Fiddler AI

Published

May 5, 2020

Last Edited

April 18, 2024

Kalyan Ram Chintalapati

Director of Engineering

Fiddler AI

Fiddler and SingleStore (formerly MemSQL) are partnering to offer the power of SingleStore to users of Fiddler's toolset for explainable AI - and to offer Fiddler's explainability tools to the many SingleStore customers who are already using, or moving to operational AI. To this end, the two companies are offering new, efficient ways to connect SingleStore self-managed software, and the SingleStore Helios managed service in the cloud, to Fiddler's toolset.

Fiddler provides a vital need, as AI moves out of labs and into the real world: Explainable AI. With Fiddler, you can describe why your AI models reached a given conclusion - why did one person get a loan, and another get selected for a clinical trial of a new drug? You need to have answers to these kinds of questions, beyond "the model said so." With Fiddler, business analytics and data science teams can build and deploy models that are inherently explainable, and provide explainability even for models that do not have it built in from the start.

SingleStore is very well-suited to the demands of operationalizing AI - that is, powering machine learning models and AI applications as they're put into production. SingleStore processes relational data, JSON data, time series data, geospatial data, and more, with blazing fast ingest speeds, eye-popping transaction performance, unmatched query responsiveness, and high concurrency. There are many resources available which demonstrate this, but among the best is this webinar on machine learning and AI from Eric Hanson of SingleStore.

In this blog post, we show how the SingleStore database and Fiddler work together to solve a knotty business problem: reducing churn among services customers. Solving this single problem cost-effectively can go far toward improving profitability in your business.

A version of this blog post also appears on the SingleStore website.

Reducing Churn

In today’s turbulent economy, customer needs are changing swiftly, causing business disruptions. As a leader, it’s more important than ever to understand the ‘why’ behind customers' actions, so you can empower your teams to build successful products and services. Having the right infrastructure and tools is critical to enable your teams to respond to these dynamic needs quickly.

Analyzing, predicting and monitoring churn accurately is critical for every data science or business intelligence team, especially in times like these.

By complementing SingleStore's industry-leading capability of enabling fast access to data at scale - across both streaming and historical datasets - with Fiddler’s Explainable AI Platform, which provides visibility, insights, and actionable analytics, business intelligence and analytics teams are perfectly positioned to respond to shifting customer needs and to ensure that customers are well served.

Challenges with Churn Analysis

There are a common set of analytics challenges to address when monitoring customer churn:

Descriptive analytics - Identifying possible reasons for customer churn
Diagnostic analytics - Ascribing actual customer churn to specific reasons
Predictive analytics - Predicting churn and the reasons for it
Prescriptive analytics - Identifying potential actions to reduce future churn

Solution

To begin with, all customer data needs to be effectively organized in one place, to enable teams to leverage AI-powered technologies to model the churn. The database needs to be able to handle streaming data in real time, so analytics are performed on the "latest and greatest" data.

SingleStore's fast streaming database is an ideal platform for organizing this data. SingleStore provides unmatched processing capabilities for both transactions and queries, vital for operational analytics, machine learning, and AI. By streaming customer data into SingleStore, users get all the interactive query capabilities, along with the ability to keep the data up-to-date within milliseconds.

We can then run churn analytics on this by connecting it with Fiddler, an explainable AI platform that helps data scientists and analysts build trust with AI decisions inside their organizations. Cutting-edge explainability algorithms help business users make sense of AI, getting answers to cause-and-effect questions on the drivers behind a prediction.

BI teams regularly iterate on multiple models to predict churn. Fiddler allows comparison of performance of multiple models to identify the most effective one for a given task. The Explainable AI Platform offers a lens to assess model performance and validate models. Since the precision of the churn model not only impacts performance but also decision-making, customers would like to iterate on the models, and monitor several versions of the model, to help in identifying problems and solutions.

Integrating SingleStore with Fiddler - As Easy as 1-2-3!

While many analytics tasks bring in data from a CSV file, data used in machine learning generally resides in a database like SingleStore. Bringing this data into Fiddler’s Explainable AI Platform as an ML dataset is the first step in the AI/ML development workflow.

Explainable Churn Analysis with MemSQL and Fiddler

There are a few ways to bring data into Fiddler. It can be imported directly from any database which Fiddler supports, such as SingleStore; uploaded as a CSV file in the browser; or loaded directly from a file store such as AWS S3.

1. Preparing Data in SingleStore

For the purpose of this blog post, we used the popular Telco Churn Dataset from Kaggle as an example. Let’s assume this Telco company saves all customer data in SingleStore in a database named churn_example and a table named telco_customer_churn. Here's the DDL; you can also access the DDL on Github. The sample dataset is available in for download from an S3 bucket.

DROP DATABASE IF EXISTS churn_example;
CREATE DATABASE churn_example;
USE churn_example;
CREATE TABLE telco_customer_churn
(
   customerID TEXT,
   gender TEXT,
   SeniorCitizen BOOLEAN,
   Partner TEXT,
   Partner TEXT,
   Dependents TEXT,
   tenure INT,
   PhoneService TEXT,
   MultipleLines TEXT,
   InternetService TEXT,
   OnlineSecurity TEXT,
   OnlineBackup TEXT,
   DeviceProtection TEXT,
   TechSupport TEXT,
   StreamingTV TEXT,
   StreamingMovies TEXT,
   Contract TEXT,
   PaperlessBilling TEXT,
   PaymentMethod TEXT,
   MonthlyCharges DECIMAL(13, 4),
   TotalCharges DECIMAL(13, 4),
   Churn TEXT,
   PRIMARY KEY (customerID)
);

For the purposes of this tutorial, we will populate telco_customer_churn with the information from the Kaggle Telco Churn dataset.

This can be done by creating a SingleStore Pipeline to load the data from S3.

CREATE or REPLACE PIPELINE `telco_customer_churn` AS
  LOAD DATA S3
'download.memsql.com/first-time/WA_Fn-UseC_-Telco-Customer-Churn.csv'
  CONFIG '{"region": "us-east-1"}'
  SKIP DUPLICATE KEY ERRORS
  INTO TABLE `telco_customer_churn`
  FIELDS
       TERMINATED BY ','
       OPTIONALLY ENCLOSED BY '"'
  IGNORE 1 LINES;
START PIPELINE `telco_customer_churn` FOREGROUND;

Once the data is in place, run SELECT * from telco_customer_churn LIMIT 10 to validate the data and the column names.

2. Connecting SingleStore to Fiddler

To add SingleStore as a data source, we need the authentication information to construct the database URI. We can add MemSQL as the type of database in Fiddler and furnish the rest of the details like the hostname, port, username, password, the database to connect to and add the connector.

The settings are validated via a connection to the database. The ability to add and remove database connectors is an Administrator-privileged operation, whereas usage of data from the connectors themselves is a non-administrators operation.

Once the connector for SingleStore is in place, users can then import data using the connector into Fiddler. To do this, begin Fiddler’s dataset upload workflow, to add this data as a dataset for churn analysis. Select the data source that was just added, then enter the SQL query to select the data to be imported into Fiddler.

In the background, a database connection is established, the SQL query is run, and its results are ingested into Fiddler. The data is then parsed and validated to infer the column names and data types, which are presented to the user for adjustment as needed.

3. Analyzing Churn using Explainable AI

Next, start analyzing the data. Glean more insights about the features like their mean, variance, and also look at the statistical covariance across all the features. Fiddler’s Explainable AI Platform allows us to analyze the features using feature distribution and mutual information charts to visualize their statistical dependencies, among other details.

In order to leverage Explainable AI, Fiddler offers 2 options.

Bring in a custom pre-trained model
Build an interpretable model

Data scientists can use option #1 to bring in their own ML models, built on open-source or custom ML platforms, and then use Fiddler to explain them. To do this, Fiddler offers a Python library that data scientists can use to upload a pre-trained model.

Alternatively, they can follow option #2, and use Fiddler to build an interpretable model on the platform. Once the models are ingested, Fiddler uses sophisticated explainability algorithms to compute the causal drivers for model predictions. And the explanations are presented in a collection of dashboards, consumable by business users as well as data scientists.

For example, an Account Manager in a Customer Success team can use the dashboard below to understand why this customer is likely to churn, with a probability of 75%.

As shown in the picture above, the top five reasons why this customer is likely to churn are:

Short tenure (only 4 months) on the telecom service
Lack of online security in her package
Being on a month-to-month contract
Not having tech support
And paying high monthly charges of $76

Using this information, the Account Manager can then intervene and fiddle with the inputs, and examine what-if scenarios. For example, they can see what would happen if they tried a couple of actions.

Offer the customer “TechSupport”
Reduce her Monthly Charges from $76 to $60

Two actions would reduce the customer’s likelihood to churn from 75% to 40%.

In addition to simple, business user-facing explanations, Fiddler also supports advanced slicing and explanation capabilities to go deeper into the data and the models - for instance, to understand why the cohort of, for example, low-tenure users are churning.

Slicing to explain a cohort of high churn, low tenure users

4. Using SingleStore and Fiddler Together in Production

Once a churn model is operationalized, Fiddler can be connected to a live SingleStore database to continuously monitor, predict, and explain the churn model.

After the model is live, users can monitor the performance in production and close the feedback loop. Fiddler will connect with SingleStore to score the model in a continuous manner and monitor performance. That way our users can track business KPIs, performance metrics, and setup alerts when something goes out of the ordinary. Fiddler’s Explainable Monitoring features help analysts and data scientists to keep track of the following:

Feature Attributions: Outputs of explainability algorithms that allow further investigation, helping to understand which features are the most important causal drivers for model predictions within a given time frame.
Data Drift: Track the data coming from SingleStore, so that analysts and data scientists can get visibility into any training-serving skew.
Outliers: The prediction time series from the model outputs, and outliers that are automatically detected for egregious high-churn or low-churn predictions.

Monitoring dashboard showing Outliers (orange dots) in Churn Prediction

SingleStore is well-suited to work in tandem with an Explainable AI Platform like Fiddler:

Speed: the faster the database runs, the more up-to-date monitoring is, and more explanations can be tested against more data in a given time frame. This increases the functional value of Fiddler.
Scale: SingleStore, as a fully distributed database, can easily be scaled out as needed. There's no barrier to handling more data, or to speeding up processing of existing data volumes.
SQL: SingleStore is a relational database with native ANSI SQL support. It fully preserves schema, which serves as a valuable input to Fiddler, while connecting to the wide range of BI tools that depend on SQL. (Also, see this take on NoSQL vs NewSQL.)
Spark: In addition to supporting a wide range of monitoring and analytics tools, including Fiddler, SingleStore also boasts the MemSQL Spark Connector 3.0, speeding model processing via predicate pushdown.
Kafka: Streaming data platforms such as Kafka are often used to bring the "latest and greatest" data to machine learning models, without delay. Kafka partitions map directly to SingleStore leaf node partitions, allowing very rapid, parallel ingest and exactly-once processing.
Converged data: In addition to relational data, SingleStore handles semi-structured JSON data, full-text search for unstructured data, geospatial data, and time series data, with specific time series functionality.

In addition to downloadable, self-managed software that runs on all platforms, from on-premises to any cloud, SingleStore offers MemSQL Helios, an elastic managed service in the cloud. With Helios, you can minimize the time to stand up, and the operational effort to run, an advanced, feature-rich database.

Like SingleStore, Fiddler works with a wide range of BI tools. You simply export data from Fiddler to a range of tools. The data from Fiddler can then be integrated into dashboards, reports, and the answers to interactive queries.

Conclusion

The seamless integration between SingleStore and Fiddler enables easy import of data from inside SingleStore to Fiddler’s Explainable AI Platform, for ML insights in minutes. Data science and analytics teams working on customer churn can upload their pre-trained models, or quickly generate interpretable models on Fiddler.

Once the models are in place, Fiddler users can easily create interactive dashboards for users in the business. They can also export explanations into their favorite BI tool of choice.

Business users, such as account managers, can then self-serve to understand why a customer is likely to churn, run what-if scenarios, and fiddle with data to see what actions they can take to save a customer from churning.

Explainable Churn Analysis with SingleStore and Fiddler