We’re excited to introduce Anusha Sethuraman, the newest member of our team. Anusha joins us as our Head of Product Marketing.
Anusha comes from a diverse product marketing background across startups and enterprises, most recently on Microsoft’s AI Thought Leadership team where she spearheaded the team’s storytelling strategy with stories being featured in CEO and exec-level Keynotes. Before this, she was at Xamarin (acquired by Microsoft) leading enterprise product marketing where she launched Xamarin’s first decision-maker event and was instrumental in creating the integrated Microsoft + Xamarin story. And prior to that, she was at New Relic (pre-IPO) leading product marketing for New Relic’s mobile monitoring product.
Anusha believes in a world where AI is responsible,ethical, and understandable. In her own words:
“The idea of democratizing AI is great, but even better – democratizing AI that has ethics and responsibility inbuilt. Today’s AI-powered world is nowhere close to being trustworthy: we still run into everyday instances of not knowing the why and how behind the decisions AI generates. Fiddler’s bold ambitions to create a world where technology is built responsibly, where humanity is not only putting AI to the best use possible across all industries and scenarios but creating this ethically and responsibly right from the start is something I care about deeply. I’m very excited to be joining Fiddler to lead Product Marketing and work towards building an AI-powered world that is understandable, transparent, explainable, and secure.”
In today’s world, data has played a huge role in the success of technology giants like Google, Amazon, and Facebook. All of these companies have built massively scalable infrastructure to process data and provide great product experiences for their users. In the last 5 years, we’ve seen a real emergence of AI as a new technology stack. For example, Facebook built an end-to-end platform called FBLearner that enables an ML Engineer or a Data Scientist build Machine Learning pipelines, run lots of experiments, share model architectures and datasets with team members, scale ML algorithms for billions of Facebook users worldwide. Since its inception, millions of models have been trained on FBLearner and every day these models answer billions of real-time queries to personalize News Feed, show relevant Ads, recommend Friend connections, etc.
However, for most other companies building AI applications remains extremely expensive. This is primarily due to a lack of systems and toolsfor supporting end-to-end machine learning (ML) application development — from data preparation and labeling to operationalization and monitoring .
The goal of this post is 2-fold:
List the challenges with adopting AI successfully: data management, model training, evaluation, deployment, and monitoring;
List the tools I think we need to create to allow developers to meet these challenges: a data-centric IDE with capabilities like explainable recommendations, robust dataset management, model-aware testing, model deployment, measurement, and monitoring capabilities.
Challenges of adopting AI
In order to build an end-to-end ML platform, a data scientist has to go through multiple hoops of the following workflow .
End-to-End ML Workflow
A big challenge to building AI applications is that different stages of the workflow require new software abstractions that can accommodate complex interactions with the underlying data used in AI training or prediction. For example:
Data Management requires a data scientist to build and operate systems like Hive, Hadoop, Airflow, Kafka, Spark etc to assemble data from different tables, clean datasets, procure labeling data, construct features and make them ready for training. In most companies, data scientists rely on their data engineering teams to maintain this infrastructure and help build ETL pipelines to get feature datasets ready.
Training models is more of an art than science. It requires understanding which features work and what modeling algorithms are suitable to the problem at hand. Although there are libraries like PyTorch, TensorFlow, Scikit-Learn etc, there is a lot of manual work in feature selection, parameter optimization, and experimentation.
Model evaluation is often performed as a team activity since it requires other people to review the model performance across a variety of metrics from AUC, ROC, Precision/Recall and ensure that model is calibrated well, etc. In the case of Facebook, this was built into FBLearner, where every model created on the platform would get an auto-generated dashboard showing all these statistics.
Deploying models requires data scientists to first pick the optimal model and make it ready to be deployed to production. If the model is going to impact business metrics of the product and will be consumed in a realtime manner, we need to deploy it to only a small % of traffic and run an A/B test with an existing production model. Once the A/B test is positive in terms of business metrics, the model gets rolled out to 100% of production traffic.
Inference of the models is closely tied with deployment, there can be 2 ways a model can be made available for consumption to make predictions.
batch inference, where a data pipeline is built to scan through a dataset and make predictions on each record or a batch of records.
realtime inference, where a micro-service hosts the model and makes predictions in a low-latency manner.
Monitoring predictions is very important because unlike traditional applications, model performance is non-deterministic and depends on various factors such as seasonality, new user behavior trends, data pipeline unreliability leading to broken features. For example, a perfectly functioning Ads model might need to be updated when a new holiday season arrives or a model trained to show content recommendations in the US may not do very well for users signing up internationally. There is also a need for alerts and notifications to detect model degradation quickly and take action.
As we can see, the workflow to build machine learning models is significantly different from building general software applications. f models are becoming first-class citizens in the modern enterprise stack, they need better tools. As Tesla’s Director of AI Andrej Karpathy succinctly puts it, AI is Software 2.0 and it needs new tools .
If we compare the stack of Software 1.0 with 2.0, I claim we require transformational thinking to build the new developer stack for AI.
We need new tools for AI engineering
In Software 1.0, we have seen a vast amount of tooling built in the past few decades to help developers write code, share it with other developers, get it reviewed, debug it, release it to production and monitor its performance. If we were to map these tools in the 2.0 stack, there is a big gap!
What would an ideal Developer Toolkit look like for an AI engineer?
To start with, we need to take a data-first approach as we build this toolkit because, unlike Software 1.0, the fundamental unit of input for 2.0 is data.
Integrated Development Environment (IDE): Traditional IDEs focus on helping developers write code, focus on features like syntax highlighting, code checkpointing, unit testing, code refactoring, etc.
For machine learning, we need an IDE that allows easy import and exploration of data, cleaning and massaging of tables. Jupyter notebooks are somewhat useful, but they have their own problems, including the lack of versioning and review tools. A powerful 2.0 IDE would be more data-centric, starts with allowing the data scientist to slice and dice data, edit the model architecture either via code or UI and debug the model on egregious cases where it might be not performing well. I see traction in this space with products like StreamLit  reimagining IDEs for ML.
Tools like Git, Jenkins, Puppet, Docker have been very successful in traditional software development by taking care of continuous integration and deployment of software. When it comes to machine learning, the following steps would constitute the release process.
Model Versioning: As more models get into production, managing the various versions of them becomes important. Git can be reused for models, however, it won’t scale for large datasets. The reason to version datasets is that to be able to reproduce a model, we need the snapshot of the data the model was trained upon. Naive implementations of this could explode the amount of data we’re versioning, think 1-copy-of-dataset-per-model-version. DVC  which is an open-source version control system is a good start and is gaining momentum.
Unit Testing is another important part of the build & release cycle. For ML, we need unit tests that catch not only code quality bugs but also data quality bugs.
Canary Tests are minimal tests to quickly and automatically verify that the everything we depend on is ready. We typically run Canary tests before other time-consuming tests, and before wasting time investigating the code when the other tests are failing . In Machine Learning, it means being able to replay a previous set of examples on the new Model and ensuring that it meets certain minimal set of conditions.
A/B Testing is a method of comparing two versions of an application change to determine which one performs better . For ML, AB testing is an experiment where two or more variations of the ML model are exposed to users at random, and statistical analysis is used to determine which variation performs better for a given conversion goal. For example in the dashboard below, we’re measuring click conversion on an A/B experiment dashboard that my team built at Pinterest, and it shows the performance of the ML experiments against business metrics like repins, likes, etc. CometML  lets data scientists keep track of ML experiments and collaborate with their team members.
Debugging: One of the main features of an IDE is the ability to debug the code and find exactly the line where the error occurred. For machine learning, this becomes a hard problem because models are often opaque and therefore exactly pinpointing why a particular example was misclassified is difficult. However, if we can understand the relationship between feature variables and the target variable in a consistent manner, it goes a long way in debugging models, also calledmodel interpretability, which is an active area of research. At Fiddler, we’re working on a product offering that allows data scientists to debug any kind of models and perform root cause analysis.
Profiling: Performance analysis is an important part of SDLC in 1.0 and profiling tools allow engineers to figure out slowness of an application and improve it. For models, it is also about improving performance metrics like AUC, log loss, etc. Often times, a given model could have a higher score on an aggregate metric but it can be performing poorly on certain instances or subsets of the dataset. This is where tools like Manifold  can enhance the capabilities of traditional performance analysis.
Monitoring: While superficially, application monitoring might seem similar to model monitoring and could actually be a good place to start, we need to track a different class of metrics for machine learning. Monitoring is crucial for models that automatically incorporate new data in a continual or ongoing fashion at training time, and is always needed for models that serve predictions in an on-demand fashion. We can categorize monitoring into 4 broad classes:
Feature Monitoring: This is to ensure that features are stable over time, certain data invariants are upheld, any checks w.r.t privacy can be made as well as continuous insight into statistics like feature correlations.
Model Ops Monitoring: Staleness, regressions in serving latency, throughput, RAM usage, etc.
Model Performance Monitoring: Regressions in prediction quality at inference time.
Model Bias Monitoring: Unknown introductions of bias both direct and latent.
I walked through 1) some challenges to successfully deploying AI (data management, model training, evaluation, deployment, and monitoring), 2) some tools I propose we need to meet these challenges (a data-centric IDE with capabilities like slicing & dicing of data, robust dataset management, model-aware testing, and model deployment, measurement, and monitoring capabilities). If you are interested in some of these tools, we’re working on them at Fiddler Labs. And if you’re interested in building these tools, we would love to hear from you at https://angel.co/fiddler-labs
A few weeks ago, I was on the panel for Explainable AI at the PRMIA Fintech Horizons Conference in SF. The participants were predominantly from the finance industry like Banks, Hedge Funds and Fintech Startups.
We had a very interesting discussion on topics like:
Automated AI vs Human-Centered AI
How catastrophic can it be when a Business Risk is left unmanaged?
Example: Boeing 737 Max 8 failure
Special challenges in Quantitative Finance with AI. Can we quantify Model Risk in terms of a $-value?
Who in the organizations needs to care about Explainable AI? Is it the Data Scientist? Chief Risk Officer? Business Owner?
Bob Mark – Former CRO & Treasurer of CIBC, Managing Partner Black Diamond Risk – Moderator
We’re excited to introduce Ankur Taly, the newest member of our team. Ankur joins us as the Head of Data Science.
Previously, he was a Staff Research Scientist at Google Brain where he worked on Machine Learning Interpretability and was most well-known for his contribution to developing and applying Integrated Gradients — a new interpretability algorithm for Deep Neural Networks. Ankur has a broad research background and has published in several areas including Computer Security, Programming Languages, Formal Verification, and Machine Learning. Ankur obtained his Ph.D. in CS from Stanford University and a B. Tech in CS from IIT Bombay.
Ankur passionately believes in the need for Explainable AI and is excited to join Fiddler labs. In his own words:
“Explainability is one of the key missing pieces in the ongoing machine learning (ML) revolution. As ML models continue to become more complex and opaque, being able to explain their predictions is getting increasingly important and challenging. The ability to explain a model’s predictions would enable users to build trust in the model, business stakeholders to derive actionable insights and strategies, regulators to assess model fairness and risk, and data scientists to iterate on the model in a principled manner. In response to this need, there has been a large surge in research on explaining various aspects of ML models. Fiddler Labs has taken up the ambitious task of driving this research to industrial practice, by making it available as a cutting-edge enterprise product catering to several business needs. This is incredibly promising, and I am super excited to join Fiddler on this journey!”
As we stand on the brink of a technological transformation that will fundamentally alter the way we live, work, and relate to one another – I am reminded of this famous line from one of the Tennyson’s poems I used to recite as a young boy growing up.
Artificial Intelligence is set to dramatically change human lives – optimize jobs and activities, minimize risks, help us make effective decisions – however there are many questions that remain to be answered and many concerns that need to be addressed. It is a no-brainer that every enterprise whether public or private needs to be adopting AI today, at the risk of being obsolete and outgunned by the competition.
There are two main challenges that businesses face today with adopting AI
Higher costs of building AI products.
Increasing lack of trust in AI.
Operationalizing AI applications continues to be prohibitively time-consuming and expensive even for the most sophisticated companies. This is primarily because the tools that researchers and scientists use for building AI models are not scalable for real production systems. These systems need end-to-end AI platforms for everything from data preparation and labeling, to operationalization and monitoring. Additionally, the ROI ambiguity of AI within the enterprise makes them pursue a ‘golden use case’ thus holding back many from fully exploiting its potential. Existing enterprise AI platforms, especially those deployed on-premise, have poor UX, limited features and lack distributed computing. The only alternative for enterprises seems to be to either move to cloud offerings like AWS, Google Cloud or Azure ML, or start custom engineering projects.
Enterprises therefore are investing in significant R&D to build custom AI infrastructure that they need. The biggest tech companies have had a considerable head start here. First, they pioneered data collection, data engineering, and ML frameworks. Now they are building a new kind of proprietary infrastructure in-house e.g. FBLearner at Facebook, TFX at Google, Michelangelo at Uber, Notebook Data Platform at Netflix, Cortex at Twitter and BigHead at Airbnb. We call this kind of infrastructure, the ‘AI Engine’. This infrastructure manages compute loads, automates deployments for ML models, and provides tools for managing AI projects across the organization.
Trust in AI is a looming societal and technological problem.
One of the biggest questions people are concerned about: Is bias creeping into AI, or is it already present in the data that is fueling models? Fairness in AI poses a difficult and subjective question. Sometimes there can be a trade-off between accuracy and fairness of ML models. In sensitive applications such as healthcare or criminal justice, this trade-off is often undesirable as any increase in prediction error could have dangerous consequences. AI fairness is tricky because we cannot design a randomized experiment with race or gender. Instead, we need to understand and monitor data throughout the lifecycle of AI from generation, collection, sampling, feature engineering etc. It’s a good thing that as a society, we have become more sensitive about how we use data. There is also a clear consensus emerging that businesses need to be alert for the dangers of bias and its harmful effects on AI-powered decision-making.
To solve these problems, we have started Fiddler Labs to help enterprises adopt cutting edge AI by simplifying operationalization, removing the ambiguity around ROI, and crucially, by creating a culture of trust.
Converting data into intelligence has been our team’s multi-decade journey through companies like Facebook, Google, Twitter, Pinterest, Amazon, Lyft, PayPal and Microsoft. Over the years, we have seen many new tools, algorithms, and systems built, deployed and scaled in production. Our team has worked with business owners, data scientists, business analysts and devops to develop a deep understanding of the challenges they face day-to-day in converting data into intelligence and insight. These experiences led us to build a new kind of AI Platform that’s both more trustable and more efficient: The ExplainableAI Engine.