We’ve seen it before, we’re seeing it again now with the recent Apple and Goldman Sachs alleged credit card bias issue, and we’ll very likely continue seeing it well into 2020 and beyond. Bias in AI is there, it’s usually hidden, (until it comes out), and it needs a foundational fix.
This past weekend we saw just how quickly the Apple Card, managed by Goldman Sachs, issue spiralled out of control. What started as a tweet thread with multiple reports of alleged bias (including from Apple’s very own co-founder, Steve Wozniak and his spouse), eventually led to a regulator opening an investigation into Goldman Sachs and their algorithm-prediction practices.
If we dig into the allegations to find the source of the problem, two issues stand out:
The algorithm making credit decisions for the Apple card is biased
The customer support teams from Goldman Sachs and Apple had zero insight into how the algorithm worked when they were asked to explain certain decisions
For (1) above we saw multiple responses to the original tweet thread that corroborated the allegation. Multiple people had faced a similar outcome where all other input factors being the same (or in some cases higher: like a higher annual income or credit score), and gender being the only difference, they were given significantly lower credit limits than their male spouse. This definitely comes across as problematic. Following the allegations, a separate group of non-related women and men ran a test experiment to check for bias and noticed significant differences in credit limits. Men with bad credit scores and irregular income got better offers than women with high incomes and good credit scores.
Was the algorithm biased against women? We can’t say for sure because we don’t know. This is the key – we don’t know what’s going on inside the algorithm to analyze the root-cause. But based on external outcomes, we’re guesstimating that this is likely what happened.
Impact of this problem
The primary issue here is with the black-box algorithm that generated Apple’s credit lending decisions. As laid out in the tweet thread, Apple Card’s customer service reps were rendered powerless to the algorithm’s decision. Not only did they have no insight into why certain decisions were made, they were unable to override it.
Humans don’t want a future ruled by algorithms, especially biased ones. Algorithms permeate all aspects of our lives today from lending and housing decisions to decisions in criminal justice. If we continue to let algorithms operate the way they do today, in the black-box and without human oversight, it results in a dystopian view of the world where unfair decisions are made by unseen algorithms operating in the unknown.
How could this issue have been avoided or at least handled better? Let’s come back to the initial statement above on how bias in AI needs a foundational fix. It’s very likely that in this particular credit lending decision, the algorithm was trained on biased data to begin with. To better understand this, let’s look at the high-level lifecycle of an AI solution:
Identify a use case for AI (credit lending, criminal justice, cancer prediction, etc.)
Access historical data to build models
Import this data to train and test models
Test and validate models
Deploy models into production
Monitor models in production to ensure optimal performance
If the data is flawed to begin with, this flaw permeates into everything that an algorithm does going forward. What we need is a way to check for bias and other issues in both data and models through all stages of the AI lifecycle. What we also need is human oversight – AI is just simply not ready to function on its own. We need humans-in-the-loop who will ensure that AI is functioning as it should.
In the Apple credit card example above, it’s likely this issue could have been avoided if humans had visibility into every stage of the AI lifecycle. They could have seen examples in the test and validate stage of how the model was behaving when a certain input factor was isolated and compared with the global dataset. They could have also had the ability to override an algorithm’s prediction in the test/validate stage if they felt it was unfair or incorrect. This would have resulted in an algorithm that was getting trained in the right way to produce accurate results when in production.
How Fiddler solves for this
This is exactly what we’re addressing at Fiddler. We’re working to unlock the AI black-box and empower all relevant stakeholders with more visibility into their AI than exists today. We’re working on infusing visibility and insight with explainable AI into every stage of an AI solution’s lifecycle: right from the data and training of it to when a model is deployed and in production.
We dive into the details to explain each prediction a model makes – whether a training, test, or production model – so users can understand the why behind individual decisions.
Our goal is to empower users with easy to grasp explanations of AI decisions. This empowers different stakeholders in an organization: data scientists are empowered to build the best and most accurate models, risk officers are empowered to publicize models with minimal-risk, and customer support representatives are empowered to answer customer questions around the why behind decisions.
With Fiddler’s explainable AI built into the AI lifecycle, it helps teams ensure they are compliant with regulations and are protecting their algorithms from inherent and hidden bias.
We’re continuing to build capabilities into Fiddler to ensure explainability is infused throughout the AI lifecycle and are working with a variety of customers to build this functionality into their existing and new models. If you’re interested in working with us, please reach out.
We recently chatted with Ganesh Nagarathnam, Director of Analytics and Machine Learning Engineering, at S&P Global. Take a listen to the podcast below or read the transcript. (Transcript edited for clarity and length.)
Fiddler: Welcome to Fiddler’s explainable AI podcast. I’m Anusha Sethuraman. And today I have with me on the podcast, Ganesh Nagarathnam, from S&P Global. He’s the director for analytics and machine learning engineering. Ganesh, thank you so much for joining us. We’re super excited to have you. Could you please tell us a little bit about yourself and what you do at S&P Global?
Ganesh: Thank you, Anusha for inviting me to the podcast. I’m currently working with S&P Global Market Intelligence line of business as a director for analytics and machine learning engineering. I have 20 plus years of experience in building our distributed and scalable software systems on a variety of technologies. From the likes of Java2, Java3 all the way to Java9. And right now, I’m heavily into the big data ecosystem on the public cloud. I have had opportunities to work with great firms from Verizon, Verizon Wireless, Goldman Sachs, JP Morgan, and now with S&P Global Market Intelligence.
Fiddler: Wonderful. Pulling up on that point of big data a little bit. How do you use Data and AI in your organization today?
Ganesh: So, at S&P Global I work with the innovation team and the product team where we work on an idea, or an innovation gets germinated by our interactions with our customers. Then the idea gets into our analytics team, which is my team, where we try to build the MVP. Our job is to wire the necessary technology stack in accordance with corporate standards and get the product out to market quickly. My primary focus is to get the machine learning models that are being developed into production as quickly as possible. So, having said that we use AI extensively. We build an idea; we build a model from simple to complex and we try to get that out. At S&P Global, it’s all about data. We have a humongous amount of data and we think about how we can provide actionable intelligence to our clients with the right amount of information at the right time. That’s our primary goal.
Fiddler I’m curious: what is a typical process for your team for creating AI solutions. You mentioned you come up with an idea and innovate on it. Can you touch upon a few of the details there in terms of how you go through that entire process of getting it into production?
Ganesh: So, as we discussed, we have lots of data and that’s a sufficient reason for us to explore or go down the AI route. Right from predictive analytics to interactive analytics and from visual analytics to simple data visualization, all we’re trying to do here is we’ll have to leverage that momentum to get to market quickly.
So, the typical process would be once we identify an innovative idea, we go to the drawing board and we discuss the product needs and we try to figure out what the appropriate technology stack is like. And then we invest 20% of our effort to deliver 80% value for our clients.
This means that we don’t want to iterate for too long and we involve our customers end-to-end when innovating. We then get this out to the product team and they take it to their customers to validate and ask for customer feedback. Then the process gets funneled with appropriate funding. So, the 20% of effort we’ve put into it doesn’t go to waste.
Fiddler: Great. Thank you for that insight. You did mention technology stack. So, I wanted to dig into that. What are the core AI and ML tools you use today and what are the main reasons why you use them?
Ganesh: That’s a great question. To begin with we are migrating into the public cloud and we have a lot of home-grown tools and external tools like Domino and AWS. On the AWS side, we use ML pipeline. We also use the Spark ML pipeline to do our preliminary feature engineering and then build the entire stack. Historically – if you look at Gartner’s report – around 68 to 70% of models being developed don’t get to see the production phase, meaning that they are sitting somewhere as Jupyter notebooks on desktops. So there is no set of well-defined processes around how you take an idea, develop a model, and then how you deliver it into production. That was the missing piece there.
Fiddler: I’m curious – you mentioned a lot of these models are just sitting there and that’s part of the challenge. What are some of the core challenges that your team is facing when you’re taking this AI solution all the way from inception to production?
Ganesh: The main focus for us is around how quickly we can show the dollar value by building the MVP. What we do is when we identify a solution, we remove our organizational hats (this organization or that organization) and we try to address the problem with a holistic approach. We figure out the appropriate solution with an open-minded approach. Once we find the right solution, we look at the boundaries or the bounding boxes in which we operate. Every organization has their specific set of boundaries. Then we look at those boundaries and see how we can factor in the solution which we are planning to build into the existing boundaries. We also take a closer look at these boundaries – are they legacy boundaries and is there something that can be tweaked so that the solution can be implemented seamlessly? That to me is a big challenge. On one side you’ve got to get to market pretty quickly, and on the other side, you have to work with the boundaries that you have within an organization. So how do you balance these two? That’s a challenge for us.
Fiddler: What tools do you think are lacking today to fill these gaps in the process?
Ganesh: We use Scrum in our day to day project stability. When it comes down to machine learning you have to be truly agile when building machine learning products. The reason why I’m saying that is suddenly with machine learning coming in and meeting software engineering, everybody is talking about ML Ops. How do you get to show the value by involving the product team right from the outset? How do you iterate faster? But the more important thing is how do we iterate smatter? That is the key to me.
To me the data science team should also be empowered to get the model from inception to production. If I really look at it, the half-life of a model is determined by its north star metric. The moment these metrics go off track, you will have to retrain the model within weeks if not days. So, do we have that edge? Are we ready? That’s the key thing to me. I wouldn’t call it as a gap – it is something which we are working on to streamline the process. And that is why we as an organization are marching full steam into ML Ops. We have defined our core set of drivers which are key to achieve a successful ML Ops culture within the organization.
Fiddler: Ganesh we didn’t spend a lot of time talking about exactly what S&P Global does for your clients. Can you tell me a bit about that in terms of things like what sort of risk and trust and safety issues you’re dealing with?
Ganesh: Risk – that brings in to the core concept of explainability. Right now, we haven’t seen any adverse effect by not being able to explain our models, but eventually we’ll all get there. S&P Global has four lines of business. One is the Platts, then we have Indices, and then Market Intelligence and the Ratings division. I am part of S&P Global market intelligence team. Our main focus is to gather raw data transcripts and generate sentiment scores and provide actionable intelligence to our clients.
But when it really comes down to the risks in building these machine learning models, I don’t think organizations, and not just S&P Global, I don’t think organizations across the board are ready to take that leap. With all of the regulations coming up like GDPR regulations, it is so important for explainability to be a key factor in your AI. Think about it – if you are making a prediction with your AI, and the customer is going to ask you why, and you’re not in a position to explain that, then that would cause the trust to go whacky. And on the other hand, you don’t want your models to introduce any bias. Right now, as part of the ML Ops framework and design thinking, we wanted to incorporate explainability right in the design phase, and not at the end of the machine learning model’s lifecycle. So, you don’t want the machine learning model to go into production and then figure out explainability there.
Fiddler: I’m sure not too many people have heard this concept of explainable AI – XAI – as you mentioned. So, can you tell us a little bit about this black box AI model as it exists today and the need for something like explainable AI?
Eventually, whenever we build systems in traditional software engineering, we have people – as a software developer when I started my career and get queries from the client, I go and then look into the database or look into the code and then figure out what was the reason – as simple as that. To me the same principle holds good when we go into the machine learning and AI world. Why did the AI system make a specific prediction or a decision or why didn’t the AI system do something else? When did the AI system that we built succeed or fail, or how can the AI systems correct the errors that are coming out of it? Those are some things which resonated with me.
To me there are traditional models like for example the classic random forest models or any of the Bayesian algorithms – these can be explained, but if you look at the core neural networks, they’re a little bit difficult. When talking about deep layered neural network and more than a million parameters – even the ResNet 50 or the VGG 16 have 100 million parameters. There’s hope that sufficient progress can be made so that we can have both power and accuracy for our machine learning models to predict something, and at the same time we don’t lose the required transparency and explainability.
And that to me is very important – it’s good for the business. The community has already started talking about it in one form or another. They are visualizing what-if scenarios during the design phase. That’s what we do. And that has become our core element for our ML Ops journey. We know that explainability is important and we might decide to work on that later. Sometimes customers might need XAI upfront – we don’t know. So, this is where we need to have a tradeoff. It’s a balance between the fine art of the power and accuracy of your model predictions and transparency.
Fiddler: What do you feel about how you might have to build some of these things? How do you do it- are you building these things or are you looking for external solutions to help you include explainable AI in the design phase?
Ganesh: There have been some interesting conversations around this but we haven’t given serious thought about it. When we iterate on new projects, we’re engaging product owners and the customers and then asking them if this needs to be explainable. Not every model needs to be explainable. You don’t want to invest in explainability just for the sake of it. But if it really comes down to a project which has strict GDPR regulations, it’s better to ask all the right questions upfront during the design phase. You may not have answers but startups like Fiddler might have answers to explainability. As data scientists and engineers representing these bigger firms, it is so important for us to ask those questions upfront in the design phase and then if needed, put in the right thought process and engage the right people in a discussion. And think about how you would explain it – do you want some kind of visual dashboard? If customers were to ask ‘why did my loan get rejected’ because of this or what are the important parameters that may go into this prediction. You have to go back and then explain it. You don’t want to lose your customer because of the time it takes for you to explain it. We are not there yet, but eventually we’ll get there.
Fiddler: It’s getting important especially with all these regulations you mentioned. I’m curious: it seems like you might not have come across a situation yet where this black box AI has negatively impacted your organization or have you already come across a situation like this?
Ganesh: No, not really, but I’m thinking ahead. The reason is when you really look at credit risk or taking a step away from the financial industry -let’s talk about the health industry. If you’re going to make serious predictions which have a human impact, it can become extremely problematic not only for the lack of transparency, but also for possible biases which are inherited by the algorithms. This could come from human prejudices or artifacts hidden in the training data that can lead to unfair or wrong decisions. How do you uncover this? Right now, every organization, every line of business, every sub project in these businesses have some amount of data science going on. But they might not get to see the bigger picture. So, as a technology leader, my job is to ask these questions upfront – how do we learn about explainable? It’s through my interactions and attendance in industry conferences. And that’s when you get to understand what’s going on in the space.
Fiddler: As we come to the close of this episode what do you think are some of the core things that teams will need to think about?
Ganesh: The core things I would say an organization should think right now – we need to be thinking about ML Ops. That’s where the heart is right now. We have machine learning models and human brains and we need to figure out how to take this idea, iterate quickly and get to market. The third piece is explainability. That’s where we have to be upfront in asking the right set of questions during the design phase and then take it forward from there. Let’s try to get better with ML Ops so that people are able to see value in ideas that are generated and involve the customer at every phase. Needless to say, explainability will kick in around the corner with regulations coming up, and then you won’t have a choice.
Fiddler: Well thank you so much Ganesh for sharing all your insights on this. We really appreciate your time. Thanks for joining us today.
Ganesh: Thank you so much, Anusha. It’s been a pleasure.
Fiddler will be at the very first TwiML conference next week on October 1 & 2! It’s a new conference hosted by the amazing folks at TwiML, and we can’t wait to explore and learn about the latest and greatest for AI in the enterprise.
At Fiddler, our mission is to enable businesses to deliver trustworthy and responsible AI experiences by unlocking the AI black box.
Where to find us
1) October 1, 11.20 -11.45am, Robertson 2
Session: Why and how to build Explainability into your ML workflow
Join our CEO & Founder, Krishna Gade to learn how Explainable AI is the best way for companies to deal with business risks associated with deploying AI – especially in regulation and compliance heavy industries. Krishna comes from a data and explainability background having led the team that built Explainable AI at Facebook.
2) October 1 & 2, Community Hall, Booth #6 (see our location on the map below)
Come chat with us about:
Why it’s important to provide transparent, reliable and accountable AI experiences
Risks associated with lack of visibility into AI behavior
How to understand, manage, analyze & validate models using explainability
Schedule a time to connect with us
If you’d like to set up a meeting beforehand, fill out this meeting form and we’ll be in touch to finalize dates & times. We’re excited to chat with you!
Fiddler’s very own Ankur Taly, Head of Data Science, will be speaking on September 12 on Explaining Machine Learning Models. Ankur is well-known for his contribution to developing and applying Integrated Gradients — a new interpretability algorithm for Deep Neural Networks. He has a broad research background and has published in several areas including Computer Security and Machine Learning. We hope to see you at his session!
At Fiddler, our mission is to enable businesses of all sizes to unlock the AI black box and deliver trustworthy and responsible AI experiences. Come chat with us about:
Risks associated with not having visibility into model outputs
Most innovative ways to understand, manage, and analyze your ML models
Importance of Explainable AI and providing transparent and reliable experiences to end users
Schedule a time to connect with us
If you’d like to set up a meeting beforehand, then fill out this meeting form and we’ll be in touch. We’re excited to chat with you!
Where to find us
September 11 & 12
We’ll be in the Innovator Pavilion: Booth #K10, so stop by and say hi!
As machine learning models get deployed to high stakes tasks like medical diagnosis, credit scoring, and fraud detection, an overarching question that arises is – why did the model make this prediction? This talk will discuss techniques for answering this question, and applications of the techniques in interpreting, debugging, and evaluating machine learning models.