Analyzing Consumer Complaints with Natural Language Processing

The following post is a summarized overview of my advanced data science research project. For the full code and a more in depth overview, view the project on GitHub.

Consumer Complaint Processing

Encouraging Corporate Social Responsibility

As financial institutions experience increasing scrutiny over customer service practices, understanding the patterns, impacts, and predictive capabilities of complaint data has become essential for practical risk management and enabling continued success of companies.

The Consumer Financial Protection Bureau serves as a key agency for regulatory transparency. When their disclosure of consumer complaints became publicly available in 2013, corporations proactively worked to improve their corporate social responsibility (CSR) (Wang, Tsang, Xiang, and Yan, 2024) .

Consumers pay attention to data on the behaviors of companies they intend to do business with, negative sentiment is contagious. In 2016, Wells Fargo was involved in a cross-selling scandal that quickly spread to their unrelated mortgage products. The negative effects of poor customer experience persist (Noel and Osman, 2024).

Presentation for my Advanced Data Science research project

Keywords: Regulatory transparency, Consumer complaints, Severity scoring, Natural Language Processing (NLP), Logistic Regression

Consumer complaint data is from the CFPB, 2023

Key Terminology

Corporate Social Responsibility (CSR):
When companies voluntarily take responsibility for their impact on society, the environment, and stakeholders beyond just making profits

Sentiment Contagion:
Negative (or positive) customer feelings about one aspect of a company or product spread to completely unrelated aspects of that same company

Research Questions

Can consumer complaint narratives be efficiently analyzed for severity using a combined approach of keyword scoring and sentiment analysis?
Can high-severity complaints be predicted from product and issue categories at the time of complaint submission?

Background

Financial institutions are using more sophisticated technology to analyze customer complaints. Athira, Adith, and Gupta (2025) showed that machine learning models, especially transformer-based approaches like DistilBERT and RoBERTa, can automatically classify complaints by sentiment, severity, and emotion. Adding sentiment analysis to traditional financial metrics also improves early warning systems. Pitta de Jesus and Besarria (2023) found that including bank manager sentiment from quarterly reports significantly improved machine learning models’ ability to predict bank insolvency risk.

The original data download contained 1,292,107 rows (CFPB, 2023). The data used in the severity scoring and predictive modeling was filtered to only include rows where ‘Consumer complaint narrative’ was not null. The filtered dataset contained 487,445 rows; 37% of the original data. The choice to limit to these filtered points proved to be a key limitation to predictive modeling.

BERT pipelines

The natural language processing (NLP) was split into emotion analysis using DistillBERT (Sanh et al., 2019) and sentiment analysis with RoBERTa (Liu et al., 2019).

72.3% of the consumer complaint narratives were classified as ‘negative’. For the emotional analysis, 49.7% of the narratives were labelled as ‘neutral’, correlating to the high number of straight forward consumer responses. Surprisingly, 255 of the consumer complaints were labeled with the emotion ‘joy’. This small group of responses closely aligns with the ‘positive’ sentiment consumer narratives. These narratives included a mix of responses expressing gratitude for the CFPB, as well as a number of sarcastic entries such as, “I’m so happy to know that I’m not the only person that was tricked into applying for a credit card.”

BERT struggles with sarcasm.

Severity Scoring

Sentiment analysis was balanced with a keyword count algorithm. I worked through six iterative cycles to build the scoring lists, sampling the results after each iteration. The higher distribution of ‘low’ severity is indicative of the need to refine this list further.

High (> .7): severe customer harm such as fraud, harassment, or financial distress
Medium (.3 to .7): frustration with service issues
Low (< .3): routine concerns, inquiries or gratitude

The results of this severity scoring proved highly effective for quickly filtering urgent consumer complaints. Within industry, this list could be refined to optimally match the needs of the individual corporation and has significant time saving potential.

If each of the 487,445 consumer complaints took 30 seconds to process, this would result in 4,062 hours of labor (2+ FTEs annually). The BERT pipeline using my home GPU finalized in 1.8 hours.

Top Severity Consumer Complaint Narratives	Score
This is my 3rd request […] my credit is being ruined, im being denied and on the verge of homelessness	.973
[…] This is over $ XXXX and they do not care one bit. I am at a total loss and this fake bank needs to be out of business as they have the worst service and are a fraudulent organization	.971
The company is harassing me and my family. We have tried to work with them and they will not give us any grace. They call over 5x a day and we told them we can not pay the amount they need and they continue to threaten to send us to collections. […]	.971
I have no knowledge of this account; it is fraudulent […]	.970
My credit was damaged, they have defamed my character with these fraudulent charges! […]	.970

Predicting High Severity Consumer Complaints

Logistic regression and random forest models were developed using ‘Product’, and ‘Issue’ as explanatory variables to predict high-severity consumer complaints. The 110,131 high-severity cases, identified through the BERT pipeline, represented 22.6% of the total narratives.

None of the models had adequate predictive power. With ROC-AUC values around 0.61, using this model is not much better than a random guess. The target variable of ‘high_severity’ consumer complaints is problematic, as the score itself is not an observed data point. Institutions seeking to predict high severity consumer complaints need a better indicator variable for high severity complaints, as well as predictors beyond product and issue type.

Financial institutions need more than publicly available CFPB variables to predict high-severity complaints effectively. However, one actionable insight emerged: complaints with consumer narratives deserve priority review. Although only 37.7% of complaints included narratives, all major banks in this study showed narrative rates above 50%. This suggests that when customers invest time writing detailed narratives, severity is likely higher.

Top Logistic Regression Features	Coefficient
Product_Credit reporting	1.35
Issue_Problem with a lender or other company charging your account	1.28
Issue_Problem with a purchase shown on your statement	1.16
Issue_Incorrect exchange rate	-1.11
Issue_Problem with a purchase or transfer	1.02

Credit Reporting Correlation

The CFPB data visualization (below) shows a significant increase in consumer complaints related to the credit reporting product from 2023 to 2025 (CFPB, 2025). While the logistic regression model lacked statistical significance, this product type did become the overwhelming source of consumer complaints in the current year.

In addition to incorporating internal data for more predictive corporate level modeling, future research should extend to a five-year window and focus on trends within the ‘Credit reporting’ product.

Conclusions

While the severity scoring proved effective for quickly identifying consumer complaints for high priority triage, predicting severity from product and issue categories alone proved insufficient, suggesting that financial institutions need more than publicly available CFPB variables to predict high-severity complaints effectively.

The sentiment analysis revealed a subset of consumer complaint narratives labeled as ‘joy’:

“[…] I am pleased to acknowledge that the late payment entries have been corrected, and I want to thank you for your diligence in resolving this matter. Your swift action in rectifying the reporting errors is greatly appreciated […].” Consumer ‘joy’ response, CFPB 2023

These narratives highlighted gratitude for the role the CFPB played in resolving their financial disputes. This finding, combined with Wang et al. (2024) research showing that public complaint data improves corporate social responsibility, demonstrates that regulatory transparency produces tangible benefits. When consumer complaint data is publicly available, institutions are held to higher standards and consumers benefit from increased accountability.

References

Athira, Adith, M., & Gupta, D. (2025). Effective complaint detection in financial services through complaint, severity, emotion and sentiment analysis. Procedia Computer Science, 258, 2220-2231.

Consumer Financial Protection Bureau. (2023). Consumer complaint database [Data set]. U.S. Government. https://www.consumerfinance.gov/data-research/consumer-complaints/

Consumer Financial Protection Bureau. (2025). Consumer complaint database visualization tool [Interactive database]. U.S. Government. Retrieved October 5, 2025 from https://www.consumerfinance.gov/data-research/consumer-complaints/

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach [arXiv preprint arXiv:1907.11692]. arXiv. https://arxiv.org/abs/1907.11692

Maxham, J. G., III, & Netemeyer, R. G. (2002). A longitudinal study of complaining customers’ evaluations of multiple service failures and recovery efforts. Journal of Marketing, 66(4), 57-71.

Noel, M. D., & Osman, S. M. I. (2024). Bank scandal contagion: Evidence from the Wells Fargo cross-selling scandal. Global Finance Journal, 63, 101044.

Pitta de Jesus, D., & Besarria, C. N. (2023). Machine learning and sentiment analysis: Projecting bank insolvency risk. Research in Economics, 77, 226-238.

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv. https://arxiv.org/abs/1910.01108

Wang, Y., Tsang, A., Xiang, Y., & Yan, S. (2024). How can regulators affect corporate social responsibility? Evidence from regulatory disclosures of consumer complaints in the U.S. The British Accounting Review, 56, 101280.

Analyzing Consumer Complaints with Natural Language Processing