Earning Calls Deception Analysis
Student Capstone Project
A team of MDS Vancouver students partnered with a global, privately owned investment firm to determine whether or not deception in earnings calls could be detected using data science.
An earnings call is a teleconference or webcast in which a public company discusses the financial results of a reporting period. Being able to detect deception would provide the firm’s analysts additional insights on companies and executives, and could reduce manual time and effort and help shortlist calls for further inspection.
The MDS Vancouver students looked at calls at two levels: (1) the call level – where they asked if all the responses by all executives were considered deceptive, and (2) the response level – where they asked if the response to a question was deceptive.
The team worked with approximately 20,000 transcripts from about 500 different publicly traded companies. The transcripts obtained spanned a 15-year period (2006 to 2020) and were split into “speaker sections,” such as the Q&A portion, with each row of data representing an individual speaker.
Research suggested that the Q&A section of a transcript would be the best place to search for deception. This was based on the idea that it is not a rehearsed section, and so, the team would be able to pick up on natural linguistic cues versus overly rehearsed or prepared statements.
To find deceptive calls or statements in the data set, the MDS Vancouver students separated their capstone project into four sub-objectives. The first was to establish a meaningful proxy for deception in lieu of labelled data and the second was to engineer linguistics features suggested by literature that would map the deception. The team then had to find a way to evaluate the data and, finally, they had to identify anomalies in calls, through unsupervised techniques, to see if they could add any insights to their supervised approach.
In the end, the MDS Vancouver team provided the investment firm with a reproducible report that detailed their methods and findings. The students also provided a Python package with a robust featured engineering pipeline and detailed documentation that will allow the firm’s data science team to quickly understand, modify, and build similar NLP features going forward.