ApplicantScore – Sarah Amundrud

Overview

As a Data Science Fellow at Insight, New York, I consulted for a tech company who receives more than 12,000 job applications per year across several different domains (e.g., Data Science, Data Engineering, etc.). Like many companies, my client faces the problem of receiving a large volume of job applications, resulting in significant cost in terms of time spent processing the applications manually, as well as a slow response rate to applicants of up to several weeks. I used natural language processing (NLP) and targeted feature engineering to develop a machine learning model that scores and ranks applications, helping my client to fast track top applicants to the interview stage. My product saves my client on average 12,000 h/year of manual processing time, as well as increasing their response rate from weeks to days.

Data Cleaning and Feature Engineering

I obtained 3,600 job applications that were labeled as either No (the application was rejected during the first round of review) or Yes+ (the application was passed on to the next stage). What made this task challenging was that applications consisted almost entirely of unstructured text; namely, the applicants’ answers to a variety of application questions in which they provided brief descriptions of their education and professional background, domain knowledge, industry specific skills and motivations, etc.

To extract meaningful information from the unstructured (text-based) applications, I engineered features that were based on expert domain knowledge (i.e., from in-depth discussions with people that have first-hand experience in the hiring process). I considered various aspects of successful applications, such as the presence of specific keywords related to an applicant’s relevant skills (e.g., machine learning, statistics), knowledge of relevant tools (e.g., programming languages), education level and study area, as well as the length of responses to the application questions.

Data processing pipeline. Unstructured text from raw applications were converted to meaningful features using natural language processing (NLP) and targeted keyword extraction.

Logistic Regression Models to Score Applications

I employed logistic regression models to predict the suitability of job applications for five different tech domains of my client company: Data Science, Health Data Science, Data Engineering, Artificial Intelligence, and DevOps. Logistic regressions are ideal for this task, because they are simple and easy to interpret, and – as parametric models – suitable for making predictions on new data. In addition, logistic regressions model the probability of classes (here No and Yes+), rather than strictly classifying data into binary classes. These classification probabilities can be directly interpreted as application scores and used to rank applicants.

For each sector, I started with a full model (all potentially relevant features), and reduced the feature space using univariate feature selection.

Modeling and feature selection for job applications to the Data Science sector of my client company. After splitting the data into train (70%) and test (30%) sets, I reduced the model from 66 potentially relevant features to 18 highly predictive features using univariate feature selection. The remaining features could be grouped into four broad categories, related to education, length of response to application question, domain relevant tools and skills.

Validation

To assess model performance, I calculated the area under the Receiver Operating Characteristics (ROC) curve, or AUC. Unlike other commonly used validation metrics such as accuracy or F1 score, AUC is threshold independent (i.e., it is suitable for continuous classification probabilities rather than binary classifications), while also being appropriate for balanced classes as is the case here.

To validate my models, I compared model predictions for the test data sets (i.e., the 30% of the applications that were not used to train the model) to the actual labels in the test data (i.e., No and Yes+). Visualizing model performances this way allowed me quickly assess the usefulness of the models’ application scores in classifying applications into No and Yes+ categories along a continuous range of classification thresholds.

Predicted classification probabilities (i.e., Scores) of applications in the test data set vs. the known categories (No and *Yes+*) of applications for the Data Science domain. The higher the “Applicant Score”, the greater the proportion of *Yes+* labels, indicating good model performance.

To my client, I delivered a module consisting of the application processing pipeline and the domain models, as well as a user friendly interactive platform that allows hiring managers at various departments (i.e., locations) to tailor ApplicantScore to their requirements and preferences. In the example below, a user at the BOS location (drop down menu) specifies a classification threshold of 0.7 (slider). At this threshold, roughly 30% of applications will be fast-tracked, of which almost 95% would have been forwarded if applications were processed manually.

An Interactive Tool to Help my Client Fast Track High Quality Applications

More details to this project and code (public version) can be found here.