Regression Analysis for EdTech Firm

Client Profile

The client was an edtech startup that provides standards-aligned curriculum through an article-based platform, with both free and paid versions. Their data strategy focuses on increasing student and teacher usage through targeted product improvements. For this project, the goal was to identify what drives a key metric called the AB Score, which measures the likelihood of license renewal. Understanding which engagement metrics most strongly influence the AB Score helps the client improve retention and long-term product use.

Solution

We used Elastic Net regression with AB Score as the outcome and 39 client-provided engagement metrics as inputs. Regression was chosen because AB Score is a continuous measure and because the client wanted interpretable, diagnostic insights rather than pure prediction. Elastic Net was especially useful because it narrowed a large set of potential drivers into a smaller, more actionable group while also producing coefficients that show the relative importance of each engagement metric.

What Is It Elastic Net Regression?

Elastic Net Regression is a type of shrinkage regression that estimates coefficients while automatically removing less important variables. Unlike traditional regression, which requires separate steps for feature selection and model fitting, shrinkage regression does both at the same time.

There are two main types of shrinkage regression: LASSO and Ridge. LASSO uses a penalty that can shrink some coefficients all the way to zero, effectively removing those variables from the model. Ridge uses a different penalty that shrinks coefficients but keeps all variables in the model.

These penalties are known as L1 (LASSO) and L2 (Ridge) regularization. Elastic Net combines both L1 and L2 regularization, allowing it to balance between LASSO and Ridge to produce a more effective and stable model.


Points Of Note Regarding Solution

  1. Avoidance of Multicollinearity

    Multicollinearity describes an interpretational problem that arises when multiple features (regressor variables) are highly correlated. When this happens, the algorithm has trouble determining how much variability can be attributed to each feature in a set of highly-correlated features. Elastic Net deals with this threat by removing all but one variable in each set of highly-correlated variables automatically, further supporting our opinion that Elastic Net is the proper algorithm to use for this project.
  2. Avoidance of Overfitting

    Overfitting is a major risk in regression analysis, especially when many features are involved. Including too many variables in a standard OLS regression can cause the model to fit the training data too closely and perform poorly on new data. Elastic Net helps address this issue by removing irrelevant variables, reducing model complexity and the risk of overfitting.

    We also used cross-validation to confirm the model was generalizing well. Overfitting would show up as much stronger performance on the training data than on the test data. Instead, the model produced similar R² values for both (0.352 for training and 0.339 for testing), indicating strong generalization to unseen data.

Conclusion

The model found that several factors were strongly linked to AB Score: the number of premium actions taken (especially purchasing the ELA version), whether a school had renewed in the past, and the school’s internal tier. Focusing on these areas is likely to increase AB Score. The analysis also suggested that the salesperson involved may influence AB Score, though we left it to the client to decide how to act on that insight.

The model’s R² was relatively low, with a testing value of 0.339, meaning the included variables explained about 33.9% of the variation in AB Score. This does not indicate a problem with the model; rather, it shows that many other factors beyond the engagement metrics analyzed also influence renewal likelihood—an insight that was valuable on its own.

Bespoke Solutions For Your Organization

Boxplot Analytics is passionate about working with all clients, regardless of their previous level of experience in data. If your organization is looking for a solution similar to the one described in this article ―or any other data-oriented capability― let us know by contacting us here.