You won’t be surprised if I tell you machine learning, data mining and data science are hot topics today. But, you might be wondering what makes you an expert in these fields? You definitely need to know basic data wrangling, munging and manipulation. Also, you should be familiar with tools like Python, R, MATLAB, and a database querying language like SQL. However, to be a good expert you are required to know more.

The field of statistics doesn’t really get elevated to the stature of machine learning or data mining until it is coupled with mathematical optimization and advanced computational technology.

Let me back my statement with a simple example. I start with the most fundamental and basic machine learning model, **linear regression**. In the linear regression, given a set of observations, we want to find the relationship between a dependent variable (*y*) and multiple predictors (*x*).

In the simple model, the relationship is typically expressed like *y*~*a*+*bx*. (Note that it is important to understand why *y*~*a*+*bx* and *y*~*a*+*b* log(*x*) are both linear regression models!) To do so, we can use two approaches, which lead to the same result.

**Econometric approach** or least sum of square errors: Given a set of observations, find parameters “a” and “b” in a way to have minimum total error. It is being modeled as:

Well, this is a convex unconstrained quadratic optimization model.

**Statistical approach** or maximum likelihood: Given a set of observations, find estimates of parameters “*a*” and “*b*” that are statistically most consistent with the sample data. Here, we assume that errors are identically independent normal random variables. Leaving out the statistical details, we need to solve the following problem:

Well again, this is a convex unconstrained optimization model. (Note that the objective function is a concave maximization objective, which is considered a convex optimization model.) Basically, with the help of statistics, we model the problem of concern mathematically, and the mathematical model is an optimization problem, which must be solved numerically or analytically to effectively complete a machine learning process. Other machine learning models are also well modeled as optimization problems. For example, **support vector machine** models are linearly constrained quadratic programs.

Can you name any others?