All Posts – Boostedml

Using ARIMA Models for Forecasting

In this post we describe the fundamentals of using ARIMA models for forecasting. We first describe the high level idea ...
Read More

Fundamentals of Exponential Families

In this post we describe exponential families. We first motivate them via the problem of density estimation and wanting to ...
Read More

Attention Mechanisms: A Geometric View

Introduction In this post we describe attention mechanisms by motivating it using geometric intuition. The ideas are largely based on ...
Read More

The Nyström Method for Finding Eigenpairs of a Kernel Function

In this post we describe the Nyström method for finding the eigenvalues and eigenfunctions of a kernel function. This has ...
Read More

Gradient Descent and Momentum: The Heavy Ball Method

In this post we describe the use of momentum to speed up gradient descent. We first describe the intuition for ...
Read More

Dealing with Bugs/Deprecated Code in Popular Github Repos: Look at the Forks

Popular papers often have code on Github, but the authors are super busy writing new papers, so you may notice ...
Read More

Projected Gradient Descent for Constrained Optimization

In this post we describe how to do gradient descent with constraints. We first describe the problem, including why we ...
Read More

Gradient Descent for Convex Optimization: The Basic Idea

In this post we describe the high-level idea behind gradient descent for convex optimization. Much of the intuition comes from ...
Read More

Decision Trees, Entropy, and Information Gain

An important class of machine learning models is decision trees: you can use them for both classification and regression. In ...
Read More

Visualizing Time Series in R

In this post we describe several methods for visualizing time series data. Time series visualization has several uses. First, it ...
Read More

Visualizing Missing Data in R: The Basics with VIM

In this post we describe basic visualization of missing data patterns in R with VIM. We describe how to see ...
Read More

Stationarity and Non-stationary Time Series with Applications in R

In this post we describe stationary and non-stationary time series. We first ask why we want stationarity, then describe stationarity ...
Read More

An Introduction to Time Series Smoothing in R

In this post we describe the basics of time series smoothing in R. We first describe why to do smoothing, ...
Read More

Solving Full Rank Linear Least Squares Without Matrix Inversion in Python and Numpy

In this post we describe how to solve the full rank least squares problem without inverting a matrix, as inverting ...
Read More

1-d Convolutional Neural Networks for Time Series: Basic Intuition

In this post we describe the basics of 1-d convolutional neural networks, which can be used in time series forecasting ...
Read More

Feedforward Neural Networks and Multilayer Perceptrons

In this post we describe multilayer perceptrons. We first describe why we want to use neural networks, and what feedforward ...
Read More

The Basics of Missing Data: Assumptions and Settings

In this post we describe the basics of missing data. We first ask whether we should consider the data to ...
Read More

The Autoregressive (AR) Model

In this post we describe the autoregressive (AR) time series model. We define it, describe steps to take before fitting ...
Read More

Testing Predictive Value in Time Series: Granger Causality in R

In this post, we describe Granger causality, which helps us answer the question of whether one time series is useful ...
Read More

The Basics of Long-Short Term Memory (LSTM)

In this post we describe the basics of long-short term memory (LSTM). We first describe some alternative classical approaches and ...
Read More

Linear Regression: Comparing Models Between Two Groups with linearHypothesis

In this post, we describe how to compare linear regression models between two groups. Without Regression: Testing Marginal Means Between ...
Read More

Linear Mixed Models: Making Predictions and Evaluating Accuracy

In this post we show how to predict future measurement values in a longitudinal setting using linear mixed models (LMMs) ...
Read More

Why You Should Center Your Features in Linear Regression

In this post we describe centering features in linear regression: you should do it because it changes the interpretation of ...
Read More

Some Healthcare Companies Doing Machine Learning

In this post we briefly describe some interesting looking companies doing machine learning in the healthcare space. Flatiron Health Flatiron ...
Read More

Handling Imbalanced Classification Datasets in Python: Choice of Classifier and Cost Sensitive Learning

In this post we describe the problem of class imbalance in classification datasets, how it affects classifier learning as well ...
Read More

Binary Classification in R: Logistic Regression, Probit Regression and More

In this post we describe how to do binary classification in R, with a focus on logistic regression. Some of ...
Read More

Linear Regression Summary(lm): Interpretting in R

Introduction to Linear Regression Summary Printouts In this post we describe how to interpret the summary of a linear regression ...
Read More

Classification Accuracy in R: Difference Between Accuracy, Precision, Recall, Sensitivity and Specificity

In this article we discuss how to evaluate classification accuracy, with a focus on binary classification and using software from ...
Read More

Regression with Count Data: Poisson Regression, Overdispersion, Negative Binomial Regression, and Zero Inflation in R

In this post we describe how to do regression with count data using R. In many applications we want to ...
Read More

Kaplan Meier: Non-Parametric Survival Analysis in R

In this post we describe the Kaplan Meier non-parametric estimator of the survival function. We first describe what problem it ...
Read More

Introduction to Linear Regression Analysis: How to do Linear Regression Analysis in R in Six Steps

In this article we describe how to perform linear regression. We go over some linear regression basics and answer the ...
Read More

Linear Regression: Log Transforming Response

There are several reasons to log transform the response. The obvious one is to fix linearity violations, but in many ...
Read More

Linear Regression: Log Transformation of Features

In linear regression, you fit the model \begin{align}y=X\beta+\epsilon\end{align} However, often the relationship between your $x$ and $y$ variables is not ...
Read More

Linear Regression Plots: Residuals vs Leverage

In this post we analyze the residuals vs leverage plot. This can help detect outliers in a linear regression model ...
Read More

The Scale Location Plot: Interpretation in R

In this post we describe how to analyze a scale location plot. You may also be interested in the fitted ...
Read More

The QQ Plot in Linear Regression

In this post we describe how to interpret a QQ plot, including how the comparison between empirical and theoretical quantiles ...
Read More

Linear Regression Plots: Fitted vs Residuals

In this post we describe the fitted vs residuals plot, which allows us to detect several types of violations in ...
Read More

Causality: Basics, Potential Outcomes, and Counterfactuals

Note: this is loosely based on Coursera's A Crash Course on Causality: Inferring Causal Effects from Observational Data Introduction In ...
Read More

Observational vs Experimental Data: Linear Regression, Exogeneity, and Endogeneity

Background Classical statistics was developed to study how to collect and analyze data in the setting of controlled studies. However, ...
Read More

When Should You Use Non-parametric Tests?

You should use non-parametric tests when the most naive distributional assumptions of a parametric test fail and you can’t invoke ...
Read More

Wearable Technology in Healthcare: Detection and Recognition

We want to use wearable technology in healthcare to drive interventions. For example, an app might send a notification encouraging ...
Read More

How Machine Learning Will Revolutionize Healthcare: Applications and Challenges

Machine learning and statistics in healthcare have potentially game changing applications, but also pose new challenges for modeling and analysis ...
Read More

Solutions to Stanford’s CS 231n Assignments 1 Inline Problems: KNN

These are solutions to the intuition questions from Stanford's Convolutional Networks for Visual Recognition (Stanford CS 231n) assignment 1 inline ...
Read More

Linear Mixed Models for Longitudinal Data

This article is in part based on http://www2.stat.duke.edu/~sayan/Sta613/2017/lec/LMM.pdf. In this post we describe how linear mixed models can be used ...
Read More

The Best Machine Learning Algorithms and Models: Five to Know

When you start you should learn a few basic algorithms and understand them well. Here are five good ones. Linear ...
Read More

What is the Difference Between Machine Learning and Statistics?

Machine learning and statistics use very similar tools: probability distributions, representations of conditional probability, maximum likelihood estimation, Bayesian inference, etc. ...
Read More

When Should You Use Non-Parametric, Parametric, and Semi-Parametric Survival Analysis

In Survival Analysis, you have three options for modeling the survival function: non-parametric (such as Kaplan-Meier), semi-parametric (Cox regression), and ...
Read More

How to Learn Machine Learning Fast: Three Benefits of Apprenticing Yourself.

One way to learn anything quickly is to constantly apprentice yourself to people better than you at what you're trying ...
Read More

How to Get Better at Math for Machine Learning

Math is difficult, but is extremely important for statistics and machine learning. Sometimes when you work with great researchers, they ...
Read More

What is the Difference Between Logit and Logistic Regression?

Logit and logistic regression are the same thing. However, they actually relate to generalized linear models. In a generalized linear ...
Read More

Using t-tests: Beware of Land Mines

You are testing a new drug treatment for HIV, and your new drug costs 10x the old one. You run ...
Read More

Cox Regression: The Basic Idea

In survival analysis, we want to model the time to a first event, often death. One way to model the ...
Read More

Six Great Learning Resources for Survival Analysis

Here are some books and courses for survival analysis that are useful for learning it. People of various skill levels ...
Read More

Modeling Disease Progression or Mood Evolution/Ecological Momentary Assessment: Discrete Latent State or Continuous-Valued ‘True’ Trajectory?

Say you want to model the evolution over time of a disease like HIV in an individual, or the evolution ...
Read More

The Probabilistic Cauchy Schwartz and the Analysis Cauchy Schwartz.

I'd often seen two different versions of Cauchy Schwartz (CS). In analysis and linear algebra I'd learned that if $x,y$ ...
Read More

SARA: An App for Engaging Users in Mobile Health Data Collection

I recently read the paper "SARA: A Mobile App to Engage Users in Health Data Collection." [1] The problem they ...
Read More

Disease Progression Modeling: Six Main Questions

An important area in applied and methodological statistics as well as machine learning is disease progression modeling. There are arguably ...
Read More

Where R Shines Over Python: Statistical Models

Python is the language of choice for many data scientists and researchers who analyze data, and it has far superior ...
Read More

Estimating Treatment Effect: Feature or Two Sample Test?

Say you have an experiment and you want to test whether there is a difference in the treatment response between ...
Read More

The Five Linear Regression Assumptions: Testing on the Kaggle Housing Price Dataset

In this post we check the assumptions of linear regression using Python. Linear regression models the relationship between a design ...
Read More

Hypothesis testing 2: two sample t-test

In a previous post, we introduced the basic terminology of hypothesis testing. We also wanted to test the null hypothesis ...
Read More

Basic hypothesis testing. Null vs alternative hypothesis. Type 1, Type 2 errors, and p-values.

Here we'll go over the fundamental concepts of hypothesis testing. Generally we want to test two hypotheses. Let's say we ...
Read More

Good fundamentals MOOCs for machine learning and statistics

In a previous post, I mentioned some books that are useful if you want to eventually be able to read ...
Read More

The Problem of Multicollinearity in Linear Regression

Here we'll talk about multicollinearity in linear regression. This occurs when there is correlation among features, and causes the learned ...
Read More

What should someone starting a PhD in machine learning know?

I saw a question like this Quora, and have been meaning to start a blog so decided to answer it ...
Read More