All Posts

In this post we describe the fundamentals of using ARIMA models for forecasting. We first describe the high level idea ...
Read More
In this post we describe exponential families. We first motivate them via the problem of density estimation and wanting to ...
Read More
Introduction In this post we describe attention mechanisms by motivating it using geometric intuition. The ideas are largely based on ...
Read More
In this post we describe the Nyström method for finding the eigenvalues and eigenfunctions of a kernel function. This has ...
Read More
In this post we describe the use of momentum to speed up gradient descent. We first describe the intuition for ...
Read More
Popular papers often have code on Github, but the authors are super busy writing new papers, so you may notice ...
Read More
In this post we describe how to do gradient descent with constraints. We first describe the problem, including why we ...
Read More
In this post we describe the high-level idea behind gradient descent for convex optimization. Much of the intuition comes from ...
Read More
An important class of machine learning models is decision trees: you can use them for both classification and regression. In ...
Read More
In this post we describe several methods for visualizing time series data. Time series visualization has several uses. First, it ...
Read More
In this post we describe basic visualization of missing data patterns in R with VIM. We describe how to see ...
Read More
In this post we describe stationary and non-stationary time series. We first ask why we want stationarity, then describe stationarity ...
Read More
In this post we describe the basics of time series smoothing in R. We first describe why to do smoothing, ...
Read More
In this post we describe how to solve the full rank least squares problem without inverting a matrix, as inverting ...
Read More
In this post we describe the basics of 1-d convolutional neural networks, which can be used in time series forecasting ...
Read More
In this post we describe multilayer perceptrons. We first describe why we want to use neural networks, and what feedforward ...
Read More
In this post we describe the basics of missing data. We first ask whether we should consider the data to ...
Read More
In this post we describe the autoregressive (AR) time series model. We define it, describe steps to take before fitting ...
Read More
In this post, we describe Granger causality, which helps us answer the question of whether one time series is useful ...
Read More
In this post we describe the basics of long-short term memory (LSTM). We first describe some alternative classical approaches and ...
Read More
In this post, we describe how to compare linear regression models between two groups. Without Regression: Testing Marginal Means Between ...
Read More
In this post we show how to predict future measurement values in a longitudinal setting using linear mixed models (LMMs) ...
Read More
In this post we describe centering features in linear regression: you should do it because it changes the interpretation of ...
Read More
In this post we briefly describe some interesting looking companies doing machine learning in the healthcare space. Flatiron Health Flatiron ...
Read More
In this post we describe the problem of class imbalance in classification datasets, how it affects classifier learning as well ...
Read More
In this post we describe how to do binary classification in R, with a focus on logistic regression. Some of ...
Read More
Introduction to Linear Regression Summary Printouts In this post we describe how to interpret the summary of a linear regression ...
Read More
In this article we discuss how to evaluate classification accuracy, with a focus on binary classification and using software from ...
Read More
In this post we describe how to do regression with count data using R. In many applications we want to ...
Read More
In this post we describe the Kaplan Meier non-parametric estimator of the survival function. We first describe what problem it ...
Read More
In this article we describe how to perform linear regression. We go over some linear regression basics and answer the ...
Read More
There are several reasons to log transform the response. The obvious one is to fix linearity violations, but in many ...
Read More
In linear regression, you fit the model \begin{align}y=X\beta+\epsilon\end{align} However, often the relationship between your $x$ and $y$ variables is not ...
Read More
In this post we analyze the residuals vs leverage plot. This can help detect outliers in a linear regression model ...
Read More
In this post we describe how to analyze a scale location plot. You may also be interested in the fitted ...
Read More
In this post we describe how to interpret a QQ plot, including how the comparison between empirical and theoretical quantiles ...
Read More
In this post we describe the fitted vs residuals plot, which allows us to detect several types of violations in ...
Read More
Note: this is loosely based on Coursera's A Crash Course on Causality: Inferring Causal Effects from Observational Data Introduction In ...
Read More
Background Classical statistics was developed to study how to collect and analyze data in the setting of controlled studies. However, ...
Read More
You should use non-parametric tests when the most naive distributional assumptions of a parametric test fail and you can’t invoke ...
Read More
We want to use wearable technology in healthcare to drive interventions. For example, an app might send a notification encouraging ...
Read More
Machine learning and statistics in healthcare have potentially game changing applications, but also pose new challenges for modeling and analysis ...
Read More
These are solutions to the intuition questions from Stanford's Convolutional Networks for Visual Recognition (Stanford CS 231n) assignment 1 inline ...
Read More
This article is in part based on http://www2.stat.duke.edu/~sayan/Sta613/2017/lec/LMM.pdf. In this post we describe how linear mixed models can be used ...
Read More
When you start you should learn a few basic algorithms and understand them well.  Here are five good ones. Linear ...
Read More
Machine learning and statistics use very similar tools: probability distributions, representations of conditional probability, maximum likelihood estimation, Bayesian inference, etc.  ...
Read More
In Survival Analysis, you have three options for modeling the survival function: non-parametric (such as Kaplan-Meier), semi-parametric (Cox regression), and ...
Read More
One way to learn anything quickly is to constantly apprentice yourself to people better than you at what you're trying ...
Read More
Math is difficult, but is extremely important for statistics and machine learning.  Sometimes when you work with great researchers, they ...
Read More
Logit and logistic regression are the same thing.  However, they actually relate to generalized linear models.  In a generalized linear ...
Read More
You are testing a new drug treatment for HIV, and your new drug costs 10x the old one.  You run ...
Read More
In survival analysis, we want to model the time to a first event, often death. One way to model the ...
Read More
Here are some books and courses for survival analysis that are useful for learning it.  People of various skill levels ...
Read More
Say you want to model the evolution over time of a disease like HIV in an individual, or the evolution ...
Read More
I'd often seen two different versions of Cauchy Schwartz (CS).  In analysis and linear algebra I'd learned that if $x,y$ ...
Read More
I recently read the paper "SARA: A Mobile App to Engage Users in Health Data Collection." [1]  The problem they ...
Read More
An important area in applied and methodological statistics as well as machine learning is disease progression modeling.  There are arguably ...
Read More
Python is the language of choice for many data scientists and researchers who analyze data, and it has far superior ...
Read More
Say you have an experiment and you want to test whether there is a difference in the treatment response between ...
Read More
In this post we check the assumptions of linear regression using Python. Linear regression models the relationship between a design ...
Read More
In a previous post, we introduced the basic terminology of hypothesis testing. We also wanted to test the null hypothesis ...
Read More
Here we'll go over the fundamental concepts of hypothesis testing. Generally we want to test two hypotheses. Let's say we ...
Read More
In a previous post, I mentioned some books that are useful if you want to eventually be able to read ...
Read More
Here we'll talk about multicollinearity in linear regression. This occurs when there is correlation among features, and causes the learned ...
Read More
I saw a question like this Quora, and have been meaning to start a blog so decided to answer it ...
Read More