In this post we describe the fundamentals of using ARIMA models for forecasting. We first describe the high level idea ...

Read More

Read More

In this post we describe exponential families. We first motivate them via the problem of density estimation and wanting to ...

Read More

Read More

Introduction In this post we describe attention mechanisms by motivating it using geometric intuition. The ideas are largely based on ...

Read More

Read More

In this post we describe the Nyström method for finding the eigenvalues and eigenfunctions of a kernel function. This has ...

Read More

Read More

In this post we describe the use of momentum to speed up gradient descent. We first describe the intuition for ...

Read More

Read More

Popular papers often have code on Github, but the authors are super busy writing new papers, so you may notice ...

Read More

Read More

In this post we describe how to do gradient descent with constraints. We first describe the problem, including why we ...

Read More

Read More

In this post we describe the high-level idea behind gradient descent for convex optimization. Much of the intuition comes from ...

Read More

Read More

An important class of machine learning models is decision trees: you can use them for both classification and regression. In ...

Read More

Read More

In this post we describe several methods for visualizing time series data. Time series visualization has several uses. First, it ...

Read More

Read More

In this post we describe basic visualization of missing data patterns in R with VIM. We describe how to see ...

Read More

Read More

In this post we describe stationary and non-stationary time series. We first ask why we want stationarity, then describe stationarity ...

Read More

Read More

In this post we describe the basics of time series smoothing in R. We first describe why to do smoothing, ...

Read More

Read More

In this post we describe how to solve the full rank least squares problem without inverting a matrix, as inverting ...

Read More

Read More

In this post we describe the basics of 1-d convolutional neural networks, which can be used in time series forecasting ...

Read More

Read More

In this post we describe multilayer perceptrons. We first describe why we want to use neural networks, and what feedforward ...

Read More

Read More

In this post we describe the basics of missing data. We first ask whether we should consider the data to ...

Read More

Read More

In this post we describe the autoregressive (AR) time series model. We define it, describe steps to take before fitting ...

Read More

Read More

In this post, we describe Granger causality, which helps us answer the question of whether one time series is useful ...

Read More

Read More

In this post we describe the basics of long-short term memory (LSTM). We first describe some alternative classical approaches and ...

Read More

Read More

In this post, we describe how to compare linear regression models between two groups. Without Regression: Testing Marginal Means Between ...

Read More

Read More

In this post we show how to predict future measurement values in a longitudinal setting using linear mixed models (LMMs) ...

Read More

Read More

In this post we describe centering features in linear regression: you should do it because it changes the interpretation of ...

Read More

Read More

In this post we briefly describe some interesting looking companies doing machine learning in the healthcare space. Flatiron Health Flatiron ...

Read More

Read More

In this post we describe the problem of class imbalance in classification datasets, how it affects classifier learning as well ...

Read More

Read More

In this post we describe how to do binary classification in R, with a focus on logistic regression. Some of ...

Read More

Read More

Introduction to Linear Regression Summary Printouts In this post we describe how to interpret the summary of a linear regression ...

Read More

Read More

In this article we discuss how to evaluate classification accuracy, with a focus on binary classification and using software from ...

Read More

Read More

In this post we describe how to do regression with count data using R. In many applications we want to ...

Read More

Read More

In this post we describe the Kaplan Meier non-parametric estimator of the survival function. We first describe what problem it ...

Read More

Read More

In this article we describe how to perform linear regression. We go over some linear regression basics and answer the ...

Read More

Read More

There are several reasons to log transform the response. The obvious one is to fix linearity violations, but in many ...

Read More

Read More

In linear regression, you fit the model \begin{align}y=X\beta+\epsilon\end{align} However, often the relationship between your $x$ and $y$ variables is not ...

Read More

Read More

In this post we analyze the residuals vs leverage plot. This can help detect outliers in a linear regression model ...

Read More

Read More

In this post we describe how to analyze a scale location plot. You may also be interested in the fitted ...

Read More

Read More

In this post we describe how to interpret a QQ plot, including how the comparison between empirical and theoretical quantiles ...

Read More

Read More

In this post we describe the fitted vs residuals plot, which allows us to detect several types of violations in ...

Read More

Read More

Note: this is loosely based on Coursera's A Crash Course on Causality: Inferring Causal Effects from Observational Data Introduction In ...

Read More

Read More

Background Classical statistics was developed to study how to collect and analyze data in the setting of controlled studies. However, ...

Read More

Read More

You should use non-parametric tests when the most naive distributional assumptions of a parametric test fail and you can’t invoke ...

Read More

Read More

We want to use wearable technology in healthcare to drive interventions. For example, an app might send a notification encouraging ...

Read More

Read More

Machine learning and statistics in healthcare have potentially game changing applications, but also pose new challenges for modeling and analysis ...

Read More

Read More

These are solutions to the intuition questions from Stanford's Convolutional Networks for Visual Recognition (Stanford CS 231n) assignment 1 inline ...

Read More

Read More

This article is in part based on http://www2.stat.duke.edu/~sayan/Sta613/2017/lec/LMM.pdf. In this post we describe how linear mixed models can be used ...

Read More

Read More

When you start you should learn a few basic algorithms and understand them well. Here are five good ones. Linear ...

Read More

Read More

Machine learning and statistics use very similar tools: probability distributions, representations of conditional probability, maximum likelihood estimation, Bayesian inference, etc. ...

Read More

Read More

In Survival Analysis, you have three options for modeling the survival function: non-parametric (such as Kaplan-Meier), semi-parametric (Cox regression), and ...

Read More

Read More

One way to learn anything quickly is to constantly apprentice yourself to people better than you at what you're trying ...

Read More

Read More

Math is difficult, but is extremely important for statistics and machine learning. Sometimes when you work with great researchers, they ...

Read More

Read More

Logit and logistic regression are the same thing. However, they actually relate to generalized linear models. In a generalized linear ...

Read More

Read More

You are testing a new drug treatment for HIV, and your new drug costs 10x the old one. You run ...

Read More

Read More

In survival analysis, we want to model the time to a first event, often death. One way to model the ...

Read More

Read More

Here are some books and courses for survival analysis that are useful for learning it. People of various skill levels ...

Read More

Read More

Say you want to model the evolution over time of a disease like HIV in an individual, or the evolution ...

Read More

Read More

I'd often seen two different versions of Cauchy Schwartz (CS). In analysis and linear algebra I'd learned that if $x,y$ ...

Read More

Read More

I recently read the paper "SARA: A Mobile App to Engage Users in Health Data Collection." [1] The problem they ...

Read More

Read More

An important area in applied and methodological statistics as well as machine learning is disease progression modeling. There are arguably ...

Read More

Read More

Python is the language of choice for many data scientists and researchers who analyze data, and it has far superior ...

Read More

Read More

Say you have an experiment and you want to test whether there is a difference in the treatment response between ...

Read More

Read More

In this post we check the assumptions of linear regression using Python. Linear regression models the relationship between a design ...

Read More

Read More

In a previous post, we introduced the basic terminology of hypothesis testing. We also wanted to test the null hypothesis ...

Read More

Read More

Here we'll go over the fundamental concepts of hypothesis testing. Generally we want to test two hypotheses. Let's say we ...

Read More

Read More

In a previous post, I mentioned some books that are useful if you want to eventually be able to read ...

Read More

Read More

Here we'll talk about multicollinearity in linear regression. This occurs when there is correlation among features, and causes the learned ...

Read More

Read More

I saw a question like this Quora, and have been meaning to start a blog so decided to answer it ...

Read More

Read More