Below are all pages related to Statistics.
Post
Statistical Machine Translation: R Package
I wrote an R package for conducting Statistical Machine Translation (SMT) as part of my first-year comps. Find it here. It is based largely on Koehn’s 2009 SMT book and implements the so-called “IBM” models, as well as phrase-based translation. While these methods have been largely supplanted by neural network-based methods, they are still interesting models, and the IBM models can be used to derive word alignments between a sentence and its translation.
Post
R Tutorial: Multi-State Models
I wrote this tutorial on estimating multi-state models in R as part of the class STAT 935 (Survival Analysis) at University of Waterloo. There are other tutorials out there, but this one (in my biased opinion, and to the best of my knowledge) is the only one that goes one by one through each type of mult-state model, the theory, how to structure the data, and how to estimate the models using the coxph function in R.
Post
The Big Data Paradox in COVID Surveys
Meng (2018) summarizes the Big Data Paradox as "The more the data, the surer we fool ourselves."
It may be counterintuitive at first--even a trained Statistician is likely
to get caught in the idea that "more data is better" at some point.
But this idea can be quickly squashed with simple examples. For instance,
if we are interested in measuring the average height of the population,
surveying 1,000 men is going to give us an estimate that is too high.
Post
ksmirnovk: Stata Program for Performing a k-sample Kolmogorov-Smirnov Test
Here is some Stata code I wrote back in 2017. A colleague asked how to perform a Kolmogorov-Smirnov test in Stata when there are more than two groups. I was surprised to find that such a test is not implemented in Stata, nor widely implemented in general. I thought perhaps that this was because such a test did not exist.
On the contrary, a k-sample analogue to the Kolmogorov-Smirnov test was developed back in 1959 by Jack Kiefer, a mathematical statistician at Cornell.