 Feb 2022

s3.uswest2.amazonaws.com s3.uswest2.amazonaws.com

1
Note: 1. Eightyfour percent of autocracies from 1946 to 2010 had a ruling party (Cheibub, Gandhi, and Vreeland 2010), and 57 percent of these parties failed to outlive the founding leader (Meng 2019).

2
Note: 2. I use the terms “authoritarian regime” and “dictatorship” synonymously. I also use the terms “dictator,” “authoritarian leader,” and “president” interchangeably.

 Jan 2022

s3.uswest2.amazonaws.com s3.uswest2.amazonaws.com

I claim that constitutional rules thatdesignate a formal successor play a critical role in promoting peaceful leadershiptransitions in dictatorships
to designate a formal successor

Figure 1. Autocratic leadership transitions, 1946 to 2014.
peaceful vs. unpeaceful power transitions:
From 1946 to 2014, only 44 percent of autocratic leadership transitions were peaceful and resulted in the continuation of the regime after the departure of the incumbent.

regimes that formally designate the vice president asthe successor are more likely to undergo peaceful transitions
leadership succession, authoritarian regime, constitutional rules, Africa


Local file Local file

1.1 Bernoulli distribution
$$ Y \sim f_{B}(y ; \theta)= \begin{cases}\theta^{y}(1\theta)^{1y} & \forall y \in\{0,1\} \\ 0 & \text { otherwise }\end{cases} $$
$$E[Y]=\theta$$
$$var(Y)=\theta(1\theta)$$

1.6 conclusion
The key innovation in the likelihood framework is treating the observed data as fixed and asking what combination of probability model and parameter values are the most likely to have generated these specific data.

maximum likelihood: general
General Steps
 Step 1: Express the joint probability of the data.
 Step 2: Convert the joint probability into a likelihood.
 Step 3: Use the chosen stochastic and systematic components to specify a probability model and functional form.
 Step 4: Simplify the expression by first taking the log and then eliminating terms that do not depend on unknown parameters.
 Step 5: Find the extrema of this expression either analytically or by writing a program that uses numerical tools to identify maxima and minima.

Definition 1.1 (Sum of squared errors (SSE))
$$ \mathrm{SSE}=\sum_{i=1}^{n}\left[y_{i}\left(\beta_{0}+\beta_{1} x_{i}\right)\right]^{2} $$
$$ \begin{aligned} &\hat{\beta}_{0}=\bar{y}\hat{\beta}_{1} \bar{x} \\ &\hat{\beta}_{1}=\frac{\sum_{i=1}^{n}\left(y_{i}\bar{y}\right)\left(x_{i}\bar{x}\right)}{\sum_{i=1}^{n}\left(x_{i}\bar{x}\right)^{2}} \end{aligned} $$

1.4 Gaussian (normal) distribution
$$Y_i$$
is distributed iid normal with mean $$μ_i$$ and variance$$σ^2$$
$$ Y \sim f_{\mathcal{N}}(y ; \boldsymbol{\theta})=\frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp \left[\frac{(y\mu)^{2}}{2 \sigma^{2}}\right] $$

Rather than consider the data as random and the parameters asfixed, the principle of maximum likelihood treats the observed data as fixedand asks: “What parameter values are most likely to have generated thedata?”
maximum likelihood:
The MLEs are those that provide the density or mass function with the highest likelihood of generating the observed data.

1.3 Bias and mean squared error
Let $$T(X)$$ be an estimator for $$\theta$$. The bias of $$T(X)$$, denoted $$\operatorname{bias}(\theta)$$, is $$ \operatorname{bias}(\theta)=\mathrm{E}[T(X)]\theta $$ The mean squared error, $$\operatorname{MSE}(\theta)$$, is given as $$ \begin{aligned} \operatorname{MSE}(\theta) &=\mathrm{E}\left[(T(X)\theta)^{2}\right] \ &=\operatorname{var}(T(X))+\operatorname{bias}(\theta)^{2} \end{aligned} $$

1.2 Binomial distribution
$$ \begin{aligned} X & \sim f_{b}(x ; n, p) \\ \operatorname{Pr}(X=k) &=\left\{\begin{array}{lll} \left(\begin{array}{l} n \\ k \end{array}\right) p^{k}(1p)^{nk} & \forall & k \in\{0, \ldots, n\} \\ 0 & \forall & k \notin\{0, \ldots, n\} \end{array}\right. \end{aligned} $$
where $$\left(\begin{array}{l}n \ k\end{array}\right)=\frac{n !}{k !(nk) !}$$ and with $$\mathrm{E}[X]=n p$$ and $$\operatorname{var}(X)=n p(1p) $$ The Bernoulli distribution is a binomial distribution with $$n=1$$.

The value of θ that the maximizes the likelihood function is called the maximumlikelihood estimate
Definition of MLE.

4.2 Mixture distribution/mixturemodel
$$ f\left(x ; w_{j}, \boldsymbol{\theta}_{j}\right)=\sum_{j=1}^{J} w_{j} g_{j}\left(x ; \boldsymbol{\theta}_{j}\right) $$
$$ \mathcal{L}\left(w_{j}, \boldsymbol{\theta}_{j} \mid \mathbf{x}\right)=\prod_{i=1}^{n}\left[\sum_{j=1}^{J} w_{j} g_{j}\left(x_{i} ; \boldsymbol{\theta}_{j}\right)\right] $$

Definition 4.1 (Profile Likelihood)
$$ \begin{aligned} \mathcal{L}_{p}\left(\boldsymbol{\theta}_{1}\right) & \equiv \max _{\boldsymbol{\theta}_{2}} \mathcal{L}\left(\boldsymbol{\theta}_{1}, \boldsymbol{\theta}_{2}\right) \\ & \equiv \mathcal{L}\left(\boldsymbol{\theta}_{1}, \hat{\boldsymbol{\theta}}_{2}\left(\boldsymbol{\theta}_{1}\right)\right) . \end{aligned} $$

4.1 Uniform distribution
Uniform distribution: $$ f(x)= \begin{cases}\frac{1}{ba} & x \in[a, b] \ 0 & \text { otherwise }\end{cases} $$ $$E[x]={a+b\over2}$$ $$var(X)={(ba)^2\over12}$$


towardsdatascience.com towardsdatascience.com

Central Limit Theorem
the Central Limit Theorem tells us the sampling distribution of X̄ is closely approximated to a normal distribution.

the sample standard deviation S

standard error

Generally, bootstrap involves the following steps:
 A sample from population with sample size n.
 Draw a sample from the original sample data with replacement with size n, and replicate B times, each resampled sample is called a Bootstrap Sample, and there will totally B Bootstrap Samples.
 Evaluate the statistic of θ for each Bootstrap Sample, and there will be totally B estimates of θ.
 Construct a sampling distribution with these B Bootstrap statistics and use it to make further statistical inference, such as:
 Estimating the standard error of statistic for θ.
 Obtaining a Confidence Interval for θ.
