Simple Regression Data – Common Regression Issues

generate_ols_data

 generate_ols_data (sample_size:int, n_exogenous_vars:int,
                    n_confounder:int=0, noise_sigma:float=1.0,
                    random_seed:Optional[int]=None)

Generate Simple OLS data

	Type	Default	Details
sample_size	int
n_exogenous_vars	int		Number of variables with a direct effect on the dep var
n_confounder	int	0	Number of confounder variables to include
noise_sigma	float	1.0	Level of un-explained gaussian noise to add
random_seed	Optional	None	Random seed for reproducability
Returns	Dataset		Generated Data

SAMPLE_SIZE = 156
N_INDEPVAR = 2
N_CONFOUNDER = 2
NOISE_SIGMA = 1
RANDOM_SEED = 42

data = generate_ols_data(
    SAMPLE_SIZE, N_INDEPVAR, 
    n_confounder=N_CONFOUNDER,
    noise_sigma=NOISE_SIGMA, 
    random_seed=RANDOM_SEED)
data.head()

Table 1: OLS on synthetic data without controlling for confounds

OLS Regression Results
Dep. Variable:	depvar	R-squared:	0.912
Model:	OLS	Adj. R-squared:	0.911
Method:	Least Squares	F-statistic:	795.4
Date:	Sat, 09 Nov 2024	Prob (F-statistic):	1.43e-81
Time:	18:17:04	Log-Likelihood:	-296.16
No. Observations:	156	AIC:	598.3
Df Residuals:	153	BIC:	607.5
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	-1.2774	0.131	-9.775	0.000	-1.536	-1.019
var_0	-2.2304	0.134	-16.666	0.000	-2.495	-1.966
var_1	-4.3309	0.116	-37.226	0.000	-4.561	-4.101

Omnibus:	1.649	Durbin-Watson:	2.158
Prob(Omnibus):	0.438	Jarque-Bera (JB):	1.242
Skew:	-0.188	Prob(JB):	0.537
Kurtosis:	3.223	Cond. No.	1.17

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Figure 2: OLS fit on synthetic data without controlling for confounds

Table 2: OLS on synthetic data controlling for confounds

OLS Regression Results
Dep. Variable:	depvar	R-squared:	0.965
Model:	OLS	Adj. R-squared:	0.965
Method:	Least Squares	F-statistic:	1055.
Date:	Sat, 09 Nov 2024	Prob (F-statistic):	3.26e-109
Time:	18:17:04	Log-Likelihood:	-223.44
No. Observations:	156	AIC:	456.9
Df Residuals:	151	BIC:	472.1
Df Model:	4
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	-1.2551	0.083	-15.145	0.000	-1.419	-1.091
var_0	-2.1015	0.086	-24.360	0.000	-2.272	-1.931
var_1	-4.8119	0.093	-51.774	0.000	-4.996	-4.628
con_0	0.6550	0.086	7.611	0.000	0.485	0.825
con_1	1.1080	0.110	10.041	0.000	0.890	1.326

Omnibus:	1.168	Durbin-Watson:	1.685
Prob(Omnibus):	0.558	Jarque-Bera (JB):	1.216
Skew:	-0.132	Prob(JB):	0.544
Kurtosis:	2.657	Cond. No.	2.21

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Figure 3: OLS fit on synthetic data controlling for confounds