Data Generation for Demonstrating Multicollinearity
Data simulated with the causal structure defined in Figure 1. For use in When is Multicollinearity an Issue?
hill
hill (x, K=1, n=1.2)
Hill tranformation
| Type | Default | Details | |
|---|---|---|---|
| x | input array | ||
| K | int | 1 | Half saturation point |
| n | float | 1.2 | Shape parameter |
sample_random_data
sample_random_data (N_weeks:int, include_hidden_confounds:bool=False, random_seed:int|None=None)
| Type | Default | Details | |
|---|---|---|---|
| N_weeks | int | Number of weeks to generate | |
| include_hidden_confounds | bool | False | Should hidden confounds be included in the dataset |
| random_seed | int | None | None | Random Seed |
| Returns | Dataset | Dataset containing the variables described by the above causal model |
dataset = sample_random_data(156, random_seed=2, include_hidden_confounds=True)
dataset.head()<xarray.Dataset> Size: 520B
Dimensions: (Period: 5)
Coordinates:
* Period (Period) datetime64[ns] 40B 2021-01-04 ... 2021-...
Data variables:
price (Period) float64 40B 3.856 3.767 3.767 4.3 4.3
season (Period) float64 40B -0.9613 -0.9732 ... -1.0
olv_sentiment (Period) int64 40B 0 0 0 0 0
social_impressions (Period) float64 40B 4.516e+03 ... 2.289e+03
olv_impressions (Period) float64 40B 1.69e+04 ... 1.227e+04
demand (Period) float64 40B 28.92 29.4 31.64 27.92 26.14
search_query (Period) float64 40B 2.182e+06 ... 2.366e+06
auction (Period) float64 40B 0.07852 0.08937 ... 0.0854
paid_search_impressions (Period) float64 40B 1.713e+05 ... 2.021e+05
paid_search_clicks (Period) float64 40B 669.2 938.3 ... 645.8 636.1
organic_search (Period) float64 40B 2.054e+06 ... 2.055e+06
sales (Period) float64 40B 6.47e+03 ... 6.29e+03
Attributes:
olv_params: {'K': 0.7960875579897572, 'n': 1.0072716356157956}
social_params: {'K': 0.9767776434467591, 'n': 2.7744792328982424}
olv_beta: 0.21671224557085256
social_beta: 0.06881513644667057