Data Generation for Demonstrating Multicollinearity
Data simulated with the causal structure defined in Figure 1. For use in When is Multicollinearity an Issue?
hill
hill (x, K=1, n=1.2)
Hill tranformation
Type | Default | Details | |
---|---|---|---|
x | input array | ||
K | int | 1 | Half saturation point |
n | float | 1.2 | Shape parameter |
sample_random_data
sample_random_data (N_weeks:int, include_hidden_confounds:bool=False, random_seed:int|None=None)
Type | Default | Details | |
---|---|---|---|
N_weeks | int | Number of weeks to generate | |
include_hidden_confounds | bool | False | Should hidden confounds be included in the dataset |
random_seed | int | None | None | Random Seed |
Returns | Dataset | Dataset containing the variables described by the above causal model |
= sample_random_data(156, random_seed=2, include_hidden_confounds=True)
dataset dataset.head()
<xarray.Dataset> Size: 520B Dimensions: (Period: 5) Coordinates: * Period (Period) datetime64[ns] 40B 2021-01-04 ... 2021-... Data variables: price (Period) float64 40B 3.856 3.767 3.767 4.3 4.3 season (Period) float64 40B -0.9613 -0.9732 ... -1.0 olv_sentiment (Period) int64 40B 0 0 0 0 0 social_impressions (Period) float64 40B 4.516e+03 ... 2.289e+03 olv_impressions (Period) float64 40B 1.69e+04 ... 1.227e+04 demand (Period) float64 40B 28.92 29.4 31.64 27.92 26.14 search_query (Period) float64 40B 2.182e+06 ... 2.366e+06 auction (Period) float64 40B 0.07852 0.08937 ... 0.0854 paid_search_impressions (Period) float64 40B 1.713e+05 ... 2.021e+05 paid_search_clicks (Period) float64 40B 669.2 938.3 ... 645.8 636.1 organic_search (Period) float64 40B 2.054e+06 ... 2.055e+06 sales (Period) float64 40B 6.47e+03 ... 6.29e+03 Attributes: olv_params: {'K': 0.7960875579897572, 'n': 1.0072716356157956} social_params: {'K': 0.9767776434467591, 'n': 2.7744792328982424} olv_beta: 0.21671224557085256 social_beta: 0.06881513644667057