Data Generation for Demonstrating Multicollinearity

Data simulated with the causal structure defined in Figure 1. For use in When is Multicollinearity an Issue?
Paid Media on Sales Seasonality Seasonality OLV Impression OLV Impression Seasonality->OLV Impression Demand Demand Seasonality->Demand Social Impression Social Impression Seasonality->Social Impression Search Query Search Query Seasonality->Search Query OLV Impression->Demand Video Platform Sentiment Video Platform Sentiment Video Platform Sentiment->OLV Impression Demand->Search Query Paid Search Click Paid Search Click Demand->Paid Search Click Sales Sales Demand->Sales Social Impression->Demand Auction Auction Search Query->Auction Paid Search Impression Paid Search Impression Search Query->Paid Search Impression Search Query->Paid Search Click Organic Search Organic Search Search Query->Organic Search Auction->Paid Search Impression Paid Search Impression->Paid Search Click Paid Search Click->Sales Organic Search->Sales Price Price Price->Demand Price->Sales
Figure 1: Causal Model of the data generating process

source

hill

 hill (x, K=1, n=1.2)

Hill tranformation

Type Default Details
x input array
K int 1 Half saturation point
n float 1.2 Shape parameter

source

sample_random_data

 sample_random_data (N_weeks:int, include_hidden_confounds:bool=False,
                     random_seed:int|None=None)
Type Default Details
N_weeks int Number of weeks to generate
include_hidden_confounds bool False Should hidden confounds be included in the dataset
random_seed int | None None Random Seed
Returns Dataset Dataset containing the variables described by the above causal model
dataset = sample_random_data(156, random_seed=2, include_hidden_confounds=True)
dataset.head()
<xarray.Dataset> Size: 520B
Dimensions:                  (Period: 5)
Coordinates:
  * Period                   (Period) datetime64[ns] 40B 2021-01-04 ... 2021-...
Data variables:
    price                    (Period) float64 40B 3.856 3.767 3.767 4.3 4.3
    season                   (Period) float64 40B -0.9613 -0.9732 ... -1.0
    olv_sentiment            (Period) int64 40B 0 0 0 0 0
    social_impressions       (Period) float64 40B 4.516e+03 ... 2.289e+03
    olv_impressions          (Period) float64 40B 1.69e+04 ... 1.227e+04
    demand                   (Period) float64 40B 28.92 29.4 31.64 27.92 26.14
    search_query             (Period) float64 40B 2.182e+06 ... 2.366e+06
    auction                  (Period) float64 40B 0.07852 0.08937 ... 0.0854
    paid_search_impressions  (Period) float64 40B 1.713e+05 ... 2.021e+05
    paid_search_clicks       (Period) float64 40B 669.2 938.3 ... 645.8 636.1
    organic_search           (Period) float64 40B 2.054e+06 ... 2.055e+06
    sales                    (Period) float64 40B 6.47e+03 ... 6.29e+03
Attributes:
    olv_params:     {'K': 0.7960875579897572, 'n': 1.0072716356157956}
    social_params:  {'K': 0.9767776434467591, 'n': 2.7744792328982424}
    olv_beta:       0.21671224557085256
    social_beta:    0.06881513644667057