Example: Predicting Daily Curtailment Events¶
Process Overview:
Label curtailment events (i.e. define a decision boundary)
Partition historic data into training and test sets
Spring Season (Feb-May)
Fit a statistical model to the training data
Predict against the test data
Evaluate the performance of the model against known labels in the test data.
[1]:
import pandas as pd
from dask import dataframe as dd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from src.conf import settings
Early Spring¶
In our exploratory data analysis, we found evidence of seasonality being a strong factor in influencing curtailment intensity, being particularly noticeable in early spring. Here we exam only Feb - May curtailment to derive a model accordingly.
[2]:
# Read curtailment data, aggregate to dailies.
df = pd.concat(
[
pd.read_parquet(
settings.DATA_DIR / f"processed/caiso/{y}.parquet"
) for y in range(2017, 2020)
]
)
df.columns = df.columns.str.lower().str.replace(" ", "_")
df.index = df.index.tz_convert("US/Pacific")
df = df.groupby(pd.Grouper(freq="D")).sum()
# Hourly resampling is a bad idea because hour-to-hour effects are co-correlated
# df = df[columns].groupby(pd.Grouper(freq="H")).sum()
df.reset_index(inplace=True)
/home/ttu/.local/share/virtualenvs/CaReCur-b3qbtQ7S/lib/python3.7/site-packages/fastparquet/encoding.py:222: NumbaDeprecationWarning: The 'numba.jitclass' decorator has moved to 'numba.experimental.jitclass' to better reflect the experimental nature of the functionality. Please update your imports to accommodate this change and see http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#change-of-jitclass-location for the time frame.
Numpy8 = numba.jitclass(spec8)(NumpyIO)
/home/ttu/.local/share/virtualenvs/CaReCur-b3qbtQ7S/lib/python3.7/site-packages/fastparquet/encoding.py:224: NumbaDeprecationWarning: The 'numba.jitclass' decorator has moved to 'numba.experimental.jitclass' to better reflect the experimental nature of the functionality. Please update your imports to accommodate this change and see http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#change-of-jitclass-location for the time frame.
Numpy32 = numba.jitclass(spec32)(NumpyIO)
[3]:
df.dtypes
[3]:
timestamp datetime64[ns, US/Pacific]
load float64
solar float64
wind float64
net_load float64
renewables float64
nuclear float64
large_hydro float64
imports float64
generation float64
thermal float64
load_less_(generation+imports) float64
wind_curtailment float64
solar_curtailment float64
dtype: object
[4]:
df.head()
[4]:
timestamp | load | solar | wind | net_load | renewables | nuclear | large_hydro | imports | generation | thermal | load_less_(generation+imports) | wind_curtailment | solar_curtailment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2017-01-01 00:00:00-08:00 | 6.451973e+06 | 502302.914737 | 656544.662054 | 5.293125e+06 | 1.665067e+06 | 652721.169749 | 653481.452160 | 2.200843e+06 | 4.251096e+06 | 1.279826e+06 | 33.747653 | 10107.108141 | 26760.716117 |
1 | 2017-01-02 00:00:00-08:00 | 6.838524e+06 | 203004.927518 | 443945.262752 | 6.191574e+06 | 1.156723e+06 | 653163.556261 | 705478.766030 | 2.372526e+06 | 4.465951e+06 | 1.950586e+06 | 47.689558 | 401.789387 | 43.705500 |
2 | 2017-01-03 00:00:00-08:00 | 7.391597e+06 | 256791.945920 | 279703.466170 | 6.855101e+06 | 1.053297e+06 | 654631.092257 | 793118.552495 | 1.885086e+06 | 5.506580e+06 | 3.005534e+06 | -69.892993 | 0.000000 | 54.841500 |
3 | 2017-01-04 00:00:00-08:00 | 7.304337e+06 | 351737.160810 | 339561.662186 | 6.613038e+06 | 1.213044e+06 | 653906.204819 | 779211.177282 | 1.877239e+06 | 5.427051e+06 | 2.780890e+06 | 46.565469 | 0.000000 | 20.247000 |
4 | 2017-01-05 00:00:00-08:00 | 7.246891e+06 | 343183.228333 | 629678.639548 | 6.274029e+06 | 1.493144e+06 | 653460.137940 | 865964.297512 | 1.239591e+06 | 6.007303e+06 | 2.994735e+06 | -3.179387 | 804.842781 | 191.525216 |
[5]:
# Subset curtailments data to relevant time range
df = df[df["timestamp"].dt.month.isin(range(2,6))]
[6]:
# Analysis Period of Dataset
df["timestamp"].describe()
[6]:
count 360
unique 360
top 2019-05-24 00:00:00-07:00
freq 1
first 2017-02-01 00:00:00-08:00
last 2019-05-31 00:00:00-07:00
Name: timestamp, dtype: object
[7]:
# Label Data - based on our EDA, we might start by "guessing" a threshold of importance of .05
# Later methods will be less biased, and allow for more variance.
# TODO: Try to find natural clusterings through an unsupervised process to label the dataset, and try to predict those labels.
df["curtailment_event"] = pd.Categorical(df["solar_curtailment"]/df["solar"] > .1)
df["is_weekday"] = pd.Categorical(df["timestamp"].dt.weekday.isin([5, 6]))
[8]:
# Merge Weather Data
# Use Day-Ahead forecasts
forecasts = [
*(settings.DATA_DIR / f"interim/gfs/ca/gfs_3_201[7-9][01][2-5]*_0000_{i*3:03}.parquet" for i in range(5, 10))
]
dayahead_weather = dd.read_parquet(forecasts).compute()
[9]:
dayahead_weather["timestamp"] = dayahead_weather["valid_time"].dt.tz_localize("UTC").dt.tz_convert("US/Pacific")
[10]:
dayahead_weather.head()
[10]:
latitude | longitude | t | gust | tp | dswrf | uswrf | SUNSD | al | sp | ... | MTFCC | CSAFP | CBSAFP | METDIVFP | FUNCSTAT | ALAND | AWATER | INTPTLAT | INTPTLON | timestamp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
index | |||||||||||||||||||||
17517 | 42.0 | 237.0 | 269.899994 | 2.9 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 86861.039062 | ... | G4020 | None | None | None | A | 16261974847 | 179108278 | +41.5879861 | -122.5332868 | 2017-02-01 07:00:00-08:00 |
17518 | 42.0 | 238.0 | 266.000000 | 1.9 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 86630.640625 | ... | G4020 | None | None | None | A | 16261974847 | 179108278 | +41.5879861 | -122.5332868 | 2017-02-01 07:00:00-08:00 |
17876 | 41.0 | 236.0 | 278.600006 | 2.9 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 97531.437500 | ... | G4020 | None | 21700 | None | A | 9240992572 | 1254297982 | +40.7066731 | -123.9258181 | 2017-02-01 07:00:00-08:00 |
17877 | 41.0 | 237.0 | 270.100006 | 2.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 81915.437500 | ... | G4020 | None | None | None | A | 8234265201 | 73407949 | +40.6477241 | -123.1144043 | 2017-02-01 07:00:00-08:00 |
18237 | 40.0 | 237.0 | 272.000000 | 2.8 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 83081.843750 | ... | G4020 | None | None | None | A | 8234265201 | 73407949 | +40.6477241 | -123.1144043 | 2017-02-01 07:00:00-08:00 |
5 rows × 37 columns
[11]:
dayahead_weather.iloc[0]
[11]:
latitude 42
longitude 237
t 269.9
gust 2.9
tp 0
dswrf 0
uswrf 0
SUNSD 0
al 0
sp 86861
csnow 0
cicep 0
cfrzr 0
crain 0
sde 0.34
surface 0
time 2017-02-01 00:00:00
valid_time 2017-02-01 15:00:00
index_right 54
STATEFP 06
COUNTYFP 093
COUNTYNS 00277311
GEOID 06093
NAME Siskiyou
NAMELSAD Siskiyou County
LSAD 06
CLASSFP H1
MTFCC G4020
CSAFP None
CBSAFP None
METDIVFP None
FUNCSTAT A
ALAND 16261974847
AWATER 179108278
INTPTLAT +41.5879861
INTPTLON -122.5332868
timestamp 2017-02-01 07:00:00-08:00
Name: 17517, dtype: object
[12]:
dayahead_weather.dtypes
[12]:
latitude float64
longitude float64
t float32
gust float32
tp float32
dswrf float32
uswrf float32
SUNSD float32
al float32
sp float32
csnow float32
cicep float32
cfrzr float32
crain float32
sde float32
surface int64
time datetime64[ns]
valid_time datetime64[ns]
index_right int64
STATEFP object
COUNTYFP object
COUNTYNS object
GEOID object
NAME object
NAMELSAD object
LSAD object
CLASSFP object
MTFCC object
CSAFP object
CBSAFP object
METDIVFP object
FUNCSTAT object
ALAND int64
AWATER int64
INTPTLAT object
INTPTLON object
timestamp datetime64[ns, US/Pacific]
dtype: object
[13]:
# Take an average over all datapoints (no weighting)
# FIXME: 100 We need to be more thoughtful about how to integrate this data.
# It should be weighted somehow, or perhaps certain locations are expressed as their own IV.
# Look into Uitlity CZs
dayahead_hourly = dayahead_weather.groupby(pd.Grouper(key="timestamp", freq="H"))[["t", "dswrf", "uswrf", "gust", "SUNSD"]].mean().interpolate().reset_index()
[14]:
dayahead_daily = dayahead_weather.groupby(pd.Grouper(key="timestamp", freq="D")).agg({"t": "mean", "dswrf": "mean", "uswrf": "mean", "SUNSD": "sum"})
[15]:
data = df.merge(dayahead_daily, on="timestamp", how="inner")
data
[15]:
timestamp | load | solar | wind | net_load | renewables | nuclear | large_hydro | imports | generation | thermal | load_less_(generation+imports) | wind_curtailment | solar_curtailment | curtailment_event | is_weekday | t | dswrf | uswrf | SUNSD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2017-02-01 00:00:00-08:00 | 7.404797e+06 | 6.365488e+05 | 8.839707e+04 | 6.679851e+06 | 1.241231e+06 | 652218.031425 | 1.058698e+06 | 2.043459e+06 | 5.361323e+06 | 2.409176e+06 | 14.109963 | 64.670000 | 8480.663464 | False | False | 282.630219 | 204.382217 | 46.266666 | 2130985.0 |
1 | 2017-02-02 00:00:00-08:00 | 7.367050e+06 | 5.002730e+05 | 2.195530e+05 | 6.647224e+06 | 1.235706e+06 | 651180.678103 | 1.077320e+06 | 1.941564e+06 | 5.425467e+06 | 2.461260e+06 | 19.334575 | 2691.564962 | 288.956667 | False | False | 282.440430 | 155.364441 | 36.537777 | 1721099.0 |
2 | 2017-02-03 00:00:00-08:00 | 7.272859e+06 | 5.318559e+05 | 2.889443e+05 | 6.452059e+06 | 1.332687e+06 | 650538.682373 | 1.103827e+06 | 1.913683e+06 | 5.359143e+06 | 2.272091e+06 | 32.670400 | 488.852142 | 869.203799 | False | False | 283.654236 | 162.995560 | 35.911110 | 1762346.0 |
3 | 2017-02-04 00:00:00-08:00 | 6.611103e+06 | 6.363877e+05 | 4.979092e+05 | 5.476806e+06 | 1.657910e+06 | 649935.253088 | 1.102862e+06 | 1.667016e+06 | 4.944096e+06 | 1.533388e+06 | -9.417728 | 8260.820991 | 69858.576657 | True | True | 283.166656 | 197.639999 | 42.088890 | 1947685.0 |
4 | 2017-02-05 00:00:00-08:00 | 6.661248e+06 | 4.002112e+05 | 2.128107e+05 | 6.048226e+06 | 1.123362e+06 | 649679.457890 | 1.093060e+06 | 2.046784e+06 | 4.614450e+06 | 1.748349e+06 | 12.990172 | 0.000000 | 25.348000 | False | True | 283.199097 | 174.813339 | 35.973331 | 2019408.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
355 | 2019-05-27 00:00:00-07:00 | 5.672302e+06 | 7.720765e+05 | 1.113156e+06 | 3.787069e+06 | 2.453673e+06 | 656141.389061 | 9.970034e+05 | 1.169736e+06 | 4.502609e+06 | 3.957918e+05 | -43.080195 | 18796.425323 | 444354.402287 | True | False | 292.328644 | 457.533325 | 73.480003 | 3091204.0 |
356 | 2019-05-28 00:00:00-07:00 | 6.436942e+06 | 1.194693e+06 | 9.471656e+05 | 4.295084e+06 | 2.701523e+06 | 656950.514763 | 1.010069e+06 | 1.417568e+06 | 5.019443e+06 | 6.509010e+05 | -69.383111 | 924.968950 | 63554.570744 | False | False | 296.436951 | 507.760010 | 74.080002 | 3122509.0 |
357 | 2019-05-29 00:00:00-07:00 | 6.762152e+06 | 1.239861e+06 | 5.052720e+05 | 5.017019e+06 | 2.309672e+06 | 657307.503346 | 1.089729e+06 | 1.752092e+06 | 5.010103e+06 | 9.533939e+05 | -42.688343 | 10658.002569 | 72766.329577 | False | False | 299.156006 | 511.271118 | 73.702225 | 3112164.0 |
358 | 2019-05-30 00:00:00-07:00 | 6.865228e+06 | 1.295425e+06 | 5.069673e+05 | 5.062835e+06 | 2.359094e+06 | 657488.225968 | 1.192374e+06 | 1.771105e+06 | 5.094174e+06 | 8.852177e+05 | -50.745769 | 13743.059075 | 7661.455448 | False | False | 299.053741 | 506.617767 | 73.831108 | 3118246.0 |
359 | 2019-05-31 00:00:00-07:00 | 6.929919e+06 | 1.244073e+06 | 4.013874e+05 | 5.284459e+06 | 2.219221e+06 | 657313.148060 | 1.212114e+06 | 1.889459e+06 | 5.040392e+06 | 9.517434e+05 | 67.953527 | 909.440812 | 48430.745252 | False | False | 298.764374 | 538.822205 | 76.894447 | 2290201.0 |
360 rows × 20 columns
Single Model Run¶
[16]:
test_data = data.sample(int(len(data)*.3//1))
training_data = data[~data.index.isin(test_data.index)]
model = "C(curtailment_event) ~ C(timestamp.dt.month) + C(is_weekday) + t"
result = smf.glm(
model,
training_data,
family=sm.families.Binomial()
).fit()
result.summary()
[16]:
Dep. Variable: | ['C(curtailment_event)[False]', 'C(curtailment_event)[True]'] | No. Observations: | 248 |
---|---|---|---|
Model: | GLM | Df Residuals: | 242 |
Model Family: | Binomial | Df Model: | 5 |
Link Function: | logit | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -72.320 |
Date: | Thu, 30 Apr 2020 | Deviance: | 144.64 |
Time: | 12:32:09 | Pearson chi2: | 191. |
No. Iterations: | 6 | ||
Covariance Type: | nonrobust |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -29.0311 | 17.196 | -1.688 | 0.091 | -62.735 | 4.673 |
C(timestamp.dt.month)[T.3] | 0.0594 | 0.736 | 0.081 | 0.936 | -1.384 | 1.502 |
C(timestamp.dt.month)[T.4] | -1.3746 | 0.860 | -1.599 | 0.110 | -3.059 | 0.310 |
C(timestamp.dt.month)[T.5] | -1.6381 | 1.027 | -1.595 | 0.111 | -3.652 | 0.375 |
C(is_weekday)[T.True] | -2.4666 | 0.472 | -5.223 | 0.000 | -3.392 | -1.541 |
t | 0.1142 | 0.061 | 1.869 | 0.062 | -0.006 | 0.234 |
[17]:
predictions = result.predict(test_data.drop(columns=["curtailment_event"]))
predictions.name = "probability"
predictions = test_data.merge(predictions, left_index=True, right_index=True)
cutoff = .7
true_positives = predictions.query("probability > @cutoff")["curtailment_event"].value_counts().loc[True]
false_negatives = predictions.query("probability > @cutoff")["curtailment_event"].value_counts().loc[False]
true_negatives = predictions.query("probability <= @cutoff")["curtailment_event"].value_counts().loc[False]
false_positives = predictions.query("probability <= @cutoff")["curtailment_event"].value_counts().loc[True]
accuracy = (true_positives+true_negatives)/len(predictions)
precision = true_positives / (true_positives + false_positives)
print(f"Accuracy: {accuracy}; Precision: {precision}")
Accuracy: 0.19444444444444445; Precision: 1.0
[18]:
predictions["curtailment_event"].astype(bool).sum()/len(predictions["curtailment_event"])
[18]:
0.06481481481481481
[19]:
1 - predictions["probability"].mean()
[19]:
0.10371318777291871
Multi-Model Run¶
TODO:
Try a k-fold approach instead. Divide the training set into K folds (without resampling), use k-1 folds as the training data and withold 1 fold. Run the model k times witholding the next fold. Calculate your accuracy metrics based on the witheld data as the “test” data within each folded run.
Other Notes:
Folding is difficult to do when predicting low-frequency events. Each fold may contain a consecutive set of rows that contain no curtailment events, effectively limiting our predictions to only consider or attempt to predict 1 type of event.
[30]:
def simulate(model, data, cutoff=.8):
test_data = data.sample(int(len(data)*.2//1))
training_data = data[~data.index.isin(test_data.index)]
result = smf.glm(
model,
training_data,
family=sm.families.Binomial()
).fit()
predictions = result.predict(test_data.drop(columns=["curtailment_event"]))
predictions.name = "probability"
predictions = 1 - predictions
predictions = test_data.merge(predictions, left_index=True, right_index=True)
positive_predictions = predictions.query("probability > @cutoff")["curtailment_event"].value_counts()
negative_predictions = predictions.query("probability <= @cutoff")["curtailment_event"].value_counts()
true_positives = positive_predictions.loc[True]
false_positives = positive_predictions.loc[False]
true_negatives = negative_predictions.loc[False]
false_negatives = negative_predictions[True]
accuracy = (true_positives+true_negatives)/len(predictions)
precision = true_positives / (true_positives + false_positives)
return {"results": result, "accuracy": accuracy, "precision": precision}
[31]:
# Run this 100 times:
model1 = []
for i in range(100):
model1.append(simulate("C(curtailment_event) ~ C(timestamp.dt.month) + C(is_weekday) + load + t + dswrf", data, cutoff=.7))
results1 = pd.DataFrame(model1)
print("Accuracy : {}".format(results1["accuracy"].mean()), "Precision : {}".format(results1["precision"].mean()))
/home/ttu/.local/share/virtualenvs/CaReCur-b3qbtQ7S/lib/python3.7/site-packages/ipykernel_launcher.py:25: RuntimeWarning: invalid value encountered in long_scalars
Accuracy : 0.8873611111111112 Precision : 0.8489247311827957
[32]:
model1[1]["results"].summary()
[32]:
Dep. Variable: | ['C(curtailment_event)[False]', 'C(curtailment_event)[True]'] | No. Observations: | 283 |
---|---|---|---|
Model: | GLM | Df Residuals: | 275 |
Model Family: | Binomial | Df Model: | 7 |
Link Function: | logit | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -68.257 |
Date: | Thu, 30 Apr 2020 | Deviance: | 136.51 |
Time: | 13:25:21 | Pearson chi2: | 212. |
No. Iterations: | 7 | ||
Covariance Type: | nonrobust |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -57.6466 | 22.202 | -2.596 | 0.009 | -101.162 | -14.132 |
C(timestamp.dt.month)[T.3] | 1.1928 | 0.808 | 1.477 | 0.140 | -0.390 | 2.776 |
C(timestamp.dt.month)[T.4] | 1.0888 | 0.993 | 1.097 | 0.273 | -0.857 | 3.035 |
C(timestamp.dt.month)[T.5] | 0.9277 | 1.139 | 0.814 | 0.415 | -1.305 | 3.160 |
C(is_weekday)[T.True] | 1.2901 | 0.866 | 1.490 | 0.136 | -0.407 | 2.987 |
load | 4.799e-06 | 1.14e-06 | 4.213 | 0.000 | 2.57e-06 | 7.03e-06 |
t | 0.1028 | 0.074 | 1.384 | 0.166 | -0.043 | 0.248 |
dswrf | -0.0069 | 0.003 | -2.295 | 0.022 | -0.013 | -0.001 |
[33]:
# Run this 100 times:
model2 = []
for i in range(100):
model2.append(simulate("C(curtailment_event) ~ C(timestamp.dt.month) + C(is_weekday) + load", data, cutoff=.7))
results2 = pd.DataFrame(model2)
print("Accuracy : {}".format(results2["accuracy"].mean()), "Precision : {}".format(results2["precision"].mean()))
/home/ttu/.local/share/virtualenvs/CaReCur-b3qbtQ7S/lib/python3.7/site-packages/ipykernel_launcher.py:25: RuntimeWarning: invalid value encountered in long_scalars
Accuracy : 0.9038888888888887 Precision : 0.8506289308176099
[36]:
# Run this 100 times:
model3 = []
for i in range(100):
model3.append(simulate("C(curtailment_event) ~ C(timestamp.dt.month) + C(is_weekday) + t * dswrf", data, cutoff=.8))
results3 = pd.DataFrame(model3)
print("Accuracy : {}".format(results3["accuracy"].mean()), "Precision : {}".format(results3["precision"].mean()))
/home/ttu/.local/share/virtualenvs/CaReCur-b3qbtQ7S/lib/python3.7/site-packages/ipykernel_launcher.py:25: RuntimeWarning: invalid value encountered in long_scalars
Accuracy : 0.8783333333333332 Precision : nan
[25]:
# Run this 100 times:
model4 = []
for i in range(100):
model4.append(simulate("C(curtailment_event) ~ C(timestamp.dt.month) + C(is_weekday) + t + dswrf", data, cutoff=.7))
results4 = pd.DataFrame(model4)
print("Accuracy : {}".format(results4["accuracy"].mean()), "Precision : {}".format(results4["precision"].mean()))
/home/ttu/.local/share/virtualenvs/CaReCur-b3qbtQ7S/lib/python3.7/site-packages/ipykernel_launcher.py:25: RuntimeWarning: invalid value encountered in long_scalars
Accuracy : 0.8748611111111111 Precision : 0.15
[26]:
# Run this 100 times:
model5 = []
for i in range(100):
model5.append(simulate("C(curtailment_event) ~ C(timestamp.dt.month) + C(is_weekday) + SUNSD", data, cutoff=.7))
results5 = pd.DataFrame(model5)
print("Accuracy : {}".format(results5["accuracy"].mean()), "Precision : {}".format(results5["precision"].mean()))
/home/ttu/.local/share/virtualenvs/CaReCur-b3qbtQ7S/lib/python3.7/site-packages/ipykernel_launcher.py:25: RuntimeWarning: invalid value encountered in long_scalars
Accuracy : 0.9001388888888888 Precision : nan
[27]:
# Run this 100 times:
model_ = []
for i in range(100):
model_.append(simulate("C(curtailment_event) ~ C(timestamp.dt.month) + C(is_weekday) + t", data, cutoff=.7))
results_ = pd.DataFrame(model_)
print("Accuracy : {}".format(results_["accuracy"].mean()), "Precision : {}".format(results_["precision"].mean()))
/home/ttu/.local/share/virtualenvs/CaReCur-b3qbtQ7S/lib/python3.7/site-packages/ipykernel_launcher.py:25: RuntimeWarning: invalid value encountered in long_scalars
Accuracy : 0.8776388888888887 Precision : 0.0
Visualize Results¶
TODO:
Histogram of probabilities on days where there is or isn’t a “substantial” curtailment event
[28]:
model_[0]
[28]:
{'results': <statsmodels.genmod.generalized_linear_model.GLMResultsWrapper at 0x7efcb07d8910>,
'accuracy': 0.9166666666666666,
'precision': nan}
[29]:
import altair as alt
alt.Chart()
---------------------------------------------------------------------------
SchemaValidationError Traceback (most recent call last)
~/.local/share/virtualenvs/CaReCur-b3qbtQ7S/lib/python3.7/site-packages/altair/vegalite/v4/api.py in to_dict(self, *args, **kwargs)
380 if dct is None:
381 kwargs["validate"] = "deep"
--> 382 dct = super(TopLevelMixin, copy).to_dict(*args, **kwargs)
383
384 # TODO: following entries are added after validation. Should they be validated?
~/.local/share/virtualenvs/CaReCur-b3qbtQ7S/lib/python3.7/site-packages/altair/utils/schemapi.py in to_dict(self, validate, ignore, context)
337 self.validate(result)
338 except jsonschema.ValidationError as err:
--> 339 raise SchemaValidationError(self, err)
340 return result
341
SchemaValidationError: Invalid specification
altair.vegalite.v4.api.Chart, validating 'required'
'data' is a required property
[29]:
alt.Chart(...)
[ ]:
[ ]: