gusl: (Default)
[personal profile] gusl
I'm currently working with some really messy time-series data, about power consumption in office buildings. There's missing data, multiple periodicities (daily, weekly, yearly), freakish outliers (e.g. holidays), and bursty anomalies (summer days generally use less power, except during heat waves, when ACs use *tons* of power). The task is daunting.

There are some things I want to do with the data for which I have no probabilistic interpretation (e.g. filter out certain frequencies).

I've spent the first several days exploring the data, making scatterplots, etc. I've seen some weird patterns, puzzling clusters. Modeling these would entail non-parametric density estimation, but this wouldn't tell me what to do wrt making actual predictions.

I should get some basic predictions working.

But there are so many possible models! Even though I'm only considering past power usage! (I'm not even looking at temperature)

Here are some basic ideas that have been floated:
* model the function using Gaussian Processes (for some kernel(s))
* model [prev n hours, next k hours] as a multivariate Gaussian (maybe this is the same as the above idea)
* autoregressive models, e.g. ridge regression on a subset of past times (including polynomial basis expansion, etc.)
* nearest neighbor (for some geometry(s))
* parameterized functional forms: model variation in daily bumps as a parameterized family of bumps, e.g. height, fatness, tail skewness, etc., using splines
* State-Space Models, a.k.a. continuous-state HMMs, (for some family of functions)
* Gaussian Lilypads, (I need to read up on this)
* ... and of course, ensembles of the above.

Following the principle of starting really simple, I plan to start by modeling daily totals, rather than hourly data.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags