gusl | the reality of really real data

I'm currently working with some really messy time-series data, about power consumption in office buildings. There's missing data, multiple periodicities (daily, weekly, yearly), freakish outliers (e.g. holidays), and bursty anomalies (summer days generally use less power, except during heat waves, when ACs use *tons* of power). The task is daunting.

There are some things I want to do with the data for which I have no probabilistic interpretation (e.g. filter out certain frequencies).

I've spent the first several days exploring the data, making scatterplots, etc. I've seen some weird patterns, puzzling clusters. Modeling these would entail non-parametric density estimation, but this wouldn't tell me what to do wrt making actual predictions.

I should get some basic predictions working.

But there are so many possible models! Even though I'm only considering past power usage! (I'm not even looking at temperature)

Here are some basic ideas that have been floated:
* model the function using Gaussian Processes (for some kernel(s))
* model [prev n hours, next k hours] as a multivariate Gaussian (maybe this is the same as the above idea)
* autoregressive models, e.g. ridge regression on a subset of past times (including polynomial basis expansion, etc.)
* nearest neighbor (for some geometry(s))
* parameterized functional forms: model variation in daily bumps as a parameterized family of bumps, e.g. height, fatness, tail skewness, etc., using splines
* State-Space Models, a.k.a. continuous-state HMMs, (for some family of functions)
* Gaussian Lilypads, (I need to read up on this)
* ... and of course, ensembles of the above.

Following the principle of starting really simple, I plan to start by modeling daily totals, rather than hourly data.

Flat | Top-Level Comments Only

From:

bhudson.livejournal.com

Oh wow, that sounds like fun!

the-locster.livejournal.com

Welcome to the world of Fractional Brownian Motion.

http://www.proba.ucc.ie/~td3/mastersthesis/

It won't help you with predictionbut it should give you a getter idea of the characteristics of this type of data.

gustavolacerda.livejournal.com

what makes this Brownian motion "fractional"?

There are many chaotic inputs into power consumption, you mention one - weather. Then you have human dynamics, e.g. everyone turning up for work at 9am and switching stuff (PCs) on at about the same time(FBM concentration). Or they might all be delayed by traffic problems. Or say you have a virus in the computer network - this changes the behaviour pattern of workers which in turn changes the power usage pattern. Or a memo from head office changes work patterns.

Many of these inputs you can't predict. Probably the best you can do is factor out some known effects - such as the tendency for power consumption to rise and fall in steady periods that correspond with the seasons. I think this is essentially what goes on at the power and gas supply companies - you can make some prediction at the large scale to predict overall usage over a month, a season or a year and keep coal and gas reserves at optimal levels for minimizing probabilty of power outages vs cost of holding reserves.

If you can get hold of power consumption plots over differeing scales like the ones in - the Dieker paper for computer network usage - you should be able to see some of the main FBM characteristics - self-similarity and long term dependence.

stepleton.livejournal.com

This blog post is the #1 Google hit for "Gaussian lilypads". Remaining hits seem to be Photoshop tutorials or people using lilypads as metaphors for HMMs with discrete states and normal distribution emission models. Is that what you mean by Gaussian lilypads?

PS: Don't forget to add conditional random fields to your toolbox.

are CRFs essentially generalized HMMs in which the state-transition probabilities are not necessarily 1st-order Markovian?

<< metaphors for HMMs with discrete states and normal distribution emission models. >>

I think so. It was Kevin Murphy who mentioned it. Can you give me the link?

<< normal distribution emission models >>

does this mean that each hidden state corresponds to a different Gaussian cloud?

I find it a bit strange that he's suggesting discretizing the state-space, since it is naturally continuous and this only makes it harder to learn. But I guess this isn't so bad since we're in 1D.

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29

Gustavo Lacerda

the reality of really real data

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

Profile

February 2020

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags