gusl | in defense of R

A lot of people in the field of machine learning like to trash R. But as someone who comes from machine learning and who has programmed all his life, in many languages and paradigms, I have to say that R can be pretty pleasant to work with. It's not very fast (supposedly much slower than Matlab on matrix computations, and a lot slower than C++); its commands are a bit quirky at first and many defaults are annoying (e.g. whitespace is the default separator); and there are plenty of imperfections and missing features (e.g. hashes). And, there is no serious type system.

However, I find that R readily accommodates my desire to reinvent the language, which makes me very happy. Functions are first-class objects. apply and Reduce often spare me from writing looping code. We have eval! Although there is no defmacro, a lot can be accomplished with deparse and substitute (to be honest, I have yet to do any serious macro-ing). In function calls, "all remaining arguments" bind to '...'. The source code is within easy reach, in case you ever wonder how e.g. plot implements its default axes labels. do-while has a substitute in the form of repeat; if T then break (a.k.a. where repeat is the same as while(TRUE)).

---

Anyway, I have produced a substantial library for myself, and almost everything I do nowadays depends on it. Since I think this code could be useful for a lot of people (my debugging function, in particular), I should release a package of general-purpose R goodies someday.

Today I'm addressing the annoyance of having to remember parameter values, and pass them again and again to the different distribution-specific functions (e.g., in the case of the normal distribution, the set pnorm,qnorm,rnorm,dnorm). This code bundles together the 4 distributions functions for any given distribution:


## given the parameter values, make the p,q,r,d functions
distributionFuns <- function(family,...){ ## '...' contains the parameter-values
  fam <- deparse(substitute(family))  
  pfun <- eval(parse(text=jPaste("p",fam)))
  qfun <- eval(parse(text=jPaste("q",fam)))
  rfun <- eval(parse(text=jPaste("r",fam)))
  dfun <- eval(parse(text=jPaste("d",fam)))
  
  pFun <- function(x) pfun(x,...)
  qFun <- function(x) qfun(x,...)
  rFun <- function(x) rfun(x,...)
  dFun <- function(x) dfun(x,...)
  
  list(pfun=pFun,qfun=qFun,rfun=rFun,dfun=dFun)
}

distr <- distributionFuns(norm,mean=10,sd=2)
distr$qfun(0.5) ## median, a.k.a. 0.5 quantile
distr$pfun(9)   ## cdf at 9
distr$rfun(5)   ## sample 5 times
distr$dfun(10)  ## density at 10

distr <- distributionFuns(beta,shape1=2,shape2=5)
plot(distr$dfun); abline(h=0); abline(v=distr$qfun(0.5), lty=2)
distr$pfun(.2)
n <- 300; points(distr$rfun(n),rep(0,n), col="#00000022")
distr$pfun(.5)

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29

Gustavo Lacerda

in defense of R

Profile

February 2020

Most Popular Tags

Style Credit

Expand Cut Tags