gusl: (Default)
One of the annoyances in R is dealing with functions that don't evaluate one or more arguments that you pass, or who otherwise use the name of the variable passed. The problem appears when you try to write abstractly.

e.g. with(data, ZQ/Total.Z) will compute data$ZQ / data$Total.Z . What 'with' is doing is parsing that expression, figuring out which variable tokens are already present in the current environment, and putting "data$" in the front of the rest. Yesterday, in my naivety, I implemented just that (28 easy-to-read lines of R).

However, it's hard to do something more abstract, e.g. with(data, property) will try to get a property named "property". To circumvent this, one can make a call to eval:

withExpr <- jPaste("with(x,",property,")")
eval(parse(text=withExpr))


I am not happy with this, but there is NO OTHER WAY. I say this confidently because 'with' appears to completely discard the value of the variable passed, while only using its name, i.e. something like:

property <- deparse(substitute(property))

Having to call eval is the price we pay for the convenience of not using quotes.

And, guess what, I take the deal! Yesterday, I wrote 'violinPlot', which is like a 'boxplot' but with kernel density estimates instead of quantiles. The two basic arguments to violinPlot are 'datasets' and 'property': for each dataset, it extracts the property and plots a violin.

l <- list(mon, tue, wed, thu, fri, sat, sun)
violinPlot(l, ZQ/Total.Z, col=c(rep("#AAAAFF",5), rep("orange", 2)), horizontal=FALSE)


My code starts with:
violinPlot <- function(datasets, property,
                       labels=c("M", "T", "W", "R", "F", "Sa", "Su"),
                       horizontal=TRUE, colors=NA){
  property <- deparse(substitute(property))
  colors <- rep(colors, length(datasets)/length(colors)+1)
  densities <- lapply(datasets, function(x) density(with2(x,property)))
  ...
}

with2 <- function(data, expr, ...)
  with(data, eval(parse(text=expr)), ...)


You can see above that I also wanted to pass 'property' without quotes. Having essentially reimplemented 'with', I am in a position to modify it so that the syntax becomes with(data, "ZQ/Total.Z"), and spare myself the eval next time... but I don't wanna.

But here's what I might do: instead of with(data,expr), make it with(data, exprLiteral=NULL, exprToEvaluate=NULL), and you would only pass one of these expr arguments. The difference is that 'exprToEvaluate' gets evaluated into a string (so it better be a string!); whereas 'exprLiteral' gets turned into a string directly, and corresponds to the current syntax of 'with'... and since 'exprLiteral' comes first (in the second position of the argument list), current calls to 'with' would continue working. Yay, backward-compatibility!


More pretty graphics:Read more... )
gusl: (Default)
One thing I really love about R is how I can write improperly-scoped code, and everything still works.

gSmooth <- function(x,y, kernelSd=1, kernel=function(z) dnorm(z,mean=x[i],sd=kernelSd)){
  v <- c()
  for (i in seq_len(length(x))){
    weights <- sapply(x, kernel)
    v[i] <- sum(weights*y)/sum(weights)
  }
  list(x=x,y=v)
}

plot(data$ZQ, type="l", ylim=c(0,130))
ss <- gSmooth(1:n,data$ZQ)
pplot(ss$x, ss$y, type="l", col="red")
ss <- gSmooth(1:n,data$ZQ, kernelSd=3)
pplot(ss$x, ss$y, type="l", col="blue")




----

This is much cleaner:
gSmooth <- function(x,y, kernel=gaussKernel){
  v <- c()
  for (i in seq_len(length(x))){
    center <- x[i]
    weights <- sapply(x, function(z) kernel(z, center))
    v[i] <- sum(weights*y)/sum(weights)
  }
  list(x=x,y=v)
}

gaussKernel <- function(z, center) dnorm(z, mean=center, sd=kernelSd)
emaKernel <- function(z, center) if(z<=center) return(exp((z-center)/kernelSd))
                                   else return(0) ## Exponential Moving Average

plot(data$dayNumber, data$ZQ, type="p")
pplot(data$dayNumber, data$ZQ, type="l")


kernelSd <- 3
ss <- gSmooth(data$dayNumber,data$ZQ)
pplot(ss$x,ss$y, type="l", col="red")

kernelSd <- 3
ss <- gSmooth(data$dayNumber,data$ZQ, kernel=emaKernel)
pplot(ss$x,ss$y, type="l", col="blue")



Note how the Exponential Moving Average (in blue) is backward-looking, and less smooth than the Gaussian one (in red), even though these kernels, when viewed as distributions, have the same standard deviation of 3.

I think that this is in part due to the Exponential kernel not being as smooth as the Gaussian one, but I also suspect that it weights the points less evenly.
gusl: (Default)
A lot of people in the field of machine learning like to trash R. But as someone who comes from machine learning and who has programmed all his life, in many languages and paradigms, I have to say that R can be pretty pleasant to work with. It's not very fast (supposedly much slower than Matlab on matrix computations, and a lot slower than C++); its commands are a bit quirky at first and many defaults are annoying (e.g. whitespace is the default separator); and there are plenty of imperfections and missing features (e.g. hashes). And, there is no serious type system.

However, I find that R readily accommodates my desire to reinvent the language, which makes me very happy. Functions are first-class objects. apply and Reduce often spare me from writing looping code. We have eval! Although there is no defmacro, a lot can be accomplished with deparse and substitute (to be honest, I have yet to do any serious macro-ing). In function calls, "all remaining arguments" bind to '...'. The source code is within easy reach, in case you ever wonder how e.g. plot implements its default axes labels. do-while has a substitute in the form of repeat; if T then break (a.k.a. where repeat is the same as while(TRUE)).

---

Anyway, I have produced a substantial library for myself, and almost everything I do nowadays depends on it. Since I think this code could be useful for a lot of people (my debugging function, in particular), I should release a package of general-purpose R goodies someday.

Today I'm addressing the annoyance of having to remember parameter values, and pass them again and again to the different distribution-specific functions (e.g., in the case of the normal distribution, the set pnorm,qnorm,rnorm,dnorm). This code bundles together the 4 distributions functions for any given distribution:
Read more... )
gusl: (Default)
inspect <- function(stuff){
  q <- as.character(match.call()[2])
  if (length(stuff)==1 && !is.data.frame(stuff)) ##not a proper vector / matrix / complex object
    jCat(q, " = ", stuff)
  else 
    jCat(q, " = "); print(stuff)
}

> inspect(nv)  ## calls jCat to print everything in one line
nv = 14

> inspect(ranking)
ranking = 
 [1] "A1~C4" "C5~C6" "B1~C2" "A1~A3" "A2~A3" "B1~B4" "C3~C6" "C2~C5" "A1~C2"
[10] "B3~C6" "B1~C6"

> inspect(nv+5)
nv + 5 = 19


The obvious drawback is the need to write quotes around the expression, so I'm wondering if we can define macros in R in the style of Common Lisp's defmacro, i.e. something that will stop evaluation. Thanks [livejournal.com profile] serapio!

R

May. 1st, 2010 03:00 pm
gusl: (Default)
One thing that drives me crazy about R is that problems sometimes go away spontaneously, and I don't feel like I learned anything. Part of this is due to working in a REPL, part mystery.
gusl: (Default)
One programming annoyance when writing loops is wanting to access each element of the list without passing the index, i.e. with code like: for (el in list) while simultaneously wanting to know the index without writing i=0 and i++.

I'm not aware of any solutions to this in current usage.
gusl: (Default)
doPlots <- TRUE; doPdflatex <- TRUE ##default settings
args <- commandArgs(TRUE)
if (length(args)>0) {
  eval(parse(text=args[1]))
}

The idea is that, since we using R's reflection mechanism, we can change the variables from the command-line.
Rscript fix-multiple.R "doPlots=FALSE; doPdflatex=FALSE"

(You could also theoretically write code that actually does stuff, but that would be strange)
gusl: (Default)
I have one big piece of R code for my research (distributed in several files that source each other). I'm currently deciding how many pieces to break it into.


Advantages of breaking into a lot of pieces (big shell script):

* if there is an error halfway down the program, at least some data has been recorded and it's straightforward to continue from there (BUT this can be done from R too...)

* it may be easier to reproduce results and debug things, without needing to control the random seed / other potential sources of variability (BUT this can be done from R too...)

* frequent garbage collection (but is this really a concern? probably not!)


Advantages of keeping it all in one R process:

* if I run programs from the shell, the same R libraries have to be loaded again and again.

* if I ever use a real IDE (e.g. Eclipse), it might follow function calls to function definitions.



I'm tempted to just write an "R script" that looks a lot like a shell script... maybe call forgetEverything() every other line, and having each called function remember what they need to remember, for the sake of showing that the program is not cheating. (Again, is this a real concern?)

forgetEverything is (rm(list=ls()).
gusl: (Default)
In R, you loop over your list l as follows

for (i in 1:length(l))

But if length(l) is zero, you still visit the loop twice... I do not know an elegant solution to this.

Another one has to do with operator precedence, 1:k+3 gets parsed as (1:k)+3 rather than 1:(k+3).
gusl: (Default)
You'd like to write code like:

do {
  A
} while(test)


but your language does not offer do-while (a.k.a do-loop). What do you do?

(a)
A
while(test){
  A
}

(b)
while(TRUE){
  A
  if (!test) break
}

Other?
gusl: (Default)
Image you are to write a certain function 'f'. The purpose of calling it is to do X for every element of a list (and optionally Y; this is controlled by a flag)

Which of the following is better?

(a)
function f()
  for 1:1000
    do X
    if(flag)
      do Y


(b)
function f()
  for 1:1000
    do X
  if(flag)
    for 1:1000
      do Y


In all languages I know, (a) is more elegant but (b) is more efficient.

In Lisp, another alternative exists: you can write a macro to do a partial evaluation given the value of 'flag'. This is as efficient as (b), but whether it's elegant is debatable. Personally I've never seen an IDE that makes code-rewriting macros nice to work with.
gusl: (Default)

## `*` is the hyper of `+`, `^` is the hyper of `*`
> hyper <- function(fn) function(a,b) Reduce(fn, rep(a,b))

> compose <- function(fn1,fn2) function(x) fn1(fn2(x))

> hyperoperation <- function(n) Reduce(compose,listRep(hyper,n))(`+`)


('rep(obj,n)' and 'listRep(obj,n)' just return a list containing 'obj' n times. I had to invent 'listRep' for technical reasons, namely passing closures to 'rep' returns an error: "object of type 'closure' is not subsettable")

get it yet? )
gusl: (Default)
I'm pleased about the Tetrad code being in Java 5, which really is a slightly different language. Java 5 has two nice features:

* nicer 'for' loops (++), such as:
for (int sampleSize : new int[]{1000, 10000}){ ... }


* Parametric types, which avoids all that unnecessary casting that Java is notorious for (+):
List nodes = dag.getNodes();
Node node0 = nodes.get(0);


Combined, we can now do:
for (Node node : dag.getNodes()) { ... }

instead of
for (int i=0; i < dag.getnodes().size(); i++) {
    Node node = dag.getnodes().get(i);
    ...
}


(although this translation would probably break if you remove an element from the list inside this loop without decrementing i)

-----------------

Sometimes, after your program has finished execution, you notice that some object (e.g. a graph) had a really interesting property... or, if the program is non-deterministic, and you notice a bug, you may want to run the same example again for debugging purposes (or the entire state of the program, if that were possible).

Unfortunately, the object is now gone, and all you have is its footprints. By the time you think of putting Object.serialize() in the code, it's too late: you have to pray that you'll run into that situation again.

What do you do?
gusl: (Default)
Given classes A and B,

A extends B
means that instances of A can invoke methods implemented in B.

A implements B
means that A implements the interfaces in B. The methods are implemented in A.

In this sense, "extends" and "implements" are opposites.

But I often think of them as being the same, because with the class names normally used, both of them can be read as "ISA". This is confusing.

I think the solution is to consciously distinguish "is a subtype of" from "is an implementation of".

---

Multiple inheritance is a bit tricky. I don't understand how all these things fit in, but from here I learned that:

* You need to create a new class Child, to be the result of the mix between Parent (first parent) and Other (second parent).
* Child extends Parent and implements 2 interfaces, MProvides and MRequires.
* There is a class called Mixin, used for the purpose of communicating with Other, which implements MProvides, and whose constructor requires an instance of MRequires. This instance simulates a parent-class for Mixin.
* The Child's constructor instantiates a Mixin, passing itself "this" as an argument. For each method func(...) that you inherit from Other, you need to write a implement a method mixin.func(...)

I don't yet understand whether this last annoyance is circumvented with "General Multiple Inheritance".
gusl: (Default)
I don't like most programming books, tutorials and reference manuals. They tend to be too low-level, walking you through too many obvious steps, and they not helping you find the RightThing for your sort of problem.

I think I much prefer books that goal-oriented, like:

* How to Solve Problems with X
* Patterns in X

They are better for someone who needs to write code, rather than someone who needs to read it.

With these books, it should be much easier to find the answers my questions, partly because they tend to *have* the answers, partly because they have less boring, distracting junk.

Right now, I want to find a good book about How to Solve Problems with Java.
gusl: (Default)
from here
Athena is a programming language and an interactive theorem proving environment rolled in one.


Can anyone explain this to me?

--

I occasionally write a function multiple times, in parallel. I wish my language could somehow prove that they are equivalent. What would be the point?, you may ask...

Well, you know how they say programs are meant to read, and only occasionally executed?

Likewise, the goal of a piece of code may be to make a reader understand something.

As any educator knows, redundancy makes learning more robust: the more ways you present something, the better, especially if you can make the student understand why the two presentations are equivalent.
gusl: (Default)
Is it possible to write a function that does something only after its caller is finished?

I am writing a function that creates a temporary file, writes some stuff to it, and returns the filename. The expression where this is called gets the returned value, uses the file, but nobody does after that. The file should therefore be deleted, but by whom?

If I delete it inside the function that creates it, then the file can't be used by the expression that called it.

I can just delete the file on the next S-expression, right after the expression that calls the function, but that seems like a violation of modularity, since the creation and deletion are not in the same place (not to mention that this requires me to save the returned value, introducing a let).

It would be good if my function could stay in the background after returning, and somehow know when to delete the file. Of course, I would need to tell it when: however, something like "two levels up the call stack" sounds like a non-robust solution. Maybe you could have a scope for temporary files (i.e. a let): as soon as you fall out of that scope, all temporary files created inside get deleted.

Maybe there is a way of creating temporary files so that they get cleaned by a garbage collector as soon as the program ends.

--

This reminds me of an old debate, about a solution that seemed much easier by using GOTO. It was probably a reaction to "GOTO Considered Harmful".
gusl: (Default)
I really really need some better debugging tools for Lisp/Slime.

Instead of having to back up working code all the time, I should be able to change one or two things, and align the two run-time call-trees to see what's different.

I am often asking "What part of my code will react negatively to this change?", and tracing this through testing is very time-consuming. trace gives me way too much to look at. I wonder if diff'ing the two traces would help.

I'd also like a stepper with variable watch. Right now, I'm doing prints, that get commented in and out.

Also, my code needs to be better designed. Implementing what should be a trivial change is proving to be a lot of work. Much of this is legacy code from the 2-month-old Lisper... I was so stupid back then... I feel much more confident as an 8-month old Lisper. Although the design also needed more thought, period... much of the code was evolved "genetically", where my tests were playing the role of natural selection.

---

Also, I'd like to publicly bitch about emacs not letting me paste text from other windows (even if they're emacs windows).

Profile

gusl: (Default)
gusl

December 2016

S M T W T F S
    123
45678910
11121314151617
18 192021222324
25262728293031

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags