gusl: (Default)
[personal profile] gusl
LJ Archive has a Word Count Analyzer tool. Checking "Ignore common words", my results are (4 letters or more):

theory 611
language 563
interesting 544
today 524
maybe 494
learning 486
problem 484
science 407
system 388
logic 386
information 347
using 344
data 342
ideas 331
someone 326


(I filtered out "http 534")

I wonder if LJ Archive would be open to researchers submitting plug-ins for data-collection purposes.

And for fun, the longest words with 15 letters or more:
representations 60
interpretations 20
recommendations 20
interdisciplinary 19
computationally 15
transformations 13
straightforward 13
counterfactuals 11


--

Annoyingly, LJ Archive doesn't do case-sensitive searches.

(no subject)

Date: 2009-03-20 02:59 pm (UTC)
From: [identity profile] bhudson.livejournal.com
More importantly, what are your eigenwords?

(no subject)

Date: 2009-03-20 09:49 pm (UTC)
From: [identity profile] bhudson.livejournal.com
I just made up the term, and I leave the precise definition up to the reader. But tossing out the short words and then saying what words you use most often is very coarse. It seems like you should be able to take a couple corpora, hit it with statistics and machine learning, and get the words that most identify you, as opposed to someone else. A few years ago we did this with the zephyr archives at CMU; the results were amusing.

(no subject)

Date: 2009-03-20 09:58 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
I agree, of course.
But this would be relative to a reference class... and we'd need to collect all that data, etc.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags