gusl: (Default)
[personal profile] gusl
A question for systems people:

If a plain-text log-file gets too big to be openable by emacs (in my case 379MB), how can I search through it? Is it possible to chop such a file into several pieces?

grep works, but it doesn't let me see the context surrounding my hits.

UPDATE: grep -An -Bn pattern file seems to find n lines before and after each hit. Still, not as good as a text editor...

(no subject)

Date: 2007-10-18 04:50 pm (UTC)
From: [identity profile] jcreed.livejournal.com
use "less"? Hit the slash key to search.

(no subject)

Date: 2007-10-18 04:55 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
Thanks! That is helpful.

(no subject)

Date: 2007-10-18 05:05 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
if 'less' can handle it, why won't emacs? Why can't emacs do it like 'less'?

(no subject)

Date: 2007-10-18 06:09 pm (UTC)
From: [identity profile] jcreed.livejournal.com
This is a fair question, and I don't really know the answer.

(no subject)

Date: 2007-10-18 07:33 pm (UTC)
From: [identity profile] cdtwigg.livejournal.com
You would hope that emacs would be smart enough to page in parts of the file at a time but it's entirely possible it isn't. I know for sure that less only holds the current file chunk in memory and uses an indexing scheme to map between line numbers and disk byte offsets (if you open a big enough file it will actually stall while building this index unless you tell it otherwise).

Some programs also have the ext2 file size limit built in so even if you're accessing files on filesystems without this limit they can't handle it. Apache, for example, refused to serve files bigger than 2GB until very recently (like, within the last year or so, if I remember correctly).

(no subject)

Date: 2007-10-18 04:58 pm (UTC)
From: [identity profile] avocado-tom.livejournal.com
wc -l <filename.txt>
head --lines=<half the number of lines returned by wc> >> <file_1.txt>
tail --lines=<remaining number of lines> >> <file_2.txt>

(no subject)

Date: 2007-10-18 05:17 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
Thanks! Maybe I should make a script to split a file into several chunks, given a chunk size.

Is there a command similar to 'head' and 'tail', but where you can specify a range in the middle?

(no subject)

Date: 2007-10-18 05:39 pm (UTC)
From: [identity profile] avocado-tom.livejournal.com
not that I know of. You can actually just use...

head --lines=<X> <file.txt> | tail --lines=<X> >> <file_middle.txt>

But you're starting to get into the realm where writing a quick perl script would probably be easier/more-functional. If my perl weren't super rusty, i'd offer to do it, but...it is and I'm crazy busy. :-)

(no subject)

Date: 2007-10-18 05:40 pm (UTC)
From: [identity profile] avocado-tom.livejournal.com
oh, that "lines" arg for tail should be Y where Y <X

(no subject)

Date: 2007-10-18 07:33 pm (UTC)
From: [identity profile] inferno0069.livejournal.com
Here, sed would be less overpowered than perl and maybe faster: sed -n {start},{end}p < input > output where {start} and {end} are line numbers, and both are included. You could also do things like sed -n '/^Oct 18/p' to extract lines that start with "Oct 18", which could be useful if your logfile has a format like that of my /var/log/messages.

(no subject)

Date: 2007-10-18 05:39 pm (UTC)
From: [identity profile] gwillen.livejournal.com
I recommend vim... it surprises me that Emacs will die on a large file, but I know that vim will not.

(no subject)

Date: 2007-10-18 06:45 pm (UTC)
ikeepaleopard: (Default)
From: [personal profile] ikeepaleopard
In my experience, vim dies on long lines but not on large files. It probably has something to do with the data structures it builds up to make moving around fast.

(no subject)

Date: 2007-10-18 07:05 pm (UTC)
From: [identity profile] gwillen.livejournal.com
Yeah, vim's datastructures are definitely line-oriented.

Now if we've learned anything from the sad tale of Endo the alien, we would realize that _ropes_ are the right datastructure for a text editor.... :-D

(no subject)

Date: 2007-10-18 07:11 pm (UTC)
From: [personal profile] chrisamaphone
:D

(no subject)

Date: 2007-10-18 08:03 pm (UTC)
gregh1983: (Default)
From: [personal profile] gregh1983
Wow... is the English lexicon really starting to look that much like German?

(no subject)

Date: 2007-10-18 08:18 pm (UTC)
ikeepaleopard: (Default)
From: [personal profile] ikeepaleopard
I know Word uses some weird out of order data structure, which (less sure about this)has to be defragmented periodically.

(no subject)

Date: 2007-10-18 05:40 pm (UTC)
From: [identity profile] gwillen.livejournal.com
Also, if you're willing to split the file up by byte ranges instead of lines:

dd if=oldfilename of=filepartN bs=1024 count=K skip=J

Will copy K kilobytes starting at kilobyte J from oldfilename into filepartN.

(no subject)

Date: 2007-10-18 07:10 pm (UTC)
From: [identity profile] dachte.livejournal.com
This may, however, be somewhat suboptimal for whatever poor line (likely) ends up being chopped into parts. It's probably worth the cost in most cases though :)

(no subject)

Date: 2007-10-18 06:52 pm (UTC)

(no subject)

Date: 2007-10-18 07:12 pm (UTC)
From: [identity profile] darius.livejournal.com
When that comes up I use an emacs workalike I wrote myself, http://www.accesscom.com/~darius/hacks/alph.tar.gz

Not as fast as grep or anything like as featureful as emacs, but it doesn't choke on big files or long lines.

(no subject)

Date: 2007-10-18 07:29 pm (UTC)
From: [identity profile] darius.livejournal.com
Oh, also you can say just grep -5 instead of grep -A5 -B5. I'd never even heard of -A and -B before.

(no subject)

Date: 2007-10-18 08:38 pm (UTC)
From: [identity profile] gwillen.livejournal.com
That's interesting... I thought -C5 would be what you are saying is -5. The -ABC options are definitely specific to gnu grep...

(no subject)

Date: 2007-10-18 08:42 pm (UTC)
From: [identity profile] darius.livejournal.com
You're probably right, I didn't look up the docs on -A or -B, or -C for that matter. Just tried them out after Gustavo's note.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags