bleg: web mirroring tool
Feb. 16th, 2007 12:01 pmMy current project is to annotate web pages. Since these pages could change, go down, etc, I need to make a static mirror.
I have used WebSuck+WebGet, which mirror the HTMLs found. I imagine this worked great 10 years ago, before the era of dynamically-generated web content.
It has a few problems:
* if it visits a page that ends in "/" (i.e. index.html or similar), it won't know to save the file as index.html.
* if it visits a dynamically-generated page, it won't save the content as an HTML file. If I wanted to save PHPs as PHP, I would need some way to set up a server, etc, which is a bad idea. The ideal solution is to rename the saved PHP (it's saved statically) and fix the links.
* it won't fix the links to point to content in the mirror. This shouldn't be too hard to do with a search&replace script.
Any ideas?
I have used WebSuck+WebGet, which mirror the HTMLs found. I imagine this worked great 10 years ago, before the era of dynamically-generated web content.
It has a few problems:
* if it visits a page that ends in "/" (i.e. index.html or similar), it won't know to save the file as index.html.
* if it visits a dynamically-generated page, it won't save the content as an HTML file. If I wanted to save PHPs as PHP, I would need some way to set up a server, etc, which is a bad idea. The ideal solution is to rename the saved PHP (it's saved statically) and fix the links.
* it won't fix the links to point to content in the mirror. This shouldn't be too hard to do with a search&replace script.
Any ideas?
(no subject)
Date: 2007-02-16 05:45 pm (UTC)You might be underestimating the difficulty of properly renaming the PHP for a directory that may have a mix of PHP and HTML files and may be used over many separate runs.
(no subject)
Date: 2007-02-16 05:52 pm (UTC)Just keep in mind that, as Pat mentioned, what you're getting is HTML and not the PHP script (since PHP is interpreted server-side). What you'd need is to get your hands on the .phps version of the script to actually download it.
(no subject)
Date: 2007-02-16 05:53 pm (UTC)