Sure, and the number will likely be larger when converted to static HTML anyway.

Google Cloud Storage is $0.02/GB per month for standard storage and $0.01/GB per month for “infrequent access” storage. Probably the cheapest way.

I sent an email link to this page to the ucsb 'create lab,
we’ll see…
i know it’s kinda a different solution, and maybe for sometime in the future
I picture plugging a hard drive in at the Stanford library, and calling it something
:slightly_smiling_face:

My guy at the Archive says the community team there would take probably just “a couple of hours” to do the work. I’ve asked him how to initiate the process. Note the “robots.txt” info in his comment:

>The result would be the entire site archive will be permanently preserved in the Wayback Machine, which is also in the process of gaining search features. This will mean in 10+ years, people searching for topics discussed in threads there will be able to find them during one-stop-shopping browsing of Wayback holdings. Archive Team and archive.org would capture and preserve the content for free, of course. (A caveat to all this is that the robots.txt file at the site archive must allow robots…)

1 Like

robots is clear:

http://archive.monome.org/robots.txt

1 Like

Cool. Thanks. He’s checking in on the Archive-community process.

So archive.org will pick up the images and zip files as well? That would be so cool.

That’s my understanding. I’ll confirm as I discuss further with them. Waiting on a reply now.

2 Likes

curious if anything came of this. i don’t see any new archive.org refreshes on the old forum since march-- and there is certainly not a complete archive of the threads.

did you get any response? thanks!

3 Likes

No response since I last checked in with him, but he’s got two little kids. I’ll shoot him a note tomorrow. I also may see him Friday.

well, i missed that thread when it happened, but i just wanted to underline that “old” forums, especially those focussed on a specialty topic, like the former monome/community should not ever be deleted.
There is an invaluable amount of information and knowledge in there, contributed by many people. Google search often directs me there as a matter of fact, and it has more than once been to dig up some “ta-dah”-inducing info.

I have in mind two forums that revolved around the broadcast industry, that have gone in a blackhole a few years ago. Those forums were gathering total noobs, amateurs, seasoned professionals, and engineers of world class manufacturers and media networks. They were cluttered, but well indexed and the mountains of knowledge in there helped me become a better professional in my field. Reference books are one thing, but the shared experience, the tiny details, are what really makes a difference.
When a forum goes away, also goes with it the willfullness to contribute of people who invested time in sharing, and the possibility for newcomers to better apprehend a field and start on the shoulders of those who came before.

As for how to archive, given the context i would think that converting the whole thing into static files is the simplest and most future-proof way, as it “only” requires to maintain a simple file server. No idea how much space it would take though. (I just launched an httrack process, so i guess i will know in a few hours/days.)

3 Likes

please share your experiences with this! i absolutely want to preserve the forum-- it’s just in a state that is very expensive per month, given the fact that it “acts” like a static web page right now-- the search is so broken with the forum software as to be useless. i’ll investigate httrack

2 Likes

I posted instructions for archiving to flat files using wget above. httrack is new to me, but if it works, great.

Just chiming in to agree that flat files is the way.

httrack smoking two servers at once, this will be running for a very long time…

4 Likes

thanks, this is good work

oh, sorry i didn’t report back earlier on this.
The httrack session i made took a few days to complete then i didn’t find time to check if it looked alright. I had to restart it once to override the default limit on the maximum number of links it follows.

The resulting folder weights around 13GB. It sits in a dedicated server for now.

1 Like

update.

files moved and hosted on the new monome server. old server shut down and saving $$ now.

need some help tuning the apache mod_rewrite.

httrack turns everything into .html files, so old google searches are now busted

ie

http://archive.monome.org/community/discussion/17189/want-to-buy-a-monome-512/p1

is actually now:

http://archive.monome.org/community/discussion/17189/want-to-buy-a-monome-512/p1.html

this one is easy. however:

http://archive.monome.org/community/discussion/16913/raga-music#Item_17

becoming

http://archive.monome.org/community/discussion/16913/raga-music.html#Item_17

is harder. any suggestions?

should we just strip off anchor links?

eventually we should be able to just use google to search “keyword site:archive.monome.org” and potentially be able to get something useful out of it. i mean, heaps of misinformation, but maybe something useful also.

1 Like

Is this what you are looking for?
http://htaccesscheatsheet.com/htaccess.php?tip=alias-clean-urls

#anchor links are not sent to web servers, they are local web browser only. I suspect for the purposes of mod_rewrite you can ignore them.

3 Likes

Cool. Still no word from Archive. Not sure what’s up there.