Musicdsp.org (bandwidth exceeded)

#41

apologies for even throwing the scraping idea out there, if it has intensified the bandwidth problem. Glad that you’re still ‘about’ and that there is hope of recovering the site in a sensible way.

I’m doing some hobby synth coding again and feel like I progressed to a stage with it where (especially) the optimisation recipes I saw on wayback could come in really handy…

0 Likes

#42

Just to clarify, what I meant to say was that the wikis in github are repositories themselves but they are a seperate from the code.

I wouldn’t recommend putting the main content there, but I was under the impression that there was some content that would make more sense as a wiki…

0 Likes

#43

FWIW, I’ve only attempted to scrape from the archive.org snapshot never musicdsp.org – mostly because I haven’t managed to catch it when it’s been up long enough to try. It wasn’t me, I swear! :slight_smile:

Glad you’re involved @bdejong!

Storing all the content as flat files sounds great to me. Seems like we could stick a wiki interface on top of that for editing if that would help automate the process of maintaining the site – still restricted by at least the requirement of having a github account, if we use github, and maybe further by using a github organization? Spam should be pretty easy to avoid by just leaving the public editing option off:

By default, only people with write access to your repository can make changes to wikis, although you can allow everyone on GitHub to contribute to a wiki in a public repository.

I just cloned a random wiki to look, and the repo is just a directory full of markdown files. We could then use the main repo to host whatever frontend we use to publish the wiki as HTML/CSS.

Otherwise I suppose we’d just do updates via pull requests with someone(s) in charge of merging the submissions into the main repo – or just grant folks direct write access to the repo?

The wikis seem perfectly suited to what’s being described above, and I think would work well as a place to hold all the content for the site IMHO. I like to edit my markdown offline in a text editor, but it couldn’t hurt to have a wiki-style editor attached for free for those who prefer it.

1 Like

#44

Hope I didn’t kill the momentum, just posting here to say I still think this would be a really nice thing to work on.

0 Likes

#45

Hey all,

I finally managed to score the mysql dump and all the files on the site. The database consists of 99% spam. :+1: :man_facepalming: This also explains why the site is going down continuously.

My current idea is the following: I will create some python scripts that can dump the whole DB and everything to markdown files, more specifically using sphinx. Then dump all of this into github.
That way people who understand markdown, github and pull requests will be able to submit more notes (comments) or new recipes (code, …) as pull requests.That should be a damned good measure against spam :wink:

And what the hell, I might still add some ads ;-)))

Anyone have a better idea, let me know!

Bram

4 Likes

#46

Well, that’s not too bad…

mysql> select count(*) from archive;
+----------+
| count(*) |
+----------+
|      266 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from archivecomments;
+----------+
| count(*) |
+----------+
|     1674 |
+----------+
1 row in set (0.00 sec)
3 Likes

#47

Thanks for the efforts on this!! Sincerely appreciated. :slight_smile:

0 Likes

#48

So, I’ve imported the SQL dumps and all the files into a private github repo. Is there anyone here who wants to help cleaning the dumps up? Basically, I need to go through 1600 comments, deleting the ones that are spam and the ones that are… useless. Anyone here volunteering to help? It’s still private because the dumps contain a lot of email addresses… :’(

If you want to help, just shoot me an email at bram.dejong at (ends with mail.com and starts with a g). I can only add 3 private collaborators, so if you volunteer, be serious about it :wink:

cheers,

Bram

5 Likes

#49

A suggestion: you could do a first pass using an API like aksimet and save a lot of time

1 Like

#50

FYI - I’ve manually cleaned up the database and am now working on generating markdown files from the database entries using Sphinx as a primary engine. This easily allows the site to use the “read the docs” formatting which I rather enjoy.

Once this is done I will dump all the markdown files and sphinx config into Github and open it all up. The goal is to then generate the HTML files in some CI pipeline and publish them to the musicdsp website.

Bram

7 Likes

#51

Hey all,

I’m struggling with the DNS settings to redirect musicdsp.org, but…

https://musicdsp.readthedocs.io

Comments, pull requests and edits very welcome :wink:

Bram

6 Likes

#52

Wow, Bram. That is fantastic. Thanks for all the effort you expended to archive this important content.

0 Likes

#53

In case you’d still want/need some wiki functionality you can disable public editing of the wiki in your repo’s settings

0 Likes