Forum › Forums › News › Announcements › antiX oldforums archive (you can browse online, or clone, or download a copy)
Tagged: old forums
- This topic has 19 replies, 10 voices, and was last updated Jun 30-11:39 am by mroot.
-
AuthorPosts
-
November 14, 2017 at 1:02 pm #2544
Anonymous
updated Dec 26, 2017
a searchable copy of the oldforums archive content is now available.
To search, use the navigation link provided in page headerbar (“Forum” … “Quick News” …)
click “Forum” and choose “Old Forum Archive” from the dropdown menuYou can browse the archive contents here: https://antixlinux.com/archive

=================
.
(below, the links / screenshots from the earlier announcement are still valid).
You can go here to browse the archive: antiX oldforums archive (2007–2017)To download a copy for offline browsing, visit /antix-skidoo/antix-skidoo.github.io”>https://github.com/antix-skidoo/antix-skidoo.github.io
and click the “Clone or Download” button.
The zipfile is 27Mb, the expanded content occupies 112MB size on disk.
You can change the styling of the archived html pages by editing the css rules within aaa_oldfourms.css
Within the downloaded zipfile, the index page (homepage) is archive/index.html
-=-
Tip: you can gain full (even wildcard queries) search capability for the content of your downloaded copy
by installing debian package “recoll” (and reading its docs) and creating a searchindex of ~/path/to/extracted_files/archive



notes:
— median size of topic page sizes is only 9Kb (gobs of inline scripts and stylesheets have been removed)
— joinDate, numTotalPosts ~~ these details are present, but (to minimize clutter) are currently suppressed from display via css rules
— the archive scrape (er, mirror) operation did NOT capture attached files, nor (uploaded and embedded) attached images
— each of the pages calls 2 external files: aaa_oldforums.css and aaa_oldforums.js (within your copy of the archive fileset, you can modify these to suit)
- This topic was modified 5 years, 5 months ago by rokytnji.
November 14, 2017 at 1:37 pm #2550Forum Admin
BitJam
November 14, 2017 at 1:43 pm #2551Moderator
caprea
November 14, 2017 at 2:10 pm #2552Forum Admin
anticapitalista
::W.O.W! Brilliant work skidoo.
Philosophers have interpreted the world in many ways; the point is to change it.
antiX with runit - leaner and meaner.
November 14, 2017 at 2:22 pm #2554Forum Admin
dolphin_oracle
November 14, 2017 at 4:46 pm #2562Forum Admin
Dave
::Skidoo is this an html call archive. As if recursively crawling the links in a Web browser or using wget?
I am wondering if we could work a script to parse the html into a comma separated sql file. Then try and import it into the current forum sql database.
On second thought maybe it is better to host this along side of the forum under a separate archive link and make a decent search page for the archive.
Computers are like air conditioners. They work fine until you start opening Windows. ~Author Unknown
November 14, 2017 at 6:35 pm #2565Anonymous
::Dave, I extracted links from the 280 or topic list pages, using firefox addon “Link Gopher”
then fed the pages to “httrack” (a crawler, not a scraper) instructing it to “get separated (sic) pages”.python + scapy (an xpath scraper) library could extract (from the original or) from the pages in the archive set:
author.userid
author.name
author.join_date (00 Feb 2000)
author.total_num_posts
subforum.id
subforum.name
subforum.total_topics
subforum.total_posts
subforum.last_post_date (00 Feb 0000, 00:00)
post.id
post.datetime (00-0000-00T00:00)
post.num_within_topic
post.content
post.topic_id
topic.id
topic.title
topic.startedby_userid
topic.num_posts
topic.last_post_datetime (2008-00-00T00:00)A script could sanitize these then store ’em to your target db engine + schema.
Importing into an existing database would beimpossdifficult, due to collisions (user.id, topic.id, post.id)November 14, 2017 at 7:53 pm #2566Anonymous
::Excellent work skidoo
I used antix before just not a member of the forums.
I used antix (circa 15) have to dig old disk out and look as a
rescue disk on my friends windoze xp and vista pc computers.November 15, 2017 at 12:37 am #2583Memberwatsoccurring
November 15, 2017 at 11:26 am #2610Forum Admin
Dave
::@ skidoo likely will be a collision problem though I think most id’s would be auto incrementing so could be left blank to allow that to happen. User ID would be a big problem where we would probably need to make a archive user and set that user ID as default in the sql.
That being said it would be difficult, so maybe we could work out a search index in the website and use the extraction as a static archive that can be linked to in the forum like the faq/user manual.?Computers are like air conditioners. They work fine until you start opening Windows. ~Author Unknown
December 23, 2017 at 12:35 am #4228ModeratorBobC
December 23, 2017 at 8:57 am #4238Forum Admin
Dave
::Does the archive appear fine under the forum link and does the content display properly?
I tried embedding it into the main antiX site.
Archive
Unwrapped content:
https://antixlinux.com/forum-archivePerhaps later a search function could be added
Added: duckduckgo, but has not been crawled.
Anyone that knows how to submit a url to duckduckgo feel free to add antixlinux.com/forum-archive/- This reply was modified 5 years, 4 months ago by Dave.
- This reply was modified 5 years, 4 months ago by Dave.
- This reply was modified 5 years, 4 months ago by Dave.
Computers are like air conditioners. They work fine until you start opening Windows. ~Author Unknown
December 23, 2017 at 11:25 am #4246Anonymous
::The newly-added navigation link (and links in your post) work. Search via DDG does not find any results.
DDG does not accept crawl requests, but they utilize the searchindex of yandex.com… and I created a yandex webmaster account and submitted the archive index page URL.
When I later found (still) no results returned by DDG, I returned to yandex and read (realized):
they don’t accept “crawl requests” (sigh). What they provide is way to request indexing of specific URLs, and must paste each URL into submission box ~~ max 100 per day. This is not viable; the archive comprises 7,000+ URLs.December 23, 2017 at 11:49 am #4247Forum Admin
Dave
::Bummer. Maybe they will crawl based on one url being submitted? I see that the new forum and website show up if you use them in the site: specification.
Edit:
Better yet. I can make a webpage holding nothing but the ls of the archive. If we submit that one page would they not index all urls within?- This reply was modified 5 years, 4 months ago by Dave.
Computers are like air conditioners. They work fine until you start opening Windows. ~Author Unknown
December 23, 2017 at 12:17 pm #4250Anonymous
::If we submit that one page would they not index all urls within?
As mentioned in my prior post, I’m convinced yandex is a dead-end.
This https://www.webnots.com/how-to-submit-your-site-to-yandex/ reads like an accurate walkthrough of the steps I had performed.
Different from my recollection, the article/screenshot indicates “max 20 per day” (vs 100).
I’m sending you a message containing the yandex webmaster account login details. Maybe you’ll find something, some “trick”, that I’ve missed. -
AuthorPosts
- You must be logged in to reply to this topic.