datahoarder

6608 readers

3 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 4 years ago

MODERATORS

archivist@lemmy.ml

Thoughts on ArchiveBox for archiving webpages? (github.com)

submitted 1 year ago by stricken_liftoff@feddit.ch to c/datahoarder@lemmy.ml

7 comments fedilink hide all child comments

Has anyone used ArchiveBox for self hosted web archiving? If so, what are your thoughts on it compared to Internet Archive or other publicly available services?

top 6 comments

sorted by: hot top controversial new old

[–] hoodlem@hoodlem.me 3 points 1 year ago (1 children)

I used it but unfortunately it did not meet my needs. I’m interested in a full mirror of a website, while ArchiveBox focuses on a single webpage with a max of 1 level deep. I use wget personally, but if your goal is to archive a single webpage then ArchiveBox might be a good choice.

[–] stricken_liftoff@feddit.ch 2 points 1 year ago

Thanks for the info! Single page with no link following is all I need for this project, so I'll give it a go.

[–] Fryboyter@discuss.tchncs.de 3 points 1 year ago* (last edited 1 year ago)

I don't particularly like the graphic interface as shown at https://demo.archivebox.io/public/. In my opinion, too much is displayed at once.

For my part, I use Wallabag to save single Internet pages. I think its graphic interface is better. But it is not perfect either.

[–] ThorrJo@lemmy.sdf.org 2 points 1 year ago (1 children)

I have been experimenting with it, for what it is, it works pretty well ... for now. I have concerns about the fact that it's a ton of moving parts basically duct-taped together by an abuse of the Django admin (that's the web app platform it's based on, which I was a developer for long ago). Also, the search function is primitive at best. I don't think it's going to be my long-term solution for this need, but maybe I'm wrong.

[–] oldfart@lemm.ee 1 points 1 year ago

The archived pages are available as files on disk, I also added a script which generates index.html so I can browse it without starting the program. Basically the only time I run archivebox code is when adding a new site. And I never look at the GUI, it adds nothing to the table

[–] BustedPancake@lemmy.world 1 points 1 year ago

It's a great tool, but depends on what you expect from it and your use case. Personally I tried it but was always disappointed by it. I always just end up using SingleFile(Z) on my browser or in the cli along with the usual yt-dlp and the like and that's all I need really. And if I need to save an entire site I just use wget or httrack. I don't really have the need for a browsable archive of my saved pages, I usually order them by subject when saving.

load more comments