I used it but unfortunately it did not meet my needs. I’m interested in a full mirror of a website, while ArchiveBox focuses on a single webpage with a max of 1 level deep. I use wget personally, but if your goal is to archive a single webpage then ArchiveBox might be a good choice.
datahoarder
Who are we?
We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.
We are one. We are legion. And we're trying really hard not to forget.
-- 5-4-3-2-1-bang from this thread
Thanks for the info! Single page with no link following is all I need for this project, so I'll give it a go.
I don't particularly like the graphic interface as shown at https://demo.archivebox.io/public/. In my opinion, too much is displayed at once.
For my part, I use Wallabag to save single Internet pages. I think its graphic interface is better. But it is not perfect either.
I have been experimenting with it, for what it is, it works pretty well ... for now. I have concerns about the fact that it's a ton of moving parts basically duct-taped together by an abuse of the Django admin (that's the web app platform it's based on, which I was a developer for long ago). Also, the search function is primitive at best. I don't think it's going to be my long-term solution for this need, but maybe I'm wrong.
The archived pages are available as files on disk, I also added a script which generates index.html so I can browse it without starting the program. Basically the only time I run archivebox code is when adding a new site. And I never look at the GUI, it adds nothing to the table
It's a great tool, but depends on what you expect from it and your use case. Personally I tried it but was always disappointed by it. I always just end up using SingleFile(Z) on my browser or in the cli along with the usual yt-dlp and the like and that's all I need really. And if I need to save an entire site I just use wget or httrack. I don't really have the need for a browsable archive of my saved pages, I usually order them by subject when saving.