Does lemmy have any communities dedicated to archiving/hoarding data?
Official numbers here https://www.debian.org/mirror/size
About 4.4TB, but that’s all architectures and (I believe?) all distributions (stable, testing…).
If you only want source+all+amd64+arm64, and only want stable, it will be smaller of course.
Not nothing, but at $10/TB or so, it’s not much.
And if you’re following 3-2-1, I’m pretty sure the “1” is already handled for you :)
Kinda curious where you’re getting $10/TB from
Wait, isn’t there an offline copy of a part of Wikipedia? The article Just by yourself a nice printer with enough ink and do it yourself ;)
It could cost a bit if you wanted to keep it up to date.
This post foreshadowed today’s AWS outage.
👀
So I actually have a dockerized Debian/Ubuntu mirror I think is like 2 versions ob Debian and the latest Ubuntu and still less then 1tb in total size. The English wikipedia is 50gb so overall not that much and very doable. However pretty unnecessary
At this point I just keep it because I’m to lazy to change the apt.soruces files for the VM/physical PCs in my network again.
I would also add Openstreetmap to the list
Dont forget 3-2-1 when you do!
What is that?
3 copies of data, 2 of which are on different storage media (HDD, tape drive, etc.), 1 at an offsite location.
Don’t forget the copies sealed in faraday containers.
I’ve lost family photos to a bad seagate drive in the 2000s and digitized VHSs to a fire in 2011.
If it matters throw it on a SSD in a Safe Deposit box. They are surprisingly affordable.
💯
I kind of want that hackermans diy pc that runs on 18650 cells
This is just minor datahoarding. I do it, on an extreme level.
Get out of my mind.
FWIW :
fabien@debian2080ti:/media/fabien/slowdisk$ ls -lhS offline_prep/ total 341G -rw-r--r-- 1 fabien fabien 103G Jul 6 2024 wikipedia_en_all_maxi_2024-01.zim -rw-r--r-- 1 fabien fabien 81G Apr 22 2023 gutenberg_mul_all_2023-04.zim -rw-r--r-- 1 fabien fabien 75G Jul 7 2024 stackoverflow.com_en_all_2023-11.zim -rw-r--r-- 1 fabien fabien 74G Mar 10 2024 planet-240304.osm.pbf -rw-r--r-- 1 fabien fabien 3.8G Oct 18 06:55 debian-13.1.0-amd64-DVD-1.iso -rw-r--r-- 1 fabien fabien 2.6G May 7 2023 ifixit_en_all_2023-04.zim -rw-r--r-- 1 fabien fabien 1.6G May 7 2023 developer.mozilla.org_en_all_2023-02.zim -rw-r--r-- 1 fabien fabien 931M May 7 2023 diy.stackexchange.com_en_all_2023-03.zim -rw-r--r-- 1 fabien fabien 808M Jun 5 2023 wikivoyage_en_all_maxi_2023-05.zim -rw-r--r-- 1 fabien fabien 296M Apr 30 2023 raspberrypi.stackexchange.com_en_all_2022-11.zim -rw-r--r-- 1 fabien fabien 131M May 7 2023 rapsberry_pi_docs_2023-01.zim -rw-r--r-- 1 fabien fabien 100M May 7 2023 100r-off-the-grid_en_2022-06.zim -rw-r--r-- 1 fabien fabien 61M May 7 2023 quantumcomputing.stackexchange.com_en_all_2022-11.zim -rw-r--r-- 1 fabien fabien 45M May 7 2023 computergraphics.stackexchange.com_en_all_2022-11.zim -rw-r--r-- 1 fabien fabien 37M May 7 2023 wordnet_en_all_2023-04.zim -rw-r--r-- 1 fabien fabien 23M Jul 17 2023 kiwix-tools_linux-armv6-3.5.0-1.tar.gz -rw-r--r-- 1 fabien fabien 16M Oct 6 21:32 be-stib-gtfs.zip -rw-r--r-- 1 fabien fabien 3.8M Oct 6 21:32 be-sncb-gtfs.zip -rw-r--r-- 1 fabien fabien 2.3M May 7 2023 termux_en_all_maxi_2022-12.zim -rw-r--r-- 1 fabien fabien 1.9M May 7 2023 kiwix-firefox_3.8.0.xpibut if you want the easier version just get Kiwix on whatever device in front of you right now (yes, even mobile phone assuming you have the space) then get whatever content you need.
If need a bit of help I recorded TechSovereignty at home, episode 11 - Offline Wikipedia, Kiwix and checksums with a friend just 3 weeks ago.
I also wrote randomly update https://fabien.benetou.fr/Content/Vademecum and coded https://git.benetou.fr/utopiah/offline-octopus but tbh KDE-Connect is much better now.
The point though is having such a repository takes minutes. If you don’t have the space, buy a 512Go microSD for 50EUR then put that on, stuff it in a drawer then move on. If you want to every 3 months or whenever you feel like it, updated it.
TL;DR: takes longer to write such a meme than actually do it.
Whoa, what are all those things you have?
Commenting inline :
-rw-r--r-- 1 fabien fabien 103G Jul 6 2024 wikipedia_en_all_maxi_2024-01.zim # encyclopedia Wikipedia English with images and more -rw-r--r-- 1 fabien fabien 81G Apr 22 2023 gutenberg_mul_all_2023-04.zim # Project Gutenberg, book collection in multiple languages -rw-r--r-- 1 fabien fabien 75G Jul 7 2024 stackoverflow.com_en_all_2023-11.zim # StackOverflow, programming questions and answers -rw-r--r-- 1 fabien fabien 74G Mar 10 2024 planet-240304.osm.pbf # OpenStreetMap low resolution for the whole World -rw-r--r-- 1 fabien fabien 3.8G Oct 18 06:55 debian-13.1.0-amd64-DVD-1.iso # Debian base ISO -rw-r--r-- 1 fabien fabien 2.6G May 7 2023 ifixit_en_all_2023-04.zim # iFixit colection of guides to fix appliances -rw-r--r-- 1 fabien fabien 1.6G May 7 2023 developer.mozilla.org_en_all_2023-02.zim # Web development documentation -rw-r--r-- 1 fabien fabien 931M May 7 2023 diy.stackexchange.com_en_all_2023-03.zim # Do It Yourself Q&A -rw-r--r-- 1 fabien fabien 808M Jun 5 2023 wikivoyage_en_all_maxi_2023-05.zim # WikiVoyage, the version of Wikipedia for traveling -rw-r--r-- 1 fabien fabien 296M Apr 30 2023 raspberrypi.stackexchange.com_en_all_2022-11.zim # Raspberry Pi Q&A -rw-r--r-- 1 fabien fabien 131M May 7 2023 rapsberry_pi_docs_2023-01.zim # Rasspberry Pi documentation -rw-r--r-- 1 fabien fabien 100M May 7 2023 100r-off-the-grid_en_2022-06.zim # Off the grid documents -rw-r--r-- 1 fabien fabien 61M May 7 2023 quantumcomputing.stackexchange.com_en_all_2022-11.zim # Quantum computer Q&A -rw-r--r-- 1 fabien fabien 45M May 7 2023 computergraphics.stackexchange.com_en_all_2022-11.zim # Computer graphics Q&A -rw-r--r-- 1 fabien fabien 37M May 7 2023 wordnet_en_all_2023-04.zim # Graph of words in English -rw-r--r-- 1 fabien fabien 23M Jul 17 2023 kiwix-tools_linux-armv6-3.5.0-1.tar.gz # Kiwix to read .zim files -rw-r--r-- 1 fabien fabien 16M Oct 6 21:32 be-stib-gtfs.zip # public transport database in Brussels, Belgium -rw-r--r-- 1 fabien fabien 3.8M Oct 6 21:32 be-sncb-gtfs.zip # train transport database in Belgium -rw-r--r-- 1 fabien fabien 2.3M May 7 2023 termux_en_all_maxi_2022-12.zim # Termux, Linux tooling on Android, documentation in English -rw-r--r-- 1 fabien fabien 1.9M May 7 2023 kiwix-firefox_3.8.0.xpi # Kiwix Web Extension for the Firefox browserBy the way, there’s now a Wikipedia 2025 snapshot.
I am currently trying to fit that on my phone somehow. I wish I could just omit the index database at the end that can’t be split it seems. I have to keep it, but when it’s split up, it doesn’t work anyway (search is broken that way) (https://github.com/openzim/zim-tools/issues/295).
My phone can only do FAT32 for SD cards…For 2024 Wikipedia, that seems to be around 18GiB of wasted space.
Thanks, updating (~20min) accordingly.
FWIW I have a CMF Nothing 1 and I can put a 500Go microSD in it.
I’ve got Ulefone Armor 24. It can take a 1TB Micro SD, but only FAT32. Why a Linux-based OS can only do FAT32, despite supporting other FSs on internal storage goes beyond me.
Weird, assuming you have Android 13 it should be usable at least as exFAT and thus can be large enough
Watch out for flash data corruption. Lots of cheap flash (USB sticks, SD cards, SSDs) lose data after just a few years of offline storage. Something something quantum tunnel bullshit, iirc.
So either look for media that guarantee long cold storage retention (lots of businesses need to keep shit for 10 years for tax reasons), or occasionally plug it in and let do the housekeeping.
It’s more that flash NAND uses a small electric charge to keep the NAND gates in the correct configuration. Over time, that charge dissipates. If you power the storage device every once in a while, you minimize these chances.
Here’s a video explaining why it happens to Wii U’s after being powered off for a while. https://youtu.be/JHME4zLs6Qs
User older flash tech can be useful here. You might not always need the highest density storage if you want to maintain files for a long time. Getting stuff built in a much larger process node makes for a much more stable form of storage.
Or look for industrial / business grade stuff with long retention times. Old flash also means less sophisticated controllers etc
Thanks but even though it’s on a plugged HDD I don’t even care for any of that data. What I mean is that none of that data is sensitive. It might be useful, potentially, but it’s not unique. What I mean is that if somehow my
.zimfile for Wikipedia was corrupted I could download it again from https://library.kiwix.org/#lang=eng&category=wikipedia or elsewhere in ~30min (just checked).What I’m trying to highlight here is more the process than the actual outcome.
TL;DR: yes, if one is actually serious about just getting and storing, they should verify periodically if the data is indeed fine. What I do want to highlight though is to first know how to do it at all. Anyway, you are right that for a proper solution on the long run one must understand how (cold) storage actually works. My heuristic is that it’s like can food (which I don’t use much), it might last a while, but not forever.
I thought the point of backing stuff up was to have things in case just downloading it again isn’t a viable option?
old pcs off amazon usually come with good reliable 1/2tb harddrive.
I would love to have a small Wikipedia browser that can survive the apocalypse.
E-ink display, mini keyboard and touchpad, multiple ways/ports to transfer info, All wrapped up in a heavy duty equipment case that’s able to survive a building collapses and burns in an earthquake, that’s shielded from EMP.
Sounds like the beginning of a proper Hitchhikers Guide to the Galaxy.
Actually having something telling me Don’t Panic is big friendly letters would help my mental health…
You mean like the wiki reader:

I used it as an ebook reader until the screen gave out.
I would love to have a small Wikipedia browser that can survive the apocalypse.
I’ve got the full 120 GB Wikipedia dump running in Kiwix on a Raspberry Pi Zero. Works great (surprisingly)
E-ink display, mini keyboard
Have been using a Minimal Phone for a few months now which has both of those. Can connect to the Pi easily.
multiple ways/ports to transfer info,
Add a USB-C hub (or add a hub to the Pi) and you’re set
All wrapped up in a heavy duty equipment case that’s able to survive a building collapses and burns in an earthquake, that’s shielded from EMP.
And that’s where I’m limited - My 3D printer can only do so much lol. 😆
I’ve been working on a side project this week with a Orange Pi Zero 2W (Pi Zero “clone” but with better specs). It’s got the Kiwix+Wikipedia like my older Pi (described above) plus a bunch of other neat stuff. It’s kind of a combination travel router, portable web app server, party box, and extremely over-engineered bluetooth speaker all-in-one. Hoping to put together a show-and-tell post about it when I get the last of it squared away.
Very interested in your setup for that opi2w. I have one that is being retired from pihole duty that I’ll be doing similar to. Also want to add an sdr to it so it can pull ghostnet js8call and the like.
Ooh, I haven’t tried RTL-SDR on it yet, but I think I’m nearing capacity on what it can do at once lol.
Here’s the block diagram for it (in spoiler below). Everything’s up and running except the Bluetooth Receiver -> Snapcast (it works on the bench but I don’t have the scripting/automation done yet). I’m also adding an SMA connector for an external antenna, but the new base part is still printing. Photo shows it “as is” of this writing.
SSL for the web apps was a PITA since I wanted real certs. Had to make a wildcard domain under my main hobby domain, so all my apps are like “https://{APP_NAME}.mobile.mydomain.xyz/”
As soon as I can get the Bluetooth + Pulseaudio scripting done, I’m gonna try to do a write up and maybe a show/tell post.
Block Diagram

Current Case

If you do this please share your IP so I can use your backup too
You can find me at ::1
Unlike OP, I’m not some hacker trying to get your IP address. I just need your regular address? :)
Welcome to datahoarders.
We’ve been here for decades.
“backups”? Pray tell, fine sir and or madam, what is that?
You know there’s only two kind of people, those who do backups and those that haven’t lost a hard drive/data before. Also: raid is no backup
Still remember the PSU blast taking out my main drive plus my backup drive in like 2001. I thought I was so good because I at least had a backup 😑. Those were the days 🤷🏻♀️
That sounds like an adventure!
Ya, me learning that a dinky psu is your worst enemy, i upgraded my SOs old duron to an athlon for work, which used more energy…
My condolences! That said Athlons were late 90s (?) cool.













