this post was submitted on 12 Nov 2023
0 points (50.0% liked)

Data Hoarder

0 readers
3 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 1 year ago
MODERATORS
 

Hi. I am a long-time lurker, first time poster to Reddit! I have a problem that has me stumped and normally, I would like to work things out for myself but I am banging my head against the wall.

I have a youtube channel where I post images that have been downloaded with this tool in the format of a stop-motion. These could be of MRI's, photo's of the sun from various solar probes, photo's from weather satellites and so so on.

There are many solar probes that are put into space to analyze the behavior of the sun. They each take x amount of photo's on a regular interval using a timer. This could be once every three minutes.

If the solar probe is of the property of NASA, then it goes to a big database that they host. These photo's are free use, providing the relevant people and entities are acknowledged. These images are categorized by month. So, every month it starts a new index and photo's go into the new index instead.

There is an index directory of photo's from Stereo-B that I would like to download, however I am having some trouble.

This is an example of a directory that could be accessed using a web browser (Google Chrome).

https://iswa.gsfc.nasa.gov/iswa_data_tree/observation/solar/stereo-B/euvi/2007/04/

The web browser displays the photo's correctly, but must sometimes wait for up to two minutes after the request to receive a response. This depends on how many photo's there are in that index.

However, this doesn't work at all for Cyotek WebCopy, or any of the web crawlers that I have tried.

The reason why I refer to Cyotek Webcopy is because it is the closest solution that I have found to working. However, it timeouts after 20 seconds. Not long enough for the website to deliver a response.

After playing around with the speed limit settings, in project>project settings, I was able to download this directory. It has 14,000 images, and notice that it is on the same domain.

https://iswa.gsfc.nasa.gov/iswa_data_tree/observation/solar/sdo/aia-0131_1024x1024/2018/05/

Please can someone help me? I don't know if it is possible to do this without the developers issuing a patch. If they do issue a patch, along with telling me instructions, I would be endlessly grateful.

Does anyone else have better luck? What did you do, how did you do it, using what. I would love to know. It isn't practical downloading them all one-by-one!

top 3 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 1 year ago
[–] [email protected] 1 points 1 year ago

tldr:Offline Explorer by Metaproducts

In Offline Explorer, there is a sequencer that allow you to autocheck a website/webpage for any updates in timed task and download the update. I have not use that features. Cant afford it haha.

For general site mirroring, for me, wget and Offline Explorer that I usually go to. Also in JDownloader there is Scheduler plugin, have not use it also.

[–] [email protected] 1 points 1 year ago

tldr:Offline Explorer by Metaproducts

https://metaproducts.com/products/product-comparison-chart

In Offline Explorer, there is a sequencer that allow you to autocheck a website/webpage for any updates in timed task and download the update. I have not use that features. Cant afford it haha.

For general site mirroring, for me, wget and Offline Explorer that I usually go to. Also in JDownloader there is Scheduler plugin, have not use it also.