• 1 Post
  • 13 Comments
Joined 1 year ago
cake
Cake day: June 10th, 2023

help-circle


  • Ah. Well, I still see the web interface pulling in new posts as I sit on the home page. But then, I also mentioned that my Lemmy instance (or, the instance I’ve joined, that is) is a couple of versions behind. (I’m not sure if they’re behind on both Lemmy and the UI or on just one.) If they’ve changed that behavior in newer versions, that could be why I’m still seeing the web interface pull in new posts while you don’t.

    And if that behavior is removed in the newer versions, then I can probably expect all the issues I’ve mentioned in this thread to be resolved as soon as latte.isnot.coffee updates to more recent versions of either Lemmy or Lemmy-UI or both.




  • Oh I’m with you. There used to be (though I haven’t been able to find any lately) Tor web gateways that would let you visit a tor site without having to run Tor or Tor Browser yourself. They don’t protect your identity when you use them the way using Tor Browser protects your identity, but they could be used. And some onion sites still come up as results when you search DDG for something like “Hidden Wiki site:onion.pet”. The result doesn’t link you to the .onion address, but to a .onion.pet address that takes you to the same page/site.

    As far as Tor and speeds, I think Tor imposes very large latencies (that is, it takes a few seconds to get a download started), which is more what you’re experiencing when you notice sites “being slow” when browsing through Tor. But bandwidth isn’t affected all that much.

    One caveat, though. When downloading through Tor, your request is being proxied through a chain of proxies. If any one of those is slow or purposefully limits speeds, that will limit your bandwidth. That’s a problem, maybe 30% of the time or so. But there are commands you can use to tell Tor to “please select a different route.” After doing that once or twice, you’ll generally get a decently fast “circuit.”

    Just as a test, I downloaded the latest Arch Linux ISO (which is 853MB in size) from here both via Tor and directly. Direct took 7 minutes 36.324 seconds for an average speed of 1.869MB/s. Tor took 9 minutes 26.627 seconds for an average speed of 1.505MB/s. In short, a pretty moderate difference in speed.

    And, yes, this is a highly unscientific, n=1 test, but I think it’s pretty well in line with what I’ve seen in the past.



  • Closest thing I’ve found was /r/OpenDirectories on the site that shall not be named. Which is to say, no there’s not really any such thing as “Pirate Bay but for direct downloads.” At least not that I’ve found.

    Pirate Bay but for direct downloads does seem like something that might be able to thrive on the dark web, though, doesn’t it? I wonder why something like that hasn’t become a thing and gotten big.

    I suppose some site that just acts like a searchable directory of links to IPFS could be used in combination with IPFS web adapter sites. But I haven’t found anything like that.





  • If you login in a browser, it’ll most likely give you a “session cookie” that you should be able to see in the developer tools. (If you’re using Firefox’s developer tools, it’d be under the “storage” section.) The name of the cookie will generally have the word “session” in the name. After logging in, that cookie identifies you to the server, letting the server know that “this particular request is from CucumberSalad” (or whatever your user is named on that service.) Wget probably hasn’t been working because the requests from wget don’t include that cookie like the requests from your browser do.

    (Just looking at my developer tools while using Lemmy, it seems like the Lemmy web ui doesn’t use session cookies but rather a JSON web token in a cookie named “jwt”, but I think that cookie would suffice if I was trying to scrape the Lemmy web ui.)

    Once you have the proper cookie name and value, you can have wget send the cookie to the server with each request by adding the flag --header 'Cookie: <cookie name>=<cookie value>' (but replace the values in angle brackets. Example: --header 'Cookie: JSESSIONID=ksdjflasjdfaskdjfaosidhwe'.)

    Also, if you can provide more info as to what you’re trying to scrape, folks can probably help more. Hopefully what I’ve given you is enough to get you going, but it’s possible there might be more hurtles to jump to get it working.



  • I saw another post today about ArchiveTeam Warrior and on a lark started up a Docker container.

    But it occurred to me that maybe today isn’t the best day to be archiving things. Right?

    With so many subreddits shut down, isn’t ArchiveTeam going to get a whole lot of “this sub is private” messages rather than actual content?

    Hopefully the mothership is smart enough to gracefully account for that. Maybe centrally, they keep track of those pages and “reassign” those pages back out to be fetched again after a good number of hours (days?) have passed.