Lemmy newb here, not sure if this is right for this /c.

An article I found from someone who hosts their own website and micro-social network, and their experience with web-scraping robots who refuse to respect robots.txt, and how they deal with them.

    • splendoruranium@infosec.pub
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 months ago

      They block VPN exit nodes. Why bother hosting a web site if you don’t want anyone to read your content?

      Fuck that noise. My privacy is more important to me than your blog.

      It’s a minimalist private blog that sets no 3rd party cookies and loads no 3rd party resources. I presume that alleviates your concerns? 😜

    • tripflag@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      4
      ·
      4 months ago

      and filtering malicious traffic is more important to me than you visiting my services, so I guess that makes us even :-)

        • tripflag@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 months ago

          Absolutely; if I was a company, or hosting something important, or something that was intended for the general public, then I’d agree.

          But I’m just an idiot hosting whimsical stuff from my basement, and 99% of it is only of interest for my friends. I know ~everyone in my target audience, and I know that none of them use a VPN for general-purpose browsing.

          As it is, I don’t mind keeping the door open to the general public, but nothing of value will be lost if I need to pull the plug on some more ASN’s to preserve my bandwidth. For example when a guy hopping through a VPN in Sweden decides to download the same zip file thousands of times, wasting terabytes of traffic over a few hours (this happened a week ago).