So recently, I had to spider a site. I checked online for tools that let me spider one site for free.
Didn’t find any tool that works well on mac without any limitation to spider. I just to spider a site quick and fast.
So I though why not build my own personal script to for it. So with the help of AI I built a simple python script to spider the site.
So I created a initial version of the script with beautifulSoap to parse the html for links and then spider those links for more. It add everything in a csv file just as I needed. It was working perfectly, with one issue. It was too damn slow.
So then I came across asyncio. It runs your code on multiple threads making it insanely fast. I added few check to remove the duplication issue and I was all set.
Here’s the link to my script.