Hacking a captcha from the government website for a client. #HackCaptcha EP 1/3
I was approached by the head of a construction company that owns more than 2000 apartment units spread throughout the city, who told me:
Every month I have 8 dedicated employees to access the city hall website, enter 2000 thousand IPTU numbers, for each number enter the security captcha, about 3 times because the letters of the captcha are always scrambled, access a screen to select the quota, make download the pdf of the IPTU quota, which in the case of this municipality, each sheet of IPTU comes with 3 quotas, so they have to break this sheet into 3 pieces and take only the sheet for the current month. Would it be possible to create a bot to do this?
I thought: I have to think of a way to find out how to go through the security captcha, download the pdf page, divide it into 3 pages and put each of the separate quotas for payment in a folder.
As I always say:
My dear friend, if you pay me the right amount I can even send a missile to the moon.
Dear Trump, I’m kidding you!
I asked for some time to create a proof of concept, and if it works, he will pay me for it, so let’s sketch the architecture:
1 — Create Machine Learning algorithm to learn how to read captchas. For this I will use google tensorflow. Why ? Because the code is mine and I always wanted to use it and because it is fashionable. :)
For the Machine Learning part, the logic would basically be this:
1 — Create a script to access the website using Puppeteer and download 10000 images of the captchas.
2 — Ask an unemployed friend to rename the 10,000 captchas images with the captcha value. I need this mass of data to train the tensorflow, so that he will be able to discover the new captchas.
3 — Pass the captchas images renamed above to the tensorflow, with this he will be trained to discover the next captchas.
For the part of accessing the government website, we will create a script by performing the following steps:
1 -Open a session to the site via Puppeteer browser.
2 — Enter the IPTU number.
3 — Capture the captcha image, send it to tensorflow to resolve the captcha value, enter the value in the text field of the website and submit the form.
4 — Take 3 printscreem on the IPTU quota page, creating a file for each quota.
Phew a lot. Take a look at the drawing at the top of the page, as reading pictures has always been easier.
So that’s it for today, that was just the draft of the idea, this is the first episode of our 1/3 series, in the next we will go into the details of the implementation, because the Tensorflow part is giving a lot of work, but it is moving.
Did you find this article relevant? Do you want to know about the next episodes? Do you or your company have a similar problem? Please contact me and I will not hesitate to charge you to resolve. :)