Captcha > Tensorflow > Puppeteer

Hacking a captcha from the government website for a client. #HackCaptcha EP 1/3

I was approached by the head of a construction company that owns more than 2000 apartment units spread throughout the city, who told me:

Every month I have 8 dedicated employees to access the city hall website, enter 2000 thousand IPTU numbers, for each number enter the security captcha, about 3 times because the letters of the captcha are always scrambled, access a screen to select the quota, make download the pdf of the IPTU quota, which in the case of this municipality, each sheet of IPTU comes with 3 quotas, so they have to break this sheet into 3 pieces and take only the sheet for the current month. Would it be possible to create a bot to do this?

I thought: I have to think of a way to find out how to go through the security captcha, download the pdf page, divide it into 3 pages and put each of the separate quotas for payment in a folder.

As I always say:

My dear friend, if you pay me the right amount I can even send a missile to the moon.

Dear Trump, I’m kidding you!

I asked for some time to create a proof of concept, and if it works, he will pay me for it, so let’s sketch the architecture:

1 — Create Machine Learning algorithm to learn how to read captchas. For this I will use google tensorflow. Why ? Because the code is mine and I always wanted to use it and because it is fashionable. :)

2 — Use a website testing tool to simulate access to the website programmatically. I thought about using Puppeteer which is a google library used to control events accessing websites in Chrome using javascript. But why? because javascript is very cool and compatible, in case I want to transform it into an application.

For the Machine Learning part, the logic would basically be this:

1 — Create a script to access the website using Puppeteer and download 10000 images of the captchas.

2 — Ask an unemployed friend to rename the 10,000 captchas images with the captcha value. I need this mass of data to train the tensorflow, so that he will be able to discover the new captchas.

3 — Pass the captchas images renamed above to the tensorflow, with this he will be trained to discover the next captchas.

For the part of accessing the government website, we will create a script by performing the following steps:

1 -Open a session to the site via Puppeteer browser.

2 — Enter the IPTU number.

3 — Capture the captcha image, send it to tensorflow to resolve the captcha value, enter the value in the text field of the website and submit the form.

4 — Take 3 printscreem on the IPTU quota page, creating a file for each quota.
Phew a lot. Take a look at the drawing at the top of the page, as reading pictures has always been easier.

So that’s it for today, that was just the draft of the idea, this is the first episode of our 1/3 series, in the next we will go into the details of the implementation, because the Tensorflow part is giving a lot of work, but it is moving.

Did you find this article relevant? Do you want to know about the next episodes? Do you or your company have a similar problem? Please contact me and I will not hesitate to charge you to resolve. :)

Expert in software development and data manipulation.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

CI/CD Pipeline with Jenkins and Kubernetes(Part 1)

Best Practices for writing code review comments

A thin client is a fine point

API Core with Docker

Understanding The Practical Differences Between Excel 2011 for Mac, Excel 2016 for Mac and Excel…

I’m Learning Kotlin By Creating Apps — This is my journey so far. [Part 2]

Migrating to CSI controller for OpenStack Cinder after Upgrading to KubeSpray 2.13

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Samuel JB

Samuel JB

Expert in software development and data manipulation.

More from Medium

MFA login hack to shortcut smartphone check

Integrating Flower with Celery in Django Project Using Docker

Making our First Telegram Bot [Part -2]

How to Compress Images without losing quality for FREE.