Python, Pandas, extracting text from PDF and MERGE 2 CSV files

Published: 28 February 2020
on channel: Python 360
950
5

So I had to combine data from a pdf with the data in a CSV to send back to the powers that be.
They foolishly sent me a PDF rather than CSV of recommended firmware versions.
In this video I convert the PDF using pdftotext, to create a CSV which I can the MERGE using Pandas module in Python.


There was a snag : the data in the CSV from the PDF was not delineated with commas or spaces, and I had no way of organising the columns based on the number of spaces between column one and column two.


The cheat was to visually do this with Notepadd++, not ideal, but it would have been extremely difficult to code it for repeatability otherwise.


Once I had the 2 files I tested the code with Jupyter which allowed me to view the tables "dataframes" nicely and then I took a step back and made the code work with 2 smaller, similar sample files. In the end it was quicker to do the job manually, but it's sure to crop up again so I am now armed with some code, and hopefully they'll believe that I am doing the work manually - instead I'll run this code in Python and sit back and watch cat videos on YouTube or *similar*.



reference:



Check out the Minimalist online python IDE :



Buy Dr Pi a Coffee...or Tea! :


Watch video Python, Pandas, extracting text from PDF and MERGE 2 CSV files online, duration 10 minute 05 second in high hd quality that is uploaded to the channel Python 360 28 February 2020. Share the link to the video on social media so that your subscribers and friends will also watch this video. This video clip has been viewed 950 times and liked it 5 visitors.