Python, Pandas, extracting text from PDF and MERGE 2 CSV files

Опубликовано: 28 Февраль 2020
на канале: Python 360
950
5

So I had to combine data from a pdf with the data in a CSV to send back to the powers that be.
They foolishly sent me a PDF rather than CSV of recommended firmware versions.
In this video I convert the PDF using pdftotext, to create a CSV which I can the MERGE using Pandas module in Python.


There was a snag : the data in the CSV from the PDF was not delineated with commas or spaces, and I had no way of organising the columns based on the number of spaces between column one and column two.


The cheat was to visually do this with Notepadd++, not ideal, but it would have been extremely difficult to code it for repeatability otherwise.


Once I had the 2 files I tested the code with Jupyter which allowed me to view the tables "dataframes" nicely and then I took a step back and made the code work with 2 smaller, similar sample files. In the end it was quicker to do the job manually, but it's sure to crop up again so I am now armed with some code, and hopefully they'll believe that I am doing the work manually - instead I'll run this code in Python and sit back and watch cat videos on YouTube or *similar*.



reference:



Check out the Minimalist online python IDE :



Buy Dr Pi a Coffee...or Tea! :


Смотрите видео Python, Pandas, extracting text from PDF and MERGE 2 CSV files онлайн, длительностью 10 минут 05 секунд в хорошем hd качестве, которое загружено на канал Python 360 28 Февраль 2020. Делитесь ссылкой на видео в социальных сетях, чтобы ваши подписчики и друзья так же посмотрели это видео. Данный видеоклип посмотрели 950 раз и оно понравилось 5 посетителям.