Table of contents
Introduction
Data management lies at the heart of many professional activities. However, centralizing and processing information from diverse sources like PDFs or images can quickly become a time-consuming nightmare. In our modern times, there are efficient solutions that can help make your life easier.
In this article, we’ll explore how to extract data from PDFs or images into Excel. Whether you’re an analyst, a consultant, or a project manager, you’ll discover practical use cases that will transform your workflow.
How to convert data from a PDF to an Excel table
The problem: PDF data won't come formatted as a table when pasted to Excel
We've all faced this situation more than once: we have a great PDF with a lot of useful information shown as a table, but we cannot just copy and paste this information into Excel.
We can try all we want, but data from a table in the PDF just won't come as a table when pasted in the Excel grid. Does this mean we have to spend hours manually copying and pasting each individual cell value from the table into the corresponding Excel table? Hell no!
How to extract data from PDF to Excel
Here are the steps to extract the data from a PDF to Excel:
From Excel, go to the 'Data' tab
In the 'Get & Transform data' group, click 'Get data' > 'From file' > 'From PDF'
Browse and select your PDF
A window will open where Excel shows the list of pages inside your PDF. For each page, a preview shows the tables with the data you can extract.
Click the page containing the data to extract, and then validate with 'Load'.
That's it! The data is now shown in a new Excel table and you can start working on it.
This extraction method is actually leveraging one of the most underutilized feature of Excel, called PowerQuery.
Power Query is a powerful data connection and transformation tool available in Microsoft Excel and Power BI. It allows users to easily import, clean, and reshape data from various sources without needing complex coding skills. With its intuitive interface, Power Query simplifies the process of transforming raw data into usable insights, making it an essential tool for data analysis and reporting.
How to convert data from a picture to Excel
Extracting data from an image into Excel (for tables)
Even more impressive than a PDF, Excel can also extract data from pictures. This is great when all you have is a picture or photo you've taken of a table, and you need to put it in Excel.
Here are the steps to extract the data from a picture to Excel:
From Excel, go to the 'Data' tab.
In the 'Get & Transform data' group, click 'From picture', and then either 'Picture from file' or 'Picture from clipboard'.
Browse and select the picture containing the data.
Excel will open a task pane at the right of the screen, showing the characters recognized by Excel.
Review the cells highlighted in light red, they are the cells where Excel is not sure of the result. If you click them, you can manually enter the correct value.
When you're done, click "Insert data" and the values will be imported in the Excel grid.
This feature is powered by OCR (Optical Character Recognition) technology, which allows Excel to recognize and extract text from images, making it possible to work with data in pictures without re-entering everything manually.
Example with a regular table
Browse and select the picture containing the data, review the characters recognized in the task pane (paying close attention to cells highlighted in light red where Excel may need confirmation), manually correct any errors, and finally click "Insert Data" to import the values seamlessly into the Excel grid.
Example with a handwritten table
Now what happens if your picture is not that clean, but instead you're using a picture of handwritten data? How will Excel's OCR perform in this situation?
When we imported a picture with handwritten data, Excel asked us to review a significant number of the values from the table, reducing the interest of a supposedly automatic solution.
When inserting the data without reviewing some of the cells, the result came as shown below:
As you can see, the result is far from perfect.
OCR technology is not always perfect. The quality of the recognition can depend on several factors, such as the clarity of the picture, the font used, or the layout of the data. There are known limitations, and Excel definitely struggles with handwritten text. You might often find errors that need to be corrected manually, so be aware that while it's a great feature, it may not always be reliable.
Example with a handwritten table, using ChatGPT instead
We compared the Excel OCR feature with the free alternative of ChatGPT. With ChatGPT, we can extract the handwritten data in a digital table with a much higher level of precision and quality. Then we can import into Excel.
The example above highlights how ChatGPT offers a more accurate solution for data extraction compared to traditional Excel OCR technology. Its advanced capabilities in understanding handwritten and structured data ensure greater precision and efficiency.
Extracting data from an image into Excel (for charts)
Have you ever found a chart image online, where you actually needed to extract the data from the chart and put it into Excel? The reason could be that you had to work on the data, or maybe you just wanted to re-create a new, cleaner chart matching your brand colors.
In any case, getting the data from a picture of a chart into Excel can seem like a complicated and tedious task. And in fact, there is no simple way to do this in Excel except manually reading and typing the values.
However, there is a third-party web tool called WebPlotDigitizer by Automeris that can actually do that for you.
Go to https://automeris.io/ and create an account. Then upload your chart and follow the guided steps to define what type of chart this is and what the axis.
Once ready, you can extract the data points, and then download the data as a .csv you can then import in Excel.