top of page

How to extract data from a PDF or picture into Excel


Table of contents


Introduction


Data management lies at the heart of many professional activities. However, centralizing and processing information from diverse sources like PDFs or images can quickly become a time-consuming nightmare. In our modern times, there are efficient solutions that can help make your life easier.


In this article, we’ll explore how to extract data from PDFs or images into Excel. Whether you’re an analyst, a consultant, or a project manager, you’ll discover practical use cases that will transform your workflow.


How to convert data from a PDF to an Excel table


The problem: PDF data won't come formatted as a table when pasted to Excel


We've all faced this situation more than once: we have a great PDF with a lot of useful information shown as a table, but we cannot just copy and paste this information into Excel.


We can try all we want, but data from a table in the PDF just won't come as a table when pasted in the Excel grid. Does this mean we have to spend hours manually copying and pasting each individual cell value from the table into the corresponding Excel table? Hell no!


How to extract data from PDF to Excel


Here are the steps to extract the data from a PDF to Excel:

  • From Excel, go to the 'Data' tab

  • In the 'Get & Transform data' group, click 'Get data' > 'From file' > 'From PDF'

  • Browse and select your PDF

  • A window will open where Excel shows the list of pages inside your PDF. For each page, a preview shows the tables with the data you can extract.

  • Click the page containing the data to extract, and then validate with 'Load'.

  • That's it! The data is now shown in a new Excel table and you can start working on it.

Excel extract data from a PDF with PowerQuery
Get data > From file > From PDF
Import a PDF document to PowerQuery
Import your PDF Document
PowerQuery select the table to transform from PDF to Excel table
Select the table you want to transform
Excel PowerQuery select the table to transform
Click on "Close & Load" to generate the Excel spreadsheet
Close & Load to import data from PDF to Excel table
Well done ! You have successfully extracted all the data from page 50 of the PDF into your Excel spreadsheet.

This extraction method is actually leveraging one of the most underutilized feature of Excel, called PowerQuery.


Power Query is a powerful data connection and transformation tool available in Microsoft Excel and Power BI. It allows users to easily import, clean, and reshape data from various sources without needing complex coding skills. With its intuitive interface, Power Query simplifies the process of transforming raw data into usable insights, making it an essential tool for data analysis and reporting.



How to convert data from a picture to Excel


Extracting data from an image into Excel (for tables)


Even more impressive than a PDF, Excel can also extract data from pictures. This is great when all you have is a picture or photo you've taken of a table, and you need to put it in Excel.


Here are the steps to extract the data from a picture to Excel:

  • From Excel, go to the 'Data' tab.

  • In the 'Get & Transform data' group, click 'From picture', and then either 'Picture from file' or 'Picture from clipboard'.

  • Browse and select the picture containing the data.

  • Excel will open a task pane at the right of the screen, showing the characters recognized by Excel.

  • Review the cells highlighted in light red, they are the cells where Excel is not sure of the result. If you click them, you can manually enter the correct value.

  • When you're done, click "Insert data" and the values will be imported in the Excel grid.


Excel import data from Picture to Excel table with OCR

This feature is powered by OCR (Optical Character Recognition) technology, which allows Excel to recognize and extract text from images, making it possible to work with data in pictures without re-entering everything manually.


Example with a regular table


Browse and select the picture containing the data, review the characters recognized in the task pane (paying close attention to cells highlighted in light red where Excel may need confirmation), manually correct any errors, and finally click "Insert Data" to import the values seamlessly into the Excel grid.


Excel review data from picture
Regular table with Data
Insert and review your data for transformation
Insert and review your data before it is imported

The data from your picture is then imported into Excel
The data from your picture is then imported

Example with a handwritten table


Now what happens if your picture is not that clean, but instead you're using a picture of handwritten data? How will Excel's OCR perform in this situation?


Excel review data from handwritten picture with OCR

When we imported a picture with handwritten data, Excel asked us to review a significant number of the values from the table, reducing the interest of a supposedly automatic solution.


When inserting the data without reviewing some of the cells, the result came as shown below:

The result: only part of the data was imported

As you can see, the result is far from perfect.


OCR technology is not always perfect. The quality of the recognition can depend on several factors, such as the clarity of the picture, the font used, or the layout of the data. There are known limitations, and Excel definitely struggles with handwritten text. You might often find errors that need to be corrected manually, so be aware that while it's a great feature, it may not always be reliable.



Example with a handwritten table, using ChatGPT instead


We compared the Excel OCR feature with the free alternative of ChatGPT. With ChatGPT, we can extract the handwritten data in a digital table with a much higher level of precision and quality. Then we can import into Excel.


Extracting data from a table into Excel using ChatGPT

The example above highlights how ChatGPT offers a more accurate solution for data extraction compared to traditional Excel OCR technology. Its advanced capabilities in understanding handwritten and structured data ensure greater precision and efficiency.



Extracting data from an image into Excel (for charts)


Have you ever found a chart image online, where you actually needed to extract the data from the chart and put it into Excel? The reason could be that you had to work on the data, or maybe you just wanted to re-create a new, cleaner chart matching your brand colors.


In any case, getting the data from a picture of a chart into Excel can seem like a complicated and tedious task. And in fact, there is no simple way to do this in Excel except manually reading and typing the values.


However, there is a third-party web tool called WebPlotDigitizer by Automeris that can actually do that for you.


Go to https://automeris.io/ and create an account. Then upload your chart and follow the guided steps to define what type of chart this is and what the axis.


Once ready, you can extract the data points, and then download the data as a .csv you can then import in Excel.

1 view
bottom of page