Skip to main content

EUDI: Extract Tabular Data From Any File

PDF format is a de facto standard for exchanging documents or archiving information since PDF files are non editable and the document layout is preserved.

For this reason, most of statistical data like corporate annual reports, stock performance, sale figures, etc are available only in the form of PDF files.

But a common problem associated with PDF is that it becomes very difficult to extract data from them. Though there are some workarounds and tools to extract text out of PDF files, they fail to preserve the original document layout.

Say, the account department sends you the annual salary sheet in PDF which you want to export to Excel and do some analysis. Or the sales people send their weekly sales report in a plain text email that you like to represent as charts in Excel or even Microsoft Word.

Cogniview, an Israel based company, have released EUDI 1.0, a data extraction software that works with PDF or with any Windows Application that has the Print feature.

Like Acrobat, EUDI [End User Data Integrator] installs as a printer on your machine. When you want to extract data from PDF, Windows CHM Help files, Notepad files or even Web-based Email, just print the file or browser window using the EUDI printer.

The document then opens inside EUDI PDF to Excel conversion software where you visually mark the area using mouse that you want to extract. EUDI is smart enough to recognize the table layout and formatting and will automatically split the selected area into rows and columns.

It draws a marquee around the detected rows and columns. If there's a mistake in automatic detection, you can change manually like you do with Cell Selections in Microsoft Excel - Merge Cells, Split Rows/Columns or even Delete them.

The extracted text results are shown in real time in the bottom pane. Once you are satisfied with the adjusted layout, you can either export it to Microsoft Word, Excel or just copy to the clipboard. The file can also be saved in EUDI specific format for editing and exporting later.



EUDI's user interface is very intuitive and you can immediately get started without even reading the help manual. There's also a nice video screencast on their website if you like to see the software in action.

Download EUDI - This Windows-only software supports Office 97, 2000, XP and 2003. Product Activation is mandatory for using the software. It can be done online or via Fax/E-mail.

Don't confuse EUDI with an OCR software like OmniPage or Abby FineReader. EUDI will extract only the data that was embedded as text, it won't interpret a graphic image like a text logo.

Popular posts from this blog

How to Download Contacts from Facebook To Outlook Address Book

Facebook users are not too pleased with the "walled garden" approach of Facebook. The reason is simple - while you can easily import your Outlook address book and GMail contacts into Facebook, the reverse path is closed. There's no "official" way to export your Facebook friends email addresses or contact phone numbers out as a CSV file so that you can sync the contacts data with Outlook, GMail or your BlackBerry. Some third-party Facebook hacks like "Facebook Sync" (for Mac) and "Facebook Downloader" (for Windows) did allow you to download your Facebook friends' names, emails, mobile phone number and profile photo to the desktop but they were quickly removed for violation of Facebook Terms of Use. How to Download Contacts from Facebook There are still some options to take Friends data outside the walls of Facebook wall. Facebook offers the Takeout option allowing you to download all Facebook data locally to the disk (include

Digital Inspiration

Digital Inspiration is a popular tech blog by  Amit Agarwal . Our popular Google Scripts include  Gmail Mail Merge  (send personalized emails with Gmail ),  Document Studio (generate PDFs from Google Forms ) and   File Upload Forms ( receive files  in Google Drive). Also see  Reverse Image Mobile Search , Online Speech Recognition and Website Screenshots , the most useful websites on the Internet.

PhishTank Detects Phishing Websites by Digg Style Voting

OpenDNS, a free service that helps anyone surf the Internet faster with a simple DNS tweak , will announce PhishTank today. PhishTank is a free public database of phishing URLs where anyone can submit their phishes via email or through the website. The submissions are verified by the other community members who then vote for the suspected site. This is such a neat idea as sites can be categorized just based on user feedback without even having to manually verify each and every submission. PhishTank employs the "feedback loop" mechanism where users will be kept updated with the status' of the phish they submit either via email alerts or a personal RSS feed . Naturally, once the PhishTank databases grows, other sites can harness the data using open APIs which will remain free. OpenDNS would also use this data to improve their existing phishing detection algorithms which are already very impressive and efficient. PhishTank | PhishTank Blog [Thanks Allison] Related: Google