When Microsoft introduced Office 2007 they also revealed that they would now store their documents in a XML-based file format. To show the difference between the “old” format and the new, files stored in XML get the ‘X’ added to the file extension.
Retrieving images from a Word document (or any other for that matter) has never been a simple walk in the park. In earlier Office versions it was if not easy, but everyone managed to use copy and paste. You could also save the document as a HTML-file and retrieve the images from the file folder it created.
With the “new” XML-based file formats it’s actually gotten way easier…
XML is an archive
What many of us didn’t know is that the XML-format is not just a “web-format” which can be configured in similar manner to the HTML-format, using tags. The XML-format can also be used as an archive which can store dependent resources.
A normal word document with images stores the images and various XML-documents containing formatting, text etc. These resources may be extracted much the same way as ZIP-files, or RAR-files.
Rename the document
First, let me say it’s important that the document is stored in the 2007/2010 format for this to work. In this example I’m using a Word document, but it can be applied to any Office Document.
- Make a copy of your document and rename the extension to ZIP (e.g. name.docx -> name.zip)
- Open or Extract the “Zip-File” using your prefered archive tool
- In the archive you will find a folder named WORD (or excel, or …)
- Inside that folder, you find one called MEDIA, open it.
- There are your media files
The media files has been renamed using continuous numbering (image 1, image 2 …). The advantage is that the files themselves hasn’t been altered so there is no data loss as you could experience with the “old” office file format.
About Thomas
Computer geek from the age of 7, which amounts to 30 years of computer experience. From the early days (when every computer company had their own OS) of DOS, Windows 1.0 through Seven...
Search Windows Guides