Thread: How To Extract Images From A PDF
-
June 12th, 2018, 20:05 #21
I did not have any image files. I ran it in a VM (Virtual Machine). I have no idea why the image files were not written to disk.
I tried again. In the VM, I invoked a DOS shell (cmd.exe) and then executed the batch file from within that. I did this to observe possible error messages. There were two messages, missing *.jpg and missing *.ppm files. (Which are unimportant, see below.) My test .pdf was large (~30Mb, >700 images). As near as I can tell all images were extracted this time.
I also tried running the batch file without a VM but using a DOS command shell. Again all images seem to have been extracted. And the same two error messages were reported.
As a another test, no VM no DOS shell, just simply double clicking the batch file from Windows Explorer. Another success.
So to summarize, everything seems to be working fine. The results of my first attempt must have been an aberration.
The following will be unimportant to most people reading this thread, except those with a particular "techie" inquisitive nature or those having difficulties.
The error messages were a conundrum until I started digging. They are unimportant, since they are the result of the two erase commands in the .bat file. In addition to the .jpg image files, .pbm, .pgm and .ppm files are created, but not one-to-one with the .jpg files.
I went to the "superuser.com" web site (part of stack exchange) so I feel comfortable with the information I got there. The .ppm, .pbm and .pgm files have to do with image type within the .pdf. I'm not sure that I care enough to explore this further, but someone else might. One item of interest is the -l command line parameter, which gives an image by image extraction report. It might be helpful if an image you want from the .pdf doesn't extract.
Colin. Thanks for your help.
-
June 12th, 2018, 23:57 #22
- Join Date
- Mar 2014
- Location
- Staffordshire, UK
- Posts
- 337
Wow ... a significant and in depth amount of testing.....
It is painful to extract a lot of images from a PDF and the program does it (generally) quite well and very, very fast.
All I can take credit for is the batch file which simplifes the process for the average user.
The true credit should go to the developers over at https://www.xpdfreader.com/ who developed the routine in the first instance.
Anyways I hope the images extracted are of sufficient quality for use. If not, I am sure you are aware of a variety of programs to resize them, one I like is https://www.xnview.com/en/xnconvert/ as its very good at batch resizing.Ultimate License
UK Time Zone (GMT/BST)
DM'ing since 1977 (Basic D&D)
Currently Playing:
Empire of the Ghouls 5E Campaign
Tales from the Yawning Portal 5E Campaign
Rise of the Runelords Pathfinder 1e
Amazing Adventures 5E Campaign"Some are born to move the world, to live their fantasies
But most of us just dream about the things we'd like to be."
Rush - Losing It
Currently DM'ing
Princes of the Apocalypse 5E Campaign
Waterdeep: Mad Mage 5E Campaign
The Blight 5E Campaign
-
June 13th, 2018, 01:06 #23
Actually Colin, after reading the information on the "superuser" site. It is my belief that one can't get better quality. The routines extract the image in the format that it was embedded into the .pdf file. With many of the DriveThruRPG and DMs Guild PDFs, the WotC guidelines specify that the PDFs should be constructed using JPGs. If I understand all of this correctly, it doesn't get any better than that.
When I first saw the images extracted as JPGs I thought to myself, "too bad it doesn't extract them in an uncompressed format, extracting to JPGs might not be totally "lossless"". But after further reading my ignorance has been assuaged. I'll probably do a little more extraction comparisons, and report back.
But right now, I'd recommend the methodology you posted as the #1 avenue for extraction of images from PDFs. Especially since file size is irrelevant, as is password protection of PDF properties (like watermarks). The PDF I extracted from was a DriveThruRPG watermarked purchased adventure.
Keep in mind I'm not the #1 authority on this, but I have spent just about two weeks now searching the forums, trying various methodologies and trying to digest it all into one easy to eat package.
-
June 14th, 2018, 08:20 #24
- Join Date
- Jun 2018
- Posts
- 37
When I run it I get an I/O error message saying it can't open the pdf file. I've tried more than one pdf file, none could be opened. I ran the batch file and I also ran the exe from a command prompt, both as administrator. I've tried the pdf files with different names including z.pdf and they were in the images subfolder. I have Windows 8.1. I've also checked the pdf files for security issues, but they look normal.
-
June 14th, 2018, 08:42 #25
My last run batch looks like this:
@echo off
cd images
erase *.jpg
erase *.ppm
..\pdfimages.exe -j ..\Strange_Tales_of_the_Century.pdf images
cd ..
pause
Worked fine.
-
June 14th, 2018, 08:48 #26
- Join Date
- Jun 2018
- Posts
- 37
The batch file in the zip looks like this...
@echo off
cd images
erase *.jpg
erase *.ppm
rename *.pdf z.pdf
..\pdfimages.exe -j z.pdf images
rem erase z.pdf
cd ..
Is that right, or is there something missing or set wrong?
-
June 14th, 2018, 11:14 #27
Hi Swifty0x0
I had trouble with the rename part of the script so I ran it on specified files each time.
-
June 14th, 2018, 18:00 #28Proud Ultimate License Holder
Also have bought a lot of other 5E and other , incl. Savage Worlds, Mutants & Masterminds, and Pathfinder 2e
Central Time Zone (living in the USA, although born on the eastern shores of Canada)
Have Played All D&D Editions except for 3/3.5 (am familiar with those rules, tho)
-
June 15th, 2018, 12:44 #29
- Join Date
- Mar 2014
- Location
- Staffordshire, UK
- Posts
- 337
I was working on several files and tried scripting the batch file to process each pdf and write out to its own folder.... but became fedup as to how long it was taking me (seem to recall it was a problem with .exe not working in sub folders) - so I to change the process to use a single file called z.pdf. After all how often do you process a file compared to creating the module and editing the images (damn them secret doors !!!)
All I then had to do was simply copy the pdf into the folder and rename it. Run the batch and move the extracted files. Rinse and repeat....Ultimate License
UK Time Zone (GMT/BST)
DM'ing since 1977 (Basic D&D)
Currently Playing:
Empire of the Ghouls 5E Campaign
Tales from the Yawning Portal 5E Campaign
Rise of the Runelords Pathfinder 1e
Amazing Adventures 5E Campaign"Some are born to move the world, to live their fantasies
But most of us just dream about the things we'd like to be."
Rush - Losing It
Currently DM'ing
Princes of the Apocalypse 5E Campaign
Waterdeep: Mad Mage 5E Campaign
The Blight 5E Campaign
-
June 17th, 2018, 05:09 #30
- Join Date
- Jun 2018
- Posts
- 37
Thanks for the replies guys. I ended up downloading a free version of PDFMate PDF Converter and that had no problem working on the same PDF files I couldn't get to work with the pdfimages.exe file.
Thread Information
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks