Break PDFs into chunks bigger than 1 page? #128

AbeHandler · 2015-02-24T23:56:35Z

I just got a very large PDF. I want to break it into smaller PDFs -- but not thousands and thousands of them. Would you be open to a pull request that added this feature to the pages command? Something like

$docsplit pages big.pdf --pages 1-1000 --numoutput 1 #breaks the first 1000 pages into a single file

page_extractor.rb

# Burst a list of pdfs into single pages, as `pdfname_pagenumber.pdf`.
def extract(pdfs, opts)
  extract_options opts
  [pdfs].flatten.each do |pdf|
    pdf_name = File.basename(pdf, File.extname(pdf))
    page_path = ESCAPE[File.join(@output, "#{pdf_name}")] + "_%d.pdf"
    FileUtils.mkdir_p @output unless File.exists?(@output)

    cmd = if DEPENDENCIES[:pdftailor] # prefer pdftailor, but keep pdftk for backwards compatability
      "pdftailor unstitch --output #{page_path} #{ESCAPE[pdf]} 2>&1"
    else
      "pdftk #{ESCAPE[pdf]} burst output #{page_path} 2>&1"
    end
    result = `#{cmd}`.chomp
    FileUtils.rm('doc_data.txt') if File.exists?('doc_data.txt')
    raise ExtractionFailed, result if $? != 0
    result
  end
end

The text was updated successfully, but these errors were encountered:

AbeHandler · 2015-02-25T00:12:01Z

Seems like pdftk can do this https://charlieharvey.org.uk/page/howto_breaking_pdfs_up_into_mutiple_pages

knowtheory · 2015-02-25T13:45:21Z

Hey @AbeHandler,

Yep you're right. Adding page ranges to Docsplit will also require adding them to PDFtailor since PDFtailor just splits a PDF into all of it's constituent pages. If you are interested in adding page ranges to PDFtailor a pull request would be more than welcome!

Although i'm not so down with the --numoutput. My feeling is that if you want pages, use the page subcommand, if you want pdfs we should be talking about the pdf command.

i'm more comfortable with something like docsplit pdf source.pdf --pages 1-5 10-20 30-37 or maybe even docsplit pdf source.pdf --split 1-5 10-20 30-37

pickhardt · 2023-05-06T01:47:25Z

Hi, just checking if page ranges have been added? I want to be able to do Docsplit.extract_text(filepath, start_page: 20, end_page: 25) for example.

AbeHandler changed the title ~~Break PDFs into~~ Break PDFs into chunks bigger than 1 page? Feb 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break PDFs into chunks bigger than 1 page? #128

Break PDFs into chunks bigger than 1 page? #128

AbeHandler commented Feb 24, 2015

AbeHandler commented Feb 25, 2015

knowtheory commented Feb 25, 2015

pickhardt commented May 6, 2023

Break PDFs into chunks bigger than 1 page? #128

Break PDFs into chunks bigger than 1 page? #128

Comments

AbeHandler commented Feb 24, 2015

AbeHandler commented Feb 25, 2015

knowtheory commented Feb 25, 2015

pickhardt commented May 6, 2023