Command Line Arguments reading image and pdf: Argparse with Python

easy-to-use and easy to write

2 min readMay 1, 2021

reading image and pdf from the command line

Photo by Madan Maram

What are Command-line arguments in Python3?

Python Command line arguments are input parameters passed to the script when executing them.

use this line in the command prompt to get the output!

usage: python -i sample1.jpg -p pdf/sample1.pdf


import numpy as np
import re
import pytesseract
import os
from PIL import Image
import cv2
from pdf2image import convert_from_path
import pandas as pd
import tabula

from tabula import read_pdf
import pandas as pd
import numpy as np
import argparse
import sys

pytesseract.pytesseract.tesseract_cmd = ‘C:\\Program Files\\Tesseract-OCR\\tesseract.exe’

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument(“-i”, “ — imgpath”, help=”image path”)
ap.add_argument(“-p”, “ — pdfpath”, help=”pdf path”)

args = vars(ap.parse_args())

def func_name(imagepath,pdfpath):
— — -path = imagepath
— — -image = cv2.imread(path, 0)

— — - pdf_path = pdfpath
— — -list1 = []
— — -df1 = tabula.read_pdf(pdf_path, pages=’all’)
— — -for item in df1:
— —- for info in item.values:
— — -list1.append(info)
— — -df1 = pd.DataFrame(list1)


Now, our helper text is displayed we use --help from the command line.

C:\Users\Madan \argparse>python -h
usage: [-h] [-i IMGPATH] [-p PDFPATH]

optional arguments:
-h, — help show this help message and exit
-i IMGPATH, — imgpath IMGPATH
image path
-p PDFPATH, — pdfpath PDFPATH
pdf path

Read Files from folder :

usage: python path pdf1

parser = argparse.ArgumentParser(description=’Read in a file or set of files, and return the result.’, formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument(‘path’, nargs=’+’, help=’Path of a file or a folder of files.’)
parser.add_argument(‘-e’, ‘ — extension’, default=’’, help=’File extension to filter by.’)
args = parser.parse_args()

# Parse paths
full_paths = [os.path.join(os.getcwd(), path) for path in args.path]
files = set()
for path in full_paths:
if os.path.isfile(path):
files |= set(glob.glob(path + ‘/*’ + args.extension))

I don’t like to write the waste content here, so I went directly to matter without writing anything blah blah blah