Does PDFNet support extracting table and list info from PDFs?
PDFNet supports extraction of all content available in PDF document. On the other hand PDF standard does not directly support abstract constructs such as paragraphs, columns, tables, etc. Because the logical structure is missing in PDF document, the target application would need to analyze and generate logical structure based on the underlying content that is available through PDFNet. Note that PDF standard supports marked content and so called ‘tagged PDF’. PDFNet can be used to extract marked content and any existing logical structure. Unfortunately many PDF files are missing tags and logical structure.