Print2CAD 2021 - Reference Manual

Optimization 3:

Native Text Conversion

USA Flag
German Flag
Optimization 3:  Part 1 - Automatic OCR Text Recognition

Optimization 3:

Example of Use - OCR of Text

Optimization 3:  Part 2 - Native Text Recognition
point-small

Recognition of Native PDF Text

The text in PDF files can be placed as a native PDF text, as a text deconstructed in lines, as a text deconstructed in hatches, and as a text presented in raster pictures.

To recognize this kind of text the program uses artificial intelligence methods of OCR (Optical Character Recognition) and Symbol Recognition.

Automatic OCR Text Recognition of Non Native PDF Text
Automatic OCR Text Recognition of Non Native PDF Text

Text Presented in Raster Pictures

Text Deconstructed in Hatches

Automatic OCR Text Recognition of Non Native PDF Text

Text Deconstructed in Lines

point-small

Conversion of Native PDF texts

The native text in PDF files can be placed as strings or individual characters. The best method to find out if your PDF file contains real text is to analyze the PDF file with the analysis function of Print2CAD and see if there are any text entities indicated.

Another method is to open the PDF file in a PDF Reader and zoom the text to maximum view. If the letters still have smooth edges, your PDF file most likely features real text.  If the edges of the letters are not smooth, Print2CAD will not convert the “text” to real text without activating the OCR function.

Conversion of native PDF texts

Parameter: Convert Native PDF Texts into Editable CAD Texts

In PDF files, text is usually defined as separate characters or groups of characters with their own insertion points. With the help of special internal methods, Print2CAD merges characters into strings and places these strings as texts in the DWG or DXF drawings.

Parameter: Convert Native PDF Texts into Editable CAD Texts

Parameter: Convert Native PDF Texts into Hatches

Print2Cad converts all Texts into Polylines with filled Areas (Hatches).

It is not always possible to extract text from a PDF especially when the Unicode map is missing or “user defined”. There are many construction drawings that use this type of trick to stop people from extracting the data.

If it is not possible to cut and paste the correct text from Acrobat then you will have very little chance of converting the text yourself. If Acrobat cannot extract it then it is very unlikely that Print2CAD can extract the text correctly.

To convert this text into hatches or to apply OCR functions on it is the only one possibility to handle this kind of text.

Parameter: Visualization of a Text with Corrupt Codec

If the font codec and encoding table is manually created then the program Print2CAD will use artificial intelligence methods to find out the right codes.

Parameter: Sort Text Onto Separate Layer

When activating this function, all native or recognized text gets sorted onto a predetermined layer. If there are no real text, but only polylines, hatches or raster images, the letters will not be recognized as text.

Parameter: Replace All Fonts With a SHX ot TTF Font

When enabling this option, all text styles get the same selected SHX or TTF font assigned.

Parameter: Scale Factors for Blank Space Width

Text in PDF files is often placed as single letters. In this case the spaces are not available.

 When Print2CAD is transforming letters to text, blank spaces get recognized with the help of a substitute space width equating the letter “a.”

 Should the space detection does not work properly, increase or reduce the substitute space factor according to the below graphic (by trial and error):

 

Parameter: Scale Factors for Blank Space Width

Parameter: Scale Factors for Text Width and Height

If Print2CAD can’t find the fonts used in the PDF in the Windows system, Print2CAD will select a similar font. In doing so, the text width may change.

A workaround for this is the use of scale factors for the text width and height. The text will be scaled by the given factor and placed left-aligned in the CAD drawing.

The fonts in PDF files are mostly embedded, so that you do not need the fonts in your Windows system if you display the PDF files.

In DWG or DXF files the fonts can not be embedded. You will need all the fonts that are used in the DWG or DXF files installed in your Windows system.

Print2CAD is not able to extract PDF embedded fonts into your Windows system.

BacktoCAD Technologies, LLC

601 Cleveland St, Suite 380

Clearwater, FL 33755, USA

 

Email: bc-sales@cad-pdf.com
Phone: (727) 303 0383

© Copyright 2020 BackToCAD Technologies, LLC. All rights reserved. Kazmierczak® is a registered trademark of Kazmierczak Software GmbH. Print2CAD, AzubiCAD, and CAD2Print are Trademarks of BackToCAD Technologies LLC. CADconv is a Trademark of Expert Robotics Inc.. DWG is the name of Autodesk’s proprietary file format and technology used in AutoCAD® software and related products. Autodesk, the Autodesk logo, AutoCAD, DWG are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. This website is independent of Autodesk, Inc., and is not authorized by, endorsed by, sponsored by, affiliated with, or otherwise approved by Autodesk, Inc. The material and software have been placed on this Internet site under the authority of the copyright owner for the sole purpose of viewing of the materials by users of this site. Users, press or journalists are not authorized to reproduce any of the materials in any form or by any means, electronic or mechanical, including data storage and retrieval systems, recording, printing or photocopying.