DCR

Document Content Recognition

Preprocessor

  • Identifying scanned image pdf documents using PyMuPDF.

  • Converting scanned image pdf documents to a series of jpeg or png files using pdf2image and Poppler.

  • Converting bmpgifjp2jpegpngpnmtiftiff or webp type documents to pdf format using Tesseract OCR.

  • Converting csvdocxepubhtmlodtrst or rtf type documents to pdf format using Pandoc and TeX Live.

Natural Language Processing (NLP)

index_rahman_finin
index_rahman_finin

press to zoom
architecture_preprocessor
architecture_preprocessor

press to zoom
developing_data_model_dbt_document_erd
developing_data_model_dbt_document_erd

press to zoom
index_rahman_finin
index_rahman_finin

press to zoom
1/4