Synthesis connected Pipeline

Overview

This pipeline represents the final architecture of the synthesis system, it contains multiple stages fron extraction, analyzing, and synthesizing the content of RL papers. The system combines OCR, computer vision (VLM ), natural language processing, and graph analysis to create comprehensive paper summaries and knowledge representations.

Research Paper Analysis Pipeline Diagram — Pipeline architecture showing data flow from PDF upload to final visualization

Pipeline Components

OCR Processing (ocr.py)

The OCR module handles the initial document processing:

Converts PDF documents to images using pdf2image
Uses Tesseract OCR for text extraction
Implements layout analysis using LayoutParser with the PubLayNet model
Detects and extracts tables and figures from the document
Outputs: JSON file containing page text, table locations, and figure locations

Configuration requirements:

Tesseract OCR installation
Poppler installation for PDF processing
PubLayNet model for layout detection

2. Vision Analysis (vision.py) The vision module analyzes extracted visual elements:

Uses NVIDIA AI models for image analysis
The model used is “microsoft/phi-3.5-vision-instruct”
Processes both tables and figures
Extracts context from surrounding text
Generates descriptions for visual elements
Outputs: Enhanced JSON with visual element descriptions

API Dependencies:

NVIDIA AI API access
Base URL: https://integrate.api.nvidia.com/v1

Content Processing (processor.py)

The processor module handles the semantic analysis:

Analyzes media items for relevance
Creates synthesis plans
Uses LLaMA model for content analysis
Determines essential visual elements
Outputs: Structured JSON with analysis results

Synthesis Generation (synthesis.py)

The synthesis module creates the final paper summary:

Generates section-by-section content Integrates visual elements Creates a cohesive narrative Outputs: Markdown file with complete synthesis

Knowledge Graph Analysis (knowledge_graph.py)

The knowledge graph module creates semantic representations:

Extracts key concepts and relationships
Creates graph visualizations
Uses NetworkX for graph processing
Outputs: JSON files with graph data and PNG visualization

Integration

Data Flow

PDF → OCR Results → Vision Analysis → Processing → Synthesis
Knowledge graph generation occurs in parallel after processing

File Formats

Input: PDF files
Intermediate: JSON files for data exchange
Output:
- Markdown synthesis
- JSON knowledge graph data
- PNG graph visualizations

Deployment

Dependencies

pip install -r r.txt

Required external software:

Tesseract OCR
Poppler
Python 3.8+

API Configuration

Set the following environment variables:

NVIDIA_API_KEY

Other API keys as needed

Usage

Web Interface

Quick Analysis Interface (app.py)

streamlit run app.py

This interface provides:

Summarized synthesis of the paper
Extraction of key entities and relationships
Interactive knowledge graph visualization
Entity and relationship downloads
Streamlined analysis process

App Interface Upload Screen — Initial upload and document synthesis of the quick analysis interface

App Interface Synthesis View — Paper synthesis and entity extraction view

App Interface Knowledge Graph — RL Paper to knowledge graph visualization

the graph can be downloaded, as png and as json (entities.json & relationships.json)

Comprehensive Analysis Interface (main.py)

streamlit run main.py

This interface provides:

Detailed, comprehensive synthesis

Step-by-step pipeline visualization

Full OCR and vision analysis results

In-depth figure and table analysis

Complete processing results and artifacts

Knowledge Graph

Customization

The pipeline can be customized through:

Model selection in vision analysis
Synthesis templates
Knowledge graph parameters
Output format modifications