Documents

note: Engramic currently only supports PDFs

flowchart TD
    %% Define node styles
    classDef process fill:#f9f9f9,stroke:#333,stroke-width:1px,rounded:true
    classDef io fill:#e8f4ff,stroke:#4a86e8,stroke-width:1px,rounded:true
    classDef external fill:#f0fff0,stroke:#2d862d,stroke-width:1px,rounded:true

    %% Input and external processes
    prompt([User Prompt]):::io
    stream([User Stream]):::io
    sense[Sense]:::external

    respond --> stream

    %% Core processes in learning loop
    subgraph "Engramic Learning Loop"
      direction RL
      consolidate[Consolidate]:::process
      retrieve[Retrieve]:::process
      respond[Respond]:::process
      codify[Codify]:::process

      consolidate --> retrieve
      retrieve --> respond
      respond --> codify
      codify --> consolidate
    end

    %% External connections
    prompt --> retrieve
    sense --> consolidate

PDF parsing is part of the sense service. When a document is parsed, it is sent to the consolidate service where it is processed and passed to retrieval for storing in a vector database and to response if it is matched semantically.

Example Code Walkthrough

The full code is available in the source code at /engramic/examples/document/document.py. You can download the files for this exercise at https://www.engramic.org/assets-page

Let's walk through how this example works step-by-step:

1. Setting Up the Environment

The example code creates a TestService class that demonstrates how to:

Submit a document for processing
Listen for document processing completion
Query the system about the processed document

2. Document Submission Process

# In TestService.start():
sense_service = self.host.get_service(SenseService)
document = Document(
    Document.Root.RESOURCE.value, 'engramic.resources.rag_document', 'IntroductiontoQuantumNetworking.pdf'
)
self.document_id = document.id
sense_service.submit_document(document)

This code:

Gets a reference to the SenseService
Creates a Document object using a PDF from the resources directory
Saves the document ID for later reference
Submits the document to the SenseService for processing

3. Document Processing Flow

When a document is submitted, the following happens:

Sense Service

Convert PDF page to PNGs
Extract meta data from first few pages
Convert from image into annotated text
Summarize annotated text for Meta object
Parse from annotated text into Engrams
Package into an observation (Meta + Engrams)

Event Handling

The TestService subscribes to two key events:

self.subscribe(Service.Topic.MAIN_PROMPT_COMPLETE, self.on_main_prompt_complete)
self.subscribe(Service.Topic.DOCUMENT_INSERTED, self.on_document_inserted)

DOCUMENT_INSERTED: Triggered when document processing is complete
MAIN_PROMPT_COMPLETE: Triggered when a response to a prompt is ready

4. Querying the Document

When the document is fully processed (DOCUMENT_INSERTED event), the code automatically sends a query:

def on_document_inserted(self, message_in: dict[str, Any]) -> None:
    document_id = message_in['id']
    if self.document_id == document_id:
        retrieve_service = self.host.get_service(RetrieveService)
        prompt = Prompt('Do you have any files about quantum networking? What is it about?')
        retrieve_service.submit(prompt)

This:

Checks if the completed document is the one we submitted
Gets a reference to the RetrieveService
Creates a prompt asking about quantum networking
Submits the prompt to the RetrieveService

5. Handling the Response

When the response is ready (MAIN_PROMPT_COMPLETE event), the code logs it:

def on_main_prompt_complete(self, message_in: dict[str, Any]) -> None:
    response = Response(**message_in)
    logging.info('\n\n================[Response]==============\n%s\n\n', response.response)

Document Submission Options

To submit a document for processing, you can use the submit_document method from the SenseService (as shown in the example) or via the Document.Topic.SUBMIT_DOCUMENT message.

When submitting documents that may have been processed before, you can use the overwrite parameter to force reprocessing:

# Submit multiple documents with overwrite option
repo_service.submit_ids([document_id1], overwrite=True)

# Submit without overwrite (uses cached version if available)
repo_service.submit_ids([document_id2])

Loading From Data Directory

In the example above, the code is referencing a file saved in the resources directory, which is packaged with the distribution (src/engramic/resources). If you would like to load a file that isn't a resource, you can pass Document.Root.DATA.value to the first parameter of Document which will set a base directory to the REPO_ROOT environment variable.

# Loading from local data directory
document = Document(
    Document.Root.DATA.value,
    '/path/to/document/folder',
    'document.pdf'
)

Example of setting REPO_ROOT environment variable.

REPO_ROOT = "~/.local/share/engramic/"