Load And Query a Repo
flowchart LR
sense --> repo
repo --> sense
classDef green fill:#b2f2bb;
class sense green
A repo is collection of files located at a root folder. The RepoService provides the following abilities:
- Returns the set of all root folders.
- Returns all files within each root folder.
- Enables prompting filtering by repo.
Example Code Walkthrough
The full code is available in the source code at /engramic/examples/repo/repo.py
.
You can download the files for this exercise at https://www.engramic.org/assets-page
Let's walk through how this example works step-by-step:
1. Setting Up the Environment
The example code creates a TestService
class that demonstrates how to:
- Scan and discover repository folders
- Submit documents for processing
- Listen for repository and document events
- Query the system with repository filtering
2. Initializing Required Services
def main() -> None:
host = Host(
'standard',
[
MessageService,
SenseService,
RetrieveService,
ResponseService,
StorageService,
ConsolidateService,
CodifyService,
RepoService, #<-- This is the key service for repository functionality
ProgressService,
TestService,
],
)
This code sets up all necessary services, with RepoService
being essential for the repository functionality.
3. Repository Discovery Process
# In TestService.start():
def start(self):
super().start()
self.subscribe(Service.Topic.MAIN_PROMPT_COMPLETE, self.on_main_prompt_complete)
self.subscribe(Service.Topic.REPO_FOLDERS, self._on_repo_folders)
self.subscribe(Service.Topic.REPO_FILES, self._on_repo_files)
self.subscribe(Service.Topic.DOCUMENT_INSERTED, self.on_document_inserted)
repo_service = self.host.get_service(RepoService)
repo_service.scan_folders()
self.run_task(self.submit_documents())
This code:
- Subscribes to repository-related events
- Gets a reference to the RepoService
- Initiates a scan of repository folders
- Launches a task to submit documents for processing
4. Repository Event Handling
The TestService subscribes to key events related to repositories:
REPO_FOLDERS
: Triggered when repository folders are discoveredREPO_FILES
: Triggered after folders are discovered, providing file informationDOCUMENT_INSERTED
: Triggered when a document has been processedMAIN_PROMPT_COMPLETE
: Triggered when a response to a prompt is ready
5. Processing Repository Folders
def _on_repo_folders(self, message_in: dict[str, Any]) -> None:
if message_in['repo_folders'] is not None:
self.repos = message_in['repo_folders']
self.repo_id1 = next(
(key for key, value in self.repos.items() if value['name'] == 'QuantumNetworking'), None
)
self.repo_id2 = next((key for key, value in self.repos.items() if value['name'] == 'ElysianFields'), None)
else:
logging.info('No repos found. You can add a repo by adding a folder to home/.local/share/engramic')
This code:
- Stores the discovered repository folders
- Looks up specific repository IDs by their folder names (accessing the 'name' key from the repository dictionary)
- Provides feedback if no repositories are found
6. Processing Repository Files
def _on_repo_files(self, message_in: dict[str, Any]) -> None:
# Only logging information about the files
logging.info('Repo: %s, Files received: %d', message_in['repo'], len(message_in['files']))
for file in message_in['files']:
status = 'previously scanned' if file['percent_complete_document'] else 'unscanned'
logging.info('File: %s - %s', file['file_name'], status)
This code:
- Logs information about the repository and number of files found
- Iterates through each file and checks its scanning status using the 'percent_complete_document' field
- Logs each file's name and whether it has been previously scanned
7. Document Submission Process
async def submit_documents(self) -> None:
repo_service = self.host.get_service(RepoService)
self.document_id1 = '97a1ae1b8461076cdc679d6e0a5f885e' # 'IntroductiontoQuantumNetworking.pdf'
self.document_id2 = '9c9f0237620b77fa69e2ca63e40a9f27' # 'Elysian_Fields.pdf'
repo_service.submit_ids([self.document_id1], overwrite=True)
repo_service.submit_ids([self.document_id2])
This code:
- Defines document IDs to be processed
- Submits the first document with overwrite=True to force reprocessing
- Submits the second document without overwrite, using cached version if available
8. Querying with Repository Filters
When documents are fully processed (DOCUMENT_INSERTED
events), the code sends queries with different repository filters:
def on_document_inserted(self, message_in: dict[str, Any]) -> None:
document_id = message_in['id']
if document_id in {self.document_id1, self.document_id2}:
self.count += 1
if self.count == TestService.DOCUMENT_COUNT:
retrieve_service = self.host.get_service(RetrieveService)
prompt1 = Prompt(
'This is prompt 1. Briefly tell me about IntroductiontoQuantumNetworking.pdf and Elysian_Fields.pdf. Start with prompt number.',
repo_ids_filters=[self.repo_id1, self.repo_id2],
)
retrieve_service.submit(prompt1)
prompt2 = Prompt(
'This is prompt 2. Briefly tell me about IntroductiontoQuantumNetworking.pdf and Elysian_Fields.pdf. Start with prompt number.',
repo_ids_filters=[self.repo_id1],
)
retrieve_service.submit(prompt2)
prompt3 = Prompt(
'This is prompt 3. Briefly tell me about IntroductiontoQuantumNetworking.pdf and Elysian_Fields.pdf. Start with prompt number.',
repo_ids_filters=None,
)
retrieve_service.submit(prompt3)
This code:
- Tracks when all documents are processed
- Creates three different prompts with varying repository filters:
- Prompt 1: Filters by both repositories
- Prompt 2: Filters by only the first repository
- Prompt 3: Uses no repository filtering (null repo)
- Submits the prompts to the RetrieveService
Repository Filtering Options
Access Multiple Repositories
This prompt will access files from both repo_id1 and repo_id2.
prompt1 = Prompt(
'This is prompt 1. Tell me about document content.',
repo_ids_filters=[self.repo_id1, self.repo_id2],
)
Filter by a Single Repository
This prompt will only access files from repo_id1.
prompt2 = Prompt(
'This is prompt 2. Tell me about document content.',
repo_ids_filters=[self.repo_id1],
)
The 'Null' Repository
Documents are usable without using the repo system. In this case, the file is associated with a default repo known as the 'null' repo.
prompt3 = Prompt(
'This is prompt 3. Tell me about document content.',
repo_ids_filters=None,
)
Note: If you have previously run the document example, you may have IntroductiontoQuantumNetworking.pdf in the 'null' repo. To remove it you can delete the database (you can simply delete the sql lite file or run the following hatch command).
hatch shell dev
hatch run delete_dbs
Empty List (invalid)
❌ Invalid - Empty list repo filters are not allowed and will throw an error.
# The following would throw an exception
# Prompt('This is prompt 4. Tell me about document content.', repo_ids_filters=[])
Important Considerations
- In order to run the code, you must add the REPO_ROOT environment variable. For example, you can set repo root to a path like the following:
REPO_ROOT = "~/.local/share/engramic/"
. - For security purposes, there is no ability to access all repos without explicitly providing each repo id as a filter.
- If you load a document under a repo, it is forever in that repo. For example, if you first scan a document under the 'null' repo and then scan that document as part of a repo it will be detected and not scanned. Because of this, when you query the repo, it will not display results as part of the repo because it is part of the 'null' repo.