Architecture Design ==================== Overview -------- MIMI's architecture is designed around efficient mass spectrometry data processing and analysis. The system follows a modular design pattern with clear separation of concerns between data handling, processing, and analysis components. .. image:: _static/Fig1_MIMI_architecture.png :width: 600 :align: center :alt: MIMI Architecture Overview Detailed Flow Diagram ---------------------- .. image:: _static/Fig2_MIMI_workflow_detail.png :width: 600 :align: center :alt: MIMI Detailed Flow Diagram Core Components --------------- 1. Data Input Layer ~~~~~~~~~~~~~~~~~~~ - **Sample Processor**: Handles mass spectrometry data input from ASC files - **Database Connectors**: Interfaces with KEGG and HMDB metabolite databases - **Cache Manager**: Manages binary cache files for efficient data retrieval 2. Processing Engine ~~~~~~~~~~~~~~~~~~~~ - **Formula Parser**: Parses chemical formulas into atomic components - **Mass Calculator**: Computes molecular masses with ionization adjustments - **Isotope Handler**: Manages isotope patterns and abundance calculations - **PPM Calculator**: Handles mass tolerance calculations for matching 3. Analysis Layer ~~~~~~~~~~~~~~~~~ - **Pattern Matcher**: Matches mass patterns between samples and reference data - **Isotope Validator**: Validates isotope patterns for compound identification - **Result Formatter**: Organizes and formats analysis results Data Flow --------- 1. **Preprocessing Phase** - Chemical formulas are parsed from databases (KEGG, HMDB) - Molecular masses are calculated for each compound - Isotope variants are computed and stored in cache files 2. **Analysis Phase** - Mass spectrometry data is loaded from sample files - Sample masses are compared against cached molecular masses - Matches are verified using isotope patterns - Results are filtered based on PPM tolerance settings 3. **Output Phase** - Matched compounds are organized by confidence level - Results are formatted into tabular output - Detailed information is provided for each match Key Design Principles --------------------- 1. **Efficiency**: Hash-based indexing for fast mass lookups 2. **Flexibility**: Support for multiple ionization modes and isotope labeling 3. **Precision**: PPM-based matching for high accuracy identification 4. **Scalability**: Batch processing capabilities for multiple samples 5. **Modularity**: Clear separation between components for maintainability Implementation Details ---------------------- - **Atom Module**: Handles atomic data and isotope information - **Molecule Module**: Processes molecular formulas and calculates masses - **Analysis Module**: Coordinates the analysis workflow - **Cache Creation**: Precomputes molecular data for faster analysis - **Database Connectors**: Extract compound information from external sources