science_live.pipeline.question_processor

`science_live.pipeline.question_processor`#

Science Live Pipeline: Question Processing#

First step of the pipeline that parses and preprocesses natural language questions.

Responsibilities:

Clean and normalize input text
Classify question type (what, who, where, etc.)
Extract key phrases and potential entities
Assess intent confidence

Author: Science Live Team Version: 1.0.0

Module Contents#

Classes#

QuestionProcessor

Parse and preprocess natural language questions.

Functions#

`is_valid_question`	Check if a question is valid for processing
`preprocess_question_batch`	Preprocess a batch of questions

Data#

`__all__`
`__version__`
`__author__`
`__description__`

API#

science_live.pipeline.question_processor.__all__ = ['QuestionProcessor']#

class science_live.pipeline.question_processor.QuestionProcessor(config: Dict[str, Any] = None)[source]#

Bases: science_live.pipeline.common.PipelineStep

Parse and preprocess natural language questions.

This is the first step in the pipeline that takes raw natural language questions and prepares them for entity extraction and further processing.

Features:

Question type classification
Text cleaning and normalization
Key phrase extraction
Potential entity identification
Intent confidence assessment

Initialization

_initialize_patterns() → Dict[str, List[str]][source]#: Initialize question type classification patterns

_initialize_stop_words() → set[source]#: Initialize stop words for key phrase extraction

async process(question: str, context: science_live.pipeline.common.ProcessingContext) → science_live.pipeline.common.ProcessedQuestion[source]#

Process natural language question.

Args: question: Raw natural language question context: Processing context with user info and preferences

Returns: ProcessedQuestion with classified and preprocessed information

Raises: ValueError: If question is empty or invalid

_clean_question(question: str) → str[source]#: Clean and normalize the question text

_is_interrogative(text: str) → bool[source]#: Check if text is an interrogative sentence

_classify_question_type(question: str) → Tuple[science_live.pipeline.common.QuestionType, float][source]#: Classify the type of question and assess confidence

_extract_key_phrases(question: str) → List[str][source]#: Extract key phrases from the question

_identify_potential_entities(question: str) → List[str][source]#: Identify potential entities in the question

get_question_complexity(processed_question: science_live.pipeline.common.ProcessedQuestion) → int[source]#

Assess question complexity on a 1-5 scale.

Args: processed_question: The processed question to analyze

Returns: Complexity score from 1 (simple) to 5 (very complex)

suggest_improvements(processed_question: science_live.pipeline.common.ProcessedQuestion) → List[str][source]#

Suggest improvements to make the question more processable.