science_live.pipeline.question_processor#

Science Live Pipeline: Question Processing#

First step of the pipeline that parses and preprocesses natural language questions.

Responsibilities:

  • Clean and normalize input text

  • Classify question type (what, who, where, etc.)

  • Extract key phrases and potential entities

  • Assess intent confidence

Author: Science Live Team Version: 1.0.0

Module Contents#

Classes#

QuestionProcessor

Parse and preprocess natural language questions.

Functions#

is_valid_question

Check if a question is valid for processing

preprocess_question_batch

Preprocess a batch of questions

Data#

API#

science_live.pipeline.question_processor.__all__ = ['QuestionProcessor']#
class science_live.pipeline.question_processor.QuestionProcessor(config: Dict[str, Any] = None)[source]#

Bases: science_live.pipeline.common.PipelineStep

Parse and preprocess natural language questions.

This is the first step in the pipeline that takes raw natural language questions and prepares them for entity extraction and further processing.

Features:

  • Question type classification

  • Text cleaning and normalization

  • Key phrase extraction

  • Potential entity identification

  • Intent confidence assessment

Initialization

_initialize_patterns() Dict[str, List[str]][source]#

Initialize question type classification patterns

_initialize_stop_words() set[source]#

Initialize stop words for key phrase extraction

async process(question: str, context: science_live.pipeline.common.ProcessingContext) science_live.pipeline.common.ProcessedQuestion[source]#

Process natural language question.

Args: question: Raw natural language question context: Processing context with user info and preferences

Returns: ProcessedQuestion with classified and preprocessed information

Raises: ValueError: If question is empty or invalid

_clean_question(question: str) str[source]#

Clean and normalize the question text

_is_interrogative(text: str) bool[source]#

Check if text is an interrogative sentence

_classify_question_type(question: str) Tuple[science_live.pipeline.common.QuestionType, float][source]#

Classify the type of question and assess confidence

_extract_key_phrases(question: str) List[str][source]#

Extract key phrases from the question

_identify_potential_entities(question: str) List[str][source]#

Identify potential entities in the question

get_question_complexity(processed_question: science_live.pipeline.common.ProcessedQuestion) int[source]#

Assess question complexity on a 1-5 scale.

Args: processed_question: The processed question to analyze

Returns: Complexity score from 1 (simple) to 5 (very complex)

suggest_improvements(processed_question: science_live.pipeline.common.ProcessedQuestion) List[str][source]#

Suggest improvements to make the question more processable.

Args: processed_question: The processed question to analyze

Returns: List of suggestion strings

science_live.pipeline.question_processor.is_valid_question(question: str) bool[source]#

Check if a question is valid for processing

science_live.pipeline.question_processor.preprocess_question_batch(questions: List[str]) List[str][source]#

Preprocess a batch of questions

science_live.pipeline.question_processor.__version__ = '1.0.0'#
science_live.pipeline.question_processor.__author__ = 'Science Live Team'#
science_live.pipeline.question_processor.__description__ = 'Question processing step for Science Live pipeline'#