science_live.pipeline.entity_extractor
#
EntityExtractorLinker - Extract and link entities to URIs (FIXED FINAL VERSION)
Module Contents#
Classes#
Extract and link entities to URIs with proper filtering and punctuation handling |
API#
- class science_live.pipeline.entity_extractor.EntityExtractorLinker(endpoint_manager, config: Dict[str, Any] = None)[source]#
Extract and link entities to URIs with proper filtering and punctuation handling
Initialization
- _initialize_function_words() set [source]#
Initialize function words that are definitely not entities
- async extract_and_link(processed_question: science_live.pipeline.common.ProcessedQuestion, context: science_live.pipeline.common.ProcessingContext) science_live.pipeline.common.LinkedEntities [source]#
Extract and link entities from processed question
- async _extract_entities(processed_question: science_live.pipeline.common.ProcessedQuestion) List[science_live.pipeline.common.ExtractedEntity] [source]#
Extract entities with type classification and punctuation cleaning
- _extract_parenthetical_examples(text: str) List[science_live.pipeline.common.ExtractedEntity] [source]#
Extract examples from parentheses
- _extract_clean_noun_phrases(text: str) List[science_live.pipeline.common.ExtractedEntity] [source]#
Extract clean noun phrases with proper boundaries
- _extract_meaningful_words(text: str) List[science_live.pipeline.common.ExtractedEntity] [source]#
Extract meaningful single words
- _clean_phrase_boundaries(phrase: str) str [source]#
Clean phrase boundaries by removing function words at edges
- _clean_and_filter_entities(entities: List[science_live.pipeline.common.ExtractedEntity]) List[science_live.pipeline.common.ExtractedEntity] [source]#
Remove duplicates, overlaps and filter low-quality entities
- _entities_overlap(entity1: science_live.pipeline.common.ExtractedEntity, entity2: science_live.pipeline.common.ExtractedEntity) bool [source]#
Check if two entities overlap in text position
- async _link_entities(entities: List[science_live.pipeline.common.ExtractedEntity]) List[science_live.pipeline.common.ExtractedEntity] [source]#
Link entities to URIs
- async _link_via_external_services(text: str, entity_type: science_live.pipeline.common.EntityType) Optional[Dict[str, Any]] [source]#
Link entity via external services
- _classify_entities(entities: List[science_live.pipeline.common.ExtractedEntity], processed_question: science_live.pipeline.common.ProcessedQuestion) Tuple[List[science_live.pipeline.common.ExtractedEntity], List[science_live.pipeline.common.ExtractedEntity]] [source]#
Classify entities as potential subjects or objects
- _calculate_linking_confidence(entities: List[science_live.pipeline.common.ExtractedEntity]) float [source]#
Calculate overall linking confidence