Extract Contents
Extract text content from book files (EPUB, PDF) as structured sections. Automatically uses AI to identify chapter boundaries in large documents.
How It Works
For EPUB files, the task parses the document structure to extract individual sections with their titles and content. Each section is automatically classified with a section type (e.g., “titlepage”, “dedication”, “chapter”, “epilogue”, “glossary”) when the EPUB includes structural metadata. Sections are also marked as front matter, body, or back matter based on their type. The section type and front matter status are displayed in the contents viewer.
For PDF files, AI is used to detect chapter boundaries within the continuous text.
AI Section Classification
Some EPUB files don’t include the structural metadata needed to automatically identify section types. In those cases, AI determines what each section is — whether it’s a chapter, dedication, epilogue, acknowledgements, and so on. Sections that are already identified from the file’s metadata are left as-is. This runs automatically and is included in the base cost.
When to Use
Use this task early in a pipeline to convert book files into the StructuredText format required by most AI analysis tasks.
Reference
Book file (EPUB or PDF) to extract content from.
Sections with titles and text content.