Module Description
Synopsis
This module adds a new computed field on File entity: "File extractor: extracted file".
This new field allows to access the content of the file:
* in webservices like JSON:API
* in a field formatter (file field)
* in Search API
The module provides the following extraction methods:
* Docconv binary
* Pdftotext binary
* Python Pdf2txt binary
* Solr built-in extractor (Search API Solr)
* Tika App JAR
* Tika Server JAR
History This project is a fork of Search API Attachments. More information on the module origins on: #3126845: Version 2.0.0
Requirements
Each extractor plugin can require different modules/libraries, if the requirements are not satisfied the plugin doesn't show up in the settings.
Each extractor plugin can require different binary on your server, when configuring the extraction, a test will be done to see if the extraction works. Also you can read the module documentation to see installation instructions for extractor plugins.
Installation
Starting from version 4.0.0, Composer 2 is required to install the module with Composer.
Configuration
* Enable the File Extractor module on your site.
* Go to the configuration page (/admin/config/media/file-extractor) and configure the extraction settings.
Starting from version 3.0.0, the module provides its own cache bin 'file_extractor', so in your settings.php file you can override the cache backend for this cache bin. For example if you want to use the File Cache module:
$settings['cache']['bins']['file_extractor'] = 'cache.backend.file_system'; Maintainers
* Florent Torregrosa (Grimreaper)
This module adds a new computed field on File entity: "File extractor: extracted file".
This new field allows to access the content of the file:
* in webservices like JSON:API
* in a field formatter (file field)
* in Search API
The module provides the following extraction methods:
* Docconv binary
* Pdftotext binary
* Python Pdf2txt binary
* Solr built-in extractor (Search API Solr)
* Tika App JAR
* Tika Server JAR
History This project is a fork of Search API Attachments. More information on the module origins on: #3126845: Version 2.0.0
Requirements
Each extractor plugin can require different modules/libraries, if the requirements are not satisfied the plugin doesn't show up in the settings.
Each extractor plugin can require different binary on your server, when configuring the extraction, a test will be done to see if the extraction works. Also you can read the module documentation to see installation instructions for extractor plugins.
Installation
Starting from version 4.0.0, Composer 2 is required to install the module with Composer.
Configuration
* Enable the File Extractor module on your site.
* Go to the configuration page (/admin/config/media/file-extractor) and configure the extraction settings.
Starting from version 3.0.0, the module provides its own cache bin 'file_extractor', so in your settings.php file you can override the cache backend for this cache bin. For example if you want to use the File Cache module:
$settings['cache']['bins']['file_extractor'] = 'cache.backend.file_system'; Maintainers
* Florent Torregrosa (Grimreaper)
Module Link
Project Usage
211
Security Covered
Covered By Security Advisory
Version Available
Production
Module Summary
This module adds a new computed field on File entity for accessing file content in various ways, using different extraction methods like Docconv binary, Pdftotext binary, Python Pdf2txt binary, Solr built-in extractor, Tika App JAR, and Tika Server JAR.
Data Name
file_extractor