Command Reference ( extract mode )
List of extract mode commands.
Delete extraction configuration. : cmdbox -m extract -c del <Option>
Deletes extraction configuration.
Option |
Required |
Description |
|---|---|---|
--host <IP address or host name> |
Specify the service host of the Redis server. |
|
--port <port number> |
Specify the service port of the Redis server. |
|
--password <password> |
Specify the access password of the Redis server (optional). If omitted, password is used. |
|
--svname <Service Name> |
Specify the service name of the inference server. |
|
--extract_name <name> |
Yes |
Specify the name of the extraction configuration to delete. |
--retry_count <Number of retries> |
Specifies the number of reconnections to the Redis server. If less than 0 is specified, reconnection is forever. |
|
--retry_interval <Retry Interval> |
Specifies the number of seconds before reconnecting to the Redis server. |
|
--timeout <time-out> |
Specify the maximum waiting time until the server responds. |
List extraction configuration. : cmdbox -m extract -c list <Option>
Lists saved extraction configurations.
Option |
Required |
Description |
|---|---|---|
--host <IP address or host name> |
Specify the service host of the Redis server. |
|
--port <port number> |
Specify the service port of the Redis server. |
|
--password <password> |
Specify the access password of the Redis server (optional). If omitted, password is used. |
|
--svname <Service Name> |
Specify the service name of the inference server. If omitted, server is used. |
|
--kwd <keyword> |
Partial match filter for extraction configuration names. |
|
--retry_count <Number of retries> |
Specifies the number of reconnections to the Redis server. If less than 0 is specified, reconnection is forever. |
|
--retry_interval <Retry Interval> |
Specifies the number of seconds before reconnecting to the Redis server. |
|
--timeout <time-out> |
Specify the maximum waiting time until the server responds. |
Load extraction configuration. : cmdbox -m extract -c load <Option>
Loads extraction configuration from the specified file.
Option |
Required |
Description |
|---|---|---|
--host <IP address or host name> |
Specify the service host of the Redis server. |
|
--port <port number> |
Specify the service port of the Redis server. |
|
--password <password> |
Specify the access password of the Redis server (optional). If omitted, password is used. |
|
--svname <Service Name> |
Specify the service name of the inference server. |
|
--extract_name <name> |
Yes |
Specify the name of the extraction configuration to load. |
--retry_count <Number of retries> |
Specifies the number of reconnections to the Redis server. |
|
--retry_interval <Retry Interval> |
Specifies the number of seconds before reconnecting to the Redis server. |
|
--timeout <time-out> |
Specify the maximum waiting time until the server responds. |
Save extraction configuration. : cmdbox -m extract -c save <Option>
Saves settings for extracting text from the specified file.
Option |
Required |
Description |
|---|---|---|
--host <IP address or host name> |
Specify the service host of the Redis server. |
|
--port <port number> |
Specify the service port of the Redis server. |
|
--password <password> |
Specify the access password of the Redis server (optional). If omitted, password is used. |
|
--svname <Service Name> |
Specify the service name of the inference server. |
|
--extract_name <name> |
Yes |
Specify the name of the extraction configuration. |
--extract_cmd <command name> |
Yes |
Specify the name of the extraction command setting. |
--extract_type <type> |
Yes |
Specify the type of extraction. Available values: file. |
--scope <scope> |
Specify the reference scope. Available scopes: client, server. |
|
--client_data <path> |
Specify the path of the data folder when local is referenced. |
|
--loadpath <path> |
Yes |
Specify the source path. |
--loadregs <pattern> |
Yes |
Specifies a load regular expression pattern. |
--retry_count <Number of retries> |
Specifies the number of reconnections to the Redis server. |
|
--retry_interval <Retry Interval> |
Specifies the number of seconds before reconnecting to the Redis server. |
|
--timeout <time-out> |
Specify the maximum waiting time until the server responds. |
Extract text using pdfplumber. : cmdbox -m extract -c pdfplumber <Option>
Extracts text from the specified PDF document file using pdfplumber.
Option |
Required |
Description |
|---|---|---|
--host <IP address or host name> |
Specify the service host of the Redis server. |
|
--port <port number> |
Specify the service port of the Redis server. |
|
--password <password> |
Specify the access password of the Redis server (optional). If omitted, password is used. |
|
--svname <Service Name> |
Specify the service name of the inference server. |
|
--scope <scope> |
Specify the reference scope. Available scopes: client, current, server. |
|
--loadpath <path> |
Yes |
Specify the source file path. |
--client_data <path> |
Specify the path of the data folder when local is referenced. |
|
--chunk_table <method> |
Specifies how to chunk tables in the PDF file. Available values: none, table, row_with_header. Default: table. |
|
--chunk_table_header <name> |
Replaces existing header items by specifying the names of table header items (multiple values allowed). |
|
--chunk_exclude <pattern> |
A regular expression specifying a string that should not be included in the chunk (multiple values allowed). |
|
--chunk_size <size> |
Specifies the chunk size. Default: 1000. |
|
--chunk_overlap <size> |
Specifies the overlap size of the chunk. Default: 50. |
|
--chunk_separator <separator> |
Specifies the delimiter character for chunking (multiple values allowed). |
|
--chunk_spage <page> |
Specifies the starting page of the embedding range. Default: 0. |
|
--chunk_epage <page> |
Specifies the ending page of the embedding range. Default: 9999. |
|
--retry_count <Number of retries> |
Specifies the number of reconnections to the Redis server. |
|
--retry_interval <Retry Interval> |
Specifies the number of seconds before reconnecting to the Redis server. |
|
--timeout <time-out> |
Specify the maximum waiting time until the server responds. |
Extract text using chunklet. : cmdbox -m extract -c chunklet <Option>
Extracts text from the specified document file using chunklet.
Option |
Required |
Description |
|---|---|---|
--host <IP address or host name> |
Specify the service host of the Redis server. |
|
--port <port number> |
Specify the service port of the Redis server. |
|
--password <password> |
Specify the access password of the Redis server (optional). If omitted, password is used. |
|
--svname <Service Name> |
Specify the service name of the inference server. |
|
--scope <scope> |
Specify the reference scope. Available scopes: client, current, server. |
|
--loadpath <path> |
Yes |
Specify the source file path. |
--client_data <path> |
Specify the path of the data folder when local is referenced. |
|
--chunk_lang <language> |
Specify the language of the text to be chunked. Available values: auto, ja, en. Default: auto. |
|
--chunk_max_sentences <count> |
Specify the maximum number of sentences (not characters) for chunking text. Default: 4. |
|
--chunk_overlap_percent <percent> |
Specifies the overlap percentage of the chunk. Default: 20. |
|
--retry_count <Number of retries> |
Specifies the number of reconnections to the Redis server. |
|
--retry_interval <Retry Interval> |
Specifies the number of seconds before reconnecting to the Redis server. |
|
--timeout <time-out> |
Specify the maximum waiting time until the server responds. |