Command Reference ( extract mode )

List of extract mode commands.

extract ( chunklet ) : `cmdbox -m extract -c chunklet <Option>`

Extracts text from the specified document file.

Option	Type	Multi	Required	Default	Choices	Description
--host <host>	str		required	localhost		Specify the service host of the Redis server.
--port <port>	int		required	6379		Specify the service port of the Redis server.
--password <password>	passwd		required	password		Specify the access password of the Redis server (optional). If omitted, password is used.
--svname <svname>	str		required	cmdbox		Specify the service name of the inference server.
--scope <scope>	str		required	current	client \| current \| server	Specifies the scope to be referenced. When omitted, 'client' is used.
--fwpath <fwpath>	file	multi	required			Specify the path to determine whether the specified path is out of bounds. If it is not under this path, it will result in an error.
--loadpath <loadpath>	file		required			Specify the source file path.
--client_data <client_data>	str					Specify the path of the data folder when local is referenced.
--chunk_lang <chunk_lang>	str			auto	auto \| ja \| en	Specify the language of the text to be chunked. If auto is specified, the language will be automatically detected.
--chunk_max_token_counter <chunk_max_token_counter>	str			gpt-4o		Specify the maximum number of tokens for chunking text.
--chunk_max_tokens <chunk_max_tokens>	int			1024		Specify the maximum number of tokens for chunking text.
--chunk_max_sentences <chunk_max_sentences>	int			4		Specify the maximum number of sentences (not characters) for chunking text.
--chunk_overlap_percent <chunk_overlap_percent>	int			20		Specifies the overlap percentage of the chunk.
--retry_count <retry_count>	int			3		Specifies the number of reconnections to the Redis server.
--retry_interval <retry_interval>	int			5		Specifies the number of seconds before reconnecting to the Redis server.
--timeout <timeout>	int			120		Specify the maximum waiting time until the server responds.

Output Schema

This command implements output_schema() returning Result model.

{
  "success": {
    "performance": [
      {
        "key": "string",
        "value": null
      }
    ],
    "file": "<class 'pathlib.Path'>",
    "data": [
      {
        "content": "string",
        "metadata": {}
      }
    ]
  },
  "warn": {},
  "error": {},
  "schema": {},
  "end": false
}

Field	Type	Required	Default	Description
success	Data \| null	no	null	成功した場合の結果
success.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
success.file	Path \| str \| null	no	null	ファイルパス
success.data	list[ContentRecord] \| null	no	null	処理結果のデータ
warn	dict[str, any] \| Data \| str \| bool \| null	no	null	警告がある場合の結果
warn.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
error	dict[str, any] \| Data \| str \| bool \| null	no	null	エラーがある場合の結果
error.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
schema	dict[str, any] \| null	no	null	スキーマ情報
end	bool \| null	no	null	終了フラグ

extract ( del ) : `cmdbox -m extract -c del <Option>`

Delete the extraction configuration.

Option	Type	Required	Default	Description
--host <host>	str	required	localhost	Specify the service host of the Redis server.
--port <port>	int	required	6379	Specify the service port of the Redis server.
--password <password>	passwd	required	password	Specify the access password of the Redis server (optional). If omitted, password is used.
--svname <svname>	str	required	cmdbox	Specify the service name of the inference server.
--retry_count <retry_count>	int		3	Specifies the number of reconnections to the Redis server.
--retry_interval <retry_interval>	int		5	Specifies the number of seconds before reconnecting to the Redis server.
--timeout <timeout>	int		120	Specify the maximum waiting time until the server responds.
--extract_name <extract_name>	str	required		Specify the name of the extraction configuration to delete.

Output Schema

This command implements output_schema() returning Result model.

{
  "success": {
    "performance": [
      {
        "key": "string",
        "value": null
      }
    ],
    "data": "string"
  },
  "warn": {},
  "error": {},
  "schema": {},
  "end": false
}

Field	Type	Required	Default	Description
success	Data \| null	no	null	成功した場合の結果
success.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
success.data	str \| null	no	null	処理結果のデータ
warn	dict[str, any] \| Data \| str \| bool \| null	no	null	警告がある場合の結果
warn.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
error	dict[str, any] \| Data \| str \| bool \| null	no	null	エラーがある場合の結果
error.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
schema	dict[str, any] \| null	no	null	スキーマ情報
end	bool \| null	no	null	終了フラグ

extract ( list ) : `cmdbox -m extract -c list <Option>`

Display a list of saved extraction settings.

Option	Type	Required	Default	Description
--host <host>	str	required	localhost	Specify the service host of the Redis server.
--port <port>	int	required	6379	Specify the service port of the Redis server.
--password <password>	passwd	required	password	Specify the access password of the Redis server (optional). If omitted, password is used.
--svname <svname>	str	required	cmdbox	Specify the service name of the inference server. If omitted, server is used.
--retry_count <retry_count>	int		3	Specifies the number of reconnections to the Redis server.If less than 0 is specified, reconnection is forever.
--retry_interval <retry_interval>	int		5	Specifies the number of seconds before reconnecting to the Redis server.
--timeout <timeout>	int		60	Specify the maximum waiting time until the server responds.
--kwd <kwd>	str			Specify the name you want to search for. Searches for partial matches.

Output Schema

This command implements output_schema() returning Result model.

{
  "success": {
    "performance": [
      {
        "key": "string",
        "value": null
      }
    ],
    "data": [
      {
        "name": "string",
        "path": "<class 'pathlib.Path'>"
      }
    ]
  },
  "warn": {},
  "error": {},
  "schema": {},
  "end": false
}

Field	Type	Required	Default	Description
success	Data \| null	no	null	成功した場合の結果
success.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
success.data	list[NamePath]	no	(必須)	処理結果のデータ
success.data.name	str	yes	(必須)	名前
success.data.path	Path \| str \| null	no	null	パス
warn	dict[str, any] \| Data \| str \| bool \| null	no	null	警告がある場合の結果
warn.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
error	dict[str, any] \| Data \| str \| bool \| null	no	null	エラーがある場合の結果
error.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
schema	dict[str, any] \| null	no	null	スキーマ情報
end	bool \| null	no	null	終了フラグ

extract ( load ) : `cmdbox -m extract -c load <Option>`

Loads settings for extracting text from the specified file.

Option	Type	Required	Default	Description
--host <host>	str	required	localhost	Specify the service host of the Redis server.
--port <port>	int	required	6379	Specify the service port of the Redis server.
--password <password>	passwd	required	password	Specify the access password of the Redis server (optional). If omitted, password is used.
--svname <svname>	str	required	cmdbox	Specify the service name of the inference server.
--retry_count <retry_count>	int		3	Specifies the number of reconnections to the Redis server.
--retry_interval <retry_interval>	int		5	Specifies the number of seconds before reconnecting to the Redis server.
--timeout <timeout>	int		120	Specify the maximum waiting time until the server responds.
--extract_name <extract_name>	str	required		Specify the name of the extraction configuration to load.

Output Schema

This command implements output_schema() returning Result model.

{
  "success": {
    "performance": [
      {
        "key": "string",
        "value": null
      }
    ],
    "extract_name": "string",
    "extract_type": "string",
    "extract_cmd": "string",
    "scope": "string",
    "client_data": "string",
    "loadpath": "string",
    "loadregs": "string"
  },
  "warn": {},
  "error": {},
  "schema": {},
  "end": false
}

Field	Type	Required	Default	Description
success	Data \| null	no	null	成功した場合の結果
success.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
success.extract_name	str \| null	no	null	エクストラクト名
success.extract_type	str \| null	no	null	エクストラクトタイプ
success.extract_cmd	str \| null	no	null	エクストラクトコマンド
success.scope	str \| null	no	null	スコープ
success.client_data	str \| null	no	null	クライアントデータ
success.loadpath	str \| null	no	null	読み込みパス
success.loadregs	str \| null	no	null	読み込み正規表現
warn	dict[str, any] \| Data \| str \| bool \| null	no	null	警告がある場合の結果
warn.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
error	dict[str, any] \| Data \| str \| bool \| null	no	null	エラーがある場合の結果
error.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
schema	dict[str, any] \| null	no	null	スキーマ情報
end	bool \| null	no	null	終了フラグ

extract ( pdfplumber ) : `cmdbox -m extract -c pdfplumber <Option>`

Extracts text from the specified document file.

Option	Type	Multi	Required	Default	Choices	Description
--host <host>	str		required	localhost		Specify the service host of the Redis server.
--port <port>	int		required	6379		Specify the service port of the Redis server.
--password <password>	passwd		required	password		Specify the access password of the Redis server (optional). If omitted, password is used.
--svname <svname>	str		required	cmdbox		Specify the service name of the inference server.
--scope <scope>	str		required	current	client \| current \| server	Specifies the scope to be referenced. When omitted, 'client' is used.
--loadpath <loadpath>	file		required			Specify the source file path.
--fwpath <fwpath>	file	multi	required			Specify the path to determine whether the specified path is out of bounds. If it is not under this path, it will result in an error.
--client_data <client_data>	str					Specify the path of the data folder when local is referenced.
--chunk_table <chunk_table>	str			table	none \| table \| row_with_header	Specifies how to chunk tables in the PDF file. none :do not chunk by table, table :by table, row_with_header :by row (with header)
--chunk_table_header <chunk_table_header>	str	multi				Replaces existing header items by specifying the names of the table header items in the PDF file, from left to right.
--chunk_exclude <chunk_exclude>	str	multi				A regular expression specifying a string that should not be included in the chunk. If this specification is matched, embedding will not be performed.
--chunk_size <chunk_size>	int			1000		Specifies the chunk size.
--chunk_overlap <chunk_overlap>	int			50		Specifies the overlap size of the chunk.
--chunk_separator <chunk_separator>	str	multi				Specifies the delimiter character for chunking.
--chunk_spage <chunk_spage>	int			0		Specifies the starting page of the embedding range.
--chunk_epage <chunk_epage>	int			9999		Specifies the ending page of the embedding range.
--retry_count <retry_count>	int			3		Specifies the number of reconnections to the Redis server.
--retry_interval <retry_interval>	int			5		Specifies the number of seconds before reconnecting to the Redis server.
--timeout <timeout>	int			120		Specify the maximum waiting time until the server responds.

Output Schema

This command implements output_schema() returning Result model.

{
  "success": {
    "performance": [
      {
        "key": "string",
        "value": null
      }
    ],
    "file": "<class 'pathlib.Path'>",
    "data": [
      {
        "content": "string",
        "metadata": {}
      }
    ]
  },
  "warn": {},
  "error": {},
  "schema": {},
  "end": false
}

Field	Type	Required	Default	Description
success	Data \| null	no	null	成功した場合の結果
success.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
success.file	Path \| str \| null	no	null	ファイルパス
success.data	list[ContentRecord] \| null	no	null	処理結果のデータ
warn	dict[str, any] \| Data \| str \| bool \| null	no	null	警告がある場合の結果
warn.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
error	dict[str, any] \| Data \| str \| bool \| null	no	null	エラーがある場合の結果
error.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
schema	dict[str, any] \| null	no	null	スキーマ情報
end	bool \| null	no	null	終了フラグ

extract ( save ) : `cmdbox -m extract -c save <Option>`

Saves settings for extracting text from the specified file.

Option	Type	Required	Default	Choices	Description
--host <host>	str	required	localhost		Specify the service host of the Redis server.
--port <port>	int	required	6379		Specify the service port of the Redis server.
--password <password>	passwd	required	password		Specify the access password of the Redis server (optional). If omitted, password is used.
--svname <svname>	str	required	cmdbox		Specify the service name of the inference server.
--retry_count <retry_count>	int		3		Specifies the number of reconnections to the Redis server.
--retry_interval <retry_interval>	int		5		Specifies the number of seconds before reconnecting to the Redis server.
--timeout <timeout>	int		120		Specify the maximum waiting time until the server responds.
--extract_name <extract_name>	str	required			Specify the name of the extraction configuration.
--extract_cmd <extract_cmd>	str	required			Specify the name of the extraction command setting.
--extract_type <extract_type>	str	required	file	file	Specify the type of extraction.
--scope <scope>	str		client	client \| server	Specify the reference scope. The available image types are client and server.
--client_data <client_data>	str				Specify the path of the data folder when local is referenced.
--loadpath <loadpath>	dir	required			Specify the source path.
--loadregs <loadregs>	str	required	.*		Specifies a load regular expression pattern.

Output Schema

This command implements output_schema() returning Result model.

{
  "success": {
    "performance": [
      {
        "key": "string",
        "value": null
      }
    ],
    "data": "string"
  },
  "warn": {},
  "error": {},
  "schema": {},
  "end": false
}

Field	Type	Required	Default	Description
success	Data \| null	no	null	成功した場合の結果
success.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
success.data	str \| null	no	null	処理結果のデータ
warn	dict[str, any] \| Data \| str \| bool \| null	no	null	警告がある場合の結果
warn.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
error	dict[str, any] \| Data \| str \| bool \| null	no	null	エラーがある場合の結果
error.performance	list[KeyVal] \| null	no	null	パフォーマンス情報のリスト
schema	dict[str, any] \| null	no	null	スキーマ情報
end	bool \| null	no	null	終了フラグ

Command Reference ( extract mode )

extract ( chunklet ) : cmdbox -m extract -c chunklet <Option>

extract ( del ) : cmdbox -m extract -c del <Option>

extract ( list ) : cmdbox -m extract -c list <Option>

extract ( load ) : cmdbox -m extract -c load <Option>

extract ( pdfplumber ) : cmdbox -m extract -c pdfplumber <Option>

extract ( save ) : cmdbox -m extract -c save <Option>

extract ( chunklet ) : `cmdbox -m extract -c chunklet <Option>`

extract ( del ) : `cmdbox -m extract -c del <Option>`

extract ( list ) : `cmdbox -m extract -c list <Option>`

extract ( load ) : `cmdbox -m extract -c load <Option>`

extract ( pdfplumber ) : `cmdbox -m extract -c pdfplumber <Option>`

extract ( save ) : `cmdbox -m extract -c save <Option>`