Skip to main content
Skip to main content

Formats for input and output data

ClickHouse supports most of the known text and binary data formats. This allows easy integration into almost any working data pipeline to leverage the benefits of ClickHouse.

Input formats

Input formats are used for:

  • Parsing data provided to INSERT statements
  • Performing SELECT queries from file-backed tables such as File, URL, or HDFS
  • Reading dictionaries

Choosing the right input format is crucial for efficient data ingestion in ClickHouse. With over 70 supported formats, selecting the most performant option can significantly impact insert speed, CPU and memory usage, and overall system efficiency. To help navigate these choices, we benchmarked ingestion performance across formats, revealing key takeaways:

  • The Native format is the most efficient input format, offering the best compression, lowest resource usage, and minimal server-side processing overhead.
  • Compression is essential - LZ4 reduces data size with minimal CPU cost, while ZSTD offers higher compression at the expense of additional CPU usage.
  • Pre-sorting has a moderate impact, as ClickHouse already sorts efficiently.
  • Batching significantly improves efficiency - larger batches reduce insert overhead and improve throughput.

For a deep dive into the results and best practices, read the full benchmark analysis. For the full test results, explore the FastFormats online dashboard.

Output formats

Formats supported for output are used for:

  • Arranging the results of a SELECT query
  • Performing INSERT operations into file-backed tables

Formats overview

The supported formats are:

FormatInputOutput
TabSeparated
TabSeparatedRaw
TabSeparatedWithNames
TabSeparatedWithNamesAndTypes
TabSeparatedRawWithNames
TabSeparatedRawWithNamesAndTypes
Template
TemplateIgnoreSpaces
CSV
CSVWithNames
CSVWithNamesAndTypes
CustomSeparated
CustomSeparatedWithNames
CustomSeparatedWithNamesAndTypes
SQLInsert
Values
Vertical
JSON
JSONAsString
JSONAsObject
JSONStrings
JSONColumns
JSONColumnsWithMetadata
JSONCompact
JSONCompactStrings
JSONCompactColumns
JSONEachRow
PrettyJSONEachRow
JSONEachRowWithProgress
JSONStringsEachRow
JSONStringsEachRowWithProgress
JSONCompactEachRow
JSONCompactEachRowWithNames
JSONCompactEachRowWithNamesAndTypes
JSONCompactEachRowWithProgress
JSONCompactStringsEachRow
JSONCompactStringsEachRowWithNames
JSONCompactStringsEachRowWithNamesAndTypes
JSONCompactStringsEachRowWithProgress
JSONObjectEachRow
BSONEachRow
TSKV
Pretty
PrettyNoEscapes
PrettyMonoBlock
PrettyNoEscapesMonoBlock
PrettyCompact
PrettyCompactNoEscapes
PrettyCompactMonoBlock
PrettyCompactNoEscapesMonoBlock
PrettySpace
PrettySpaceNoEscapes
PrettySpaceMonoBlock
PrettySpaceNoEscapesMonoBlock
Prometheus
Protobuf
ProtobufSingle
ProtobufList
Avro
AvroConfluent
Parquet
ParquetMetadata
Arrow
ArrowStream
ORC
One
Npy
RowBinary
RowBinaryWithNames
RowBinaryWithNamesAndTypes
RowBinaryWithDefaults
Native
Null
Hash
XML
CapnProto
LineAsString
LineAsStringWithNames
LineAsStringWithNamesAndTypes
Regexp
RawBLOB
MsgPack
MySQLDump
DWARF
Markdown
Form

You can control some format processing parameters with the ClickHouse settings. For more information read the Settings section.

Format schema

The file name containing the format schema is set by the setting format_schema. It's required to set this setting when it is used one of the formats Cap'n Proto and Protobuf. The format schema is a combination of a file name and the name of a message type in this file, delimited by a colon, e.g. schemafile.proto:MessageType. If the file has the standard extension for the format (for example, .proto for Protobuf), it can be omitted and in this case, the format schema looks like schemafile:MessageType.

If you input or output data via the client in interactive mode, the file name specified in the format schema can contain an absolute path or a path relative to the current directory on the client. If you use the client in the batch mode, the path to the schema must be relative due to security reasons.

If you input or output data via the HTTP interface the file name specified in the format schema should be located in the directory specified in format_schema_path in the server configuration.

Skipping errors

Some formats such as CSV, TabSeparated, TSKV, JSONEachRow, Template, CustomSeparated and Protobuf can skip broken row if parsing error occurred and continue parsing from the beginning of next row. See input_format_allow_errors_num and input_format_allow_errors_ratio settings. Limitations:

  • In case of parsing error JSONEachRow skips all data until the new line (or EOF), so rows must be delimited by \n to count errors correctly.
  • Template and CustomSeparated use delimiter after the last column and delimiter between rows to find the beginning of next row, so skipping errors works only if at least one of them is not empty.