COPY INTO <table>

This command loads data into Databend from files in a variety of locations.

Supported File Locations

Your data files must be located in one of these locations for COPY INTO to work:

Named internal stage: Databend internal named stages. Files can be staged using the PUT to Stage API.
Named external stage: Stages created in Supported Object Storage Solutions.
External location:
- Buckets created in Supported Object Storage Solutions.
- Remote servers from where you can access the files by their URL (starting with "https://...").
- IPFS.

Syntax

COPY INTO [<database>.]<table_name>
FROM { internalStage | externalStage | externalLocation }
[ FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
[ PATTERN = '<regex_pattern>' ]
[ FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | XML} [ formatTypeOptions ] ) ]
[ copyOptions ]

Where:

internalStage

internalStage ::= @<internal_stage_name>[/<path>]

externalStage

externalStage ::= @<external_stage_name>[/<path>]

externalLocation

AWS S3 Compatible Object Storage Service

externalLocation ::=
  's3://<bucket>[<path>]'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        ACCESS_KEY_ID = '<your-access-key-ID>'
        SECRET_ACCESS_KEY = '<your-secret-access-key>'
        SESSION_TOKEN = '<your-session-token>'
        REGION = '<region-name>'
        ENABLE_VIRTUAL_HOST_STYLE = 'true|false'
  )

Parameter	Description	Required
`s3://<bucket>[<path>]`	External files located at the AWS S3 compatible object storage.	Required
ENDPOINT_URL	The bucket endpoint URL starting with "https://". To use a URL starting with "http://", set `allow_insecure` to `true` in the [storage] block of the file `databend-query-node.toml`.	Required
ACCESS_KEY_ID	Your access key ID for connecting the AWS S3 compatible object storage. If not provided, Databend will access the bucket anonymously.	Optional
SECRET_ACCESS_KEY	Your secret access key for connecting the AWS S3 compatible object storage.	Optional
SESSION_TOKEN	Your temporary credential for connecting the AWS S3 service	Optional
REGION	AWS region name. For example, us-east-1.	Optional
ENABLE_VIRTUAL_HOST_STYLE	If you use virtual hosting to address the bucket, set it to "true".	Optional

Azure Blob storage

externalLocation ::=
  'azblob://<container>[<path>]'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        ACCOUT_NAME = '<your-account-name>'
        ACCOUNT_KEY = '<your-account-key>'
  )

Parameter	Description	Required
`azblob://<container>[<path>]`	External files located at the Azure Blob storage.	Required
ENDPOINT_URL	The container endpoint URL starting with "https://". To use a URL starting with "http://", set `allow_insecure` to `true` in the [storage] block of the file `databend-query-node.toml`.	Required
ACCOUNT_NAME	Your account name for connecting the Azure Blob storage. If not provided, Databend will access the container anonymously.	Optional
ACCOUNT_KEY	Your account key for connecting the Azure Blob storage.	Optional

Google Cloud Storage

externalLocation ::=
  'gcs://<container>[<path>]'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        CREDENTIAL = '<your-credential>'
  )

Parameter	Description	Required
`gcs://<bucket>[<path>]`	External files located at the Google Cloud Storage	Required
ENDPOINT_URL	The container endpoint URL starting with "https://". To use a URL starting with "http://", set `allow_insecure` to `true` in the [storage] block of the file `databend-query-node.toml`.	Optional
CREDENTIAL	Your credential for connecting the GCS. If not provided, Databend will access the container anonymously.	Optional

Huawei Object Storage

externalLocation ::=
  'obs://<container>[<path>]'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        ACCESS_KEY_ID = '<your-access-key-id>'
        SECRET_ACCESS_KEY = '<your-secret-access-key>'
  )

Parameter	Description	Required
`obs://<bucket>[<path>]`	External files located at the obs	Required
ENDPOINT_URL	The container endpoint URL starting with "https://". To use a URL starting with "http://", set `allow_insecure` to `true` in the [storage] block of the file `databend-query-node.toml`.	Required
ACCESS_KEY_ID	Your access key ID for connecting the OBS. If not provided, Databend will access the bucket anonymously.	Optional
SECRET_ACCESS_KEY	Your secret access key for connecting the OBS.	Optional

Remote Files

externalLocation ::=
  'https://<url>'

You can use glob patterns to specify moran than one file. For example, use

ontime_200{6,7,8}.csv to represents ontime_2006.csv,ontime_2007.csv,ontime_2008.csv.
ontime_200[6-8].csv to represents ontime_2006.csv,ontime_2007.csv,ontime_2008.csv.

IPFS

externalLocation ::=
  'ipfs://<your-ipfs-hash>'
  CONNECTION = (ENDPOINT_URL = 'https://<your-ipfs-gateway>')

FILES = ( 'file_name' [ , 'file_name' ... ] )

Specifies a list of one or more files names (separated by commas) to be loaded.

PATTERN = 'regex_pattern'

A PCRE2-based regular expression pattern string, enclosed in single quotes, specifying the file names to match. Click here to see an example. For PCRE2 syntax, see http://www.pcre.org/current/doc/html/pcre2syntax.html.

formatTypeOptions

formatTypeOptions ::=
  RECORD_DELIMITER = '<character>'
  FIELD_DELIMITER = '<character>'
  SKIP_HEADER = <integer>
  QUOTE = '<character>'
  ESCAPE = '<character>'
  NAN_DISPLAY = '<string>'
  ROW_TAG = '<string>'
  COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE

`TYPE = 'CSV'`

Comma Separated Values format (RFC).

some notice:

a string field contains Quote|Escape|RECORD_DELIMITER|RECORD_DELIMITER must be quoted.
no character is escaped except Quote in quoted string.
no space between FIELD_DELIMITER and Quote.
no trailing FIELD_DELIMITER for a record.
Array/Struct field is serialized to a string as in SQL, and then the resulting string is output to CSV in quotes.
if you are generating CSV via programing, we highly recommend you to use the CSV lib of the programing language.
for text file unloaded from MySQL, the default format is TSV in databend. it is valid CSV only if ESCAPED BY is empty and ENCLOSED BY is not empty.

`RECORD_DELIMITER = '<character>'`

Description: One character that separate records in an input file. Supported Values: \r\n or One character including escaped char: \b, \f, \r, \n, \t, \0, \xHH Default: \n

`FIELD_DELIMITER = '<character>'`

Description: One character that separate fields in an input file. Supported Values: One character only, including escaped char: \b, \f, \r, \n, \t, \0, \xHH Default: , (comma)

`Quote = '<character>'`

Description: One character to quote strings in CSV file.

for data loading, quote is not necessary unless a string contains Quote|Escape|RECORD_DELIMITER|RECORD_DELIMITER

Supported Values: \' or \".

Default: \"

`ESCAPE = '<character>'`

Description: One character to escape quote in quoted strings.

Supported Values: \' or \" or \\.

Default: \"

`SKIP_HEADER = '<integer>'`

Use: Data loading only.

Description: Number of lines at the start of the file to skip.

Default: 0

`NAN_DISPLAY = '<string>'`

Supported Values: must be literal 'nan' or 'null' (case-insensitive) Default: 'NaN'

`COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE`

Description: String that represents the compression algorithm.

Default: NONE

Supported Values:

Values	Notes
`AUTO`	Auto detect compression via file extensions
`GZIP`
`BZ2`
`BROTLI`	Must be specified if loading/unloading Brotli-compressed files.
`ZSTD`	Zstandard v0.8 (and higher) is supported.
`DEFLATE`	Deflate-compressed files (with zlib header, RFC1950).
`RAW_DEFLATE`	Deflate-compressed files (without any header, RFC1951).
`XZ`
`NONE`	Indicates that the files have not been compressed.

`TYPE = 'TSV'`

these characters are escaped: \b, \f, \r, \n, \t, \0, \\, \', RECORD_DELIMITER,FIELD_DELIMITER.
quoting/enclosing is not support now.
Array/Struct field is serialized to a string as in SQL, and then the resulting string is output to CSV in quotes.
Null is serialized as \N

`RECORD_DELIMITER = '<character>'`

Description: One character that separate records in an input file.

Supported Values: \r\n or One character including escaped char: \b, \f, \r, \n, \t, \0, \xHH

Default: \n

`FIELD_DELIMITER = '<character>'`

Description: One character that separate fields in an input file.

Supported Values: One character only, including escaped char: \b, \f, \r, \n, \t, \0, \xHH

Default: \t (TAB)

`COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE`

same as COMPRESSION in TYPE = 'CSV'

`TYPE = 'NDJSON'`

`COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE`

same as COMPRESSION in TYPE = 'CSV'

`TYPE = 'XML'`

`COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE`

same as COMPRESSION in TYPE = 'CSV'

`ROW_TAG` = `<string>`

Description: used to select XML elements to be decoded as a record.

Default: 'row'

`TYPE = 'Parquet'`

No options available now.

copyOptions

copyOptions ::=
  [ SIZE_LIMIT = <num> ]
  [ PURGE = <bool> ]
  [ FORCE = <bool> ]

Parameters	Description	Required
`SIZE_LIMIT = <num>`	Specifies the maximum rows of data to be loaded for a given COPY statement. Defaults to `0` meaning no limits.	Optional
`PURGE = <bool>`	If `True`, the command will purge the files in the stage after they are loaded successfully into the table. Default: `False`.	Optional
`FORCE = <bool>`	Defaults to `False` meaning the command will skip duplicate files in the stage when copying data. If `True`, duplicate files will not be skipped.	Optional

Examples

Loading Data from an Internal Stage

COPY INTO mytable FROM @my_internal_s1 pattern = 'books.*parquet' file_format = (type = 'PARQUET');

Loading Data from an External Stage

COPY INTO mytable FROM @my_external_s1 pattern = 'books.*parquet' file_format = (type = 'PARQUET');

Loading Data from External Locations

AWS S3 compatible object storage services

This example reads 10 rows from a CSV file and inserts them into a table:

COPY INTO mytable
  FROM 's3://mybucket/data.csv'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        ACCESS_KEY_ID = '<your-access-key-ID>'
        SECRET_ACCESS_KEY = '<your-secret-access-key>')
  FILE_FORMAT = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 1) size_limit=10;

This example reads 10 rows from a CSV file compressed as GZIP and inserts them into a table:

COPY INTO mytable
  FROM 's3://mybucket/data.csv.gz'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        ACCESS_KEY_ID = '<your-access-key-ID>'
        SECRET_ACCESS_KEY = '<your-secret-access-key>')
  FILE_FORMAT = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 1 compression = 'GZIP') size_limit=10;

This example loads data from a CSV file without specifying the endpoint URL:

COPY INTO mytable
  FROM 's3://mybucket/data.csv'
  FILE_FORMAT = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 1) size_limit=10;

This is an example loading data from a Parquet file:

COPY INTO mytable
  FROM 's3://mybucket/data.parquet'
  CONNECTION = (
      ACCESS_KEY_ID = '<your-access-key-ID>'
      SECRET_ACCESS_KEY = '<your-secret-access-key>')
  FILE_FORMAT = (type = 'PARQUET');

Azure Blob storage

This example reads data from a CSV file and inserts it into a table:

COPY INTO mytable
    FROM 'azblob://mybucket/data.csv'
    CONNECTION = (
        ENDPOINT_URL = 'https://<account_name>.blob.core.windows.net'
        ACCOUNT_NAME = '<account_name>'
        ACCOUNT_KEY = '<account_key>'
    )
    FILE_FORMAT = (type = 'CSV');

Remote Files

This example reads data from three remote CSV files and inserts it into a table:

COPY INTO mytable
    FROM 'https://repo.databend.rs/dataset/stateful/ontime_200{6,7,8}_200.csv'
    FILE_FORMAT = (type = 'CSV');

IPFS

This example reads data from a CSV file on IPFS and inserts it into a table:

COPY INTO mytable
    FROM 'ipfs://<your-ipfs-hash>' connection = (endpoint_url = 'https://<your-ipfs-gateway>')
    FILE_FORMAT = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 1);

Loading Data with Pattern Matching

This example uses pattern matching to only load from CSV files containing sales in their names:

COPY INTO mytable
  FROM 's3://mybucket/'
  PATTERN = '.*sales.*[.]csv'
  FILE_FORMAT = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 1);

Where .* is interpreted as zero or more occurrences of any character. The square brackets escape the period character (.) that precedes a file extension.

If you want to load from all the CSV files, use PATTERN = '.*[.]csv':

COPY INTO mytable
  FROM 's3://mybucket/'
  PATTERN = '.*[.]csv'
  FILE_FORMAT = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 1);

Tutorials

Here are some tutorials to help you get started with COPY INTO:

Tutorial: Load from an internal stage: In this tutorial, you will create an internal stage, stage a sample file, and then load data from the file into Databend with the COPY INTO command.
Tutorial: Load from an Amazon S3 bucket: In this tutorial, you will upload a sample file to your Amazon S3 bucket, and then load data from the file into Databend with the COPY INTO command.
Tutorial: Load from a remote file: In this tutorial, you will load data from a remote sample file into Databend with the COPY INTO command.

Supported File Locations​

Syntax​

internalStage​

externalStage​

externalLocation​

FILES = ( 'file_name' [ , 'file_name' ... ] )​

PATTERN = 'regex_pattern'​

formatTypeOptions​

TYPE = 'CSV'​

RECORD_DELIMITER = '<character>'​

FIELD_DELIMITER = '<character>'​

Quote = '<character>'​

ESCAPE = '<character>'​

SKIP_HEADER = '<integer>'​

NAN_DISPLAY = '<string>'​

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE​

TYPE = 'TSV'​

RECORD_DELIMITER = '<character>'​

FIELD_DELIMITER = '<character>'​

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE​

TYPE = 'NDJSON'​

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE​

TYPE = 'XML'​

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE​

ROW_TAG = <string>​

TYPE = 'Parquet'​

copyOptions​

Examples​

Loading Data from an Internal Stage​

Loading Data from an External Stage​

Loading Data from External Locations​

Loading Data with Pattern Matching​

Tutorials​

Supported File Locations

Syntax

internalStage

externalStage

externalLocation

FILES = ( 'file_name' [ , 'file_name' ... ] )

PATTERN = 'regex_pattern'

formatTypeOptions

`TYPE = 'CSV'`

`RECORD_DELIMITER = '<character>'`

`FIELD_DELIMITER = '<character>'`

`Quote = '<character>'`

`ESCAPE = '<character>'`

`SKIP_HEADER = '<integer>'`

`NAN_DISPLAY = '<string>'`

`COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE`

`TYPE = 'TSV'`

`RECORD_DELIMITER = '<character>'`

`FIELD_DELIMITER = '<character>'`

`COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE`

`TYPE = 'NDJSON'`

`COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE`

`TYPE = 'XML'`

`COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE`

`ROW_TAG` = `<string>`

`TYPE = 'Parquet'`

copyOptions

Examples

Loading Data from an Internal Stage

Loading Data from an External Stage

Loading Data from External Locations

Loading Data with Pattern Matching

Tutorials