使用 Druid SQL-based ingestion 批量摄取 S3 数据

📅 2023年04月16日 · ☕ 1 分钟

#Apache Druid
#S3

本文旨在将来自 S3 的 .csv.gz 数据，批量摄取至 Druid. 其中：

Apache Druid: 26.0.0
参考文档：
- SQL-based ingestion
- S3 input source

REPLACE all data

1
2
3
4
5
REPLACE INTO <target table>
OVERWRITE ALL
< SELECT query >
PARTITIONED BY <time granularity>
[ CLUSTERED BY <column list> ]

REPLACE specific time ranges

1
2
3
4
5
REPLACE INTO <target table>
OVERWRITE WHERE __time >= TIMESTAMP '<lower bound>' AND __time < TIMESTAMP '<upper bound>'
< SELECT query >
PARTITIONED BY <time granularity>
[ CLUSTERED BY <column list> ]

作者

Molly Wang

一个数据产品人的自我修养