I found out that AWS Glue has a feature to create Streaming ETL jobs with self-managed Kafka as data source. I quickly deployed testing Kafka server on EC2, created simple producer, set up connection, database and table in Glue and created job which simply writes data to S3 as written in this guide.
Then I ran my job for 10 minutes and after checking contents of S3 target, found that there is only a folder named checkpoint/
, while I expect following structure: year=<yyyy>/month=<mm>/day=<dd>/hour=<HH>/
. Does anyone has experience with this, what could be potentially wrong?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…