EMR version : 5.32
Sqoop version : 1.4.7
In order to get data from a table in MYSQL, sqoop import is being used.
Below is the table structure in MYSQL.
Client
- The ID is a random generated number which is the primary key.
| ID | Title | Active Status
|-------- |------------- |-------------
| pd-31-ed | Test1 | true
| ps-34-tr | Test2 | true
- When using sqoop import although there are no duplicates in the table, I noticed that after the sqoop import the last row is getting duplicated.
- I have queried in MYSQL to check if there are duplicates. But there are no duplicates.
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect $MYSQL_URL/Client --username $MYSQL_DB_USER --password $MYSQL_DB_PASSWORD --table Client --columns ID,Title,ActiveStatus--hive-import --hive-overwrite --hive-table $HIVE_DB.tmp_client --delete-target-dir --target-dir $TARGET_DIR/tmp_client --num-mappers 10 --boundary-query 'SELECT MIN(ID), MAX(ID) FROM Client' --split-by ID --fetch-size=500 --direct --map-column-hive ActiveStatus=boolean
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…