Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
539 views
in Technique[技术] by (71.8m points)

mysql - sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true Duplicates Records

EMR version : 5.32
Sqoop version : 1.4.7

In order to get data from a table in MYSQL, sqoop import is being used.

Below is the table structure in MYSQL.

Client

  • The ID is a random generated number which is the primary key.

| ID | Title | Active Status

|-------- |------------- |-------------

| pd-31-ed | Test1 | true

| ps-34-tr | Test2 | true

  • When using sqoop import although there are no duplicates in the table, I noticed that after the sqoop import the last row is getting duplicated.
  • I have queried in MYSQL to check if there are duplicates. But there are no duplicates.

sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect $MYSQL_URL/Client --username $MYSQL_DB_USER --password $MYSQL_DB_PASSWORD --table Client --columns ID,Title,ActiveStatus--hive-import --hive-overwrite --hive-table $HIVE_DB.tmp_client --delete-target-dir --target-dir $TARGET_DIR/tmp_client --num-mappers 10 --boundary-query 'SELECT MIN(ID), MAX(ID) FROM Client' --split-by ID --fetch-size=500 --direct --map-column-hive ActiveStatus=boolean


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...