Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
346 views
in Technique[技术] by (71.8m points)

How to QUICKLY download Tar file, untar it and upload the content to Azure Block storage using Python?

I have the following Python code that runs in Jupyter Notebook. It downloads a tar file from the source location, untars it and uploads to Azure Blob storage.

import os
import tarfile
from azure.storage.blob import BlobClient

def upload_folder(local_path):
    connection_string = "XXX"
    container_name = "mycontainername"
    
    with tarfile.open(local_path, "r") as file:
        for each in file.getnames():
            print(each)
            file.extract(each)          
            blob = BlobClient.from_connection_string(connection_string,
                                                     container_name=container_name,
                                                     blob_name=each)

            with open(each, "rb") as f:
                blob.upload_blob(f, overwrite=True)
            os.remove(each)


# MAIN
!wget https://path/to/myarchive.tar.gz

local_path = "myarchive.tar.gz"

upload_folder(local_path)

!rm -rf myarchive.tar.gz
!rm -rf myarchive

The myarchive.tar.gz takes 1Gb, which corresponds to approximately 4Gb of uncompressed data. The problem is that it takes too long to run this code even for such relatively small data volume. It takes around 5-6 hours.

What am I doing wrong? Is there any way to optimise my code to run it faster?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can make upload file processing as one task and use multiprocessing to create some process pool. Then we can run some task at one time with the pool to add the speed of the script. For more details, please refer to here and here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...