DNAnexus download and upload

Summary

This post provides a method to download files from Stjude Could that is much faster than wget.

Login account setup

Note

You just need to do this once.

CREATE YOUR API TOKEN

Following these 3 steps to get an API TOKEN. Maybe email this token to yourself, because you will once see it once. If you lost it, you will need to create it again (though it is quite easy to generate a new one).

../../_images/DNAnexusAPI1.PNG ../../_images/DNAnexusAPI2.PNG ../../_images/DNAnexusAPI3.PNG
module load dx-toolkit

dx login --token [YOUR TOKEN]

Download a folder

You can find projectID by going to the folder and click Settings. See the example below:

../../_images/dx_download.png

The general usage is shown below.

module load dx-toolkit

dx cd projectID

dx ls

# replace projectID

dx download -r -f folder_name

## or

dx download -f *.gz

# -f option used to overwrite files with the same file name.

# I sometimes see files with the same file name in Stjude cloud

# and it can raise error when download the files.

For example, all your data is store in some folder inside chengLab dir, say RNA_seq_example00. Then, by dx cd chengLab-projectID, you are virtually inside chengLab dir, just like Linux command cd. Next, do dx download -r RNA_seq_example00, it will download the whole folder to your current dir in HPC. Now, suppose I want to download RNA_seq_example00 inside my sequencing folder, you should first go to that dir and then do dx download.

Note

ProjectID is for each project, it is the root of your directory tree. If your target folder is in the third-level or more, you need to do more than one cd command.

The following example will assume that the directory structure is:

.. image:: ../../images/stj_cloud_example_tree.png

Example 1 - download seq_data1

Option 1: This option will create seq_data1 folder inside your current dir.

module load dx-toolkit

dx cd projectID

dx download -r -f seq_data1

Option 2: This option will download every file inside your current dir.

module load dx-toolkit

dx cd projectID

dx download -f *.gz

Example 2 - download seq_data2

Option 1: This option will create seq_data2 folder inside your current dir.

module load dx-toolkit

dx cd projectID

dx cd Banana

dx download -r -f seq_data1

Option 2: This option will download every file inside your current dir.

module load dx-toolkit

dx cd projectID

dx cd Banana

dx download -f *.gz

Upload a dir

../../_images/stj_cloud_tree.png

In this example, my root dir is Share_with_PSU. My sub-dir is test. In the test dir, I have test1 folder. In the HPC, I have created a folder called test2 and I want to upload this folder to test.

So the first step to do is to go to the root dir:

module load dx-toolkit

dx cd projectID

Then go to the test dir:

dx cd test

Finally, upload your test2 folder

dx upload -r test2

Note that dx upload -r test2/ will upload all files in test2. If you want to upload the dir, you should not include the back slash char.

code @ github.