Author Manuscript Collection

The PMC Author Manuscript Collection (“Collection”) consists of articles in author manuscript form that have been made available in PMC in compliance with the NIH Public Access Policy or similar policies of other funders. The text of manuscripts in the Collection may be downloaded in XML and plain text formats. These files are available for text mining. They may also be used consistent with the principles of applicable copyright law.

The files can be accessed using PMC’s FTP service. The URL of the Collection on the FTP site is

The Collection files have been packaged based on PMCID. This means that an author manuscript that has a PMCID of PMC3947720 would be packaged in the file PMC003XXXXXX.xml.tar.gz. As of October 2015, all author manuscripts have PMCIDs that fall in the range of PMC002XXXXXX to PMC004XXXXXX. Note that these files are quite large (up to 4 GB).

The files that contain the XML of all of the articles are:

The plain text files containing the extracted full text are:

These files are updated twice a week, Monday and Thursday.

Suggested FTP client configuration

After a series of experiments using ftp clients with NCBI's FTP server, we've found that the configuration of ftp clients can seriously affect performance. NCBI recommends setting the TCP buffer size to 32Mb. For more information, please see

Last updated: Wed, 07 Jan 2015