Package for downloading things from a string URL using a variety of protocols.
go-getter is a library for Go (golang) for downloading files or directories from various sources using a URL as the primary form of input.
The power of this library is being flexible in being able to download from a number of different sources (file paths, Git, HTTP, Mercurial, etc.) using a single string as input. This removes the burden of knowing how to download from a variety of sources from the implementer.
The concept of a detector automatically turns invalid URLs into proper URLs. For example: "github.com/hashicorp/go-getter" would turn into a Git URL. Or "./foo" would turn into a file URL. These are extensible.
This library is used by Terraform for downloading modules and Nomad for downloading binaries.
Package documentation can be found on GoDoc.
Installation can be done with a normal
go get:
$ go get github.com/hashicorp/go-getter
go-getter also has a command you can use to test URL strings:
$ go install github.com/hashicorp/go-getter/cmd/go-getter ...$ go-getter github.com/foo/bar ./foo ...
The command is useful for verifying URL structures.
go-getter uses a single string URL as input to download from a variety of protocols. go-getter has various "tricks" with this URL to do certain things. This section documents the URL format.
Protocols are used to download files/directories using a specific mechanism. Example protocols are Git and HTTP.
Detectors are used to transform a valid or invalid URL into another URL if it matches a certain pattern. Example: "github.com/user/repo" is automatically transformed into a fully valid Git URL. This allows go-getter to be very user friendly.
go-getter out of the box supports the following protocols. Additional protocols can be augmented at runtime by implementing the
Getterinterface.
In addition to the above protocols, go-getter has what are called "detectors." These take a URL and attempt to automatically choose the best protocol for it, which might involve even changing the protocol. The following detection is built-in by default:
In some cases, the protocol to use is ambiguous depending on the source URL. For example, "http://github.com/mitchellh/vagrant.git" could reference an HTTP URL or a Git URL. Forced protocol syntax is used to disambiguate this URL.
Forced protocol can be done by prefixing the URL with the protocol followed by double colons. For example:
git::http://github.com/mitchellh/vagrant.gitwould download the given HTTP URL using the Git protocol.
Forced protocols will also override any detectors.
In the absence of a forced protocol, detectors may be run on the URL, transforming the protocol anyways. The above example would've used the Git protocol either way since the Git detector would've detected it was a GitHub URL.
Each protocol can support protocol-specific options to configure that protocol. For example, the
gitprotocol supports specifying a
refquery parameter that tells it what ref to checkout for that Git repository.
The options are specified as query parameters on the URL (or URL-like string) given to go-getter. Using the Git example above, the URL below is a valid input to go-getter:
github.com/hashicorp/go-getter?ref=abcd1234
The protocol-specific options are documented below the URL format section. But because they are part of the URL, we point it out here so you know they exist.
If you want to download only a specific subdirectory from a downloaded directory, you can specify a subdirectory after a double-slash
//. go-getter will first download the URL specified before the double-slash (as if you didn't specify a double-slash), but will then copy the path after the double slash into the target directory.
For example, if you're downloading this GitHub repository, but you only want to download the
testdatadirectory, you can do the following:
https://github.com/hashicorp/go-getter.git//testdata
If you downloaded this to the
/tmpdirectory, then the file
/tmp/archive.gzwould exist. Notice that this file is in the
testdatadirectory in this repository, but because we specified a subdirectory, go-getter automatically copied only that directory contents.
Subdirectory paths may also use filesystem glob patterns. The path must match exactly one entry or go-getter will return an error. This is useful if you're not sure the exact directory name but it follows a predictable naming structure.
For example, the following URL would also work:
https://github.com/hashicorp/go-getter.git//test-*
For file downloads of any protocol, go-getter can automatically verify a checksum for you. Note that checksumming only works for downloading files, not directories, but checksumming will work for any protocol.
To checksum a file, append a
checksumquery parameter to the URL. go-getter will parse out this query parameter automatically and use it to verify the checksum. The parameter value can be in the format of
type:valueor just
value, where type is "md5", "sha1", "sha256", "sha512" or "file" . The "value" should be the actual checksum value or download URL for "file". When
typepart is omitted, type will be guessed based on the length of the checksum string. Examples:
./foo.txt?checksum=md5:b7d96c89d09d9e204f5fedc4d5d55b21
./foo.txt?checksum=b7d96c89d09d9e204f5fedc4d5d55b21
./foo.txt?checksum=file:./foo.txt.sha256sum
When checksumming from a file - ex: with
checksum=file:url- go-getter will get the file linked in the URL after
file:using the same configuration. For example, in
file:http://releases.ubuntu.com/cosmic/MD5SUMSgo-getter will download a checksum file under the aforementioned url using the http protocol. All protocols supported by go-getter can be used. The checksum file will be downloaded in a temporary file then parsed. The destination of the temporary file can be changed by setting system specific environment variables:
TMPDIRfor unix;
TMP,
TEMPor
USERPROFILEon windows. Read godoc of os.TempDir for more information on the temporary directory selection. Content of files are expected to be BSD or GNU style. Once go-getter is done with the checksum file; it is deleted.
The checksum query parameter is never sent to the backend protocol implementation. It is used at a higher level by go-getter itself.
If the destination file exists and the checksums match: download will be skipped.
go-getter will automatically unarchive files into a file or directory based on the extension of the file being requested (over any protocol). This works for both file and directory downloads.
go-getter looks for an
archivequery parameter to specify the format of the archive. If this isn't specified, go-getter will use the extension of the path to see if it appears archived. Unarchiving can be explicitly disabled by setting the
archivequery parameter to
false.
The following archive formats are supported:
tar.gzand
tgz
tar.bz2and
tbz2
tar.xzand
txz
zip
gz
bz2
xz
For example, an example URL is shown below:
./foo.zip
This will automatically be inferred to be a ZIP file and will be extracted. You can also be explicit about the archive type:
./some/other/path?archive=zip
And finally, you can disable archiving completely:
./some/path?archive=false
You can combine unarchiving with the other features of go-getter such as checksumming. The special
archivequery parameter will be removed from the URL before going to the final protocol downloader.
This section documents the protocol-specific options that can be specified for go-getter. These options should be appended to the input as normal query parameters (HTTP headers are an exception to this, however). Depending on the usage of go-getter, applications may provide alternate ways of inputting options. For example, Nomad provides a nice options block for specifying options rather than in the URL.
The options below are available to all protocols:
archive- The archive format to use to unarchive this file, or "" (empty string) to disable unarchiving. For more details, see the complete section on archive support above.
checksum- Checksum to verify the downloaded file or archive. See the entire section on checksumming above for format and more details.
filename- When in file download mode, allows specifying the name of the downloaded file on disk. Has no effect in directory mode.
file)
None
git)
ref- The Git ref to checkout. This is a ref, so it can point to a commit SHA, a branch name, etc. If it is a named ref such as a branch name, go-getter will update it to the latest on each get.
sshkey- An SSH private key to use during clones. The provided key must be a base64-encoded string. For example, to generate a suitable
sshkeyfrom a private key file on disk, you would run
base64 -w0.
Note: Git 2.3+ is required to use this feature.
depth- The Git clone depth. The provided number specifies the last
nrevisions to clone from the repository.
The
gitgetter accepts both URL-style SSH addresses like
git::ssh://[email protected]/foo/bar, and "scp-style" addresses like
git::[email protected]/foo/bar. In the latter case, omitting the
git::force prefix is allowed if the username prefix is exactly
[email protected].
The "scp-style" addresses cannot be used in conjunction with the
ssh://scheme prefix, because in that case the colon is used to mark an optional port number to connect on, rather than to delimit the path from the host.
hg)
rev- The Mercurial revision to checkout.
http)
To use HTTP basic authentication with go-getter, simply prepend
username:[email protected]to the hostname in the URL such as
https://Aladdin:[email protected]/index.html. All special characters, including the username and password, must be URL encoded.
Optional request headers can be added by supplying them in a custom
HttpGetter(not as query parameters like most other options). These headers will be sent out on every request the getter in question makes.
s3)
S3 takes various access configurations in the URL. Note that it will also read these from standard AWS environment variables if they're set. S3 compliant servers like Minio are also supported. If the query parameters are present, these take priority.
aws_access_key_id- AWS access key.
aws_access_key_secret- AWS access key secret.
aws_access_token- AWS access token if this is being used.
aws_profile- Use this profile from local ~/.aws/ config. Takes priority over the other three.
If you use go-getter and want to use an EC2 IAM Instance Profile to avoid using credentials, then just omit these and the profile, if available will be used automatically.
If you use go-gitter for Minio support, you must consider the following:
aws_access_key_id(required) - Minio access key.
aws_access_key_secret(required) - Minio access key secret.
region(optional - defaults to us-east-1) - Region identifier to use.
version(optional - defaults to Minio default) - Configuration file format.
S3 has several addressing schemes used to reference your bucket. These are listed here: http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html#access-bucket-intro
Some examples for these addressing schemes: - s3::https://s3.amazonaws.com/bucket/foo - s3::https://s3-eu-west-1.amazonaws.com/bucket/foo - bucket.s3.amazonaws.com/foo - bucket.s3-eu-west-1.amazonaws.com/foo/bar - "s3::http://127.0.0.1:9000/test-bucket/hello.txt?awsaccesskeyid=KEYID&awsaccesskeysecret=SECRETKEY®ion=us-east-2"
gcs)
In order to access to GCS, authentication credentials should be provided. More information can be found here
The tests for
get_gcs.gorequire you to have GCP credentials set in your environment. These credentials can have any level of permissions to any project, they just need to exist. This means setting
GOOGLE_APPLICATION_CREDENTIALS="~/path/to/credentials.json"or
GOOGLE_CREDENTIALS="{stringified-credentials-json}". Due to this configuration,
get_gcs_test.gowill fail for external contributors in CircleCI.