in Snowplow AWS S3 Iglu Repository Analytics ~ read.

Running status Iglu repository on AWS S3

During setuping Snowplow analytics systems I have to setup a private Iglu repository. The main idea behind this is described on (https://github.com/snowplow/iglu/wiki/Setting-up-an-Iglu-repository). That manual missed several steps that are really important for building Iglu repository on AWS infrastructure. I had spent a lot of time trying to figureout those step. So here they are:

  1. You should upload data to S3 in the layout that is described here https://github.com/snowplow/iglu/wiki/Static-repo
  2. Enable S3 bucket as static hosting solution. It can be from Properties menu of S3 bucket.
  3. Amend policy of s3 bucket to allow public access. It is located in Permission section within Bucket Policy submenu.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::bucket-name/*"
        }
    ]
}
  1. Create CORS policy. It is also located in Permission section in submenu CORS configuration section.
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>*</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>
  1. Update Iglu resolver config for enricher (https://github.com/snowplow/iglu/wiki/Iglu-client-configuration).

You can check that everything is working correct with simple wget ping:
wget http://you-prefix.s3-amazon-region-prefix.amazon.com/schemas/com.yourcompany/schema_name/jsonschema/1-0-0
(in case you are create your own schema with reference as iglu:com.yourcompany/schema_name/{schema_version} and schema version is 1-0-0). You should be able to download your schema now.