Amazon Macie
Discover how to detect sensitive data in an S3 bucket using Amazon’s Macie service.
Amazon Macie is a security service provided by Amazon that uses machine learning models to monitor data to detect risks and provides automated protection against those risks. Macie monitors our account’s S3 buckets, evaluates their access level and potential security risks, and checks for sensitive data in S3 objects.
When Macie detects a potential risk, finds a bucket that is publically accessible or detects sensitive data in an object, it generates a finding, which contains the details of the sensitive data Macie detects or other security issues in S3 buckets. There are two categories of findings generated by Amazon Macie:
Sensitive data finding: This finding includes data about the sensitive information Macie detects in S3 objects.
Policy finding: This finding includes data about potential risks in the policies associated with our S3 buckets or other security and access control risks.
Monitoring S3 buckets with Macie
Once Amazon Macie is enabled for an AWS account, it creates an IAM service-linked role, which permits it to monitor the resources inside our account. Macie creates a record of all available S3 buckets in our accounts and starts monitoring them to detect security risks. It maintains data for all the buckets in our account, which includes the following:
General information about the bucket, such as its name, ARN, tags, and other things
Block access settings of the bucket
Bucket-level permissions
Bucket sharing and replication settings
Object count and object settings
Detect sensitive data in S3 objects
Other than detecting security leaks in an S3 bucket, we can use Amazon Macie to detect sensitive data, such as PII (personally identifiable information) or our AWS credentials, in S3 objects. This can be done by using one of the following methods:
Automated sensitive data detection
In this method, Macie performs automated evaluations on the objects in our S3 buckets on a daily basis. Macie doesn’t evaluate each and every object in the method; rather, it uses various sampling techniques to test a subset of the available objects. This is done by grouping objects with similar metadata and likely to have similar contents. For example, objects having the bucket name, storage classes, prefixes, etc, are grouped together, and a representative object from this group is selected and evaluated for sensitive data.
Creating sensitive data detection jobs
In case we want a more in-dept analysis of a single S3 bucket, we can create sensitive data detection jobs in Amazon Macie. In these jobs, we can define custom criteria to detect sensitive data in specific S3 objects and have the option to run them once or on a defined schedule. Once a job is run successfully, Macie creates detailed reports containing the results about sensitive data it finds in our S3 objects.
While creating a job, Macie gives us the option of using managed data identifiers provided by Amazon or creating custom data identifiers. Data identifiers are a set of rules and patterns Amazon Macie uses to detect sensitive data in an object; for example, we can specify the pattern for phone numbers in a specific country.
In the illustration given above, Amazon Macie looks for sensitive data in S3 buckets and invokes an EventBridge rule in case it finds sensitive data in one of the buckets. This rule then triggers a Lambda function, which can then be used to take action against the sensitive data found.
Pricing in Macie
Amazon Macie typically uses a combination of fixed fee and usage-based pricing model, and we are charged based on the following:
The total number of S3 buckets evaluated by Amazon Macie
The total number of S3 objects evaluated
The overall quality of the data evaluated by Macie in automated and targeted data discovery
Get hands-on with 1300+ tech skills courses.