- Last 7 days
-
learn.cantrill.io learn.cantrill.io
-
Welcome to this video where I'm going to very briefly talk about AWS Transfer Family. This will be an introduction video, and if the topic you're studying requires any additional information, either theory or practical, then there will be follow-up videos. If not, then this is all that you need to know. I want to keep this video as brief as possible, so let's jump in and get started.
AWS Transfer Family is a product that provides a managed file transfer service, allowing you to transfer files to or from S3 and EFS. The reason it's managed is that it provides managed servers which support various protocols. This product allows you to upload and download data to and from EFS and S3 using protocols other than the two native protocols. If you need to access S3 or EFS and not use them natively, then Transfer Family is the product that allows this. It allows you to interact with both of these services using a number of common protocols.
The first protocol is FTP, or File Transfer Protocol, which is unencrypted and has been around for decades. Due to its lack of encryption, its usage is relatively niche. The next is FTP-S, or File Transfer Protocol Secure, which is FTP with TLS encryption, adding additional layers of functionality to FTP. Then we have SFTP, or Secure Shell File Transfer Protocol, which is file transfer running over the top of SSH. Lastly, we have AS2, or Applicability Statement 2, a protocol used for transferring structured business-to-business data. While this is relatively niche, it was added to the product because it's common in certain industries, particularly for workflows with compliance requirements related to data protection and security. AS2 might be used in industries for supply chain logistics, payment workflows, or business-to-business transactions and integrations with enterprise resource planning (ERP) and customer relationship management (CRM) systems. Using AS2 is something you'll typically do in specific situations, and you'll know if you need this protocol. FTP, FTP-S, and SFTP are much more common and are found in a wide variety of industries.
Transfer Family also supports a wide variety of identity providers. The service can have built-in identities, you can use AWS's Directory Service, or you can use custom identity providers, utilizing Lambda or API Gateway. A really important feature of Transfer Family is the Managed File Transfer Workflows (MFTW), which you can think of as a serverless file workflow engine. When files are uploaded to the product, you can define a workflow for what happens to that file as it gets uploaded, such as notification or tagging. This can be especially effective if you need to integrate Transfer Family into other process-based workflows within your business.
With Transfer Family, you gain access to all of these different file transfer protocols via a Transfer Family server within AWS, without needing to manage any server infrastructure. If you need to interact with S3 or EFS using existing workflows where you can't change your applications, then you can use Transfer Family. Now let's take a look visually at the architecture of the product. At a high level, this is how Transfer Family works: we have an AWS environment, and inside, we have S3 and EFS. We want to grant access to either or both of these to an external user, for example, a medical user who wants to use the SFTP protocol.
To do this, we'd configure Transfer Family and create servers that are enabled for one or more protocols. These servers are configured to communicate with the backend storage resources using IAM roles for permissions. Once configured, our user can access these resources, potentially using a custom DNS name and a protocol that they support, rather than the native S3 or EFS access methods. Transfer Family also supports a range of authentication methods, including built-in identities, AWS's Directory Service, or a custom identity store.
That's how the service works from a high-level architecture perspective. But there is additional information that you'll need for real-world applications and the AWS exams, where this product features. Let's now explore how we can connect to the service over different networking architectures. Within Transfer Family, you create servers, which act as the front-end access points to your storage. These servers present the supporting backend storage (S3 and EFS) via one or more supported protocols, such as SFTP, FTPS, FTP, and AS2. How you access these depends on how you configure the service's endpoints, with three options available: Public, VPC with Internet access, and VPC internal only.
For the Public endpoint, the endpoint runs in the AWS public zone, making it accessible from the public internet. This means there is nothing to configure in terms of networking, and no worries about VPCs or other networking components. However, it comes with some limitations: the only supported protocol is SFTP, the endpoint has a dynamic IP managed by AWS (so DNS should be used to access it), and you can't control access using features like network access control lists or security groups.
Next, the endpoint types that run in a VPC share some similarities. Both run inside a VPC, but the VPC Internet type allows you to use SFTP, FTPS, and AS2. The internal-only VPC endpoint type allows the use of SFTP, FTPS, FTP, and AS2. FTP, being unencrypted, is best suited for internal use and should not run over the public internet. Both VPC endpoint types can be accessed from connected business networks, such as Direct Connect and VPNs, so anything with connectivity to the VPC can access the service as if it were inside the VPC. Additionally, Transfer Family provides static IPs for both types, and both can be secured using network access control lists or security groups, providing a security benefit compared to the public endpoint type.
The main difference is that the VPC Internet type is allocated with an elastic IP, which is static and accessible both over the public internet and within the VPC, as well as from corporate networks. This provides a more secure and flexible option compared to the public endpoint type.
To summarize, this is the foundational knowledge you need. If your course requires more theory or practical knowledge, follow-up videos will be provided. Otherwise, this covers everything you'll need to understand for now. A few final points: AWS Transfer Family is multi-AZ, and the cost is based on the provisioned server per hour and the data transferred. There are no upfront costs, and you only pay for what you use. For FTP and FTPS, only directory service or custom identity providers are supported, and for FTP, it can only be used internally within a VPC. FTP cannot be used with the public or VPC Internet endpoint types. As for AS2, it requires a VPC Internet or internal-only endpoint type and cannot use the public endpoint type.
You would use this product if you need access to S3 or EFS through existing protocols, especially when integrating it into your existing workflows where the protocol cannot be changed, or you might use the Managed File Transfer Workflow feature to create new workflows. As mentioned earlier, this is the core knowledge needed for AWS exams that feature this product. If you require further information, additional theory or practical videos will follow this one. But for now, that's everything I want to cover in this video. Complete the video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back! In this lesson, I want to talk about another file system provided by FSX, which is FSX for Lustre. This is a file system designed for various high-performance computing workloads. It is important for the exam that you understand exactly what it provides and how it’s architected. We have a lot to cover, so let’s jump in and take a look.
In the exam, you won’t need to know about Lustre in detail. It’s one of those relatively niche products, but you'll need to distinguish between scenarios when you might use products such as FSX for Windows versus FSX for Lustre. FSX for Windows is a Windows-native file system that is accessed over SMB. It’s used for Windows-native environments within AWS. FSX for Lustre, on the other hand, is a managed implementation of the Lustre file system, which is designed specifically for high-performance computing. It supports Linux-based instances running in AWS, and as a key concept to track for the exam, it also supports POSIX-style permissions for file systems. Lustre is designed for use cases such as machine learning, big data, or financial modeling—anything that needs to process large amounts of data with a high level of performance.
FSX for Lustre can scale to hundreds of gigabytes per second of throughput, offering sub-millisecond latency when accessing that storage. This is the level of performance required for high-performance computing across many clients or instances. FSX for Lustre can be provisioned using two different deployment types. If you have a need for the absolute best performance for short-term workloads, then you can pick the Scratch deployment. Scratch is optimized for really high-end performance, but it doesn’t provide much in the way of resilience or high availability. If you need a persistent file system or high availability for your workload, then you can choose the persistent option. This option is great for longer-term storage, offering high availability, but it’s important to note that it provides high availability within only one availability zone.
Lustre is a single availability zone file system because it needs to deliver high-end performance. Therefore, the high availability provided by the persistent deployment type is only within one availability zone, and it also offers self-healing. If any hardware fails as part of the file system, it will be automatically replaced by AWS. This is the deployment type to choose if you need resilience and high availability for the data running on the file system. While you won’t need to know this level of detail for the exam, I’ve included a link attached to this lesson that details the differences between Scratch and persistent deployments in more detail. I find it useful to at least know the high-level differences between these two deployment types.
As with FSX for Windows, FSX for Lustre is available over a VPN or Direct Connect from on-premises locations. Of course, you will need a substantial amount of bandwidth to benefit from the Lustre performance, but it is available as an option. Now, it's important for the exam that you have an understanding of how FSX for Lustre works, so let’s have a look at that next. Before we dive into the architecture of FSX for Lustre, I want to conceptually talk about what FSX for Lustre means. What do you actually do when you use this file system? The product focuses on a managed file system that you create, which is accessible from within a VPC and anything connected to that VPC via private networking. So, in terms of connectivity, it’s much like EFS or FSX for Windows in that sense—you can access it from the VPC or anything connected to it with private networking.
The file system is where the data lives, where it's being analyzed or processed by your applications. When you create a file system, you can associate it with a repository, and in this case, the repository is an S3 bucket. If you do this when the file system is created, the objects within the S3 bucket are visible in the file system. However, at this stage, they’re not actually stored within the Lustre file system. When the data is first accessed by any clients connected to the Lustre file system, it is lazy-loaded into the file system from the S3 repository. After that first load, it remains within the file system. So, it’s important to understand that while objects initially appear to be within the file system when using an S3 repository, they’re only truly present in the file system when they’re first accessed. There isn’t actually any built-in synchronization. Conceptually, the Lustre file system is separate, and it can use an S3 repository as a foundation.
You can sync any changes made in the file system back to the S3 repository using the HSM underscore archive command. What I want you to understand conceptually is that the Lustre file system is completely separate. It can be configured to lazy-load data from S3 and write it back, but it’s not automatically in sync. The Lustre file system is where the processing of data occurs.
Now that I’ve covered the conceptual elements, let’s take a look at how the product is architected. Before doing that, there are a few key points to discuss. Lustre splits data up when storing it on disks. There are different types or elements to the data stored within the file system. The first is the metadata, which includes things like file names, timestamps, and permissions. This is stored on metadata targets (MSTs), and the Lustre file system has one of these. Then, we have the data itself, which is split across multiple object storage targets (OSTs). Each of these is 1.17 TIB in size. By splitting the data across these OSTs, Lustre achieves its high performance levels.
The performance provided by the product is a baseline performance level based on the size of the file system. The size of the file system starts with a minimum of 1.2 TIB, and you can add in increments of 2.4 TIB. For the Scratch deployment type, you get a baseline performance of 200 megabytes per second per TIB of storage. For the persistent deployment type, there are three baseline performance levels: 50, 100, and 200 megabytes per second per TIB of storage. For both deployment types, you can burst up to 1300 megabytes per second per TIB of storage. This is based on a credit system, where you earn credits when you're using a performance level below your baseline and consume these credits when you burst above the baseline. It shares many characteristics with EBS volumes, but at a much higher scale and with more parallel architecture.
Let’s look at this architecture visually. Any FSx architecture uses a client-managed VPC—something that you design and implement. Inside this client-managed VPC, there are some clients, typically Linux EC2 instances with the Lustre software installed so they can read and interact with the Lustre file system. At the other end of the architecture, you create a Lustre file system and optionally an S3 repository for that file system. Depending on the size of storage that you configure within Lustre, the product deploys a number of storage servers. These servers handle the storage requests placed against the file system, and each one provides an in-memory cache to allow faster access to frequently used data.
At a high level, the more storage you provision, the more servers and the more aggregate throughput and IOPS that FSx for Lustre can deliver into your VPC. This performance is delivered into your VPC using a single elastic network interface (ENI). Lustre runs from one availability zone, so you’ll have one ENI within your client-managed VPC, which is used to access the product. From a performance perspective, any writes to Lustre will go through the ENI and be written directly to disk. This depends on the disk throughput and IO characteristics. Likewise, if data is read directly from disk, it’s based on the performance characteristics of the underlying disks. For frequently accessed data, it can use in-memory caching, and at that point, it’s based on the performance characteristics of the networking connecting the clients to the Lustre servers.
At a high level, this is the architecture that the FSx for Lustre product uses. Now, let’s look at some key points that you need to be aware of for the product. When you're creating an FSx for Lustre file system, you get to create it using one of two deployment types. I mentioned these earlier in this lesson. The first one is Scratch, which is designed for when you want pure performance. If you’re deploying short-term or temporary workloads, and all you care about is pure performance, the Scratch deployment type is the way to go. However, it’s crucial to understand that this doesn’t provide any high availability or replication. If there’s a hardware failure, any data stored on that hardware is lost and not available to the file system. This doesn’t mean other data is at risk, as any other data continues to be available as part of the Lustre file system, but you need to understand from a file system planning perspective that larger file systems generally mean more servers, more disks, and a higher chance of failure.
Choosing the persistent deployment type means you have replication, but this is only within a single availability zone. All hardware and data are replicated within a single availability zone, which protects you against hardware failure but not against the failure of an entire availability zone. Using the persistent deployment type means the product will auto-heal any hardware failure, and data won’t be lost, but remember, this is only within one availability zone. If an entire availability zone fails, data could be lost because hardware isn’t recoverable outside of that zone. However, with both deployment types, you can use the backup functionality of the product to back up that data to S3, and you can perform manual or automatic backups. Automatic backups have anywhere from zero to 35 days of retention, with zero meaning that automatic backups are disabled.
At a high level, this is how the FSx for Lustre product works. It’s similar to FSx for Windows and EFS in terms of architecture. It uses elastic network interfaces injected into a VPC, which can be accessed from the VPC or from any other network connected to that VPC using private networking. For the exam, if you see Windows or SMB mentioned, it’s FSx for Windows and not FSx for Lustre. If you see any mention of Lustre, high-performance computing, POSIX, machine learning, big data, or similar scenarios, it’s FSx for Lustre. If you see machine learning or SageMaker and need a high-performance file system, it could be FSx for Lustre.
With that being said, that’s everything I wanted to cover in this lesson. Go ahead and complete the lesson, and when you're ready, I’ll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover the FSx products, specifically FSx for Windows File Server. FSx is a shared file system product, but it handles the implementation in a very different way than, say, EFS, which we've covered earlier in the course. FSx for Windows File Server is one of the core components of the range of services that AWS provides to support Windows environments in AWS. For a fair amount of AWS history, its support of Windows environments was pretty bad; it just didn't seem to be a priority. Now this changed with FSx for Windows File Server, which provides fully managed native Windows File Servers or, more specifically, file shares. You're provided with file shares as your unit of consumption. The servers themselves are hidden, which is similar to how RDS is architected, but instead of databases, you get file shares.
Now, it's a product designed for integration with Windows environments. It's a native Windows file system; it's not an emulated file server. It can integrate with either managed Active Directory or self-managed Active Directory, and this can be running inside AWS or on-premises. This is a critical feature for enterprises who already have their own Active Directory provision. It is a resilient and highly available system, and it can be deployed in either single or multi-AZ mode. Picking between the two controls the network interfaces available and used to access the product. It uses elastic network interfaces inside the VPC. The backend, even in single AZ mode, uses replication within that availability zone to ensure that it's resilient to hardware failure. However, if you pick multi-AZ, then you get a fully multi-AZ, highly available solution.
It can also perform a full range of different types of backups, which include both client-side and AWS-side features. I'll talk about that later in the lesson. From an AWS side, it can perform both automatic and on-demand backups. Now, file systems that are created inside the FSx product are accessible within a VPC. But also, and this is how more complex environments are supported, they can be accessed over peering connections, VPN connections, and even accessed over physical direct connects. So if you're a large enterprise with a dedicated private link into a VPC, you can access FSx file systems over Direct Connect.
Now, in the exam, when you’re faced with any questions that talk about shared file systems, you need to be looking to identify any Windows-related keywords. Look for things like native Windows file systems, look for things like Active Directory or Directory Service integration, and look for any of the more advanced features, which I’ll talk about over the remainder of this lesson. Essentially, your job in the exam is to pick when to use FSx versus EFS because these are both network shared file systems that you’ll find on the exam. Generally, EFS tends to be used for shared file systems for Linux EC2 instances as well as Linux on-premises servers, whereas FSx is dedicated to Windows environments, so that's the main distinction between these two different services.
So let's have a look visually at how a typical implementation of FSx for Windows File Server might look for an organization like Animals for Life. We start with a familiar architecture. We have a VPC on the left and a corporate network on the right, and these networks are connected with Direct Connect or VPN, with some on-premises staff members. Inside the VPC, we have two availability zones (A and B), and in each of those availability zones, we have two different private subnets. FSx uses Active Directory for its user store, so logically, we start with a directory, which can either be a managed directory delivered as a service from AWS or something that is on-premises.
Now, this is important: FSx can integrate with both, and it doesn’t actually need an Active Directory service defined inside the Directory Services product. Instead, it can connect directly to Active Directory running on-premises. This is critical to understand because it means it can integrate with a completely normal implementation of Active Directory that most large enterprises already have. As I already mentioned, FSx can be deployed either in single AZ or multi-AZ mode, and in both of those, it needs to be connected to some form of directory for its user store. Once deployed, you can create a network share using FSx, and this can be accessed in the normal way using the double backslash, DNS name, and share notation that you'll be familiar with if you use Windows environments. For example, a file system ID dot animalsforlife.org, followed by a slash and "cat pics." In this example, "cat pics" is the actual share.
Using this access path, the file system can be accessed from other AWS services that use Windows-based storage. An example of this is Workspaces, which is a virtual desktop service similar to Citrix available inside AWS. When you deploy Workspaces into a VPC, not only does it require a directory service to function, but for any shared file system needs, it can also use FSx. The most important thing to remember about FSx is that it is a native Windows file system. It supports things like deduplication, the distributed file system (DFS), which is a way Windows can group file shares together and scale out for a more managed file share structure at scale. It supports at-rest encryption using KMS, and it also lets you enforce encryption in transit. Shares are accessed using the SMB protocol, which is standard in Windows environments, and FSx even allows for volume shadow copies. In this context, volume shadow copies allow users to see multiple file versions and initiate restores from the client side.
So that’s really important to understand: if you’re utilizing an FSx share from a Windows environment, you can right-click on a file or folder, view previous versions, and initiate file-level restores without having to use AWS or engage with a system administrator. That’s something that’s provided along with the FSx product as long as it’s integrated with Windows environments—you get that capability. Now, from a performance perspective, FSx is highly performant. The performance delivered can range from anywhere from 8 megabytes per second to 2 gigabytes per second. It can deliver hundreds of thousands of IOPS and less than one millisecond latency, so it can scale up to whatever performance requirements your organization has.
Now, for the exam, you don't need to be aware of the implementation details. I’m trying to focus really on the topics and services that you need for the exam in this course. So when things do occur, I want to teach you more information than you may require for the exam, but there are a lot of topics or features of different services that you only require a high-level overview of, and this is one of those topics. So, what I want to do now is go through some keywords or features that you should be on the lookout for when you see any exam questions that you think might be related to FSx.
The first of these is DFS, a Windows feature that allows users to perform file and folder-level restores. This is one of the features that's provided and is unique to FSx, meaning that if you have any users of Workspaces and they use files and folders on an FSx share, they can right-click, view previous versions, and restore from a user-driven perspective without having to engage a system administrator. Another thing to be aware of is that FSx provides native Windows file systems that are accessible over SMB. If you see SMB mentioned in the exam, it’s probably going to be FSx as the default correct answer. Remember, the EFS file system uses the NFS protocol and is only accessible from Linux EC2 instances or Linux on-premises servers. If you see any mention of SMB, then you can be almost certain that it’s a Windows environment question and involves FSx.
Another key feature provided by FSx is that it uses the Windows permission model, so if you're used to managing permissions for folders or files on Windows file systems, you'll be used to exactly how FSx handles permissions. This is provided natively by the product specifically to support Windows environments in AWS. Next is that the product supports DFS, the distributed file system. If you see that mentioned, either its full name or DFS, then you know that this is going to be related to FSx. DFS is a way that you can natively scale out file systems inside Windows environments. You can either group file shares together in one enterprise-wide structure or use DFS for replication or scaling out performance. It’s a really capable distributed file system.
Now, if you see any questions that talk about the provision of a native Windows file server, but where the admin overhead of running a self-managed EC2 instance running something like Windows Server is not ideal, then you know that it's going to be FSx. FSx provides you with the ability to provision a native Windows file server with file shares but without the admin overhead of managing that server yourself. Lastly, the product is unique in the sense that it delivers these file shares, which can also be integrated with either directory service or your own active directory directly. These are really important things to remember for the exam, and they’ll help you select between other products and FSx.
Again, I don’t expect you to get many questions on FSx. I do know of at least one or two unique questions in the exam, but even if it only gets you that one extra mark, it can be the difference between a pass and a fail. So try your best to remember all the key features I’ve explained throughout this lesson. But at that point, that is everything I wanted to cover in this theory-only lesson. Go ahead, complete this video, and then when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.ioDataSync1
-
Welcome back! In this lesson, I want to talk about an AWS service which you will use in the real world as a solutions architect, and it's also one that starts to feature more and more in the exam. The product I’m referring to is AWS DataSync. We’ve got a lot to cover, so let’s jump in and get started.
AWS DataSync currently tends to feature in the exam in a very light way. You might be lucky and not even have a question on it, but I do know that it features in at least two unique questions that I’m aware of. So, you do need to be aware of what it is, what it does, and the type of situations where you might use it.
DataSync is a data transfer service that allows you to move data into or out of AWS. Historically, many of the transfer tasks involving AWS have either been manual uploads or downloads or have used a physical device like the Snowball or Snowball Edge series of transfer devices. DataSync, however, is a service that manages this process end-to-end.
DataSync tends to be used for workloads like data migrations into AWS, or when you need to transfer data into AWS for processing and then back out again, or when you need to archive data into AWS to take advantage of cost-effective storage. It can even be used as part of disaster recovery or business continuity planning.
As a product, it’s designed to work at huge scales. Each agent—and I’ll introduce the concept of an agent later in this lesson—can handle 10 gigabits per second of data transfer, and each job—I'll also introduce the concept of jobs within this lesson—can handle 50 million files. This is obviously huge scale. Very few transfer jobs will require that level of capacity or performance, but in addition to that scale, it also handles the transfer of metadata such as permissions and timestamps, which are both essential for complex data structure migrations.
And finally, and this is a huge benefit for some scenarios, DataSync includes built-in data validation. Imagine if you're transferring huge numbers of medical records or scans into AWS. You need to make sure that the data, as it arrives in AWS, matches the original data, and DataSync includes this functionality by default.
Now, in terms of the key features of the product, it is really scalable. Each agent can handle 10 gigabits per second of data transfer, which equates to around 100 terabytes per day, and you can add additional agents assuming you have the bandwidth to support it. You can use bandwidth limiters to avoid the saturation of internet links, thus reducing the customer impact of transferring the data. The product supports incremental and scheduled transfers, and it supports compression and encryption.
If you're transferring huge amounts of data and have concerns over liability issues, DataSync also supports automatic recovery from transit errors. It handles integration with AWS services such as S3, EFS, and FSX for Windows servers. For some services, it supports service-to-service transfer, such as moving data from EFS to EFS inside AWS, even across regions. Best of all, it’s a pay-as-you-use service, so there is a per-gigabyte cost for any data that's moved by the product.
Let’s quickly look at the architecture visually to help you understand exactly how it gets used. This is going to be useful for the exam. In this example architecture, we have a corporate on-premises environment on the left and an AWS region on the right. The business premises have an existing SAN or NAS storage device with data that we want to move into AWS. To facilitate this, we install the DataSync agent on the business’s on-premises VMware platform. This agent is capable of communicating with the NAS or SAN using either the NFS or SMB protocols. Most SANs, NASs, or other storage devices support either one or both of these protocols. So, the DataSync agent is capable of integrating with nearly all local on-premises storage.
Once DataSync is configured, the agent communicates with the DataSync endpoint running within AWS, and from there, it can store the data in a number of different types of locations. Examples include various S3 storage classes or VPC resources such as Elastic File System (EFS) or FSX for Windows Server. You can configure a schedule for the transfer, targeting or avoiding certain time periods. If you have any link-speed performance issues, you can set a bandwidth limit to throttle the rate at which DataSync syncs the data between your on-premises environment and AWS.
For the exam, you just need to understand the architecture. You won’t need to be aware of the implementation details. So, at a very high level, be aware that you need to have the DataSync agent installed locally within your on-premises environment. Be aware that it communicates over NFS or SMB with on-premises storage and then transfers that data through to AWS. It can recover from failures, it can use schedules, and it can throttle the bandwidth between on-premises and AWS. From there, it can store data into S3, the Elastic File System, or FSX for Windows Servers.
If you see any questions in the exam that discuss the reliable transfer of large quantities of data, that needs to integrate with EFS, FSX, or S3, and supports bidirectional transfer, incremental transfer, and scheduled transfer, then it’s likely to be AWS DataSync that’s the right answer.
Now, let’s finish up by reviewing the main architectural components of the DataSync product. First, we have the task. A task within DataSync is essentially a job. A job defines what is being synced, how quickly, any schedules that need to be used, and any bandwidth throttling that needs to take place. It also defines the two locations that are involved in that job—where the data is being copied from and where it is being copied to.
Next, we have the agent. As I’ve already mentioned, this is the software used to read or write to on-premises data stores. It uses NFS or SMB and is used to pull data off that store and move it into AWS, or vice versa.
Lastly, we’ve got a location. Every task has two locations—the from location and the to location. Examples of valid locations are network file systems or NFS, server message block or SMB—both of these are very common corporate data transfer protocols. NFS is typically used with Linux or Unix systems, and SMB is very popular in Windows environments. Other valid locations include AWS storage services such as EFS, FSX, and Amazon S3.
That’s all the information you need for the exam. I just wanted to introduce the service because, as I mentioned at the start, I am aware that there are at least two questions involving this product on the new version of the exam. I want to make sure that you go into that exam understanding the high-level architecture, so if you do see DataSync mentioned, you can at least identify whether it’s an appropriate use of that technology or not.
You’ll find that those questions aren’t asking you to interpret different features of DataSync. You’ll be asked to select between DataSync and another product or method of getting data into AWS. So, in this lesson, I focused on exactly when you would use DataSync. If you need to use an electronic method and Snowball or Snowball Edge aren’t appropriate, if you need something that can transfer data in and out of AWS, if the product needs to support schedules, bandwidth throttling, automatic retries, compression, and can handle huge-scale transfers with various AWS and traditional file transfer protocols, then it’s likely DataSync that you need to pick.
With that being said, that’s everything you need to know for any DataSync questions on the exam. Go ahead and complete this video, and when you’re ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover the AWS Directory Service. This is another service which I think is often overlooked and undervalued. It provides a managed directory, a store of users, objects, and other configuration. Now, it's delivered as a managed service, and it has a few versions and lots of use cases. So we do have a lot to cover in this architecture lesson. Let’s jump in and take a look.
Before we start, I want to talk about directories in general. What are they, and what do they do? Well, directories store identity and asset-related information. So things like users, groups, computer objects, server objects, and file shares. They hold all of these objects in a structure that is hierarchical, like an inverted tree. This is often referred to as a domain. But regardless of its name, it's essentially an inverted tree structure that holds identity-related objects.
Now, multiple directories, each of which provides a tree structure, can be grouped together into what's called a forest. Directories are commonly used within larger corporate Windows environments. You can join devices to a directory, such as laptops, desktops, and servers. Directories provide centralized management and authentication, which means you can sign in to multiple devices with the same username and password. It allows corporate IT staff to centrally manage all of the identity and asset information in one single data store.
One of the most common types of directory in large corporate environments is Active Directory by Microsoft, known as Microsoft Active Directory Domain Services (ADDS). But there are alternatives. Another common one is SAMBA, which is an open-source implementation of Active Directory. It’s designed to provide an alternative, but it only provides partial compatibility with Active Directory. This is something you need to be aware of when it comes to picking the mode that the directory service will be operating in.
Now, let's look at Directory Service specifically. Directory Service is an AWS implementation of a directory, the equivalent of Active Directory, as RDS is to databases. Using it means you have no admin overhead of running your own directory service, and that admin overhead is often significant. Directory Service runs within a VPC; it's a private service. So to use it for services, those services either need to be within that VPC, or you need to configure private connectivity to that VPC.
It provides high availability by deploying into multiple subnets in multiple availability zones within AWS. Now, there are certain AWS services, such as EC2, that can optionally use a directory. For example, Windows EC2 instances can be configured to join the directory, allowing you to use identities inside that directory to log in to the EC2 instance. You can also configure a directory for centralized management to various Windows features running on Windows EC2 instances.
Certain services within AWS require a directory. An example is Amazon Workspaces, which is a virtual desktop product where you can get a virtual operating system on which to run applications. If you've ever used Citrix or something similar, Amazon Workspaces is AWS's version of this. But it needs a directory, and that directory needs to be registered with AWS, so it requires the Directory Service product. To join EC2 instances to a domain via the AWS tools, you also need to have a registered directory inside AWS.
The Directory Service product is an AWS-supported and registered directory service within AWS that other AWS products can utilize for identity and management purposes. When you create a directory, you'll be doing so with a number of different architectures in mind. It might be an isolated directory, meaning inside AWS only and independent of any other directory that you might have, or it can be integrated with an existing on-premises directory, almost like a partner directory. Alternatively, you can use the Directory Service in what's called connector mode, which proxies connections back to your on-premises system. This essentially allows you to use your existing on-premises directory with AWS services that require a registered directory service.
I want to quickly step through each of these different architectures visually before we finish up. For the exam, you only need to have an awareness of the architecture, and I find that by looking at it visually, it helps keep it in your memory for when you sit the exam. Let’s do that next and step through each of the different modes that the Directory Service can run in.
First, we’ll look at the Directory Service running in simple AD mode. This is the cheapest and simplest way that the product can run inside a VPC. We start with a VPC and say we want to run Amazon Workspaces within this VPC. These Workspaces will be used by some Animals for Life users. Since Workspaces as a product requires a directory service, when you log into a workspace, you're not logging into a local user; you're logging in using a user of that directory. Therefore, it needs some type of directory registered within AWS, and one option is deploying the Directory Service in simple AD mode.
Simple AD is an open-source directory based on SAMBA4, providing as much compatibility with Microsoft Active Directory as possible but in a lightweight way. If you see any mention of open-source or SAMBA4, think simple AD. You can create users and other objects within simple AD, and it can integrate with Workspaces. Simple AD can operate in two different sizes: it can support up to 500 users in small mode and up to 5,000 users in large mode. It integrates with many AWS services such as Amazon Chime, Amazon Connect, Amazon QuickSight, Amazon RDS, WorkDocs, WorkMail, WorkSpaces, and even the AWS Management Console, allowing you to sign in with users of the directory service.
There are other services such as EC2, which can also utilize the Directory Service, either from the console or by manually configuring the operating system of the EC2 instance. When you deploy a simple AD Directory Service, you’re deploying a highly available version of SAMBA, so anything that can join this SAMBA directory is capable of joining the Directory Service running in simple AD mode.
The critical thing to understand about simple AD mode is that it's designed to be used in isolation. It’s not designed to integrate with on-premises systems, nor is it a full implementation of something like Microsoft Active Directory. If you need something bigger and more feature-rich, you can opt for Managed Microsoft AD mode. This mode is for when you want a direct presence inside AWS and also have an existing on-premises environment.
Using this mode, you can create an instance of the Directory Service inside AWS. Architecturally, it’s similar to simple AD, where you can create users within the directory service hosted inside AWS. Once created, services inside AWS can integrate directly with the directory service. Additionally, you can create a trust relationship with your existing on-premises directory, and this connection requires private networking such as a direct connect or VPN connection.
The benefit of this mode is that the primary location is in AWS, and it trusts your on-premises directory. Even if the VPN fails, the AWS services that rely on the directory can still function. When you deploy Directory Service in Microsoft AD mode, it’s a fully fledged directory service in its own right, not reliant on any on-premises infrastructure. It’s also an actual implementation of Microsoft Active Directory, specifically the 2012 R2 version, supporting applications requiring Microsoft AD features like schema extensions, such as Microsoft SharePoint and Microsoft SQL Server-based applications.
So, if you encounter exam questions about requiring an actual implementation of Microsoft Active Directory, complete with trust relationships with an on-premise Microsoft Active Directory, then you need to use the managed Microsoft Active Directory mode, not simple AD.
Lastly, there’s the AD Connector mode. Consider a scenario where you only want to use one specific AWS service that has a directory service requirement, like Amazon Workspaces. In this example, you already have an on-premises directory and don’t want to create a new one just to use this one product. AD Connector provides a solution. To use AD Connector, you need to establish private network connectivity between your AWS account and your on-premises network, such as via a VPN. Once the VPN is established, you can create the AD Connector and point it back at your on-premises directory.
It's critical to understand that the AD Connector is just a proxy. It exists solely to integrate with AWS services, so any AWS services that need a directory will see the AD Connector and know they have access to an active directory instance, but it doesn’t provide any authentication of its own. It simply proxies the requests back to your on-premises environment.
If the private network connectivity fails, the AD Connector stops working, and any services using it could experience issues. This means that AD Connector is best used when you already have a directory on-premises and just want to use AWS products and services requiring a directory without deploying a new one.
One important thing for the exam is knowing when to pick between the different modes for Directory Service. Start with simple AD. It's your default, designed for simple requirements, if you need an isolated directory within AWS and don't need connectivity with on-premises. Use it to support AWS products and services that require a directory. If you need an actual implementation of Microsoft Active Directory or have applications expecting it, use Microsoft AD. If you just need AWS services to access a directory but don’t want to manage a directory in the cloud, use AD Connector.
Remember that the AD Connector doesn't provide functionality of its own. It simply proxies the requests to your on-premises environment. It requires connectivity to your on-premises environment, and that environment must be fully functional. If either of those things is not true, the AD Connector will fail.
A major difference between the older SAAC01 exam and the newer C02 exam is that there are more questions about Windows environments. That’s why I’ve included this lesson on Directory Service. You need to be aware of all the products and services that can support and implement Windows environments within AWS and enable hybrid operations with your on-premises environments.
For the associate-level exam, the questions won’t be very challenging. They’ll focus on the high-level architecture, so you won’t need to know implementation details. That’s why this is a theory-only lesson. This lesson provides all the information you need to answer any directory service questions you might encounter on the exam.
At this point, that’s everything I wanted to cover. So go ahead, complete this video, and when you’re ready, I’ll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to talk about a really effective set of services which AWS provides for moving data between on-premises and AWS. These services are Snowball, Snowball Edge, and the AWS Snowmobile. Now we've got a lot to cover, so let's jump in and get started.
At a high level, the Snowball series—Snowball, Snowball Edge, and Snowmobile—are designed to move large amounts of data in or out of AWS. In many cases, especially with smaller projects, you can just upload data over an internet connection or, in some cases, if you have specific requirements, you can use Direct Connect. But there are situations where the amount of data that you need to move makes this impractical. This could be because of the speed of your internet connection making a transfer of this size impractical, or because you need to get huge amounts of data into AWS as quickly as possible.
Now, the devices in this product series are physical storage units—either suitcase-sized or the size of a truck, literally carried on trucks inside a custom-built shipping container. You can either order them empty, load them up with data, and then return them to AWS, or the reverse, so you can order them with data, receive them, empty off the data, and return the appliances to AWS.
The key things that you need to know for the associate exam are not specifically the product specifics, but rather when and where you need to use the products. So you don't need to be aware of the implementation details, just the architecture. That's what I want to cover in this lesson, and we're going to start with Snowball.
When you're using the Snowball product, it's actually a physical process—you're interacting physically with AWS, and so it's a device that you order from AWS, log a job, and the device is delivered to you, so it's not an instant process. Any data stored on a Snowball is encrypted using KMS, so there is encryption at rest. Anything that's persistently stored on that device is encrypted using KMS.
There are two types of Snowball devices. You can get a 50 TB device or an 80 TB device. In terms of network connectivity, you can connect to the Snowball using either 1 gig or 10 gig networking. That's important to make sure that you have that physical, cabled network connectivity wherever you get the Snowball delivered to because you will need to use physical networking.
While the capacity of the normal Snowball device is either 50 TB or 80 TB, the economical range for using Snowball is generally within the region of 10 TB to 10 PB of data. So if the amount of data that you need to transfer in or out of AWS is in that range, then it tends to be economical to use Snowball rather than transferring the data across the internet, Direct Connect, or any other connection technology. For large amounts of data, physical transfer using Snowball is often the most economical.
One of the benefits of using Snowball or Snowball Edge (which I'll cover next) is that you can order multiple devices. While it has a 50 or 80 TB capacity, and the economical range is 10 TB to 10 PB, you can order multiple devices and get them sent to multiple business premises. This is a really important architectural benefit: you could order 10 Snowballs and have one deployed into each of 10 business premises, using those devices to ingest data, send it back to AWS, and get that data accessible inside your AWS environment. So remember for the exam, multiple devices, multiple premises, and try your best to remember the economical range—10 TB to 10 PB.
Now, Snowball, as a device, only includes storage. It's only a storage device, and it doesn't include any compute capability. This is important because it's in contrast to the next product that I want to talk about, which is Snowball Edge. Snowball Edge comes with both storage and compute, so it's like the Snowball product, but in addition, it comes with compute capability. So architecturally, you tend to use Snowball when you've got large amounts of data to ingest or get out of AWS. With Snowball Edge, you’ve got some other architectural patterns that you can use that involve compute, and we’ll talk about that in a second.
Another benefit of Snowball Edge is that it has a larger capacity versus the Snowball product, and it also offers faster networking. You've got 10 gig over RJ45, 10 or 25 over SFP, and then 45, 50, or 100 over QSFP+, so whatever local connection technology you have, you can generally achieve faster connection to Snowball Edge versus Snowball.
There are actually three different types of Snowball Edge. First, we’ve got storage optimized, but there’s also a slight variant of the storage optimized series which includes EC2 capability. By default, the storage optimized version of Snowball Edge comes with 80 TB of storage, 24 vCPUs, and 32 GiB of memory, but if you include the EC2 capability option, it also comes with 1 terabyte of local SSD. So you can actually run EC2 instances on top of this using that 1 terabyte of SSD.
Now, we've also got compute optimized variants, which include 100 TB of storage plus 7.68 gig of NVME storage. This is storage directly attached to the PCI bus, it's super fast, and super low latency, which is beneficial when you've got aggressive compute requirements that you want to run on the Snowball Edge. The compute optimized also includes 52 virtual CPUs of capacity and 208 GiB of memory. Then finally, there’s also the compute optimized with GPU capability, which, in addition to all of those resources, also includes GPU capability. So for any scientific analysis, modeling, or any parallel activities that benefit from a GPU, you can also get the Snowball Edge with GPU capability.
In terms of when you’d use the Snowball Edge versus the Snowball: Snowball is an older generation, purely for storage. Snowball Edge includes compute, so if you've got any remote sites where you need to perform data processing on data as it's ingested, then you should use the Snowball Edge. If you’ve got higher capacity requirements, or need faster networking, or if you have a lot of data and need to make sure that you load it onto the device as quickly as possible, then by having the faster networking provided by the Snowball Edge, the turnaround time can be quicker.
The last piece of this product set is the Snowmobile, and the Snowmobile is a portable data center within a shipping container on a truck. I literally mean this: it is a truck that's delivered to your business premises, and on the back of this truck is a custom-designed shipping container. In that container is a portable data center, so this is a product that needs to be specially ordered from AWS. It's not something that's available in high volume and it's not available everywhere. It's generally only used when you have a single location with huge amounts of data that you need to ingest into AWS, such as when you’re dealing with large enterprises or performing a traditional data center migration of anything over 10 petabytes of data—that’s the type of scenario where you might use a Snowmobile.
Snowmobiles can actually store up to 100 petabytes of data per Snowmobile. The process is that you order a Snowmobile, it's driven to your data center location, and the back of the shipping container is opened up. They expect to connect into data center-grade power and data networking. Essentially, this is driven to your location, you plug it into your data center, and use it to transfer data into AWS.
For the exam, remember it's a single truck. Why this matters is that it's not economical for smaller than 10 petabytes or when you have multi-site migrations. You have to remember that this is one single device. So, if you have, for example, 10 petabytes of data, but that’s spread across lots of different sites, then logically, you would be using the Snowball Edge product. If you've got all this data on one single site, then potentially, it’s more economical to use the Snowmobile. You can't physically split a Snowmobile into multiple trucks. You can't do a road trip around your different data centers. A Snowmobile drives out to one location, it's plugged in, used to transfer data into, and then it drives off back to AWS.
So if your requirement is multi-site, then unless it really is a huge scale migration, you would not look to use a Snowmobile. These are the three different products: the Snowball, the Snowball Edge, and the Snowmobile. It will really benefit you in the exam to understand the exact set of requirements when you would and wouldn't use each of these products. I don't expect you to get any questions that test your knowledge in detail. You certainly won’t get any questions where you need to physically know the process of ordering and transferring, but from an architectural perspective, you will gain massive benefit from knowing the details of each of these products. That's what I've tried to do in this lesson.
With that being said, that is everything I wanted to cover in this lesson. Go ahead and complete the video, and then when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this final part of the Storage Gateway series, I want to talk about Storage Gateway running in file mode. So far, I've covered volume mode, where Storage Gateway handles raw block volumes, and VTL mode or tape mode, where Storage Gateway pretends to be a physical tape backup system. Running in file mode, as you can guess from the name, Storage Gateway manages files. Now, we have a lot to cover because this is one of the most feature-rich modes of Storage Gateway, so let's jump in and get started.
I want to stress one thing right at the start: for any Storage Gateway questions in the exam, if you see volumes mentioned, you should default to volume gateway. If you see tapes mentioned, default to VTL mode. If you see files mentioned, then default to file mode and only move away if you see something that eliminates any of those options. File Gateway bridges on-premises file storage and S3, linking local file storage to an S3 bucket. With a file gateway, you create one or more mount points or shares, which are available via two protocols: NFS (generally used for Linux servers and workstations) or SMB (a Windows network file-sharing protocol). These are another pair of keywords that will help you distinguish between volume gateway, tape gateway, and file gateway. So, if you see any mention of NFS or SMB with a Storage Gateway question that concerns files, you know it's going to be the file gateway.
These file shares or mount points you create within the file gateway map directly to one S3 bucket, which is in your account. You manage this S3 bucket and have visibility of it. This means that when you store files onto a mount point over SMB or NFS, they appear in the S3 bucket as objects. If you store objects into an S3 bucket, they’re visible on the corresponding mount point on-premises. This is essentially the key benefit of using Storage Gateway running in file mode: it translates between on-premises files and AWS-based S3 objects, which is super powerful from an architecture perspective. Like the other storage gateways, it typically runs on-premises, and to ensure performance, it also does read and write caching. This caching ensures that the performance achieved is comparable to anything else running on a local area network.
The file gateway isn't an overly complex product; essentially, what you see on screen is what it does. But where the power comes from is how it integrates with S3 and how you can take advantage of S3 features to implement some really useful architectures. Over the remainder of this lesson, I want to step through those architectures so that you can get some idea of how you can use it effectively in production and answer any questions relating to the file gateway that you might experience in the exam.
A typical architecture with file gateway starts with business premises on the left and AWS on the right. File gateway runs as a virtual appliance in most cases, and it has local storage that it uses as a read and write cache. This gives the data managed by the storage gateway near local area network performance. On the storage gateway end (on-premises), we create file shares, and each of these file shares is linked with a single S3 bucket running in your account. This link between a file share and an S3 bucket creates what's known as a bucket share. These file shares can be accessed from any local servers using NFS for Linux servers and SMB for Windows servers. If you're using a Windows share, you can also use Active Directory authentication for even better integration with a Windows environment.
The reason why file gateway is so powerful is because the file shares and the buckets are linked together. This means that S3 objects are visible in the file share and vice versa. Files on-premises map directly onto objects running in AWS, so there's a mapping between the file name and the object name. The structure of on-premises file shares is preserved by building that structure into the object name within S3, much like how S3 emulates a nested file system structure within a flat object storage system by building it into the object name. For example, a file called "winky.jpeg" on the left will be represented by an object called "winky.jpeg" inside the S3 bucket. Another file called "ruffle.png" inside the "omg wow" folder will be represented by an object called "/omg wow/ruffle.png," showing how a structure is emulated by building it into the object name.
You can have up to 10 of these shares per storage gateway, and crucially, the primary data is held on S3. The only thing stored locally is within the local cache, which holds data written or read from the storage gateway and is designed to improve performance to levels near those achieved from any other resources running on a local area network. Because the objects are stored within S3, you have the ability to integrate with other AWS services. You can use S3 events and Lambda, as well as other AWS services such as Athena. Anything that can use S3 as a source location will have access to any files stored on S3 indirectly using the file gateway.
At a high level, this is the architecture of a file gateway: it allows you to extend your local file storage into AWS using S3. If you see the keyword "file" in the exam, possibly with the keyword "extension," a possible answer is the Storage Gateway running in file mode. However, the product goes far beyond this. The reason it’s my favorite mode of Storage Gateway is because it enables some really cool hybrid architectures. The architecture shown on screen now is a simple two-site hybrid architecture with on-premises on the left and AWS on the right.
In this architecture, we still have the Storage Gateway in the on-premises environment on the left, and there's still the one-to-one relationship between the files presented by the Storage Gateway and the object in the S3 bucket. But we can add another on-premises environment at the bottom, and this environment also presents the same set of objects from the same bucket as files. There are some concerns to keep in mind. First, when you update a file on a local storage gateway, that update is copied into S3 automatically. But to save on resource usage and avoid unnecessary S3 listings, there's no automatic version of that in reverse. When you list the file share on-premises, you're listing the most recent copy of the S3 bucket that the gateway is aware of. For example, if you added a new cat picture to the top storage gateway, that would create a new object immediately in the S3 bucket. However, if you then listed the file share in the bottom on-premises environment, it wouldn't show until you initiated a listing.
There’s a feature of Storage Gateway called "notify when uploaded." I'll make sure to include a link to the lesson detailing this functionality, but at a high level, this sends an event using CloudWatch events to inform other Storage Gateways when an update has occurred. However, this needs to be designed into your solution—it doesn't occur by default. Another point to be aware of is that File Gateway doesn’t support any form of object locking. This means that if two users are editing the "winky.jpg" file in the top and bottom environments and they both write, there isn’t any form of checking or control over this access. This can result in data loss, where one update overwrites another. So either make sure that one of the shares is read-only and the others are read-write or implement some form of control on who accesses files and when.
Another powerful architecture supported by Storage Gateway in file mode is replication. Given this architecture, where we have two customer sites linked to the S3 bucket in US East 1, we could create another bucket in, say, US West 1 and then configure cross-region replication of that data between the two buckets. This gives us a nice way to implement multi-region disaster recovery without any significant changes to infrastructure or much in the way of additional costs. We can also use File Gateway and S3 lifecycle management together.
Let’s say we have this architecture with on-premises on the left and AWS on the right. Using File Gateway means that the files and objects remain in sync because it’s using S3. There are different storage classes available for objects, such as S3 Standard, S3 Standard Infrequent Access, and S3 Glacier. When you create a file share, you specify the default storage class to use, usually S3 Standard. However, on top of this default, you can create lifecycle policies within an S3 bucket, maybe configuring them to move objects from Standard to Standard-IA after 30 days. Behind the scenes, objects are moved automatically between these two classes, and because this is an automatic process, it repeats as additional objects reach a certain age. More objects move between these different storage classes. Multiple steps are allowed, so in addition to the move after 30 days to Standard-IA, you could also have a 90-day move to Glacier, meaning objects over time move to cheaper storage. This process happens in the background automatically, making it cost-effective, and because the primary copy of all data is in S3, any on-premises locations automatically benefit from this cost-effective storage system.
File Gateway is a really cool product, but to fully appreciate it, it helps to experience it in practice. For the exam, though, this lesson has covered everything you’ll need. I’ve covered all the main features and architectural patterns of File Gateway. For the exam, try to make sure that you're familiar with when you'd use each type of Storage Gateway—when it makes sense to use volume stored or when it makes sense to use volume cached, what situations VTL mode is useful in, and the same question for File Gateway. If you’re in doubt, come and talk to me on techstudieslack.com, and we can discuss this in more detail. But with that being said, thanks for watching. Go ahead, complete the video, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson of the Storage Gateway series, I wanted to cover Storage Gateway running in Tape mode, also known as VTL mode, which stands for Virtual Tape Library. Now, try your best as you go through this lesson to ignore the fact that the logo looks like a toilet roll. It's one of those things where once you've seen it, it can't easily be unseen. I also often forget that there might be some of my students who don't actually know what Tape backup is, so I want to quickly cover that before I show you how Storage Gateway can integrate Tape backups and AWS. So, let's jump in and take a look.
Large enterprise backups in recent times tend to run in one of three ways: backup to Tape, backup to disk, or off-site backup to a remote facility over a network link. For this lesson, we're focusing on Tape backup. Now, there are various different types of tapes, and one popular type is called LTO, which stands for Linear Tape Open. This is an example of some LTO tapes. They come in various different generations, and one of the more recent is LTO 9, which is capable of storing 24 terabytes of data per tape, and this is uncompressed data. If you add compression, it allows for storage up to 60 terabytes per tape.
All Tape backup solutions have at least one Tape drive. A Tape drive can logically be empty or have a tape inserted. Assuming it does have a tape inserted, then the Tape drive can read from the tape or write to the tape, and it's sequential, meaning not random access like disk or SSD-based storage. Now, to find data, you need to seek through the entire tape, locate the data, and then read it. Data that's stored on tape can't easily be updated. You need to overwrite the data that's already on that tape. It's not really possible to modify data stored on tape. Essentially, it's designed as a medium that allows write as a whole or read as a whole.
Now, as well as a tape drive, you have what's known as a Tape loader, sometimes called a Tape robot. Think of this as a literal robot arm, which can insert tapes, remove tapes, or swap tapes between a drive and somewhere else. Now, a tape library is a piece of equipment that often fits in a rack or is the size of one or more racks. It contains one or more tape drives, one or more tape loaders (the robots which move tapes), and a collection of slots, and these slots can store tapes when they're not in the drive. So, a tape library could have room for eight tapes, 32 tapes, hundreds of tapes, or even thousands of tapes.
So, when we're discussing tape backup, we've got a number of different components. We've got the drive itself, where tapes go to be read from or written to. Next, we've got the library, and the library consists of a tape drive or tape drives, the robots, and a number of slots that can store tapes when they're not in the drives themselves. And then, thirdly, we've got a tape shelf. Now, this is a throwback to the physical world where tapes were stored on shelves. If you see a tape shelf mentioned, it's anywhere that isn't the library, so another location that could be in the same building or a different physical site entirely.
In a traditional tape backup situation, this is what the architecture might look like. On the left, we've got a business premises, and inside it, we've got a number of application and data storage servers, which aren't shown, together with a backup server and a tape library. Now, the backup server connects to the tape library using a protocol known as iSCSI, and this iSCSI connection exposes a few devices, the most important ones being one or more tape drives and a media changer. A media changer is just another name for a tape loader or a tape robot.
Now, the important thing to realize is that everything has a cost, so the equipment costs money to buy—that's the tapes, the software, and the library—and their capital expenditure costs. Then, to operate it, there's ongoing costs such as licensing, maintenance, and staffing costs. It's not cheap to run an enterprise-grade backup architecture such as the one that's on screen now. In addition to this, we have the other side of the architecture—off-site tape storage, which is often managed by a third party. This location is often a decent distance away from your primary business premises, and this is done to provide resilience in the event of a disaster.
Now, only tapes that are in the library itself can be used for backups, and to keep data safe, any tapes that aren't being actively used are moved from your primary premises to off-site tape storage, and this transport costs money and takes time. So, we have a few main problems with this architecture: we have the cost to purchase, the cost to maintain, the cost to operate, and the time and cost to move things around between our primary premises and our off-site tape storage. Storage Gateway running in VTL or tape mode fixes many of these problems, so let's take a look at how.
Using Storage Gateway in VTL mode, much of the architecture changes. About the only thing that is shared is that we still have a business premises and a backup server. Instead of the on-premises tape library, we use Storage Gateway in tape or VTL mode, and this presents itself in the same way to the backup server using iSCSI. Now, this is actually part of the design: the backup server sees this as a normal tape loader; it doesn't know the difference between this and the previous physical tape library. There are very little, if any, software changes required beyond connecting the backup software to the Storage Gateway.
The Storage Gateway presents a media changer and the tape drives, but it looks like a normal physical tape library. However, it has an upload buffer and a local cache, much like Volume Gateway running in cache mode. It uses these to store data that is being actively used, essentially virtual tapes, and I'll talk about these in a second. Instead of using physical tapes, which are present in a local tape library, the Storage Gateway communicates with the Storage Gateway endpoint within AWS. This is much like the Volume Gateway. The Storage Gateway endpoint then presents two main capabilities: the Virtual Tape Library (VTL), which is backed by S3, and the Virtual Tape Shelf (VTS), which is backed by either Glacier or Glacier Deep Archive.
Now, conceptually, the Virtual Tape Library is an AWS-hosted version of a tape library, which the Storage Gateway uses, and the Virtual Tape Shelf is for virtual tapes that have been logically moved out of that Virtual Tape Library. So, the on-premises Storage Gateway communicates with the Virtual Tape Library, caches stuff locally, and uploads in the background to the Virtual Tape Library. So, backups occur at LAN speeds to the local storage that the Storage Gateway uses, and then in the background, any backup data gets uploaded to the Storage Gateway endpoint running inside AWS, specifically the Virtual Tape Library.
Now, a virtual tape can be anywhere from 100 gig to 5 terabytes. If you have a good memory, you might notice that the 5 terabyte maximum size for a virtual tape is also the maximum size of an S3 object, and that's because these virtual tapes are stored using S3 in an AWS-managed Storage Gateway bucket. Now, the Storage Gateway can handle a total of one PB of data across 1500 virtual tapes within the Virtual Tape Library. Remember, this is the part which uses S3.
When virtual tapes aren't being used for anything, they can be exported within the backup software, and this marks them as not being within the library. Now, in the physical architecture, which I showed you on the previous screen, this would then mean ejecting them from the physical tape library and moving them to off-site storage. An exported tape simply means that it's not in the library. Now, when you export a virtual tape using Storage Gateway in VTL mode, it archives it from the Virtual Tape Library (VTL) into the Virtual Tape Shelf (VTS), and this moves the data from S3 into Glacier or Glacier Deep Archive, which offers unlimited storage.
The limit of this Storage Gateway appliance is only for tapes that are stored in the Virtual Tape Library. Anything which you archive into the Virtual Tape Shelf benefits from unlimited amounts of capacity. Now, Glacier is generally used for archival when there's an expectation that at some point you will need access to the data, so anything that's infrequently accessed can be archived into the Virtual Tape Shelf using Glacier. Glacier Deep Archive is used for longer-term data retention, where you might never need to access the data again, but you do need to keep it, maybe for legal reasons. In either case, if you need to access the data again, then it can be retrieved from the Virtual Tape Shelf into the Virtual Tape Library, and then it can be accessed again by the Storage Gateway.
Now, I'm hoping by this point that you understand what this mode of the Storage Gateway provides. It essentially pretends to be an iSCSI tape library, an iSCSI changer, and an iSCSI drive, and it uses a combination of S3, Glacier, and Glacier Deep Archive to support the backup and restore functionality of a physical tape library. Now, this gives you a few interesting capabilities. You can use it to maintain your existing backup system, which you might need to keep, but replace much of the expensive parts with economical AWS storage. It also allows you to extend an additional backup system by using capacity within AWS, so you can use this together with existing backup software and extend any limited on-premises capacity into AWS by using S3 and Glacier.
Or, it also lets you do a migration. So, let's say that you're migrating everything into AWS, but you need to maintain a historical set of tape backups. Well, you can migrate those tape backups onto Storage Gateway using VTL mode, decommission the physical tape hardware, and then even run the Storage Gateway appliance and the backup server from within AWS for any data retrieval needs. So, this type of Storage Gateway presents some really interesting architectural possibilities.
For the exam, anything which involves traditional tape backup architecture, then Storage Gateway running in VTL mode is likely to be the correct answer. VTL mode is a really powerful product, which I've used a few times in real-world projects, generally as part of data center extensions or migrations of backup platforms. But at this point, that's everything that I wanted to cover about Storage Gateway running in VTL mode. So, thanks for watching. Go ahead and complete the video, and I'll look forward to you joining me in the next part of this Storage Gateway series, where I'll be covering the file gateway, which is probably my favorite Storage Gateway mode.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back.
Over the next few lessons, I'm going to be covering Storage Gateway in more depth, focusing on the types of architectures it can support. The key to exam success when it comes to Storage Gateway is understanding when you would use each of the modes, as each has its own specific situation where it should or shouldn't be used. In this lesson, I'll start off with the Storage Gateway running in Volume Stored mode and Volume Cached mode—so let's jump in and get started.
Storage Gateway normally runs as a virtual machine on-premises, although it can be ordered as a hardware appliance. However, it's much more common to use the virtual machine version of this product. It acts as a bridge between storage that exists on-premises or in a data center and AWS. Locally, it presents storage using iSCSI (a SAN and NAS protocol), NFS (commonly used by Linux environments to share storage over a network), and SMB (used within Windows environments). On the AWS side, it integrates with EBS, S3, and the various types of Glacier.
As a product, Storage Gateway is used for tasks such as migrations from on-premises to AWS, extending a data center into AWS, and addressing storage shortages by leveraging AWS storage. It can implement storage tiering, assist with disaster recovery, and replace legacy tape media backup solutions. For the exam, you need to identify the correct type of Storage Gateway for a given scenario—and that's what I want to help you with in this set of lessons.
As a quick visual refresher, a Storage Gateway is typically deployed as a virtual appliance on-premises. Architecturally, you might also have some Network Attached Storage (NAS) or a Storage Area Network (SAN) running on-premises. These storage systems are used by a collection of servers—also running on-premises. The servers probably have their own local disks, but for primary storage, they're likely to connect to the SAN or NAS equipment.
These storage systems (SANs or NASs) generally use the iSCSI protocol, which presents raw block storage over the network as block devices. The servers see them as just another type of storage device to create a file system on and use normally. This is a traditional architecture in many businesses. What's also common, especially for smaller businesses, is limited funding for backups or effective disaster recovery, prompting them to consider AWS as a solution to rising operational costs or as an alternative to maintaining their own data centers.
So how does Storage Gateway work? Volume Gateway works in two different modes: Cached mode and Stored mode. They are quite different and offer distinct advantages. First, let's look at Stored mode. In this mode, the virtual appliance presents volumes over iSCSI to servers running on-premises, functioning similarly to NAS or SAN hardware. These volumes appear just like those presented by NAS or SAN devices, allowing servers to create file systems on top of them as they normally would.
In Gateway Stored mode, these volumes consume local capacity. The Storage Gateway has local storage, which serves as the primary location for all the volumes it presents over iSCSI. This is a critical point for the exam—when you're using Storage Gateway in Volume Stored mode, everything is stored locally. All volumes presented to servers are stored on on-premises local storage.
In this mode, Storage Gateway also has a separate area called the upload buffer. Any data written to the local volumes is temporarily written to this buffer and then asynchronously copied into AWS via the Storage Gateway endpoint—a public endpoint accessible over a normal internet connection or a public VIF using Direct Connect. The data is copied into S3 in the form of EBS snapshots. Conceptually, these are snapshots of the on-premises volumes, occurring constantly in the background without human intervention. That's the architecture of Storage Gateway running in Volume Stored mode. Think about the architecture and what it enables, because this is what's important for the exam.
This mode is excellent for doing full disk backups of servers. You're using raw volumes on the on-premises side, and by asynchronously backing them up as EBS snapshots, you get a reliable full disk backup solution with strong RPO and RTO characteristics. Volume Gateway in Stored mode is also great for disaster recovery, since EBS snapshots can be used to create new EBS volumes. In theory, you could provision a full copy of an on-premises server in AWS using just these snapshots.
However—and this is important for the exam—this mode doesn't support extending your data center capacity. The primary location for data using this mode is on-premises. For every volume presented, there's a full copy of the data stored locally. If you're facing capacity issues, this mode won't help. But if you need low-latency data access, this mode is ideal, as the data resides locally. It also works well for full disk backups or disaster recovery scenarios.
I emphasize “full disk” here because in the next lessons, I’ll cover other Storage Gateway modes that also help with backups. Volume Gateway deals in volumes—raw disks presented over iSCSI. Some key facts worth knowing (though not required to memorize for the exam): in Volume Stored mode, you can have 32 volumes per gateway, with up to 16 TB per volume, for a total of 512 TB per gateway.
Now let’s turn to Volume Gateway in Cached mode, which suits different scenarios. Cached mode shares the same basic architecture: the Storage Gateway still runs as a virtual appliance (or physical in some cases), local servers are still presented with volumes via iSCSI, and the Gateway still communicates with AWS via the Storage Gateway endpoint, which remains a public endpoint using either internet or Direct Connect.
The major difference is the location of the primary data. In Cached mode, the main storage location is AWS—specifically S3—rather than on-premises. The Storage Gateway now only has local cache, while the primary data for all presented volumes resides in S3. This distinction is crucial: in Volume Stored mode, the data is stored locally; in Cached mode, it’s stored in AWS and only cached locally.
Importantly, when we say the data is in S3, it's actually in an AWS-managed area of S3, visible only through the Storage Gateway console. You can’t browse it in a regular S3 bucket because it stores raw block data, not files or objects. You can still create EBS snapshots from it, just like in Stored mode.
So the key difference between Stored and Cached modes is the location of the data. Stored mode keeps everything on-premises, using AWS only for backups. Cached mode stores data in S3, caching only the frequently accessed portions locally. This offers substantial architectural benefits: since only cached data is stored locally, you can manage hundreds of terabytes through the gateway while using only a small local cache. This enables an architecture called data center extension.
For example, imagine an on-premises facility with limited space and rising storage needs. Instead of investing in more hardware, the business can extend into AWS. Storage in AWS appears local, but it's actually hosted in the cloud. While Volume Stored and Cached modes are similar in using raw volumes and supporting EBS snapshots, only Cached mode enables extending data center capacity.
Stored mode is for backups, DR, and migration. It ensures local LAN-speed access, but requires full data storage locally. Cached mode allows AWS to act as primary storage, storing frequently accessed data locally, enabling cost-effective capacity extension while maintaining low-latency access for hot data. Less frequently accessed data may load more slowly, but it allows huge scalability. In Cached mode, a single gateway can handle up to 32 volumes at 32 TB each—up to 1 PB of data.
In summary, both modes work with volumes (raw block storage), but Stored mode stores everything locally and uses AWS only for backups, while Cached mode stores data in AWS and caches hot data locally, supporting data center extension. For the exam, if you see the keyword “volume” in a Storage Gateway question, you’re dealing with Volume mode. Deciding between Stored and Cached will depend on whether the scenario focuses on backup/DR/migration or on extending capacity.
That wraps up the theory for this lesson. In the next lesson, I’ll cover another mode of Storage Gateway: Tape mode, also known as VTL mode. Go ahead and complete this lesson, and when you’re ready, I look forward to having you join me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about the AWS Transit Gateway known as TGW. Now this is a product that I remember being super excited about because at the time there was a massive hole in the hybrid network capability on the AWS platform. Transit Gateway answered a lot of the complexity issues which plagued AWS in this space. Now understanding why the Transit Gateway is such a valuable product means focusing both on the features it provides as well as how it can help evolve network architectures and that's something I'll attempt to do in this lesson. Now we do have a lot to cover so let's jump in and get started.
The Transit Gateway is a network transit hub which connects VPCs to each other and to on-premises networks using site-to-site VPNs and Direct Connect. It's designed to reduce the complexity of network architectures within AWS. It's a network gateway object so another one that you need to remember and like other network gateway objects it's highly available and scalable. The architecture is that you create attachments which is how Transit Gateway connects to other network objects within AWS and it's how it connects to on-premises networks. Currently valid attachments include VPC attachments, site-to-site VPN attachments and Direct Connect gateway attachments.
Now to understand why Transit Gateway is required it's useful to compare a moderately complex network architecture and see how Transit Gateway affects that complexity. So let's do that. Let's use Animals for Life as an example and let's assume that the Animals for Life network has evolved and they have four VPCs—A, B, C and D—as well as a corporate office. Now we want to connect these together so that every point has connectivity to every other point in a fully highly available way. We can use VPC peering connections to connect the four VPCs and these are already highly available and scalable. But as you know they don't support transitive routing and so we need to create a full mesh between all VPCs. So rather than four peering connections we'd need to have six. That's already six connections that we need to plan, implement, manage and support, but that's not everything that we need for this architecture.
We need the corporate office to be connected to all of the VPCs and because VPN routing isn't transitive we need a full mesh here as well. This means a customer gateway on the customer side and VPN connections between the VPCs and that customer gateway. So that's a total of eight tunnels, two tunnels from each of the VPCs to the customer gateway of the Animals for Life corporate office. This ensures high availability at the AWS side. But remember we need this architecture to be fully, highly available and we currently have a single point of failure: the customer gateway on the customer side. While we have multiple availability zones at the AWS side all connecting back to the customer premises, they all do so via this single customer gateway. And so to make this architecture fully, highly available we need to add another customer gateway on the customer premises—ideally using a separate internet connection and ideally in a separate premises entirely—and then have that customer gateway connected back to all four of these VPCs. So that adds another eight tunnel connections.
Now I'm hoping at this point that you start to see the problem. A full mesh network is complex. It has a lot of admin overhead to implement and to maintain. And what's more, it scales really badly. The more networks are involved, the more networks get added each time a new network is added. It's a problem which gets worse the more you scale. So in addition to getting more complex, that increase in level of complexity itself increases the more that you scale. So this is not a solution that we can use much beyond this point. If we add additional VPCs or additional customer premises, it gets really complex really quickly. And that's where the transit gateway becomes really useful.
Now using Transit Gateway, we start with the same basic architecture: four VPCs—A, B, C and D—and then the corporate premises with the two customer gateway routers. This time though, we have a Transit Gateway which we create within the Animals for Life AWS account. A Transit Gateway can use a site-to-site VPN attachment meaning it becomes the AWS side termination point for the VPN. So rather than having to have connections between the corporate office and every VPC terminated on a virtual private gateway, instead each customer gateway only has to be connected back to this single Transit Gateway. So we have the same level of high availability. We still have the four tunnels connected from different availability zones at the AWS side to different customer gateways at the on-premises side. So we don't lose any high availability with this solution, but we do reduce the number of VPN tunnels that are required.
With the Transit Gateway, you can also create attachments to VPCs and just like VPC interface endpoints that you used earlier in the course, you need to specify a subnet in each availability zone inside the VPCs that you want to use the Transit Gateway with. And when you do that, it acts as a highly available inter-VPC router. So one single Transit Gateway can route traffic between lots of different VPCs and this is a transitive routing capable device. So we only need attachments from the Transit Gateway to VPC A, B, C and D, and then all of the VPCs can talk to each other through the Transit Gateway. In addition to allowing full connectivity between all of the VPCs, because we have the VPN attachment, it means that all of the VPCs can communicate with the on-premises environment as well as the on-premises environment being able to communicate with all of the individual VPCs using this single network routing device—the Transit Gateway.
And in addition, you can also peer Transit Gateways with other Transit Gateways in other accounts in other regions. So you can use this to peer with networks that themselves are connected to the peer Transit Gateway and those can be in other AWS regions and even across accounts. And this is a really useful feature to create a global network within AWS. Any cross-region data uses the AWS global network and so benefits from the more predictable latency rather than using the public internet. And in addition, you can also attach a Transit Gateway to Direct Connect gateways if your business uses Direct Connect. And so this allows you to use the Transit Gateway as well with physical, private networking connections into your business premises. And I'll be covering this specific feature in a dedicated lesson elsewhere in the course.
Transit Gateways come with a default route table, which is how traffic is routed between attachments. But you can create a complex routing topology by using multiple route tables. So before we finish this theory lesson, just some important considerations that you should be aware of for the Transit Gateway. It does support transitive routing, which means that you don't need to create this full mesh topology. You can have a single Transit Gateway with multiple attachments and it will orchestrate the routing of traffic between any of those attachments as long as appropriate routing is in place. So you need route tables with routes on them in order for the Transit Gateway to route traffic between its different attachments. But assuming you do, then it's capable of transitive routing.
Transit Gateway can be used to create global networks. I've just talked about that. You can peer different Transit Gateways. And again, you need to be aware of this for the exam. You can share Transit Gateways between different AWS accounts using AWS RAM or Resource Access Manager. I've not covered that product yet in the course, but it's an AWS service which allows you to share products and services or components of products and services between different AWS accounts. Again, you can peer Transit Gateways with different regions in the same or cross accounts. So remember that one for the exam. It doesn't have any limitations in terms of region or accounts. You really can use it to create really complex, highly efficient routing topologies.
Now, overall, the thing to remember about the Transit Gateway is it offers much less complexity in terms of network architectures than without Transit Gateway. So instead of having to use multiple VPC peers and then a full mesh topology in terms of VPN connections, you can use this one single resilient, scalable, highly available device to perform transitive routing between all of your different networks. And that results in a massive reduction of network complexity.
Now, there's going to be a demo lesson in this section of the course. Well, you'll get the opportunity to implement a Transit Gateway inside your AWS account. You'll get to experience using a Transit Gateway instead of VPC peering connections to link different VPCs together. But with that being said, that's all of the theory that I wanted to cover in this lesson. So go ahead, complete this video. And when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson I want to talk about how you can use Direct Connect, Public VIFs, and IPsec VPNs to provide end-to-end encrypted access to private VPC networks across Direct Connect. Let's just jump in and get started covering this really useful feature of Direct Connect and virtual private gateways.
Now, using a VPN gives you an encrypted and authenticated tunnel, and this is true whether you use the public internet or run a VPN over Direct Connect. By running a VPN over Direct Connect, though, you get that plus low latency and consistent latency. You get great performance together with great security.
Now architecturally, this uses a public VIF, and many students get confused by this because it provides access to a private VPC—so why not use a private VIF? Well, remember I said that a private VIF gives access to private IPs only. A public VIF gives access to public zones, meaning public IP addresses owned by AWS. Well, with a VPN, what you're connecting to are public IPs which belong to a virtual private gateway or transit gateway, and so to access these public IPs, you need a public VIF. When you're thinking about VIFs, focus on what it is that you're trying to access, and that should inform you whether you should use a public VIF or a private VIF.
Now a VPN is great because it's transit agnostic. You can connect using a VPN to a virtual private gateway or a transit gateway over the public internet or over a public VIF. The VPN configuration is the same—it's just the transit which differs: public internet or public VIF. A VPN is end-to-end encryption from a customer gateway through to a transit or virtual private gateway.
Compare this to MaxSec, which I've covered in another lesson of the course, which is single-hop based between the AWS DX router and whatever your cross-connect is terminated into. I say this so that you understand that a VPN and MaxSec are not competitors—they do different things. A VPN provides an end-to-end encrypted tunnel between a customer gateway and an AWS virtual private gateway or transit gateway, and a MaxSec connection is between two hops in the same layer two local area network.
VPNs have wide vendor support. Getting a router which supports IPsec VPN is easier than getting a switch which supports MaxSec, although this will change over time. VPNs also have more cryptographic overhead versus MaxSec, meaning VPN speeds tend to be very limited based on the hardware that you use. MaxSec is much faster and designed for terabit or above network speeds.
Now one very common pattern for using an IPsec VPN and Direct Connect together is that a VPN can be provided immediately in minutes using software, whereas Direct Connect takes time. This means you can start with a VPN, get your traffic flows working over that, and then add a Direct Connect later—either leaving the VPN in place to encrypt primary traffic over the Direct Connect, or using it as a backup to the Direct Connect, or both.
A common form of resilience in many of my clients is to have a normal internet connection; over this you also run an IPsec VPN into AWS. Then you have a Direct Connect also running another IPsec VPN tunnel. This way, traffic is always encrypted. The Direct Connect is the primary, so the organization benefits from great performance, and you have a backup in the form of a completely separate network connection and a completely separate IPsec tunnel.
Now visually, the architecture of using a VPN and Direct Connect looks like this. So this is a typical architecture: we have two AWS regions on the left—US East 1 at the top and AP Southeast 2 in the middle. We also have a Direct Connect location in AP Southeast 2 in the middle of your screen and a business premises on the right. When using a VPN with AWS—in this case, let's assume that we're going to use a virtual private gateway—what actually happens is that two VPN endpoints are created within the AWS public zone in that region, so one in each of two availability zones, and these endpoints have public addressing.
This means that you can either connect from the customer site to these endpoints using the public internet for transit, which means lots of hops as well as a fairly varied latency, or you can create the IPsec VPN across the public VIF, which means you'll benefit from Direct Connect low and consistent latency. In both cases, it's the same encrypted tunnel between the customer on-premises gateway and the AWS VPN endpoint. Only using a Direct Connect means using a public VIF as transit, so you'll benefit from the performance improvements that Direct Connect provides.
You can even connect to VPN gateways in other AWS regions using the same public VIF across the AWS global network, and this is often a great way to provide global encrypted transit between VPCs and your business network. What I want you to take away from this lesson is that IPsec running over a Direct Connect doesn't compete with MaxSec. VPN is when you need end-to-end encryption of data from AWS to on-premises networks using something which can work equally as well over the public internet and the public VIF. It also means that you can connect to VPCs in remote regions using the same grade of equipment.
Focus on understanding why we're using a public VIF with this architecture: because we're connecting from the customer gateway to a public IP or public IPs which are provided by the virtual private gateway or transit gateway. Because we're connecting to public IPs, we need to use a public VIF. With that being said, that's everything I wanted to cover about this topic. Go ahead and complete this video, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. As an architect, you need to understand the impacts of failures at various points within the DX architecture and the way that it integrates with your on-premises networks. So let's jump in and get started because we've got a lot to cover.
Now before we start looking at how resilience works with Direct Connect, let's review an architecture which is not resilient. Now, worryingly, this is actually the way I've seen most people provision Direct Connect. A typical DX deployment has a number of physical components. First, the AWS region. Think of this as the actual infrastructure that AWS uses to deliver services in that region. Secondly, a Direct Connect or DX location, which is typically a data center in the geographic area that the region is in. Now there are often multiple DX locations in a single region and these are generally located within a major data center in a metro area. And the last typical component in most architectures is the customer premises. So your office buildings or self-managed data centers.
Now architecturally, I want you to think of the AWS region as a separate thing in terms of Direct Connect architecture. So an AWS region is connected to all of the DX locations in that region using multiple high-speed network connections. Now you can assume that this part is always highly available. So while the region and DX location are conceptually different, they're always connected with high performance, highly available networking. Now inside the DX location, which remember is a data center, are a collection of AWS DX routers and these conceptually are connected to the AWS region. So picture a DX router, which is the exit point of the AWS network.
When you order a DX connection within your AWS account, what you actually receive is a port on a DX router at a DX location. Now ideally, you'll also have equipment at this DX location and this is referred to as the customer DX router. If not, you'll need to purchase a service from a communications provider and use one of their routers, which is also going to be at the DX location. But in either case, there's another router inside the DX location, the customer or provider DX router. So when you order a DX connection, you're allocated a port on the AWS DX router and you need to arrange a connection between this port and a port on your customer DX router or provider DX router. And this process, this cable is called a cross connect and it's a single cable between both of these routers.
In addition to this, generally you'll also want to connect the DX location back to your company network. If you're a large company, you might have lots of capacity at the same data center that the DX location uses. But if not, you'll need to extend this back into your own on-premises network. Generally, you'll have a customer premises router and you'll pay a carrier or a telco provider to extend the direct connect from the customer or provider DX router all the way back through to your on-premises network. So to summarize, the AWS region is linked in a highly available way to one or more DX locations. The DX location houses the AWS DX routers in addition to your or a provider's DX router and you cross connect from the AWS DX router into your DX router with a physical cable. And then all of this is extended to your on-premises environment.
So what can go wrong with this architecture? Well, there are actually seven single points of failure on this diagram. There are actually more, but seven that I would be directly concerned about if I was implementing direct connect. The first is the entire DX location could fail. You can have power failures, buildings can and often do collapse. The AWS DX router could fail either in isolation or along with the building. The same for the cross connect. It's just a cable. Cables do fail. Your DX router could fail and what if you don't have an on-site spare? The extension from the DX location to your on-premises environment. It's just a cable. It probably goes under the road or across above ground cables and any engineering works can potentially be a risk to the stability of that cable. You might also have a failure of your actual customer premises environment or your customer premises router within that environment could also have a hardware failure.
When people think about direct connect for some reason, they have a perception that it's a resilient product. It's actually not resilient in any way by default. It's a product which is based on lots of physical components which each depend on each other. But it's also a product that's designed to be flexible. So it can actually be made into a super resilient service. Let's look at that and now that you know about the architecture details, I can speed up and zoom out a little.
Now we can improve the resilience of the architecture by provisioning multiple DX ports. So we start with the same architecture at a high level. The AWS region on the left, the DX location in the middle and the customer premises on the right. Now there are actually multiple points of connection within each DX location. If you have multiple routers in the DX location, you can configure multiple cross-connects into multiple DX ports. And from there you can extend those into multiple customer premises routers using extensions.
So this architecture has two AWS DX routers, two customer DX routers and two customer premises routers. So when you're ordering direct connect, if you order two direct connects into the same DX location, then AWS will provision these onto separate DX routers and that gives you a level of resilience. Now this architecture adds significant benefits because we have the two independent DX ports, two DX routers and two customer routers. The architecture can tolerate a failure of a router in either of those two paths or a failure of one of the extensions and still continue operating.
The design does have some problems though. Some single points of failure do still exist. If the DX location itself fails, then connectivity is lost and since all connectivity goes via a single customer location, if that single customer location fails, then connectivity to the AWS platform is also lost. So we still have two fairly major single points of failure, the two locations that are involved in this private networking. We also have a potentially hidden single point of failure with this architecture and this occurs because the extensions between the DX location and the single customer premises location could in theory travel via the same physical cable route. So if you order multiple connections between two physical locations, then generally the cable route for both of those connections could be the same. This isn't always the case but it's something to be very aware of.
I've seen many businesses build this type of architecture. They assume a high level of resilience only to find that both of their independent cables enter their building under the same sidewalk and both of those cables were broken by roadworks occurring on that same sidewalk. So to prevent this, we need another step of evolution in terms of resilience. We can't have any single points of failure if the private networking is important to our organization. So let's look at a better version of this architecture, another evolution in terms of resilience.
So at a high level, we still have the AWS region on the left but now we have two independent customer premises on the right. So two completely different buildings, ideally two buildings that are spread out geographically. In addition, we have two different DX locations. In each DX location, we're going to provision one DX port that's cross-connected into one customer DX router and each of these is going to be extended into a different customer router in a different customer premises.
If we architect the solution this way, it offers much better resilience than the previous architecture. We've got two different DX locations, meaning the architecture can tolerate the failure of one of these locations and still continue to operate. Because there are two customer premises and two customer routers, it can also tolerate the failure of hardware and location at the customer side. The only risk of outage is if an entire location fails and then the hardware in the remaining path also fails. So with this architecture, if we had hardware failure in DX location one, the solution would continue to run. If the entire DX location one failed, the solution would continue to run. If both DX location one and the customer premises one failed, the solution would still provide connectivity. Only if we had all of that fail and then in addition, if we had hardware failure in DX location two or customer premises two, would the connectivity be interrupted.
But we can take this one step further and implement an architecture designed for extreme levels of resilience. This next design offers maximum resilience. We still have the AWS region on the left, the two DX locations in the middle and the two customer premises environments on the right. However, this time in each of the DX locations, we have two DX ports on separate equipment. And then in each location, we also have two customer DX routers. This provides a level of resilience within each location covering hardware failure and resilience against the failure of an entire location.
So at each location, we have a pair of cross-connects between the AWS DX router and the customer DX router. And then these are all extended to dual customer routers at each of the customer premises locations. So this architecture gives us extreme levels of resilience. We have high availability from a direct connect location perspective. One can fail and the connectivity is still active because we have infrastructure in both. Inside each DX location, we have a pair of both AWS DX routers and customer DX routers. So even if one DX location fails, we could still lose one pair of those in the remaining DX location and still the connectivity would be fine.
The extensions from the DX location to the customer premises because they're going between two completely different locations at both sides will generally follow two completely separate routes. So this is because the starting and the end points of those connections are to different physical buildings, which means the extensions themselves and the customer premises are going to be highly available. And inside each of the customer premises because we're using multiple customer routers, these are also highly available.
Now this is relatively complex network architecture and the thing that I want you to take away from this lesson from an exam perspective and if you intend to use direct connect in real world projects is that direct connect is a physical technology and so it is not resilient in any way unless you architect it that way. So if you just provision a single direct connect, it will be a single port on a single DX router at a single DX location. You'll cross connect it to a single customer DX router, extend it to a single customer premises with a single customer router. Each step of that process will be a single pointer failure.
If you want something that is highly available and you need to use direct connect, then you need to consider one of the more resilient solutions that I've demonstrated in this lesson. Now you can also use site to site VPN as a backup for direct connect and I've explained exactly how this works in another lesson in this section of the course, but generally if you do need to use direct connects only for a design then you should definitely review all of these highly available architectures rather than provisioning a simple single direct connect. Now with that being said that's all of the theory and the architecture that I wanted to talk about in this lesson so go ahead complete the lesson and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I want to talk about AWS Direct Connect. A Direct Connect (DX) is a physical connection into an AWS region. If you order this via AWS, the connection is either 1 gig, 10 gig, or 100 gig at the time of creating this lesson. There are other ways to provision slower speeds, but I'll be covering those in a dedicated lesson later in this section of the course. The connection is between a business premises, a Direct Connect (DX) location, and finally an AWS region. I’ll show this architecture visually on the next screen.
Conceptually, think of three different physical locations: your business premises, where you have a customer premises router; a DX location, where you also have other equipment such as a DX router and maybe some servers; and finally an AWS region, such as US East 1. When you order a DX connection, what you're actually ordering is a network port at the DX location. AWS provides a port allocation and authorizes you to connect to that port, which I’ll detail soon. However, a Direct Connect ordered directly from AWS doesn’t actually provide a connection of any kind—it’s just a physical port. It’s up to you to connect to this directly or arrange the connection to be extended via a third-party communications provider.
The port has two costs: an hourly cost based on the DX location and the speed of the port, and a charge for outbound data transfer. Inbound data transfer is free of charge. There are a couple of important things to keep in mind about Direct Connect. First is the provisioning time—AWS will take time to allocate a port, and once allocated, you’ll need to arrange the connection into that port at the DX location. If you haven’t already connected the DX location to your business network, you might be looking at weeks or months of extra time for the physical laying of cables between the DX location and your business premises. Keep that in mind.
Since it’s a physical cable, there’s no built-in resilience—if the cable is cut, it’s cut. You can design in resilience by using multiple Direct Connects, but that’s something you have to layer on top. Direct Connect provides low latency because data isn’t transiting across the public internet like with a VPN. It also provides consistent latency, as you’re using a single physical cable at best or a small number of private networking links at worst. If you need low and consistent latency for an application, Direct Connect is the way to go. In addition, it’s also the best way to achieve the highest speeds for hybrid networking within AWS. As mentioned, it can be provisioned with 1, 10, or 100 gigabit speeds, and since it’s a dedicated port, you’re very likely to achieve the maximum possible speed.
Compare that to an IPsec VPN, which uses encryption and therefore incurs processing overhead while transiting over the public internet. Direct Connect will give you higher, more consistent speeds. Lastly, Direct Connect can be used to access both AWS private services running in a VPC and AWS public services. However, it cannot be used to access the public internet unless you add a proxy or another networking appliance to handle that for you.
Visually, the architecture of Direct Connect starts on the right with your business premises, where you'll have some kind of customer premises router or firewall. This might be the same router connected to your internet connection or a new, dedicated DX-capable router, which I’ll explain more about in an upcoming lesson. Additionally, you’ll have some staff, in this case, Bob and Julie. In the middle, we have a DX location. This is often confusing, as it’s not a location actually owned by AWS—it’s not an AWS building. It’s usually a large regional data center where AWS rents space, and your business might also rent space alongside other businesses.
Inside this DX location is an AWS cage—an area owned by AWS containing one or more DX routers known as AWS DX routers, which are the endpoints of the Direct Connect service. You might also rent space in this DX location, known as the customer cage. If you’re a large organization, you might rent this space directly, housing some of your infrastructure and a router known as the customer DX router. If you’re a smaller organization, this cage might belong to a communications partner—this is called the comms partner cage. If you don’t have space in a DX location, the communications partner does and can extend connections from this DX location to your business premises.
The key thing to understand about Direct Connect is that it's a port allocation. When you order a Direct Connect from AWS to a specific DX location, you’re allocated a DX port. This must be physically connected using a fiber optic cable to another port in the DX location—either your router in your cage or a communications partner’s router in the same DX location. In either case, you’ll have a corresponding port within the DX location, whether on your own equipment or that of a comms provider. Between these two ports, you’ll need to order a cross connect.
The cross connect is a physical connection between the AWS DX port in the AWS cage and your or your provider’s port within the DX location. This concept is crucial, whether you have equipment in the DX location or purchase access through a communications partner. From the partner, you'll be allocated a port within the DX location, and it is to this port that the cross connect is linked. This is the cable that connects the AWS DX port to your router or a communications partner’s router. If you're using a communications partner, this link can then be extended to your customer premises. But in all cases, you must have a port within either a customer cage or comms partner cage at the DX location to establish a cross connect with AWS’s DX port.
On the left side, we have an AWS region—such as AP Southeast 2—with a VPC containing a private subnet and services. We also have the AWS public zone and example services such as SQS, Elastic IP addresses, and S3. The AWS region is AWS-owned infrastructure, which may or may not be in the same facility as the DX location but is always connected with multiple high-speed resilient network connections. Conceptually, you can think of the region as always being connected to one or more local DX locations.
That’s the physical architecture, and I’ll go into more detail in upcoming lessons elsewhere in the course. Logically, we configure virtual interfaces—called VIFs—over this single physical connection. There are three types of VIFs. First are transit VIFs, which have specific use cases that I’ll explain in detail later. Second are public VIFs, used to access AWS public space services. A public VIF runs over the full Direct Connect path—from your customer router to your DX router, then into the AWS DX router, and finally into the public AWS region. Third are private VIFs, which also run over Direct Connect but connect into virtual private gateways attached to a VPC, giving you access to private AWS services.
That’s everything I wanted to cover in this lesson. Go ahead and complete it, and when you're ready, I look forward to you joining me in the next one.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I'm going to be stepping through the architecture of AWS site-to-site VPN. For the exam and if you're designing solutions for real-world production usage, understanding VPNs and how they can be used within AWS is essential. They offer the quickest way to create a network link between an AWS environment and something that's not AWS. And this might be on-premises, another cloud environment or a data centre.
Now we've got a lot to cover so let's jump in and get started. A site-to-site VPN is a logical connection between a VPC or virtual private cloud and an on-premises network. And this connection is encrypted in transit using IPsec, which is important because it runs over the public internet in most cases. Now there's a common exception to this when you run a VPN over the top of a direct connect and we'll be covering this later in this section. But assume, unless written otherwise, that a VPN is running over the public internet.
Now a site-to-site VPN can be fully, highly available, assuming that you design and implement it correctly. This is really important to understand for the exam and real-world usage. So I'm going to be covering it as a priority in this lesson. There are a few points of failure within the VPN architecture and you need to understand them all.
Site-to-site VPNs are also quick to provision. Assuming you understand all the steps and have all the skills, you can have a site-to-site VPN up and running in less than an hour. And try and remember this because it's in contrast to the long provisioning times for physical connections like direct connect, which we'll talk about later in this section.
Now there are a few components involved in creating a VPN connection that you need to be aware of. First, and this goes without saying, the VPC. VPNs connect VPCs and private on-premise networks. So it's logical that the VPC is one important building block of the wider connection architecture. Second, we've got the virtual private gateway or VGW and this is another type of logical gateway object, which can be the target on route tables. It's something that you create and associate with a single VPC and it's the target on one or more route tables.
Next, we've got the customer gateway or CGW and this can actually refer to two different things. It's often used to refer to both the logical piece of configuration within AWS and the thing that that configuration represents, a physical on-premises router which the VPN connects to. So when you see CGW mentioned, it's either the logical configuration in AWS or it's the physical device that this logical configuration represents. And then the last component is the VPN connection itself which stores the configuration and it's linked to one virtual private gateway and one customer gateway. So this is how we create the network virtual connection between these two locations.
Now, later in this section, I'm going to be showing you how to create a VPN connection, but for now, I want to focus on the architecture and the theory. So let's look visually at how VPNs are architected. First, I want to cover a simple implementation of a site to site VPN so that you're comfortable with the architecture.
So on the left, we have the Animals for Life VPC. It's a simplified version with three private subnets in availability zone A, B and C. Next, we've got the AWS Public Zone where AWS public services operate from. The public internet directly connected to that and then finally on the right, the Animals for Life corporate office using an IP range of 192.168.10.0/24.
Now, step one for creating a VPN connection is to gather all of the required information. We need the IP address range of the VPC that will be connecting to the on-premises network and we'll also need the IP range of the on-premises network itself and the IP address of the physical router on the customer premises. Once we have all of this information, then we can create a virtual private gateway and attach it to the Animals for Life VPC. The virtual private gateway is a logical gateway object within AWS and so it can be the target of routes just like any other gateway object.
Now, within our on-premises environment, we're going to have a customer premises router and this will have an external IP address. And for this router, we create a customer gateway object within AWS. This is a logical configuration entity that represents this physical device on our customer premises. In this case, we need to define its public IP address so that the logical CGW entity matches the physical router.
Behind the scenes, the virtual private gateway is actually a highly available gateway object. Just the same as earlier in the course when you configured internet gateways. All you had to do was create the gateway and associate it with a VPC and a virtual private gateway is just the same. Behind the scenes, a virtual private gateway actually has physical endpoints. These are devices in different availability zones each with public IP version 4 addresses. And this means that the virtual private gateway is fully, highly available by design. An availability zone can fail and if it affects one of the physical endpoints, the other will still function. But that doesn't mean that the whole thing is highly available and I'll detail that during the remainder of this lesson.
The next step is that we need to create a VPN connection inside AWS and there are different types of VPN connection. There are static and dynamic VPNs. For now, we're going to create a static one and I'll be explaining the differences between the two later in this lesson. Now, when creating a VPN connection, you need to link it to a virtual private gateway and this means that it can use the endpoints which that virtual private gateway provides. You also need to specify a customer gateway to use and when you do, two VPN tunnels are created. One between each endpoint and the physical on-premises router.
A VPN tunnel is an encrypted channel through which data can flow between the VPC and on-premises network or vice versa. As long as at least one of these VPN tunnels are active, then the two networks are connected. So in this particular case, we've got one VPN connection that's using the two endpoints of the virtual private gateway and both of those are connected back to our one customer gateway. So we have a partially highly available design. If one of the AZs on the AWS side fails, then the other endpoint will continue functioning. So at least one of these tunnels will be active.
Now, because this is a static VPN, it means that we have to statically configure the VPN connection with IP addressing information. So we have to tell the AWS side about the network range that's in use within the on-premises network and we have to configure the on-premises side so that it knows the IP address range that the AWS side uses. And this means that traffic can flow from the VPC via the VPC router through the virtual private gateway over the tunnels to the on-premises network and back again.
Now, as I just mentioned, this design from an overall perspective is actually not fully highly available, but there is still one single point of failure. And that single point of failure is the customer on-premises router. If this fails, then the whole VPN connection fails. Even though at the AWS side, it is highly available, all of the connections currently terminate into the single customer on-premises router. At the AWS side, there are two tunnels to separate hardware in separate availability zones, but this doesn't matter if the customer side fails because all of the tunnels terminate into the same single point of failure. And this is known as partial high availability. It's highly available on the AWS side, but suffers from a single point of failure on the customer side. So it's not a fully highly available solution.
It is actually pretty simple to resolve this to modify this design so that it is fully highly available. And let's look at that next. Moving to a fully highly available solution means adding another on-premises customer router using a second internet connection and ideally doing all of this in a separate building. Once you have this second resilient connectivity method, then you can create an additional VPN connection at the AWS side.
Now, behind the scenes, this actually creates two more physical endpoints which are managed by the virtual private gateway. And each of those has their own public IP addressing. So this new VPN connection, it would be linked to the same virtual private gateway at the AWS side, but to the new customer gateway. So it would establish another pair of VPN tunnels between the two new endpoints and that additional customer gateway.
Now, this means architecturally, we're in the situation where the virtual private gateway is now highly available. So it's highly available as a logical entity. It's using endpoints which are located in different availability zones and so it can withstand physical failure. But now in addition, at the customer side, we're now using multiple pieces of hardware with multiple internet connections, ideally in separate buildings. And that means this is a fully highly available solution. It's got two VPN tunnels connecting each customer premises router to two VPC endpoints in separate availability zones. And then that configuration is repeated again with a second customer gateway. So either of the customer gateways can fail, either of the availability zones can fail. And still the VPC and on-premises network will have connectivity.
Now, before we finish, I just want to talk about the differences between static and dynamic VPNs. Now, a dynamic VPN uses a protocol called BGP known as the border gateway protocol. And that's important because if your customer router doesn't support BGP, then you can't use dynamic VPNs. So conceptually, this is a traditional VPN architecture. A VPC subnet on the left, a VPC router middle left, a virtual private gateway middle right, and then an on-premises environment with a customer router on the right. This architecture is the same based on both types of VPNs, so static VPNs and dynamic VPNs. At a high level, the difference is how routes are communicated.
So a static VPN uses static networking configuration. Static routes are added to the route tables and static networks have to be identified on the VPN connection. The benefit of using a static VPN is that it's simple. It just uses IPsec and because of that, it works almost anywhere with any combination of routers. You are really restricted on things like load balancing and multi-connection failover. So if you need any advanced high availability, if you need to use multiple connections, if the VPNs need to work with Direct Connect, which we'll talk about later in this section, then you really do need to be using dynamic VPNs.
With dynamic VPNs, as I just mentioned, you're using a protocol called BGP or the Border Gateway Protocol. Now this is a protocol which lets routers exchange networking information. And so if you have a dynamic VPN, you're creating a relationship between the virtual private gateway and the customer router. Over this relationship, they can both exchange information on which networks are at the AWS side and which are at the customer side. In addition, they can communicate the state of links and adjust routing on the fly. And that allows for multiple links to be used at once between the same locations. So this is why dynamic VPNs are able to use really high end, highly available VPN architectures because they can communicate the state of the links between the virtual private gateway and the customer gateway.
Now with dynamic VPNs, routes can still be added to the route table statically. Or you can make the entire solution fully dynamic by enabling a feature called route propagation on the route tables in the VPC. And when you enable route propagation, it means that while any VPNs are active, any networks that these VPNs become aware of, so the on-premises networks in this example are automatically added as dynamically learned routes on the route tables. So instead of having to statically enter routes on each and every route table, if you enable routes propagation, they will learn these routes from the virtual private gateway whenever any VPNs are active. So any virtual private gateways which learn any routes using BGP, they can automatically and dynamically be added onto route tables on a per route table basis if you enable route propagation.
Now whichever method you decide on, there are a number of key considerations that you need to be aware of. First, there is a speed cap for VPNs. A single VPN connection with two tunnels has a maximum throughput of 1.25 gigabits per second. Now this is an AWS limit. You would also need to check the speed supported by your customer router because VPNs use encryption. There's a processing overhead on encrypting and decrypting data and at high speeds, this overhead can be significant. Now the speed limit is something that you should remember for the exam because it's often something which makes selecting between VPNs and something else significantly easier. So if you need more than 1.25 GB per second, then you can't use VPNs.
Now you also need to be aware that there is a cap for the virtual private gateway as a whole. So for all VPN connections connecting to that virtual private gateway and that's also 1.25 GB per second. Another consideration for VPNs is latency. The VPN connection transits over the public internet and depending on the quality of your internet connection, there may be many hops between you and the AWS VPN endpoints. Each hop adds latency and variability. So if you care about these as a priority, maybe you're running an application which is really latency sensitive. You might want to look at something else like Direct Connect which we'll be covering later in this section. Again, latency is often a selection criteria to pick between VPNs and something else in the exam.
Now in terms of costs, VPNs have an hourly cost to operate. There's a data transfer charge to transfer data out. And because you're using the internet to transit data, your on-premises internet connection is also going to be used by the VPN. So if you have any data caps on your internet connection, this is something to keep in mind, especially if you're going to be using a VPN to transfer lots of data.
Now one of the benefits of VPNs and this is important for the exam is that they are very quick to set up. Sometimes taking hours or less because it's all software defined. An IPsec is supported on pretty much all hardware at this point, even consumer grade routers. It is worth keeping in mind though that to use dynamic VPNs you will need BGP support which is much less common. But when comparing VPNs to anything else, VPNs are almost always quicker to set up versus other private connection technologies.
VPNs can also be used as a backup for a physical connection such as Direct Connect Rather than needing to provision two physical connections for true high availability, you can use one physical connection as the primary and use VPNs as the secondary. VPNs can also be used with physical technologies though, for example Direct Connect. So you can use a VPN at the start because they're quick to set up and provision. And then you can lodge a request to provision a Direct Connect which is added later. So a Direct Connect can take much longer to provision sometimes weeks or months, but you can use a VPN to set up that initial connectivity and then either replace it further down the line as the Direct Connect comes online or you can run both of them together for high availability.
Now VPNs can also be used over the top of Direct Connect to add a layer of encryption, but I'll be covering that in the Direct Connect lesson. For now though that's all of the theory that I wanted to cover, so go ahead complete the video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover IPsec fundamentals. So I want to talk about what IPsec is, why it matters, and how IPsec works at a fundamental level. Now we have a lot of theory to cover, so let's jump in and get started. At a foundational level, IPsec is a group of protocols which work together. Their aim is to set up secure networking tunnels across insecure networks—for example, connecting two secure networks or, more specifically, their routers (called PIRS) across the public internet. You might use this if you're a business with multiple sites spread around geographically and want to connect them together, or if you have infrastructure in AWS or another cloud platform and want to connect to that infrastructure.
IPsec provides authentication so that only PIRS which are known to each other and can authenticate with each other can connect. Any traffic which is carried by the IPsec protocols is encrypted, which means to unlockers the secure data which has been carried is ciphertext—it can't be viewed and it can't be altered without being detected. Architecturally, it looks like this: we have the public internet, which is an insecure network full of goblins looking to steal your data. Over this insecure network, we create IPsec tunnels between PIRS. These tunnels exist as they're required. Within IPsec VPNs there's the concept of "interesting traffic," which is simply traffic that matches certain rules. These could be based on network prefixes or match more complex traffic types. Regardless of the rules, if data matches any of those rules, it's classified as interesting traffic, and a VPN tunnel is created to carry traffic through to its destination. If there's no interesting traffic, then tunnels are eventually torn down, only to be reestablished when the system next detects interesting traffic. The key thing to understand is that even though those tunnels use the public internet for transit, any data within the tunnels is encrypted while transiting over that insecure network—it's protected.
Now, to understand the nuance of what IPsec does, we need to refresh a few key pieces of knowledge. In my fundamentals section, I talked about the different types of encryption. I mentioned symmetric and asymmetric encryption. Symmetric encryption is fast, generally really easy to perform on any modern CPU, and it has pretty low overhead. But exchanging keys is a challenge—since the same keys are used to encrypt and decrypt, how can you get the key from one entity to another securely? Do you transmit it in advance over a different medium, or do you encrypt it? If so, you run into a catch-22: how do you securely transmit the encrypted key? That's why asymmetric encryption is really valuable. It's slower, so we don't want to be using it all the time, but it makes exchanging keys really simple because different keys are used for encryption and decryption. A public key is used to encrypt data, and only the corresponding private key can decrypt that data. This means that you can safely exchange the public key while keeping the private key private. So the aim of most protocols which handle the encryption of data over the internet is to start with asymmetric encryption, use this to securely exchange symmetric keys, and then use those for ongoing encryption. I mention that because it will help you understand exactly how IPsec VPN works.
So let's go through it. IPsec has two main phases. If you work with VPNs, you're going to hear a lot of talk about phase one or phase two. It'll make sense why these are needed by the end of this lesson. Understand that there are two phases in setting up a given VPN connection. The first is known as Ike Phase One. Ike, or Internet Key Exchange, as the name suggests, is a protocol for how keys are exchanged in this context within a VPN. There are two versions: Ike version one and Ike version two. Version one is older; version two is newer and comes with more features. You don’t need to know all the details right now—just understand that the protocol is about exchanging keys. Ike Phase One is the slow and heavy part of the process. It’s where you initially authenticate using a pre-shared key (a password of sorts) or a certificate. It’s where asymmetric encryption is used to agree on, create, and share symmetric keys used in Phase Two. The end of this phase is what's known as an Ike Phase One tunnel, or a Security Association (SA). There’s a lot of jargon being thrown around, and I’ll be showing you how this all works visually in just a moment. But at the end of Phase One, you have a Phase One tunnel, and the heavy work of moving toward symmetric keys which can be used for encryption has been completed.
The next step is Ike Phase Two, which is faster and much more agile because much of the heavy lifting has been done in Phase One. Technically, the Phase One keys are used as a starting point for Phase Two. Phase Two is built on top of Phase One and is concerned with agreeing on encryption methods and the keys used for the bulk transfer of data. The end result is an IPsec Security Association—a Phase Two tunnel which runs over Phase One. These different phases are split because it's possible for Phase One to be established, then a Phase Two tunnel created, used, and torn down when no more interesting traffic occurs—while the Phase One tunnel stays. This means establishing a new Phase Two tunnel is much faster and less work. It's an elegant and well-designed architecture.
Let’s look at how this all works together visually. This is Ike Phase One. The architecture is simple: two business sites—Site One on the left with the user Bob, and Site Two on the right with the user Julie—with the public internet in the middle. The very first step of this process is that the routers, the two peers at either side of this architecture, need to authenticate—essentially prove their identity—done either using certificates or pre-shared keys. It's important to understand that this isn't yet about encryption; it's about proving identity, proving that both sides agree that the other should be part of this VPN. No keys are exchanged—it’s just about identity. Once the identity has been confirmed, we move on to the next stage of Ike Phase One.
In this stage, we use a process called Diffie-Hellman Key Exchange. Again, sorry about the jargon, but try your best to remember Diffie-Hellman, known as DH. Each side creates a Diffie-Hellman private key. This key is used to decrypt data and to sign things—you should remember this from the encryption fundamentals lesson. Each side also derives a corresponding public key from its private key. The public key can be used to encrypt data that only the private key can decrypt. These public keys are exchanged—Bob has Julie’s public key, and Julie has Bob’s. These public keys are not sensitive and can only be used to encrypt data for decryption by the corresponding private key. Then comes the mathematically complex part—each side uses its own private key and the other’s public key to derive a shared Diffie-Hellman key. This key is the same on both sides, even though it’s been independently generated. It's used to exchange other key material and agreements, essentially a negotiation. Each side independently uses this DH key plus the exchanged material to generate a final Phase One symmetric key, used to encrypt anything passing through the Phase One tunnel, known as the Ike Security Association.
That process is slow and heavy—it’s both complex and simplistically elegant at the same time—but it means both sides have the same symmetric key without ever directly passing it between them. The phase ends with this Security Association in place, and it can now be used in Phase Two. In Phase Two, we now have a few components: the DH key on both sides, the Phase One symmetric key on both sides, and the established Phase One tunnel. In this phase, both peers want to agree on how the VPN will be constructed. Phase One was about allowing this—exchanging keys and allowing the peers to communicate. Ike Phase Two is about getting the VPN running and being ready to encrypt data—agreeing on how, when, and what. The symmetric key encrypts and decrypts agreements and passes more key material between the peers. One peer informs the other about the cipher suites it supports—encryption methods it can perform. The other peer picks the best shared one and lets the first know, and this becomes the agreed method of communication.
Then, using the DH key and the exchanged key material, both peers create a new symmetric IPsec key. This key is designed for large-scale data transfer. It’s an efficient and secure algorithm, and the specific one is based on the earlier negotiation. This IPsec key is used to encrypt and decrypt the "interesting traffic" across the VPN tunnel. Across each Phase One tunnel, there is actually a pair of Security Associations—one from right to left and one from left to right. These are used to transfer data between networks at either side of a VPN.
Now, there are two types of VPNs you need to understand: policy-based VPNs and route-based VPNs. The difference lies in how they match interesting traffic—the traffic sent over a VPN. In policy-based VPNs, rules match traffic, and based on the rule, traffic is sent over a pair of Security Associations—one for each direction. This allows different rules for different types of traffic, which is great for rigorous security environments. In contrast, route-based VPNs match traffic based on prefix—for example, "send traffic for 192.168.0.0/24 over this VPN." This uses a single pair of Security Associations per network prefix, meaning all traffic types between those networks use the same SA pair. It’s less functional but simpler to set up.
To illustrate the differences between route-based and policy-based VPNs, let's look visually at the architectures. In a simple route-based VPN, a Phase One tunnel is established using a Phase One key. Assuming we use a route-based VPN, a single pair of Security Associations is created—one in each direction—using a single IPsec key. This gives us essentially a single Phase Two tunnel running over the Phase One tunnel. That Phase Two or IPsec tunnel (the pair of SAs) can be dropped when there’s no interesting traffic and recreated again on top of the same Phase One tunnel when new traffic is detected. The key point is there’s one Phase One tunnel and one Phase Two tunnel based on routes.
Running a policy-based VPN is different. We still have the same Phase One tunnel, but over the top of this, each policy match uses an SA pair with a unique IPsec key. This allows us to have, for the same network, different security settings for different types of traffic—for example, infrastructure at the top, CCTV in the middle, and financial systems at the bottom. Policy-based VPNs are more difficult to configure but offer greater flexibility for using different security settings for different traffic types.
Now, that at a very high level is how VPNs function—the security architecture of how everything interacts. Elsewhere in my course, you'll be learning how AWS uses VPNs within their product set. But for now, that’s everything I wanted to cover. Go ahead and complete this video, and then, when you’re ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I'm going to be talking about the border gateway protocol known as BGP. Now BGP is a routing protocol and that means that it's a protocol which is used to control how data flows from point A through points B and C and arrives at the destination point D. Now BGP is a complex topic that goes far beyond the scope of this exam but as a solutions architect you need to be aware of how it works at a high level because AWS products such as Direct Connect and Dynamic VPNs both utilize BGP. So let's jump in and get started.
BGP as a system is made up of lots of self managing networks known as autonomous systems or AS. Now an AS could be a large network, it could be a collection of routers but in either case they're controlled by one single entity. From a BGP perspective it's viewed as a black box, an abstraction away from the detail which BGP doesn't need. Now you might have an enterprise network with lots of routers and complex internal routing but all BGP needs to be aware of is your network as a whole. So your autonomous systems are black boxes which abstract away from the detail and only concern themselves with network routing in and out of your autonomous system.
Now each autonomous system is allocated a number by IANA, the Internet Assigned Numbers Authority. The ASNs are 16 bits in length and range from 0 through to 65,535. Now most of that range are public ASNs which are directly allocated by IANA. However the range from 64,512 to 65,534 are private and can be utilized within private peering arrangements without being officially allocated. Now ASNs or Autonomous System Numbers are the way that BGP identifies different entities within the network, so different peers. So that's the way that BGP can distinguish between your network or your ASN and my network.
BGP is designed to be reliable and distributed and it operates over TCP using port 179 and so it includes error correction and flow control to ensure that all parties can communicate reliably. It isn't however automatic, you have to manually create a peering relationship, a BGP relationship between two different autonomous systems and once done those two autonomous systems can communicate what they know about network topology. Now a given autonomous system will learn about networks from any of the peering relationships that it has and anything that it learns it will communicate out to any of its other peers. And so because of the peering relationship structure you rapidly build up a larger BGP network where each individual autonomous system is exchanging network topology information. And that's how the internet functions from a routing perspective. All of the major core networks are busy exchanging routing and topology information between each other.
Now BGP is what's known as a path vector protocol and this means that it exchanges the best path to a destination between peers. It doesn't exchange every path only the best path that a given autonomous system is aware of and that path is known as an AS path, an autonomous system path. Now BGP doesn't take into account link speed or condition, it focuses on paths. For example, can we get from A to D using A, B, C and D or is there a direct link between A and D? It's BGP's responsibility to build up this network topology map and allow the exchange between different autonomous systems.
Now while working with AWS or integrating AWS networks with more complex hybrid architectures, you might see the terms IBGP or EBGP. Now IBGP focuses on routing within an autonomous system and EBGP focuses on routing between autonomous systems. And this lesson will focus on BGP as it relates to routing between autonomous systems because that's the type that tends to be used most often with AWS. Now I need to stress at this point that this lesson is not a deep dive into BGP. All I need you to understand at this point is the high level architecture so that you can make sense of how it's used within AWS. So let's look at this visually and hopefully it will make more sense.
So I want to step through an example of a fairly common BGP style topology. So this is Australia, the land of crocodiles and kangaroos. And in this example we have three major metro areas. We have Brisbane on the east and this has an IP address range of 10.16.0.0/16. And the router is using the IP of 10.16.0.1 and this has an autonomous system number of 200. We have Adelaide on the south coast using a network range of 10.17.0.0/16 and the router is using 10.17.0.1 and this has an autonomous system number of 201. And then finally between the two in the middle of Australia we have Alice Springs using the network 10.18.0.0/16. The router uses 10.18.0.1 and the autonomous system number is 202. Now between Brisbane and Adelaide and between Adelaide and Alice Springs is a one gigabit fiberlink. And then connecting Brisbane and Alice Springs is a five megabit satellite connection with an unlimited data cap.
BGP at its foundation is designed to exchange network topology and it does this by exchanging paths between autonomous systems. So let's step through an example of how this might look using this network structure. We start at the top right with Brisbane and this is how the route table for Brisbane might look at this point. The route table contains the destination. In this case we only have the one route and it's the local network for Brisbane. The next column in the route table is the next hop. So what IP address is needed is the first or next hop to get to that network and 0.0.0.0 in this case means that it's locally connected. And this is because it's the local network that exists in the Brisbane site. And then finally we have the AS path which is the autonomous system path and this shows the path or the way to get from one autonomous system to another. And the I in this case means that it's the origin so it's this network.
Now the two other locations will have a similar route table at this stage. So Adelaide will have one for 10.17.0.0 and Alice Springs will have one for 10.18.0.0/16. And both of those will have 0.0.0.0 as the next hop and I for the AS path because they're all local networks. So each of these autonomous systems so 200, 201 and 202 can have peering relationships configured. So let's assume that we've linked all three. So Brisbane and Alice Springs, Alice Springs and Adelaide and then finally Adelaide and Brisbane. Each of those peers will exchange the best paths that they have to a destination with each other. So Adelaide will send Brisbane the networks that it knows about and at this point it's only itself. And what it does when it exchanges this or when it advertises this is it pre-pens its AS number onto the path.
So Brisbane now knows that to get to the 10.17.0.0 network it needs to send the data to 10.17.0.1. And because of the AS path it knows that it goes through autonomous system 201 which is Adelaide and then it reaches the origin or I. And so it knows that the data only has to go through one autonomous system to reach its final destination. Now in addition to this Brisbane will also receive an additional path advertised from Alice Springs in this case over the satellite connection. And Alice Springs propends its AS number 202 onto that path. So Brisbane knows to get to the 10.18.0.0/16 network. The next stop is 10.18.0.1 which is the Alice Springs router and it needs to go via the 202 autonomous system number which belongs to Alice Springs. So at this point Brisbane knows about both of the other autonomous systems and it's able to reach both of them from a routing perspective.
Now in addition to that Adelaide will also learn about the Brisbane autonomous system because it has a peering relationship with the Brisbane autonomous system. And in addition Adelaide will also in the same way learn about the network in Alice Springs because it also has a peering relationship with the Alice Springs ASN 202. And then finally because Alice Springs also has BGP peering relationships between it and both of the other autonomous systems. It will also learn about the Brisbane autonomous system and the Adelaide autonomous system. And so at this point all three networks are able to route traffic to the other two. So if we look at the route table for Alice Springs it knows how to get to the 10.16 and 10.17 networks via the ASN of 202.01 respectively. All three autonomous systems can talk to both of the others and this has all been configured automatically once those BGP peering relationships were set up between each of the autonomous systems.
But it doesn't stop there. This is a ring network and so there are two ways to get to every other network clockwise and anti-clockwise. Adelaide is aware of how to get to Alice Springs so ASN 202 because it's directly connected to that. And so it will advertise this to Brisbane pre-pending its own ASN onto the AS path. And so Brisbane can now reach Alice Springs via Adelaide so using the 201 and then 202 AS path. Notice how the next hop for the route given to Brisbane is the Adelaide router so 10.17.0.1. And so if we used this route table entry the traffic would go first to Adelaide and then be forwarded on to Alice Springs. Likewise Adelaide is aware of Brisbane and so it will advertise that to Alice Springs pre-pending its own ASN onto the AS path. So notice how this new route on the Alice Springs route table the one for 10.16.0.0/16 is going via Adelaide so 10.17.0.1. The AS path is 201 which is Adelaide, 200 which is Brisbane and then the origin.
Now lastly Adelaide will also learn an additional route to Alice Springs but this time via Brisbane. And Brisbane would pre-pend its own ASN onto the AS path. So in this case we've got the additional route at the bottom for 10.18.0.0/16 but the next hop is Brisbane 10.16.0.1 and the AS path is 200 which is Brisbane and then 202 which is Alice Springs and then we've got the origin. Autonomous systems advertise the shortest route that they're aware of to any other autonomous systems that they have peering relationships with. Now at this point we're in a situation where we actually have a fully highly available network with paths to every single network. If any of these three sites failed then BGP would be aware of the route to the working sites. Notice that the indirect routes that I've highlighted in blue at the bottom of each route table have a longer AS path. These are non-preferred because it's not the shortest path to the destination. So Brisbane for example if it was sending traffic to Alice Springs it would use the shorter path, the direct satellite connection. By default BGP always uses the shorter path as the preferred one.
Now there are situations where you want to influence which path is used to reach a given network. Imagine that you're the network administrator for the Alice Springs network. Now that autonomous system has two networking connections, the fiber connection coming from Adelaide and the satellite connection between it and Brisbane. Now ideally you want to ensure that the satellite connection is only ever used as a backup when absolutely required and that's for two reasons. Firstly it's a slower connection, it only operates at 5 megabits and also because it's a satellite connection it will suffer from significantly higher latencies than the fiber connection between Alice Springs and Adelaide and then Adelaide and Brisbane. Now because BGP doesn't take into account performance or condition the satellite connection because it's the shortest path will always be used for any communications between Alice Springs and Brisbane.
But you are able to use a technique called ASPATH prepending which means that you can configure BGP at Alice Springs to make the satellite link look worse than it actually is. And you do this by adding additional autonomous system numbers to the path. You make it appear to be longer than it physically is. Remember BGP decides everything based on path length and so by artificially lengthening the path between Alice Springs and Brisbane it means that Brisbane will learn a new route, the old one will be removed and so the new shortest path between Brisbane and Alice Springs will be the one highlighted in blue at the bottom of the Brisbane route table. This one will be seen as shorter than the artificially extended one using ASPATH prepending and so now all of the data between Brisbane and Alice Springs will go via the fiber link from Brisbane through Adelaide and finally to Alice Springs. BGP thinks that the path from Brisbane to Alice Springs directly over the satellite connection has three hops versus the two hops for the fiber connection via Adelaide and so this one will always be preferred.
So in summary a BGP autonomous system advertises the shortest path to a destination that it's aware of to all of the other BGP routers that it's paired with. It might be aware of more paths but it only advertises the shortest one and it means that all BGP networks work together to create a dynamic and ever-changing topology of all interconnected networks. It's how many large enterprise networks function, it's how the internet works and it's how routes are learned and communicated when using Direct Connect and dynamic VPNs within AWS. Now that's everything that I wanted to cover. This has just been a high level introduction to how BGP works and it's going to be a protocol that you'll need to understand in order to architect more complex or hybrid networks between AWS and on-premise. Now that's all of the theory that I wanted to cover in this lesson so go ahead, finish off this video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to introduce a product that is becoming more important for the exam. As a solutions architect, it's an essential one to understand. That product is the AWS Global Accelerator, which is designed to optimize the flow of data from your users to your AWS infrastructure. We have a fair amount to cover, so let's jump in and take a look.
To understand why the Global Accelerator is required, let's review a typical problem when using AWS. Let's say you've created an application and initially choose to host it in the US. This application, involving cats, becomes really popular with users based in the US, who generally have a great user experience. However, over time, the application’s notoriety increases, and it becomes popular with global users, who experience a much less optimal performance. The reason for this suboptimal experience is that the traffic between their locations and the infrastructure in North America is less direct. Conceptually, every flow of data over the internet can take a different route. For example, communication between a laptop and a VPC is not direct; the data moves through various routers and different hops. Each hop adds delay, variability, and potential failure. What’s worse is that the route can vary between customers and even for the return flow of traffic for the same customer. Each of these hops is suboptimal, so the fewer the hops, the better and more consistent the experience. Generally, customers further away from your infrastructure will go through more internet-based hops, resulting in a lower quality of connection. The internet is designed to be distributed, autonomous, and highly resilient, and while speed is important, it's less important than these other priorities.
The AWS Global Accelerator product isn’t that difficult to understand architecturally. In many ways, its architecture is similar to CloudFront, and one of the key things in the exam is being able to determine when to use CloudFront and when to use the Global Accelerator. Both improve performance, but they do so in different ways and for different reasons. Global Accelerator starts with two Anycast IP addresses, which are a special type of IP address. Normal IP addresses are referred to as Unicast IP addresses, and these refer to one specific network device. If two devices on a network share the same Unicast IP address, bad things can happen. In contrast, Anycast IPs allow multiple devices to use the same IP address, which is advertised to the public internet. Internet core routers route traffic to the device closest to the source. In our example, we have Anycast IP addresses, such as 1.2.3.4 and 4.3.2.1, which map to three Global Accelerator edge locations. While there are more in reality, I’m keeping the diagram simple. The key thing to understand is that all three Global Accelerator edge locations use these Anycast IP addresses, so any traffic destined for either of these IP addresses can be serviced by any of the edge locations.
If we have two users, one based in London and one in Australia, and they both browse to these Anycast IP addresses, their traffic will be routed to the closest edge location, using the public internet. This part of the connection is still subject to the variability that the public internet can cause, but it’s now limited to the part between the customer and the Global Accelerator edge location. What we’ve essentially done is move the AWS network closer to the customer. Once traffic arrives at the edge location, it’s transmitted over the AWS global network. AWS controls its own dedicated network with fiber links between all of its regions. They handle capacity and performance, and if performance is anything but optimal, you can complain to them.
For the exam, you only need to understand the architecture: customers will arrive at one of the Global Accelerator edge locations because they’re using one of the Anycast IP addresses assigned to us when we create a Global Accelerator. So when we create a Global Accelerator, we’re allocated these two Anycast IP addresses. If customers use these, their connections will be routed to the closest Global Accelerator edge location. The part indicated in red in the diagram occurs over the public internet, and this can be affected by the variability that the internet experiences. However, once traffic enters the Global Accelerator product, it’s transmitted over the AWS global network to our infrastructure, resulting in substantially improved performance. There are fewer hops, and AWS generally maintains the network to a higher standard, specifically for the transit of data between AWS regions.
Now, let’s talk about when and where to use Global Accelerator. This product is very much like CloudFront, so it’s understandable if you’re confused. Both products aim to move the AWS network closer to your customers, but they do so in different ways. CloudFront specifically moves content closer by caching it at the edge locations, while Global Accelerator moves the actual AWS network as close to your customers as possible. The goal with Global Accelerator is to get your customers onto the AWS global network as quickly and as close to their location as possible, and this is done using the Anycast IP addresses. Once the traffic reaches the edge, it’s transmitted over the AWS global network to the appropriate location.
Global Accelerator is capable of routing traffic to the closest infrastructure location to the customer, so it can direct connections from London to local infrastructure based in Europe, rather than the US. The key thing that Global Accelerator does is get data from your customer to an application endpoint as quickly as possible, with the best performance possible. One key difference between Global Accelerator and CloudFront is that Global Accelerator is a network product. It works on any TCP or UDP applications, including web apps, whereas CloudFront only caches HTTP and HTTPS content. If you see questions mentioning caching, it’s likely to be CloudFront that is the right answer, but if the question involves TCP or UDP network optimization and global performance, then Global Accelerator might be the correct answer.
Global Accelerator doesn’t cache anything; it doesn’t cache content or network data, and it doesn’t provide any HTTP or HTTPS-level capabilities. It’s strictly a network product. So if a question involves transiting network data (TCP or UDP) as quickly and efficiently as possible through a global network, it’s likely to be Global Accelerator. If it involves content delivery, caching, pre-signed URLs, or other CloudFront services, then CloudFront is probably the correct answer. Although you might initially find it difficult to distinguish between the two products, it should now be clear that they’re separate. If you need to do caching, deal with web or secure web content, or manipulate content, it’s CloudFront. If you need global TCP or UDP network optimization, it’s Global Accelerator.
That’s everything you’ll need to know for now. Go ahead and complete this video, and when you’re ready, I’ll look forward to you joining me in the next one.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover something that starts to feature much more in AWS exams, and that's CloudFront Lambda at Edge. You don't need to have any experience implementing it, but you do need to know how it's architected and what it's capable of doing. So let's jump in and get started.
Lambda at Edge is a feature of CloudFront that allows you to run lightweight Lambda functions at CloudFront Edge locations. These Lambda functions can adjust traffic between the viewer and the origin in a number of interesting ways, and we'll talk about some of those soon. However, there are some limitations that you need to be aware of because the Lambda functions are running at the Edge. They don't have the full Lambda feature set. Currently, only Node.js and Python are supported as runtimes. You can't access any VPC-based resources since the functions are running in the AWS public zone, and additionally, Lambda layers are not supported. Lastly, the functions have different size and execution time limits compared to normal Lambda.
Now, you don't need to memorize all of these facts. What's more important is a good understanding of the architecture. So let's look at that next. Lambda at the Edge starts with the traditional CloudFront architecture. So, customers on the left, the Edge locations in the middle, and the origins on the right. Any interaction between a customer, Edge location, and origin consists of four individual parts of that communication: the connection between the customer and the Edge location, which is known as the viewer request; the connection between the Edge location and the origin, known as the origin request; when the origin responds, there's a connection between the origin and the CloudFront Edge, known as the origin response; and finally, the connection between the Edge location and the customer, known as the viewer response.
Now, with Lambda at the Edge, each of these individual components of the wider communication can run a Lambda function, and this Lambda function can influence the traffic as part of that connection. A viewer request Lambda function runs after the CloudFront Edge location receives a request from a viewer. An origin request function runs before CloudFront forwards that request onto the origin. An origin response function runs after CloudFront receives a reply from the origin, and then finally, a viewer response function runs before the response is forwarded back to the viewer.
Now there are limits on how the Lambda functions can run in each part of the architecture. At the viewer side, a Lambda at Edge function has a limit of 128 MB for memory allocation and a function timeout of five seconds. At the origin side, the memory limits are the same as a normal Lambda, but with a 30-second timeout. Again, don't worry too much about these limits. For now, try to focus at a high level on what these limits mean, what types of architectures, and what type of adjustments these Lambda functions can perform on each of these different components of the flow of data between a viewer and an origin and back again.
Now, let's look at some example solutions which involve Lambda at the Edge. I've included a link attached to this lesson which gives a few examples of situations where you would use Lambda at the Edge. I can't give an exhaustive list in this lesson because, just like Lambda itself, you can pretty much do anything that you want as long as you can code it within a Lambda environment. But a couple of common examples that you might find useful for the exam: First, you can use Lambda at the Edge to perform A/B testing. This is generally done with a viewer request function. You can use a viewer request function in an A/B testing scenario to present two different versions of an image without creating redirects or changing the URL. In this architecture, the function views the viewer request and modifies the request URL based on which version of an image you want the viewer to receive. With this architecture, the Lambda function can modify the viewer request and change the URL based on any logic that you can define inside a Lambda function. It can be things like a percentage-based algorithm or it can be random chance.
Another scenario which you might use Lambda at the Edge for is running a function as part of the origin request. This can be used to perform a gradual migration between different S3 origins. You can use a function running in this part of the architecture to gradually transfer traffic from an existing S3 origin over to a new one, and you can do so in a controlled way. For example, based on a weighted value, which represents a percentage of the traffic of your application, you can increase the percentage of traffic which goes to the new S3 origin versus the old. And all of this can be done in a controlled way without updating the CloudFront distribution.
You can also use Lambda at the Edge to customize behavior based on the type of device that your customer has. Given a particular type of device or a particular capability of that device, you can display different objects. This might include different sizes of objects or objects with different quality levels. For example, if you have a device which has a high DPI screen, so a higher dots per inch, then you might want to display objects which themselves have a higher DPI value and save lower DPI objects for devices which can't support high DPI. So again, that's something that you can use Lambda at the Edge for rather than making any changes to the CloudFront distribution.
You can also use Lambda at the Edge to vary the content displayed by country. So a Lambda function that's running in the origin request component of the communication can be used to adjust what gets displayed based on the country of the customer. Now, the link that I've included attached to this lesson gives a lot more examples of the types of scenarios that would benefit from Lambda at the Edge. In addition to the scenarios, most of these include example Lambda function code that you can implement in your own environments.
So this is going to be something that's outside of the scope of this course. You only need to be aware of the architecture of Lambda at the Edge for this certification. But if you do want to experiment with this in your own time, then you can use the examples contained on this page, and again, the URL is attached to the lesson, and you can do your own experimentation. But at this point, that's all of the architecture that I wanted to cover in this lesson. I just want you to be aware of exactly what Lambda at Edge can be used for because you might get a question on it in the exam. So make sure that you're comfortable with all of the examples that I've given in this lesson and all of the examples which are included on the link that are attached to this lesson.
Again, you won't need to remember all of the different facts and the execution limits and the memory amounts. It's all about the architecture. So if you familiarize yourself with all of the examples that are on screen now and the ones that are in the link attached to this lesson, then you'll have all of the information that you need to answer questions about this topic in the exam. But at this point, that's all of the theory that I wanted to cover in this lesson. So go ahead, complete the lesson, and then when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this video, I want to step through how you can deliver private content from CloudFront using behaviors. Additionally, I want to step through the differences between signed URLs and signed cookies, which are ways to deliver private content to your end users. We’ve got a lot to cover, so let’s jump in and get started.
CloudFront can run in two security modes when it comes to content. The first and the default is public. In this mode, any content that is distributed via CloudFront is public and can be accessed by any viewer. This is the mode that you've probably experienced so far, but there's also private. In this mode, any requests made to CloudFront need to be made with a signed cookie or signed URL, or they’ll be denied. CloudFront distributions are created with a single behavior, and in this state, the whole behavior—and so the whole distribution—is either public or private. Generally, though, you're going to have multiple behaviors, and part will be public and part private. This allows you to redirect any unintended accesses to a private behavior at a public one, for example, starting a login process.
Now, there are two ways to configure private behaviors in CloudFront: the old way and the new preferred way. In both cases, you require a signer, and a signer is an entity or entities which can create signed URLs or signed cookies. Once a signer is added to a behavior, that behavior is now private, and only signed URLs and cookies can be used to access content. With the old way, you first had to create a CloudFront key to use, and this is something that an account root user had to create and manage. This is a special key that's tied to an AWS account rather than a specific identity within that account, and once a CloudFront key exists in an account, that account can be added as a trusted signer to a distribution, specifically a behavior in that distribution. For real-world usage and for the exam, while this is the legacy method, you do need to remember the term "trusted signer." If you see it, you'll know that a private distribution or a private behavior is involved.
The new and preferred method is to create trusted key groups and assign those as signers. The key groups determine which keys can be used to create signed URLs and signed cookies. There are a few reasons why you should use trusted key groups versus the old architecture. First, you don't need to use the AWS account root user to manage public keys for CloudFront signed URLs and signed cookies. If you use trusted key groups, then you can manage these in a much more flexible way. You can manage these key groups and the configuration using the CloudFront API, and you can associate a higher number of public keys with your distribution, more specifically with your behavior, giving you more flexibility in how you use and manage those keys. So it’s absolutely preferred to use this new method of trusted key groups versus the old method of a CloudFront key being added to an AWS account and that account being added as a trusted signer. So there's the old way and the new way, and absolutely you should prefer the new way for any new deployments.
Now at this point, I want to quickly step through the differences between signed URLs and signed cookies so you know some of the situations where you might use one versus the other. Signed URLs provide access to one object and one object only. That's really critical—remember that one for the exam because it can be a really easy way to pick between the two. Now, this is not really valid at this point, but historically RTMP distributions couldn't use signed cookies, so this was a legacy point to pick between signed URLs and cookies, but this isn't really applicable anymore. You should use signed URLs if the clients don't support using cookies. Not everything does, so if your client doesn't support cookies, then you can only use signed URLs. Cookies can provide access to groups of objects, so you could use a signed cookie to provide access to groups of files or all files of a particular type. For example, all cat gifts—this is a really common requirement in applications. So, when you use signed URLs with CloudFront, you get a custom URL. If you want to present access using a certain format of URL, then you need to use signed cookies, so that's another point of differentiation.
Now visually, this is how the architecture looks. We start with our customers who are using the Categorum application—this time, the new iPhone application. Because of the popularity of the existing web application, the new mobile application has been developed to use CloudFront. The application has images which are public, and then some more sensitive ones, for example, cats bearing all for a belly rub. So, there needs to be some method of distributing private content. Within the CloudFront distribution, there are two behaviors: a public behavior, which is the default, and this handles all non-sensitive application operations, and then a private one, which handles access to all of the sensitive cat gifts. All of the infrastructure runs within an AWS account, and it’s a serverless application, so the back end consists of API Gateway, Lambda for the compute functionality, and S3 to store the media.
The application flow starts when the application connects through to the distribution, which for the default behavior uses the API Gateway as an origin, which uses Lambda for the serverless compute. Let's assume that the mobile app is using ID Federation, so a Google, Twitter, or Facebook Identity for logins. The application communicates with the default behavior, uses API Gateway, logs in, and accesses some images which are private. The Lambda signer function checks the application's access to the image, and if everything is good because we've added trusted key groups on the distribution, specifically the behavior, the Lambda function is able to generate a signed cookie which grants access to a selection of images belonging to this specific application user. That cookie, together with information on access URLs, is returned to the mobile application. The mobile application, all behind the scenes, uses the access information to access the images, and it supplies the cookie along with this request. The cookie is checked by CloudFront, and assuming everything checks out, an origin fetch occurs. The cat images are retrieved and returned back to the application.
Private behaviors are an excellent way to secure content, but you need to make sure that the origin is also secure. In this case, the S3 origin needs to be configured using an origin access identity so that it only accepts connections from the CloudFront distribution, and that will avoid the security issue where CloudFront gets bypassed. Now, at this point, that’s all of the theory I wanted to cover in this video. Thanks for watching, go ahead and complete the video, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In the next few lessons, I want to talk about how to secure the content delivery path when using CloudFront. When content is being delivered globally using CloudFront, there are a few zones that you need to think about. First, on the left are the origins, which are the locations where content is hosted. In the middle, we've got the CloudFront network, which consists of the network itself and the edge locations, and then, on the right, the public internet and our consumers of content. For the next two lessons, I want to focus on the security of this path. So first, in this lesson, I'll focus on the security of the origin fetch side, which is the transfer of data into the CloudFront network from the origins and then through to the edge locations. Then, in the next lesson, I'll be covering the security of the customer or viewer side, which involves getting content through to the consumer in a safe and secure way. For this lesson, I'll be focusing on the origin side security, specifically on how we can ensure that only CloudFront gets access to the data on those origins, essentially avoiding an ingenious customer bypassing CloudFront and accessing the origins directly.
So, let's get started and explore this part of the delivery path. Now, before we start, I want to reiterate something I covered in an earlier lesson. You can use S3 as an origin for CloudFront, but you can do it in two different ways. If you just use S3 as an origin, then it's known as an S3 origin. However, if you utilize the static web hosting feature of S3 and use this with CloudFront, then the S3 bucket is treated the same as any non-S3 origin, and this is known as a custom origin. For this lesson, when I'm covering OAIs or origin access identities, this is only applicable for S3 origins, not when using the static website feature of S3.
So what is an OAI? Well, it's a type of identity. It's not the same as an IAM user or an IAM role, but it does share some of the characteristics of both. It can be associated with CloudFront distributions, and those CloudFront distributions, when they're accessing an S3 origin, in essence, the CloudFront distribution becomes that origin access identity. This means that when a CloudFront distribution is accessing an S3 origin, the identity can be used within bucket policies, either explicit allows or denies. Generally, the common pattern is to lock an S3 origin down to only being accessible via CloudFront. This uses the implicit default deny to apply to everything except the origin access identity. So, the origin access identity is explicitly allowed access to the bucket, and everything else is implicitly denied. Visually, this looks like this: we start with a CloudFront distribution already configured for Animals for Life. This architecture uses an S3 origin, a few edge locations, and two customers, Julie and Moss, and we want to allow access via CloudFront to this S3 origin and deny any direct access. To do that, we create an origin access identity and associate that origin access identity with the CloudFront distribution.
The effect of doing this means that the edge locations gain this identity, the origin access identity. Then, we can create or adjust the bucket policy on the S3 bucket. We add an explicit allow for the origin access identity, and in its most secure form, we remove all other access, leaving the implicit deny. At this point, any access from the edge locations is actually from the origin access identity, the virtual identity that we've created and associated with the CloudFront distribution. Therefore, access from the edge locations is allowed because the origin access identity is explicitly allowed via the bucket policy. However, direct access, for example, from our Moss user, would not have the origin access identity associated with them, and because of this, it's implicitly denied from accessing the bucket. So, with this configuration, we've explicitly allowed the origin access identity and not explicitly allowed anything else. Therefore, what remains is the implicit deny that applies to everything. These identities can be created and used on many CloudFront distributions and many buckets at the same time. Generally, though, I find it easier to manage if you create one origin access identity for use with one CloudFront distribution because, long-term, this makes it easier to manage permissions.
So that's how we handle origin security for S3 origins, but what about non-S3 origins, custom origins? Is there a way to secure those? Let's take a look at that next. For this architecture, let's say that we have two custom origins, two CloudFront edge locations, and a customer, and what we want to do is prevent the customer from accessing the origins directly. Now, remember, these are not S3 origins, and so we can't use origin access identities to control access. For custom origins, we have two ways that we can implement a more secure architecture. First, we can utilize custom headers. The way this works is that our users use HTTPS to communicate with the edge locations, and we can insist on this by configuring the viewer protocol policy. Now, HTTPS is actually just HTTP running inside a secure tunnel, which has the advantage of protecting the contents of that tunnel. We can also use the same protocol between the edge location and the origin, which is known as the origin protocol policy. But in addition to this, we configure CloudFront to add a custom header, which is sent along with the request to the origin. The origin is then configured to require this header to be present, or it won't service the request. These are called custom headers, and they can be configured within CloudFront. Because the entire stream utilizes HTTPS, no one can oversee the headers we're using and fake them. The custom headers are injected at the edge location, and these allow our custom origin to know for sure that the request is coming from a CloudFront edge location. If this header isn't present, the origin will simply refuse to service any of the requests.
Now, that's one way to handle it, but we do have another way, and that's via traditional security methods. AWS actually publicizes the IP addresses of all of their services, so we can easily determine the IP ranges at the CloudFront edge locations. If we have the IP ranges that are used by CloudFront, then we can use a traditional firewall around the custom origin. This firewall is then configured to allow connections from the edge locations and deny anything else. This is another solution, which means the origin is essentially private to anything but CloudFront and can't be bypassed. You can use either of these approaches or even both of them in combination. By doing so, you ensure that the secure and private distribution of content cannot be bypassed by accessing the origins directly.
All of these methods—origin access identity, custom headers, and traditional IP blocks—secure the first part of the content delivery path. In the next lesson, we're going to look at how to use CloudFront to secure the point between the edge location and our customer. So thanks for watching, go ahead and complete this lesson, and when you're ready, I look forward to speaking to you in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back! In this lesson, I want to go into a little bit more detail about origin types and origin architecture within CloudFront. You need to understand the types of origins, how they differ, and the features that each of them provides. This is going to be another lesson where it's easier to show you the differences rather than talk about them, so I'm going to move over to my console UI and explain the key points that you need to know for the exam and real-world usage.
Okay, let's take a look at some origins. I’ll click on the services dropdown and type CloudFront to move to the CloudFront console. Now, I have two CloudFront distributions already set up. One of them is a production one, so this one is blurred out, and the other is just one that I'm getting ready for production usage. I’ll open up this distribution and click on origins. Architecturally, origins are where CloudFront goes to get content. If an edge location receives a request from a customer and that object isn't cached at the edge, then an origin fetch occurs to the relevant origin.
Now, origin groups—though I don’t have any configured—allow you to add resiliency. If you have two or more origins created within a distribution, you can create an origin group, group those origins together, and have an origin group used by a behavior. Remember, origins themselves are selected from behavior. If I go to behavior, we only have the one default behavior. If I select it and click edit, it’s here where I can pick an origin or an origin group. For this behavior, it will direct any requests, if an origin fetch is required, to the origin or origin group specified in this dropdown. This is only for the single origin, but if we had an origin group, it would provide resilience across those origins. This is a really cool way to add resilience.
Moving back to origins, there are actually a few categories of origins, and it’s important to understand all of them and exactly which features they provide, as well as the situations where you’d use one versus another. For origins, you can have Amazon S3 buckets, AWS MediaPackage channel endpoints, AWS MediaStore container endpoints, and then everything else, which means web servers. We haven’t covered MediaPackage or MediaStore yet, but I’ll be touching upon those elsewhere in the course. The split between S3 buckets, MediaPackage, MediaStore, and everything else is important. The "everything else" refers to web servers, known as custom origins, and these have different features and restrictions compared to S3 buckets. It's also important to note that an S3 bucket has one set of features, but if you configure static website hosting on that S3 bucket and use it as an origin, CloudFront views it as a web server, so a custom origin, and the feature set available is different.
S3 origins are the simplest to integrate because they’re designed to work directly with CloudFront. Let’s look at exactly what we can configure with an S3 origin. I’ve already got one prepared, so I’ll select it and click on edit. For the origin domain name, this points directly at an S3 bucket. The origin path allows CloudFront to use a particular path inside that origin. By default, any requests made to the default behavior, which points to this origin, will apply to the top level of the bucket. If we wanted to look inside a particular path in that bucket—say, images—we could specify that in the origin path box. Since this is an S3 origin, we have access to various advanced features that we wouldn’t have if we were using custom origins. One of these advanced features is origin access. This is the ability to restrict access to an S3 origin so that it’s only accessible via a CloudFront distribution.
There are two ways of doing this. We have origin access identity, which is the legacy method. Since this is an older CloudFront distribution, I have legacy access identities selected. The new and recommended way is origin access control, and I’ll demonstrate this elsewhere in the course, where you'll get a chance to experience it in practice. For now, we don’t need to worry too much about it. Just know that it means you can restrict an S3 origin so that it's only accessible via the CloudFront distribution. Another important point to realize when using S3 origins is that whichever protocol is used between the customer and the edge location—the viewer protocol policy—is also used between CloudFront and the S3 origin, called the origin protocol policy. These protocols are matched. If you’re using an S3 origin, you have the same viewer-side and origin-side protocol, whether you're using HTTP or HTTPS. Lastly, you can pass through origin custom headers. If you have any headers you want to pass through to the origin, you can do so in the origin settings. That’s pretty much all that can be customized when using an S3 origin.
Everything’s handled for you, but the complex configuration comes when you're using a custom origin. Let’s look at that next. I’ll click on cancel and then click on create origin. For the origin domain name, I’ll type a placeholder, such as catagram.io. CloudFront is smart enough to realize that this isn’t an S3 bucket, so now I get the additional options that we can configure for custom origins. We still have the ability to specify an origin path, which works in the same way as it does for S3 origins. If we want our behavior to point at an origin but instead of using the top level of that origin, look at a sub-path, we can specify that in the origin path box.
Because this is a custom origin, we can be much more granular with some of the configuration options. For example, we can specify a minimum origin SSL protocol. This configures the minimum protocol level that CloudFront will use when it establishes a connection with your origin. Best practice is always to select the latest version supported by the custom origin to ensure maximum security. You can also configure the origin protocol policy. For S3 origins, the viewer protocol and the origin protocol are matched. When you're using a custom origin, you're able to select from one of three options: HTTP only, HTTPS only, or match the viewer protocol policy. If the viewer protocol policy is HTTP, and this option is selected, CloudFront will connect to your custom origin using HTTP. You can explicitly set either insecure or secure protocols, or you can match it, so you need to choose whichever is appropriate for your particular use case.
With custom origins, you also have the ability to pick the HTTP and HTTPS port to use for CloudFront connections between the edge location and the origin. When you're using S3, you can’t configure this because S3 doesn’t have configurable ports. However, when running a custom origin, you might have different services bound to different ports, and you can select the port for HTTP and HTTPS. The default is 80 for insecure HTTP and 443 for secure HTTPS, but you can change these if you're using different ports on your custom origin. These are really important points to remember, especially for the exam. If you see any exam questions discussing custom ports or the ability to configure the origin protocol policy or the minimum SSL protocol versions, then you’ll need to be using a custom origin.
You also still have the ability to pass through custom headers. In this section of the course, I’ll explain how you can secure origins to ensure they’re only accessible via CloudFront. If you’re not using an S3 origin, you won’t be able to use origin access identities or origin access control. Instead, to secure custom origins, you can pass in a custom header that only you’re aware of and have your custom origin check for that header. This allows you to configure your custom origin to only accept connections from CloudFront. This is an important point to remember if you’re aiming for maximum security with custom origins.
That’s everything I wanted to cover in this lesson. I just wanted to give you a quick walkthrough of some important configuration options available for S3 and custom origins. It's fairly common to see CloudFront distributions use S3 origins, as it’s a popular way to deliver static content. If you need to integrate custom origins, you should be aware of those advanced configuration options. For the exam, make sure you’re aware of the different options for custom and S3 origins because these topics come up in multiple questions. With that being said, that covers everything about origins. Go ahead and complete this lesson, and when you’re ready, I look forward to you joining me in the next!
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to focus on how CloudFront works with SSL. Each CloudFront distribution receives a default domain name when it's created. It's the default way that you access a distribution, and it's a CNAME DNS record. Now, it looks something like this: it starts with a random part and is always followed by cloudfront.net. You can enable HTTPS access to your distribution by default with no additional requirements as long as you use this address. CloudFront is supplied with a default SSL certificate, which uses star.cloudfront.net as the name. So, it covers all of the CloudFront distributions that use that default domain name.
Most of the time, though, you want to use your own custom name with a CloudFront distribution. For example, cdn.catagram.whatever. This is allowed via the alternate domain name feature, where you specify different names that will be used to access a CloudFront distribution. Once these are added and active, you can point that custom name at your CloudFront distribution using a DNS provider such as Route53TPS. You need a certificate applied to the distribution which matches that name. Even if you don't use HTTPS, you need a way of verifying that you own and control the domain. That way is by adding an SSL certificate that matches the name you're adding to the CloudFront distribution. The result is, whether you want to use HTTPS or not, you need to add a cert to the distribution that matches the alternate domain name you're trying to add.
To do this, you either need to generate or import an SSL certificate using the AWS Certificate Manager, known as ACM. Now, this is a regional service. Normally, you need to add a certificate in the same region as the service you're using. So, a load balancer located in AP Southeast 2 would also need a certificate created within ACM, also in the AP Southeast 2 region. However, exceptions to this exist for global services, and one such global service is CloudFront. For these services, the certificate needs to always be created or added in US East 1. Remember this for the exam, it will come up. For CloudFront, if you're wanting to add any certificates, they always need to be in US East 1, which is the Northern Virginia region.
There are a few options that you can set on a CloudFront behavior in terms of how to handle HTTP versus HTTPS. First, you can allow both HTTP and HTTPS, so no restrictions. You will allow the customer to make the choice about the protocol, whether it's insecure or secure. Second, you can redirect any incoming HTTP connections to HTTPS, which is the option many people use to encourage the use of secure HTTP. Finally, you can restrict a behavior within CloudFront to only allow HTTPS, but this does mean that any HTTP connections will fail entirely.
If you choose to use HTTPS with CloudFront, then you have to have the appropriate certificates that match the name you're using for that distribution. For the exam, understanding certificates is really important, which is what I want to cover in detail in the remainder of this lesson. There are actually two sets of connections when any individual is using CloudFront: first, you've got the connection between the viewer and the CloudFront edge location, and second, the connection between CloudFront and the origin that's being used. These are known as the viewer and origin protocols. For the exam, and really try to commit this one to memory, both of those connections need valid public certificates as well as any intermediate certificates in the chain. The key part here is public. Self-signed certificates, if you see that term, will not work with CloudFront. They need to be publicly trusted certificates.
Now that I've covered the basics of HTTPS with CloudFront, let's quickly move on and touch on something else. One really important thing to understand about CloudFront and SSL for the exam is how it's charged. To understand that and why it is this way, you need a little understanding of how SSL has worked over time. Historically, before 2003, every SSL-enabled website needed its own dedicated IP. The reason why this is critical to understand is important if you want to truly understand SSL. SSL and TLS are often used interchangeably, but in this context, I just mean encryption that happens over a network connection.
The problem is that when encryption is being used as part of HTTPS, that encryption happens at the TCP layer, which is much lower level than HTTP, which is an application layer protocol. You might be aware that a single web server can actually host many websites using different names, all using one single IP address. For example, I could host Catergram and DogoGram on the same server using the same IP address. When using HTTP, this works because your browser tells the server which website you're trying to access through something called a host header. Essentially, your browser tells my server that you're requesting a page from Catergram or DogoGram, and so my server knows which website to serve. This happens at the application layer, Layer 7, after the connection has been established.
TLS, or the encrypted part of HTTPS, happens before this point. When you're establishing the encrypted connection between your device and an IP address, the web server identifies itself. If you don't have a way of telling the server which site you're trying to access, it won't know which certificate to use. This is why historically, it wasn't possible to host multiple HTTPS sites on a single IP address, because each site needed its own certificate, and there was no way for the server to determine which certificate to use for a given request.
In 2003, an extension called SNI, or Server Name Indication, was added to TLS. This feature allows a client to tell a server which domain name it's attempting to access during the TLS handshake, before HTTP even gets involved. With SNI, the client can tell the server it wants to access Catergram, and the server can respond with the Catergram certificate, allowing one IP address to host many HTTPS websites, each with their own certificate. However, not all browsers support SNI. If you want to use CloudFront and support HTTPS connections with a custom certificate and custom domain name, then CloudFront needs to provide dedicated IP addresses for older browsers that do not support SNI.
When using CloudFront, you can either choose to use SNI mode, which is free as part of the service, or you can choose to use a dedicated IP at the edge location, which costs money. As of the time of this lesson, this costs $600 per month per distribution. If you only need to support modern browsers that support SNI, it’s free, and you don’t need to pay any extra. You just need to install your SSL certificate. For older browsers that do not support SNI, you need to pay $600 per month for a dedicated IP address.
With that said, one last thing before we finish is to look at the architecture of SSL in CloudFront visually. So, let’s move on and take a look at that next. Architecturally, this is how it looks: we have a CloudFront edge location in the middle and three origins on the right—an S3 bucket, an application load balancer, and a custom origin, which could either be an EC2 instance or an on-premises web server. On the left, we have customers, with one using a modern web browser at the top, and at the bottom, we have customers with slightly older, pre-2003 browsers.
What you need to understand for the exam is the certificate requirements to support this and how the older browsers influence things. If we have a mixture of clients, some of which include older browsers, with this architecture, we would need to use a dedicated IP, which costs extra. If we only had to support modern browsers that support SNI, then we could use the one shared IP address. In either case, our customers use these IP addresses to connect to one or more edge locations, and this is called the viewer protocol or viewer connection—the connection between the viewers or customers and the edge location.
The key consideration here is that the certificate used by the edge location has to be a publicly trusted certificate, trusted by the web browsers our customers use. This generally means something from major certificate authorities such as Komodo, Digisert, Symantec, or AWS Certificate Manager. If you use AWS Certificate Manager, then, just to reiterate, the certificate must be created in US East 1. I'm going to keep stressing this point throughout any CloudFront-related lessons in the course because it’s a frequent exam topic. For anything CloudFront-related, where it integrates with a regional service, such as logging or certificates, assume you have to interact with it in US East 1.
Any public certificate needs to match the name of the CloudFront distribution it’s applied to. If you add a custom domain name, the DNS needs to point to CloudFront, and the certificate needs to match the DNS name you're using. So, that’s the viewer side secured, and a really important point to stress is that you cannot use self-signed certificates. Only publicly trusted certificates can be applied to CloudFront distributions. That’s another really important point for the exam.
On the other side is the connection between the edge location and the origin or origins, known as the origin protocol. The rules for certificates on this side are similar to the viewer side. They need to use publicly trusted certificates, and again, no self-signed certificates. If your origin is S3, you don't need to worry about anything else because S3 handles this natively. You don’t need to apply certificates to your S3 bucket, and indeed, you can’t change the certificate on an S3 bucket. So, if you're using S3 origins, it's really simple—you just point your CloudFront distribution at the origin, and everything works.
If you're using an application load balancer, it needs a publicly trusted certificate, and you can either use one that’s generated externally or use AWS Certificate Manager to generate a managed one. For custom origins like EC2 instances or on-premise servers, you also need a publicly trusted certificate, but these services are not supported by ACM, so you can’t use ACM to manage the certificate on your behalf. Instead, you need to apply the certificates manually.
In all cases for origins, the certificate needs to match the DNS name of the origin. So, in order for SSL to work as an architecture, the certificate applied to CloudFront needs to match the DNS name of whatever your customers are using to access CloudFront. Then, at the origin side, the certificate installed on any of your origins needs to match the DNS name that CloudFront uses to contact the origin. That’s really important.
Now, that’s everything I wanted to cover for this lesson. Thanks for watching. Go ahead and complete it, and when you're ready, I look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to talk about AWS Certificate Manager, or ACM. This is something essential to understand for almost all of the AWS certifications and most real-world projects. You need to know a little bit at the associate level, more at the pro level, and different things matter in each of the different streams: architect, developer, and operations. Now, let's just jump in and get started.
Let's start with the basics to put ACM into context. HTTP, or the Hypertext Transfer Protocol, was initially created without the need for much in the way of security, so no server identity authentication or transport encryption. As HTTP evolved from just text to complex web applications, security vulnerabilities became an issue. For example, if somebody could spoof a website’s DNS name, they could direct users to another potentially compromised web server. Users would be unaware of this because the address displayed by the browser would appear normal, just like they expect. This could be used to gain access to credentials and potentially sniff data in transit.
The evolution of HTTP, HTTPS, or Hypertext Transfer Protocol Secure, was designed to address the problems with HTTP. It uses either SSL or TLS protocols to create a secure tunnel through which normal HTTP can be transferred. In effect, the data is encrypted in transit from the perspective of an outside observer. Now, HTTPS also allows for servers to prove their identity. Using SSL and TLS, servers can be authenticated by using digital certificates. These certificates can be digitally signed by one of the certificate authorities trusted by the web client. Since your web client trusts the CA, you trust the certificate it signs, and that's why you trust the site itself. It makes it harder to spoof.
If the site claiming to be Netflix.com actually has the Netflix.com certificate, which is signed by a trusted certificate authority, then it's almost certain to actually be Netflix.com. To be viewed as secure, a website picks a DNS name like animalsforlive.org, generates a certificate or has one generated for it, signs it, and uses that certificate to prove its identity. So, the DNS name and the certificate are tied together.
Now, ACM can function both as a public certificate authority, generating certificates that are trusted by public web browsers and devices, or as a private certificate authority. This has the same architecture but is something private to your organization, often used by large corporates. With private certificate authorities, you need to configure clients so that they trust the private certificate authority. This is generally done manually by adding this trust into your client laptop and desktop builds or automatically by adding a policy to configure this trust.
In public mode, browsers trust a list of certificate authorities provided by the vendor of that operating system, and these trusted providers can themselves trust other providers, which establishes this chain of trust. With ACM, you can either generate or import certificates. The product can make certificates for you, which just need DNS or email verification to prove that you own the domain. If ACM generates the certificates, it can automatically renew them on your behalf, and you won't have any ongoing issues with expired certificates. If you import certificates generated from another external source, then you are responsible for renewing them, which generally means renewing them with the external source and then importing them again into ACM.
Now, these are really important points to remember for the exam. If ACM generates the certificate for you, it can automatically renew it. If it imports it, you're going to be responsible. Another important point for the exam is that ACM can only deploy certificates to supported services. The certificates are always stored encrypted within the product and deployed in a managed and secure way to those supported services within AWS. However, not all services are supported. In fact, this is generally only CloudFront and load balancers. EC2, for example, which is a self-managed compute service, is not supported because AWS has no way of securing the transfer and deployment. If you manage an EC2 instance and have root access, there will always be a way to access the certificate, and the whole point of ACM is to secure the storage and deployment of those certificates.
That's going to be tested in AWS exams all the time, so be aware that not all AWS services are supported, specifically EC2. Remember that for the exam just to restress: you cannot use ACM with EC2. A few more important things to know before we look at the architecture visually: ACM is a regional service. There is an isolated ACM in US East One, AP Southeast Two, and every AWS region. If you import a certificate into a region or generate a certificate in that region, those certificates cannot leave that region. Once inside, they're locked to that particular region.
Now, and this is really critical for the AWS exams, I cannot stress this enough. If you want to use a certificate within a service, for example, a load balancer in AP Southeast Two, then the certificate needs to be inside ACM in that same region. I’m going to repeat this because it’s that important: To use a certificate from ACM inside a load balancer within a particular region, such as AP Southeast Two, that certificate needs to be within ACM in AP Southeast Two.
There is one exception to this, and it's not really an exception once you understand why. For CloudFront, that service, while being global, should be viewed as running in US East One from an ACM perspective. For CloudFront, conceptually, think about it as the distribution, which is the unit of configuration for CloudFront, being within US East One. And so, you always use US East One for CloudFront certificates. So, one last time: for most services, the certificate needs to be in the same region where the service is located. For CloudFront, always use US East One with ACM. If you generate a certificate in any other region, you won’t be able to deploy it using CloudFront.
Now, let’s look at this visually because everything should start to click. Visually, this is how ACM looks. On the right, we’ve got three regions: US West One at the top, US East One in the middle, and AP Southeast Two at the bottom. Inside each of these regions, we have regionally isolated instances of ACM. Then, also in each region, we have application load balancers together with associated EC2 instances. We also have a CloudFront distribution, associated edge locations, and an S3 origin. For this lesson, the region of the S3 bucket doesn’t matter because we’re focusing on ACM, and S3 does not use ACM for certificates.
On the left, we have some globally distributed users, and on the right, our security specialist. Step number one is that our security specialist will interact with ACM in each of these regions, generate a certificate, and then deploy it out to ACM in each region where service is required. This means that for supported services in those regions, such as application load balancers, those certificates can be used from ACM to services in those regions. What we can't do is deploy cross-region. In this example, from ACM located in US West One to a load balancer located in US East One, cross-region deployment is not supported. Nor can we deploy to unsupported services such as EC2.
For CloudFront, as I mentioned earlier, conceptually think about it as the distribution, the main unit of configuration, being located in US East One. This means when deploying certificates to CloudFront using ACM, the certificate needs to be located in US East One. Once the certificate is linked to the distribution, the distribution can then take the certificate and deploy it out to the edge locations, no matter what regions they're located in.
Once the certificates are deployed onto supported services, they can be used to establish trust from customers to the application load balancer, as with the top example. The same architecture is true for edge locations at the bottom. Once the ACM certificate is deployed to the distribution and then passed out to the edge locations, customers can make secure connections to those edge locations. Now, once again, ACM isn’t used for S3, which handles its own interaction with CloudFront within this architecture, so S3 does not use ACM for any certificates.
Now, that’s all of the architecture I wanted to cover about ACM. It’s everything that you need for the associate or professional level AWS certifications, and enough to get started with the product for any real-world projects. ACM often comes up in the exams focused on diagnosing errors around which certificates can be used in which regions. So now you know that certificates can only be used in the same regions they’re deployed into. Cross-region deployments are not supported, EC2 is not supported, and for CloudFront, from the point of view of ACM, always think of it as being in US East One.
Now, you have all you need with everything I’ve covered in this lesson. Thanks for watching! Go ahead and complete this lesson, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to talk about CloudFront TTL as well as CloudFront invalidations. Both of these features can be used to influence how long objects are stored at edge locations and when they're ejected. We've got a lot to cover, so let's jump in and get started.
Now let's step through in detail exactly what happens with caching one image at one edge location. A simple architecture consists of an S3 origin on the left, an edge location in the middle, and three users at the top, middle, and bottom right. Let's say we have a photo of Whiskers the cat, and he's crying a little, so it's not his best-ever photo, but that's the one uploaded into the S3 bucket by the Categorum application. Our first customer makes a request for the picture of Whiskers. Since this image is not stored on the edge location, an origin fetch happens where the image is retrieved from the origin and placed on the edge location before being returned to the customer, which is the response.
Now, let's say the image of Whiskers is replaced in the origin. We take away the picture of Whiskers the cat crying and replace it with a much better one, showing a much happier-looking picture of Whiskers. However, now that this image has been replaced on the origin, the copy on the edge location is still the old version. What happens when another customer makes a request to the same edge location? This time, the older or bad image of Whiskers is returned. Why? Because it's the one that's cached at the edge location, and from an object perspective, it's still viewed as valid. Even though the origin has a new version, it’s never checked because the cached copy in the edge location is viewed as valid by CloudFront.
There are ways to influence this, which I'll explain later in the lesson, but for now, you need to understand that this architecture can be problematic. At some point, every object cached by CloudFront will expire. When that happens, it doesn't get immediately discarded, but it's viewed as stale, meaning it's no longer current. If another customer requests a copy of this object, the edge location doesn't immediately return the object. Instead, as with step two, it forwards the request to the origin, and the origin will respond in one of two ways. This depends on the version of the object cached at the edge location versus the one stored in the origin. It's either current, or the origin has an updated object.
If the object is current, then a 304 "Not Modified" response is returned, and the object is delivered directly from the edge location to the customer, marking the copy in the edge location as current again. If there is a difference between the edge location and the origin—meaning the origin has a newer version—then a 200 "OK" message is returned along with the new version of the object, which replaces the one cached at the edge location. This is how an edge location behaves within the CloudFront network.
An object stays in the cache ideally for the entire time that it's valid. However, if the edge location faces capacity issues, it could eject the object early. Even when an object expires, the next time it's accessed, assuming it's still within the edge location cache, the edge location checks the version of the object it has versus the one in the origin. If they're the same, a 304 code is returned, and the object is not updated but marked as current again. A new version is only transferred if the communication between the edge location and the origin determines that the version of the object in the origin has been updated.
The problem to understand is step four on this diagram. The middle user received an old version of the object, even though a newer version was stored in the origin. This is an issue because even though the updated copy existed in the origin, the user received a copy of the old object. This is something to be aware of, as there are ways to influence this, which we will explore further in the lesson.
Now, let's talk about object validity. An edge location views an object as not expired when it's within its TTL (Time to Live) period. Before we go further, it's important to understand that the more often the edge location delivers objects directly to your customer (a cache hit), the lower the load on your origin, which results in better performance for the user. So, where possible, we want to avoid edge locations needing to perform origin fetches, as that would degrade CloudFront's performance.
Objects cached by CloudFront have a default validity period of 24 hours, defined on a behavior within a distribution. The default TTL is 24 hours, meaning any objects cached by CloudFront using this behavior will have a TTL of 24 hours. After this time, the object is viewed as expired. Additionally, you can set two other values: the minimum TTL and the maximum TTL. These values, on their own, do not influence caching behavior but set lower and upper bounds for the TTL of individual objects.
It’s possible to define per-object TTL values. If you don't specify an object TTL, the default TTL attached to the behavior is used (the 24-hour default). An origin, whether an S3 bucket or a custom origin, can direct CloudFront to use object-specific TTL values using headers. These headers include
Cache-Control: s-max-age
andCache-Control: max-age
, both of which are set in seconds. These headers tell CloudFront to apply a TTL value in seconds for a particular object. Once the specified TTL has passed, the object is viewed as expired.We also have the
Expires
header, which works differently from the above headers. Instead of specifying a number of seconds, this header specifies a date and time when the object should be viewed as expired. For both the headers specifying seconds and theExpires
header, the minimum and maximum TTL values set on the behavior act as limiters. If a per-object TTL is lower than the minimum TTL for the behavior, the minimum TTL is used, and if it's higher than the maximum TTL, the maximum TTL is applied.It’s important to understand this architecture. The default TTL on a behavior is 24 hours, and this applies to any object without a per-object TTL set. You can also set minimum and maximum TTL values that act as limiters for any per-object TTLs specified using headers like
Cache-Control
andExpires
. These headers can be set using custom origins or S3. If you're using custom origins, the headers can be injected by your application or web server. If using S3, these headers are defined in the object metadata and can be set via the API, command line, or console UI.Let's now cover one last topic before we finish the lesson: cache invalidations. Cache invalidations are performed on a distribution, and whatever invalidation pattern you specify is applied to all edge locations within that distribution. Invalidation is not immediate but expires any objects regardless of their TTL based on the pattern you define. Examples include invalidating a specific object using a specific path (e.g.,
/images/whiskers1.jpeg
), using a wildcard to invalidate objects starting with a pattern (e.g.,/images/whiskers*
), or invalidating all objects in a given path (e.g.,/images/*
).There is also the option to invalidate all objects cached by a distribution with the wildcard
/*
, affecting every edge location. Keep in mind that there is a cost to perform cache invalidations, and this cost is the same regardless of the number of objects matched by the pattern. Cache invalidation should only be used to correct errors, and if you're frequently updating or invalidating individual files, using versioned file names might be a better solution.For example, using versioned file names like
whiskers1_v1.jpeg
and replacing it withwhiskers1_v2.jpeg
allows you to update the file without needing invalidation. This approach avoids the cost of invalidations, ensures that even cached copies in browsers won't interfere with the updated version, and provides better logging and consistency across edge locations.Remember, versioned file names differ from S3 object versioning, which involves different data stored under the same name. Using versioned file names means different file names for each version of an object, ensuring they are cached independently on each edge location. This approach is cost-effective and consistent, making it the preferred choice when you're frequently updating objects.
That's all the theory I wanted to cover in this lesson. Thanks for watching. Go ahead and complete the lesson, and when you're ready, I look forward to you joining me in the next one.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to quickly step through the architecture of CloudFront behaviors. Now, I've introduced them earlier in this section, so you need to have a good grasp of what options are set at the distribution level and what is configured within a behavior. As I've already introduced behaviors, it's going to be easier for me to show you rather than tell you the details of behaviors and how they fit into the wider CloudFront components. So, I'm going to switch over to my console and step through all the features and exactly how things work.
Okay, so let's take a look at some behaviors. I'll go ahead and move to the CloudFront console. I'll click on services and type CloudFront, then select it from the list. Now, I've got two CloudFront distributions—one of them is production, and one of them is for my testing. The production one is blurred out, so I'm just going to go ahead and move into my testing distribution. Distributions are the unit of configuration within CloudFront, and you'll see that there are lots of high-level options configured at a distribution level. It's here where all of the main important configuration options for the distribution are configured. So, let's take a look at some of them.
First, we've got the price class. The price class determines which edge locations your distribution is deployed to. You can make CloudFront slightly cheaper by selecting to only deploy the distribution to US, Canada, and Europe-based edge locations. This will result in a slightly reduced level of performance for any users not in those regions. You can elect to use all of the edge locations, which provides the best performance, or use the option in between, which deploys to only the US, Canada, Europe, Asia, the Middle East, and Africa. Normally, I like to deploy out to all edge locations because I prioritize customer performance, but you do have the ability to narrow this down and select only the US, Canada, and Europe for deployment. Just keep that in mind.
It's also at a distribution that you're able to associate a web application firewall. This is a layer seven firewall product available within AWS. I'll be covering this elsewhere in the course, but it's at a distribution level that you can configure this integration. You create a web ACL within the WAF product and then associate it with a CloudFront distribution. It's also at the distribution level that you can configure alternate domain names for your CloudFront distribution. Notice how my CloudFront distribution has this default domain name, which is the random string unique for this distribution, followed by CloudFront.net. This is the default, but I did configure an alternate domain name, which is labs-bucket.cantral.io, and this is added at a distribution level.
It's also at the distribution level that you can configure the type of SSL certificate that you want to use with the CloudFront distribution. I'll be talking about this elsewhere in this section, where I focus specifically on CloudFront and SSL, but you're able to use the default certificate as long as you're using that default DNS name. If you want to use an alternate domain name and use that with HTTPS, you need to use a custom SSL certificate. And when you're using a custom SSL certificate, that's defined at the distribution level. It uses ACM, and you have to pick between SNI and non-SNI. I'll be talking about exactly what that means in a dedicated lesson elsewhere in this section.
For the exam, this is important: you can select the security policy to use. There are various different security policies, and AWS updates these periodically. This is generally a trade-off because if you pick a more recent security policy, you can potentially prevent any customers with older browsers from accessing your distribution. You need to pick the one that's the best balance of security and accessibility for your users. As well as that, all the things that are configured at a distribution level are supported HTTP versions, whether you want logging on or off, which I'll cover in another lesson coming up elsewhere in the course. These are all the high-level things that can be configured at a distribution level.
There's also a lot which can be configured from a behavior perspective. A single distribution can have multiple behaviors, and I've mentioned that there's always going to be the one default behavior. Let's open that up, select it, and then click on edit. The way that behaviors work is that for any requests coming into an edge location, their pattern is matched against any behaviors for that distribution using this path pattern. The default behavior has a wildcard or star path pattern, so it matches anything which is not matched by another more specific behavior. Once a path pattern is matched against an incoming request, it's then subject to any of the options specified within this behavior. The most important one is which origin or origin group to use, but you can also select the viewer protocol policy, which defines the policy used between the viewer and the edge location. Options include insecure or secure HTTPS. You can redirect insecure requests towards secure HTTPS or only accept secure HTTPS. These options are configurable on a per-behavior basis, which is important to understand.
You can also select to allow different HTTP methods, and again, that's configured on a behavior. You can configure field-level encryption, which I talk about in a dedicated lesson elsewhere in this section, so I won't go over it here in detail. This allows you to encrypt data from the point that it enters the edge location through the CloudFront network, and again, this is configured on a per-behavior basis. You're also able to set all the cache directives within a behavior. You can do this either using legacy cache settings (which mine is configured using because this is an older distribution) or using the newer cache policy and origin request policy settings, which are recommended by AWS. These settings define methods for caching, the cache, and origin request settings. You're able to set whether you want to cache based on any request headers, with options including non-whitelist or all. Again, this is per behavior.
The minimum TTL, maximum TTL, and default TTL are all set on a per-behavior basis. I'll talk about this in much more detail elsewhere in this section of the course. An important one for the exam is that you're also able to restrict viewer access to a behavior. This is different from restricting access to an S3 origin. This option sets the entire behavior to be restricted or private. If you select this, you need to specify the trusted authorization type, which is either trusted key groups or trusted signers. Key groups are the new way of doing this, while signers are the legacy way. For the exam, if you see trusted key groups or trusted signers, you know that it is set to restrict viewer access, and you need things like signed cookies or signed URLs to access the content. I'll be discussing this elsewhere in the course if appropriate.
Many distributions in the real world will have some behaviors which are non-restricted (for example, sign-on) and some behaviors which are restricted. These control access to sensitive content. It's on a behavior that you can set to compress objects automatically. Additionally, on a per-behavior basis, you can associate Lambda at Edge functions with CloudFront. I talk about Lambda at Edge elsewhere in this section of the course, but it's at the behavior level that you associate these Lambda functions.
From a real-world perspective, you'll always have access to Google and the console, so you can easily remind yourself of exactly which options are set on a per-behavior basis versus the distribution. For the exam, do your best to commit these to memory. Specifically, you need to understand that all of the caching controls are set on a behavior, as well as the restrict viewer access. If you remember that those are behavior-based, you'll also remember that you can have different settings for those options for different behaviors in the same distribution. This will help you answer some of the more complex exam questions you might encounter.
With that being said, that's everything I wanted to cover in this lesson. I just wanted to give you a quick walkthrough to help you understand the different components. Go ahead, complete this lesson, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I want to either introduce or refresh your memory on the high-level architecture of CloudFront. We’ll cover what it does, the components it has, and some of the important terminology. This lesson will serve as an introduction or a refresher, so let’s jump in and get started.
CloudFront is a content delivery network (CDN). Its job is to improve the delivery of content from its original location to the viewers of that data, and it does so by caching and using an efficient global network. To illustrate, let’s consider an example where I’m running an application from Australia. The application becomes so successful that it has global users, like Bob on the west coast of the US and Julie in the UK. When they access the application, the data has to travel from Australia to them, but in reality, the route the data takes is often less direct than shown here. Whatever the route, the data travels long distances, which introduces two problems: higher latencies and slower transfer speeds. Both of these impact user experience. Essentially, data is transferred globally every time it is requested, and CloudFront helps us with this challenge.
Before we dive into the architecture of CloudFront, I want to introduce a few key concepts. You might be familiar with some of these terms if you've studied for the Associate Level Solutions Architect Certification. If so, consider this a refresher for some of the more advanced topics that will be covered in later lessons. First, we have an origin, which is the original location of your content. An origin can be an S3 bucket or a custom origin, which refers to anything that runs a web server with a publicly routable IPv4 address. The origin is where your content lives and is served from, and we will discuss more about origins as we move through the section. For one CloudFront configuration, you can have one or more origins.
Next is a distribution, which is the unit of configuration within CloudFront. To use CloudFront, you create a distribution, and this distribution gets deployed to the CloudFront network. Almost everything is configured within a distribution, either directly or indirectly. So, when I mentioned that a CloudFront configuration can have multiple origins, they are all configured inside a distribution, though indirectly, as we’ll discuss later in this lesson. Then, we have edge locations, which are the pieces of global infrastructure where your content is cached. AWS has regions located globally, generally in all major markets, but edge locations are more numerous and distributed closer to your customers. As of the time of creating this lesson, there are over 200 edge locations, with a good chance one is near you. Edge locations are smaller than AWS regions, typically consisting of one or more racks in a third-party data center, with about 90% of storage and a bit of compute for certain AWS services. These edge locations are primarily used for caching data, so you cannot deploy EC2 instances directly to them.
The last term I want to introduce is a regional edge cache, which is much bigger than edge locations and fewer in number. These caches are designed to hold more data, including content that is accessed less frequently but still benefits from being cached closer to customers. Regional edge caches are especially useful in large, global deployments of CloudFront. Now, the upcoming diagram will clarify how regional edge caches and edge locations are related, so let’s take a look at that next.
CloudFront is not that complex architecturally. At a high level, Bob uploads content to an S3 bucket, which will be the origin for his CloudFront distribution. Bob also creates a CloudFront distribution, which is the configuration for CloudFront. On the distribution, he configures the S3 bucket as the origin, and then on the other side of the architecture are the edge locations—global locations that cache the content and distribute it. These edge locations are spread out globally to be as close to customers as possible. Along with the distribution, a domain name is created, usually ending in CloudFront.net and unique to each distribution. You can also use your own domain name for the distribution, such as animalsforlife.org. Once the distribution is configured, it is deployed to the CloudFront network, pushing the configuration to all the chosen edge locations, making them available to customers.
Architecturally, between the edge locations and the origin are the regional edge caches, which are larger and support multiple local edge locations in the same geographic area. Now let’s assume we have two customers, Julie and Moss, located in different cities but within the same continent. When they access the website animalsforlife.org, they are directed to their closest edge location. If Julie accesses the content first, her local edge location is checked for the requested object, such as whiskers.jpg. If it’s cached locally, the object is returned quickly, resulting in a cache hit—this is a good thing. If the object is not cached locally, it’s a cache miss, and the edge location checks the regional edge cache. If the object is in the regional cache, it’s returned from there, and the edge location caches it. If it’s not in the regional edge cache, the content is fetched from the origin and cached at both the regional and local edge locations. When Moss accesses the same content, the process is similar. His edge location checks for the cached object, and if it’s in the regional edge cache from Julie’s earlier request, it’s delivered to Moss.
By deploying CloudFront, you reduce the load on origins and improve performance globally. There are two key things to know about CloudFront: first, it integrates with AWS Certificate Manager (ACM), so you can use SSL certificates with CloudFront. Second, CloudFront is only for download operations; uploads go directly to the origin for processing, as CloudFront performs no write caching. This distinction is important for exam questions that test your knowledge about CloudFront’s caching capabilities.
I want to elaborate on one more point, which will help you understand the following lessons. I mentioned earlier that distributions are the main configuration entity in CloudFront, but they don’t directly store a lot of the important configuration. That’s actually contained within behaviors, which are sub-configurations inside distributions. Behaviors are linked to origins, and they control things like TTL (time-to-live), caching policies, and whether content is public or private. Each distribution has at least one behavior, the default behavior, which matches everything with a wildcard. You can define additional, more specific behaviors, which take priority. For example, if you have private images, you could create a behavior that matches a specific path pattern (like IMG/*), so that different content types can have different configurations. Behaviors allow for more granular control over how content is cached and delivered.
This concludes the quick refresher. In the remaining lessons of this section, I’ll be focusing on key features and functionality provided by CloudFront. For now, complete this lesson, and when you’re ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this video, I want to briefly talk about Amazon AppFlow at a high level. In this video, I'll be covering the basics. If you need any other knowledge for the course that you're studying, there will be additional videos. If you only see this one, don't worry, it's everything that you'll need to know. Now let's jump in and get started.
AppFlow is an interesting service in that if you’ve worked in this space and had this specific problem, you’ll immediately see the value that the product provides. If not, you might not get the point. AppFlow is a fully managed integration service, and you can think of it like middleware. It allows you to exchange data between applications using flows. Applications are connected using connectors, and the main unit of configuration in the product is a flow.
A flow consists of a source connector and a destination connector, along with other optional components. But at a high level, the job of the product is to exchange data. Examples of this might include syncing data across applications or aggregating data from different sources together to avoid data silos within your organization. By default, the service uses public endpoints, which allows it to interact with public SaaS applications, but it can also work using PrivateLink to access private sources.
It comes with functionality for connecting to many of the most popular applications by default, but you can also use the custom connector SDK to build your own. Some examples where you might use the product are to sync contact records from Salesforce to Redshift for analysis, or to copy your support tickets from something like Zendesk into S3 for storage or analysis. AppFlow is one of those services that can do a lot, and if you have a need for this type of functionality, you will immediately understand how awesome it is.
Visually, this is how the architecture might look. We start with a flow, and into this, we configure source and destination connections. In this case, the source is Slack, and the destination is Redshift. Connections store the configuration and credentials to access an application. It's important to understand that connections are defined separately from flows so they can be reused across many different flows. It’s using connections, with this example, that the product knows how to connect to Slack and Redshift and what authentication details to use for both of these applications.
Next, within the flow, we define source and destination field mappings, as well as any optional data transformation configurations. This is what, in this example, tells AppFlow what fields we’re interested in from Slack and where to write them in Redshift. We can also define optional filtering and validation within the flow to control what data we want to copy and what checks, if any, should be performed en route. And that, at a high level, is AppFlow. It’s designed to enable you to exchange data between applications in a managed way.
A basic architectural understanding is enough knowledge for most of the AWS exams, and if you need to know more, I'll include additional videos. If you only see this one, this is everything you need to know. At this point, that's everything I’m going to cover in this video. So go ahead and complete the video, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to very briefly touch on Amazon MQ. Amazon MQ is a product that is almost like a merge between SQS and SNS but using open standards. Now, it's something that you need to understand for the exam, so let's jump in and get started, and I'm going to keep this as brief as possible.
To understand when you would use Amazon MQ, it's good to put it into context versus the other AWS products that are similar. SNS and SQS are AWS services that utilize AWS APIs. SNS provides topics, which are one-to-many communication channels, and SQS provides queues, which are one-to-one communication channels. While queues can have multiple compute things adding to the queue and removing from it, conceptually, it's the same worker group at each side, so it's one-to-one communication. Queues are generally used to allow different components of an application to be decoupled.
Now, both of these services, SNS and SQS, are public services within AWS, meaning they can be accessed from anywhere with network connectivity to the public endpoint for those services. They're also both highly scalable and integrated with AWS from an API perspective. Other AWS products can directly use them as well. The problem, however, is that larger organizations might already use topics and queues, meaning they might already have an on-premise messaging system. This on-premise messaging or queuing system might already use certain industry standards. That organization might want to migrate that existing system into AWS, and in that case, SNS and SQS won't work without application modification.
To migrate an existing messaging or queuing system into AWS without application modification, we need a standards-compliant solution. Amazon MQ is an open-source message broker. It's based on a managed implementation of Apache Active MQ, one of the most common enterprise message broker solutions. If you need a system that supports the JMS API or protocols such as AMQP, MQTT, OpenWire, or Stomp, then this means you need Amazon MQ.
The product provides both queues and topics, so it supports both one-to-one and one-to-many messaging architectures, and it does so within the same product. While Amazon MQ is a managed service, it’s not managed in the same way that SNS and SQS are. With Amazon MQ, you're provided with message broker servers, which can either be a single instance for test development or a highly available pair for production usage. One critical thing to understand about Amazon MQ is that, unlike SQS and SNS, it's not a public service. It runs in a VPC, meaning private networking or holes in a firewall are required for anyone who needs to access it. It also doesn't have native AWS integration, so you can't use it with other AWS products and services in the same way as SNS and SQS because other services expect to use SNS and SQS.
So, you do have to keep in mind both the strengths and limitations of this product. In the exam, you will be expected to identify the types of situations when you would select Amazon MQ, and I'll be touching upon that towards the end of this lesson. This is an architecture of SNS, which you've seen before. Publishers add messages to a topic, and subscribers get those messages delivered. SNS as a service runs in the public AWS zone, so it's accessible anywhere with a network connection.
This is how SQS functions. Again, messages can be added to a queue and received at the other side of that queue. Once again, it's an AWS public service, meaning it’s accessible anywhere with network connectivity to the public SQS endpoint. Now, visually, this is how a typical Amazon MQ deployment might look. You might have an existing on-premises environment with an existing messaging infrastructure. In this example, a message producer interacts with an on-premise implementation of Active MQ.
If you want to migrate this into AWS or begin a period of coexistence, then you need an AWS environment. Let’s say we have one with two availability zones, and let's say you provision a highly available pair of Amazon MQ servers in this environment. These servers would deploy a primary and standby and use EFS for shared storage between the two by default. This means data is replicated between availability zones and brokers.
The crucial thing to understand for the exam is that Amazon MQ is not a public service, and this means you need a private network connection between your on-premises environment and AWS. This could be a virtual private network (VPN) or a direct connect. This private networking ensures that the on-premises broker and the AWS managed pair can communicate over this connection. This allows any migrated application to communicate with those brokers using standard protocols and integrate with the on-premises implementation.
Before we finish, I want to go through a few considerations you should be aware of for the exam. Your default position should be to use SNS or SQS for most new implementations where you require topics or queues. You should always select SNS or SQS if you need topics or queues and AWS integration. For example, if you want to take advantage of other AWS services for logging, permissions, encryption, or if you're using other AWS services that expect SNS and SQS to exist, this is a good reason to pick SNS or SQS.
You should choose Amazon MQ if you need to migrate from an existing system with little to no application change, especially if you need to utilize APIs such as JMS or protocols like AMQP, MQTT, OpenWire, and Stomp. Remember, though, if you do decide to use Amazon MQ—and this is really important, as I’ve seen it in several exam questions—you need to make sure that you have the appropriate private networking configured. Amazon MQ is not a public service. It occupies a VPC, and anything accessing the service needs to have access to that networking inside the VPC.
Again, you don't need extensive knowledge of this product for the exam, but I have started to see more and more questions dealing with hybrid-style scenarios where an existing system exists on-premises, and you need to migrate from it into AWS or establish coexistence with that existing system. For both of these architectures, Amazon MQ is an excellent solution. With that being said, that's everything I wanted to cover in this lesson. Go ahead and complete the lesson, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to talk about AWS Glue. Glue is an interesting product that starts to feature more in the AWS exams and in real-world projects, which I've been exposed to. Now, I'm only going to be talking about it in terms of the architectural theory in this lesson because anything more is well beyond the scope of this course. So let's just jump in and take a look.
AWS Glue is a serverless ETL (Extract, Transform, and Load) system. There's another product within AWS called Data Pipeline, which can also handle ETL processes, but this uses compute within your account. Specifically, it creates EMR clusters to perform the tasks. Glue, on the other hand, is serverless, meaning AWS provides and manages all of the resources as part of the managed service.
At a high level, Glue is used to move and transform data between a source and destination. These sources and destinations can include databases, streams, or other stores of data, such as S3. If you want to take source data and restructure or enrich it, you can use a Glue job to handle that in a serverless way. Glue also crawls data sources and generates the AWS Glue Data Catalog, which I'll cover in more detail next. Glue supports a range of data locations, such as source data stores like S3, RDS, and any JDBC-compatible databases (e.g., Redshift or others), and DynamoDB. Additionally, Glue can work with source streams like Kinesis Data Streams and Apache Kafka, and data targets including S3, RDS, and again, any JDBC-compatible databases.
Now, let's quickly focus on the Data Catalog. AWS Glue provides a data catalog, which is a collection of metadata combined with data management and search tools. Essentially, it's persistent metadata about data sources within a region. The AWS Glue Data Catalog provides one unique data catalog in every region of every AWS account, and it helps avoid data silos. Rather than data being hidden away somewhere, managed by a particular team and not visible to others in the organization, it makes this metadata and data structure available to be browsed and brought into other systems using the ETL features of Glue. This helps improve the visibility of data across an organization.
Various AWS data-related products can use Glue for ETL and catalog services, such as Athena, Redshift Spectrum, EMR, and AWS Lake Formation. They all use the Data Catalog in some way, and the way data is discovered is by configuring crawlers, giving them credentials, and then pointing them at sources and letting them go to work. Visually, this is how the components of Glue fit together. Let’s start with the Data Catalog functionality.
So, we have some data sources on the left: S3, RDS, maybe some JDBC-compatible stores, DynamoDB, Kinesis, Kafka, and more. We configure data crawlers, which connect to these stores, determine schemas, create metadata, and all of this information goes into a data catalog. This means that rather than those data stores being siloed, we now have visibility of them across the organization. The data catalog can be connected to by users of the AWS account, allowing all members of the business to get value from all of the data by using it in areas beyond where it was initially gathered. Essentially, it publicizes data from across an organization and makes it visible, allowing teams like finance to use data that was gathered by different teams within the organization.
The other components of Glue are Glue jobs, and the Data Catalog is also used as part of Glue jobs. Glue jobs are extract, transform, and load (ETL) jobs. Data is extracted from a source and then loaded into a destination, with Glue performing transformations in between using a script that you create. Since Glue is serverless, you don't need to manage the compute used to perform the transformation. AWS maintains a pool of resources, and these are used to perform transformation tasks when required, with billing based only on the resources consumed.
Glue jobs can be started manually or invoked in an event-driven way using events from other sources or scheduled events within EventBridge. That's pretty much what you need to understand about Glue for the exam. It's an extract, transform, and load (ETL) service and a data catalog service that is serverless and forms part of the data and analytics services provided by AWS. Historically, the ETL part of this has been done using Data Pipeline, so in exam questions, you will generally only see one or the other: either Data Pipeline or Glue. If you see both, look for keywords such as serverless, ad hoc, or cost-effective, and if you see these, you should pick Glue rather than Data Pipeline.
Data Pipeline does offer some additional functionality compared to Glue, but over time, my expectation is that the Glue product will replace the functionality offered by Data Pipeline. At this point, though, that's everything I wanted to cover in this lesson. Go ahead and complete the lesson, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to go into some more depth about Amazon Cognito, which is one of the core identity products available within AWS. Now, we do have a lot to cover, so let's jump in and get started. This is going to be one of the most important non-graphical screens of information in the entire course. I want to make sure that you understand the terrible naming within the Cognito product. Cognito provides two main pieces of functionality. Both are very different, but both are essential to understand.
The service as a whole provides authentication, authorization, and user management for web and/or mobile applications. Authentication means to log in to verify credentials, authorization means to manage access to services, and user management means to allow the creation and management of a serverless user database. Now, there are two parts of Cognito: user pools and identity pools, and the naming on these is terrible. This is why most students struggle to understand the detail of how Cognito works. The end goal of a user pool is to allow you to sign in, and if successful, you get a JSON web token, known as a JWT. This JWT can be used for authentication with applications, and certain AWS products such as API Gateway can even accept it directly. But, and this is crucial to understand, most AWS services cannot use JWTs. To access most AWS services, you need actual AWS credentials.
Now, user pools do not grant access to AWS services; their job is to control sign-in and deliver a JWT. So they do things like sign-up and sign-in services. They also provide a built-in customizable web user interface to sign in users. They provide certain security features such as multi-factor authentication, check for compromised credentials, and offer account takeover protection, as well as phone and email verification. You can also implement customized workflows and user migration by using Lambda triggers, and we'll talk about that if applicable elsewhere in the course.
Now, where it gets confusing is that user pools, as well as allowing sign-in from built-in users, they also allow social sign-in using identities provided by Facebook, Google, Amazon, Apple, as well as offering sign-in services using other identity types such as SAML identity providers. But the important thing to understand is this is about offering a joined-up user management experience. At no point can a user pool be used to directly access most AWS resources. When you think of user pools, imagine a database of users, which can include external identities. They sign in and they get a JWT. That's it. I'm stressing this point because it's really important to conceptually separate this from an identity pool, which is coming up next.
Now, the aim of an identity pool is to exchange a type of external identity for a set of temporary AWS credentials, which can then be used to access AWS resources. Now, one option is unauthenticated identities, which can be used to offer guest access to AWS resources. Imagine you have a mobile application and want to allow high scores to be stored in a leaderboard, which is hosted using DynamoDB, and you want to offer this without a user having to sign up, and this is one way to do that. Identity pools can also be used to swap an external identity for temporary AWS credentials, and this means things like Google identity, Facebook, Twitter, SAML 2.0 for corporate logins, and even user pool identities. So from an identity pool perspective, user pools are just treated as another form of identity.
Now, all of these are examples of authenticated identities. If another identity provider, which we trust, says that they have authenticated successfully, then identity pools will exchange that identity for temporary AWS credentials. Now, I hope at this point that you do see the difference. User pools are about offering a joined-up sign-up or sign-in experience with user directory and profile management services. So it's about login and about managing user identities. Identity pools are about swapping either an unauthenticated or authenticated identity for AWS credentials, and one possible type of identity is actually a user pool identity. And this is a reason why these two different components of Cognito are often difficult to separate because they can operate together.
Now, identity pools work by assuming an IAM role on behalf of the identity. That assumption generates temporary credentials, and they're provided back in return in most cases to a mobile or web application. These IAM roles are configured within identity pools, and there's going to be a demo coming up very soon where you can experience that. Now, for the rest of this lesson, let's just step through a few architecture overviews, and I think doing this visually will help you to understand how the product works. First, let's step through an architecture that just uses user pools. Remember, user pools are about user management, sign-in, sign-up, and anything associated with that process.
So we start with a web and mobile application and a Cognito user pool with both internal identities and social sign-in. So anyone can sign in to the pool with any type of identity, and the result is a Cognito user pool token, also known as a JSON web token or JWT. This user pool token proves that the identity has been used to sign in, and it now represents a Cognito user pool user. Whether an internal user is used or a social identity, the authenticated identity is now a user pool identity. And this token can then be used to access self-managed resources such as applications running on servers that you manage or accessing databases which you also manage. It can also be used with an API Gateway, which is capable of accepting user pool tokens directly. Remember, these are known as JWTs. An API Gateway is capable of accepting JWTs for authentication.
So let's focus for a second on what's just happened. A user pool is a collection of identities of users. It's used to allow sign-up and sign-in both for internal users and social sign-ins. The tokens which are generated as a result can be used for self-managed systems, and the tokens can be used to authenticate for API Gateway. But, and this is the single biggest thing to remember about Cognito, these tokens cannot be used to access AWS resources. In general, that requires AWS credentials, and AWS credentials can be handed out via identity pools. So let's look at those next.
So we have the same web and mobile application. This time, though, we aren't using user pools. We're allowing customers to log in directly using external identities. How this works is as follows. We start with a collection of supported external identities. And these include the same social identities, which I've demonstrated previously with user pools. Our application allows users to sign in with any of those external identities. So when they click on a sign-in button within our application, they're directed at an external ID provider sign-in page. You might have experienced one of these before. This is an example of the sign-in with Google page. After a customer authenticates with their Google credentials, which it's worth pointing out that we never have access to, because this sign-in takes place on the external identity provider, in any case, we receive a Google token as a result, but it could be a Facebook token, an Amazon token, or whatever external ID provider is used. Crucially, it can be one of many different types.
If we want to support many different external identity providers, then we need to configure that support. But now that we have this external ID provider token, this proves that a user has logged in with an external ID provider. This token can't be used to access AWS resources, and that's where identity pools come in handy. Our application takes this token and passes it to an identity pool that we've configured. We've configured this to support every external identity that we want to allow logins from. This is a key thing to keep in mind. If we want to support five external ID providers, we need five different configurations, five different types of tokens to be supported.
What happens next is that Cognito is configured with roles, at least one for authenticated identities and one for unauthenticated or guest identities. In this case, we have an authenticated identity, the Google token. And so, on our behalf, Cognito assumes a role and generates temporary AWS credentials, which are then passed back to the application. The application can then use these credentials to access AWS resources. Once they expire, the application renews them again using Cognito, and the process continues. The permissions the application has are based on the roles' permissions. At no point does the application store any credentials within code or any credentials permanently. So the process is that an external identity provider authenticates a user, Cognito identity pools swap the external ID token for temporary credentials, and these are used to authorize access to AWS resources.
So once again, focus on the fact that user pools are about sign-in and sign-up for users, and identity pools are about swapping identity tokens from an external ID provider for temporary AWS credentials. These are two very different and isolated tasks. Now, you can use user pools and identity pools together to fix one small lingering problem. With this configuration, your application has to be able to deal with many different ID tokens from many different external providers. Now, one option is that we could use user pools to handle the many different types of identity, and then we can use identity pools to swap the Cognito user pool token for AWS credentials.
Now, the swapping of any external ID provider token for AWS credentials is known as Web Identity Federation, and you're going to experience that term both in the real world and in the exam. So let's quickly step through the final architecture, which combines both user pools and identity pools. We start with a user pool, and this is configured to support external identities and its internal store of users. Whatever is used, whichever identity type is used to log in, the identity that's authenticated is now a Cognito user pool user. So there's only one type of token which is generated, whether sign-in is using internal users or social sign-in. This is the user pool token or JWT.
By using a user pool, we've abstracted away from all of the configuration of many different external ID providers. We have conceptually one user store to manage, one set of user profiles all provided via a Cognito user pool. So if you log in with a user pool user or if you log in via the user pool but using Google credentials, the outcome is the same. A user pool token is returned to the application, so the user pool simplifies the management of identity tokens. Next, the application can pass this user pool token into an identity pool, and this assumes an IAM role defined in the identity pool, which generates temporary AWS credentials, and these temporary credentials are returned to the application. The benefit to this approach is that the identity pool need only be configured with a single external identity provider, the user pool. But otherwise, the process is the same as using an identity pool directly, just with less admin overhead. The application can then use those AWS credentials to access AWS resources, and that's pretty much everything I wanted to cover.
Now, in summary, user pools manage user sign-up and user sign-in, either internal or using social logins. And what you get as a result is a user pool token, also known as a JSON web token or JWT, and that is the output of any form of sign-in using user pools. Now, identity pools swap external identity tokens for AWS credentials. This process is called federation. External identity tokens can be direct external identity tokens, such as Google, Amazon, Facebook, and many others, or they can be user pool tokens, which can themselves represent external ID logins. Once an application uses an identity pool to gain access to temporary AWS credentials, it can access AWS resources.
Now, this process allows for a near-unlimited number of users. An unlimited is much more than the 5000 IAM user limit, which means this is great for web-scale applications. Now you're going to get experience of identity pools in an upcoming advanced demo. For now, though, that's everything that I wanted to cover. Really try to focus on understanding the two different parts of Cognito really well. I promise it will be helpful for both the exam and for the real world. Now, at this point, that's everything I wanted to cover, so thanks for watching. Go ahead and complete this video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover another product within the Kinesis family, Kinesis Video Streams. Now this product is a little bit different than the others in the family; it's still used for streams but this time for video data. So let's jump in and take a look. Kinesis Video Streams allows you to ingest live video streaming data from producers, and producers can be security cameras, smartphones, cars, drones, or non-video but time serialized data such as audio data, thermal data, depth, or even radar data. Now once the media is inside AWS, then consumers can access the data frame by frame or as needed to perform further analysis, and on the next screen, I'll be demonstrating an architecture involving recognition, which is a service I cover elsewhere in the course.
Kinesis Video Streams can persist data and encrypt data both in transit and at rest, and it does this as a managed service. Now you can't access the data directly that's ingested by Kinesis Video Streams, and that's really critical to understand for the exam. It's not stored in its original format; it's all been indexed and stored in a structured way inside the product, so don't let any exam question fool you into thinking that you can access the data directly on storage such as EBS or S3 or EFS. It's not possible—you have to go via the product itself. Now Kinesis Video Streams integrates with other AWS services, and two really common examples are recognition for live stream deep learning-based analytics, for example, facial recognition, and something like Kinect for voicemail or other audio streaming.
Now let's step through a fairly common style of architecture which uses Kinesis Video Streams and recognition, because this should help you understand all the different components and how they can be used for the exam. Now let's say we have a smart home, so we have a cat, a doggo, and three video cameras, and we want a solution where we can detect any known or unknown faces in the house and alert us if anything is concerning. So those three cameras stream their video feeds into AWS, specifically three Kinesis Video Streams, one per camera. Now this means we don't need any processing in the house, no hardware designed to perform complex analysis on the video, and it means that we've got a location to store the video data in some form outside of the property.
So we configure those three video streams to integrate with recognition video. This is a product I've talked about elsewhere in the course, which provides deep learning-based intelligence for images and video. One of the things that it can do is facial recognition on live video streams, so the Kinesis Video Streams are integrated with recognition and we also define a face collection, so data on some known faces which we expect in our house. So we've got Bob the homeowner, Julie his friend, another random friend, and Whiskers, a small group of cat sitters for when his human minion isn't around. Now recognition then analyzes the streamed data and outputs an analysis to a Kinesis data stream. The analysis includes details of any faces which are detected in the video stream, and in addition to that list of detected faces, it can identify any of those which it has a level of confidence match one of the faces in the face collection.
Now we can configure a Lambda function to be invoked based on records in the Kinesis data streams, so the Lambda function is invoked and can analyze every record in the stream. Then it can make some logic-based decisions based on whether it detects known or unknown faces, and if a face is detected which shouldn't be there, then the Lambda function can utilize the Simple Notification Service (SNS), which can be used to send Bob or Julie a notification. This is an example of a very simple architecture using Kinesis Video Streams and recognition that can be used for an event-driven video analytics workflow. Now the product is capable of doing so much more, but in the exam, if you see any mention of any live video streaming and any analytics that needs to be performed on that video stream, if you see any mention of G streamer or RTSP, then you can probably think about using Kinesis Video Streams as your default answer.
With that being said, though, I don't expect it to feature in the exam in a detailed way, so that's everything that you need to know to cover you for any exam questions. So go ahead, complete this video, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover Amazon Kinesis Data Analytics. This is a real-time data processing product, and it's critical that you understand its features together with when you should and shouldn't use it for the exam. Before I start talking about Kinesis Data Analytics, I want to position the product relative to everything else. Kinesis data streams are used to allow the large-scale ingestion of data into AWS and the consumption of that data by other compute resources known as consumers. Kinesis Data Firehose provides delivery services. It accepts data in and then delivers it to supported destinations in near real-time and it can also use Lambda to perform transformation of that data as it passes through. Kinesis Data Analytics is a service that provides real-time processing of data which flows through it using the structured query language known as SQL. Data inputs at one side, queries run against that data in real-time, and then data is output to destinations at the other.
The product ingests from either Kinesis data streams or Kinesis Firehose and can optionally pull in static reference data from S3, but I'll show you how that works visually in a moment. Now after data is processed, it can be sent on in real-time to destinations, and currently, the supported destinations are Firehose and indirectly any of the destinations which Firehose supports. But keep in mind, if you're using Firehose, then the data becomes near real-time rather than real-time. The product also directly supports AWS Lambda as a destination, as well as Kinesis Data Streams, and in both of those cases, the data delivery is real-time. So you only have near real-time if you choose Firehose or any of those indirect destinations. If you use Lambda or Kinesis Data Streams, then you keep the real-time nature of the data. Conceptually, the product fits between two streams of data: input streams and output streams, and it allows you, in real-time, to use SQL queries to adjust the data from the input to the output.
Now let's look at it visually because it will be easier to see how all of the various components fit together. So on the left, we start with the inputs, the source streams, and this can be Kinesis Streams or Kinesis Firehose. In the middle, we create a Kinesis Analytics application; this is a real-time product, and I'll explain what that means in a second. The Kinesis Analytics application can also take data in from a static reference source, an S3 bucket, and then the Kinesis Analytics application will output to destination streams on the right, so Kinesis Streams or Kinesis Firehose. Remember, all of these are external sources or destinations; they exist outside of Kinesis Data Analytics. Kinesis Data Analytics doesn't actually modify the sources in any way. What actually happens is this: inside the analytics application, you define sources and destinations known as inputs and outputs.
So conceptually, what happens is for the input side, objects called in-application input streams are created based on the inputs. Now you can think of these like normal database tables, but they contain a constantly updated stream of data from the input sources, the actual Kinesis Streams or Firehose. These exist inside the analytics application, but they always match what's happening on the streams which are outside of the application. Now the reference table is a table which matches data contained within an S3 bucket and it can be used to store static data which can enrich the data coming in over the streams. Consider the example of a popular online game where a Kinesis Stream has all of the data about player scores and player activities. In this particular case, the reference table might contain data on player information which can augment the stuff coming in via the stream. So if the stream only contains the raw score and activity data, then the reference data will contain other metadata about those players, so maybe player names, certain items the player has, or awards, and these can all be used to enrich the data that's coming in real-time from Kinesis Streams.
Now the core to the Kinesis Analytics application is the application code, and this is coded using the structured query language, or SQL. It processes inputs and it produces outputs. So in this case, it operates on data in the in-application input stream table and the reference table, and any output from the SQL statement is added to in-application output streams. Again, think of these like tables which exist within the Kinesis Analytics application; only these tables map onto real external streams, so any data that's outputted into those tables by the Kinesis Analytics application is entered onto the Kinesis Stream or Kinesis Firehose, and then these will feed into any consumers of the stream or destinations of the firehose. Additionally, any errors generated by the SQL query can be added to an in-application error stream, and all of this happens in real time. So data is captured from the source streams via the in-application input stream, the virtual tables. It's manipulated by the analytics application using the SQL query, and then stored into the in-application output streams which put that data into either the external Kinesis Stream or external Kinesis Firehose.
All of this just to stress it again happens in real time, and if the output data is delivered into a Kinesis Stream, then it stays real-time. If the output data is delivered into a Kinesis Firehose, then it becomes near real-time, delivering to all of those supported destinations. Now, you only pay for the data processed by the application, but it is not cheap, so you should only use it for scenarios which really fit this type of need. Before we finish this lesson, let's talk about the scenarios where you might choose to use Kinesis Data Analytics. There are some particular use cases or scenarios which fit using Kinesis Data Analytics. At a high level, this is anything which uses streaming data that needs real-time SQL-based processing, so things like time series analytics, maybe election data and e-sports, things like real-time dashboards for games, high score tables or leaderboards, and even things like real-time metrics for security and response teams. Anything which needs real-time stream-based SQL processing is an ideal candidate for Kinesis Data Analytics.
Now, I mentioned in the previous lesson that Data Firehose can also support transformation of data using Lambda, but remember the key differentiator is that Data Firehose is not a real-time product, and using Lambda you're restricted to relatively simple manipulations of data. Using Kinesis Data Analytics, you can create complex SQL queries and use those queries to manipulate input data into whatever format you want for the output data. So it has a lot more in terms of features than Data Firehose, so if you're dealing with any exam questions which need really complex manipulation of data in real-time, then Kinesis Data Analytics is the product to choose. Okay, so with that being said, that's everything that I wanted to cover in this theory lesson. Go ahead and complete the lesson, and then when you're ready, I look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to talk in detail about Amazon Kinesis Data Firehose. This is one product out of the Kinesis product set that combines a design to cope with large amounts of streaming data ingestion, consumption, and management within AWS. It's important for the exam that you really understand the different situations when you would use each of the Kinesis family of products. So let's jump in and explore Data Firehose in detail.
You learned in the last lesson that Kinesis Data Streams is a product that provides a way for producers to send huge quantities of data into AWS, storing that data for a window of time, and then allowing multiple consumers to consume that data at different rates. Producers need to be designed to put data into Kinesis, and consumers need to be designed to consume data from Kinesis. What Kinesis by default doesn't offer is a way to persist that data. Once records in Kinesis age past the end of the rolling window, then they're gone forever. Kinesis Data Firehose is a fully managed service to deliver data to supported services, including S3, which lets data be persisted beyond the rolling window of Kinesis Data Streams. Data Firehose is also used to load data into data lake products, data stores, and analytics services within AWS.
Data Firehose scales automatically. It's fully serverless, and it's resilient. Firehose accepts data and offers near-real-time delivery of that data to destinations. Now, this is key for the exam: It is not a real-time product; it is a near-real-time product. Generally, the delay is around the 60-second mark, so it's not like Kinesis, which offers consumers fully real-time access to data that is ingested. Firehose is near real-time, so remember that one for the exam.
Firehose also supports the transformation of data on the fly using Lambda. Anything that you can define in a Lambda function can be done to data being handled by Firehose, but be aware that it can add latency depending on the complexity of the processing. Firehose is a pay-as-you-go service, and you'll be billed based on the data volume passing through the service. It's a really cost-effective service that handles the delivery of data through to supported destinations.
Let’s look at the architecture of Firehose visually. We start with Kinesis Data Firehose in the middle. The end result of Firehose is to deliver incoming data through to a number of supported destinations. Now these are important for you to remember for the exam. You need to be able to pick if Firehose is a valid solution, and for that, you need to know the valid destinations for the service. So, it can deliver data to HTTP endpoints, which means it can deliver to third-party providers. It directly supports delivery to Splunk, Redshift, Elasticsearch, and finally, it can deliver data into S3.
Firehose can directly accept data from producers, or that data can be obtained from a Kinesis Data Stream. So, if you already have a set of producers adding data into a Kinesis Data Stream, you might want to integrate that with Firehose. Remember, these producers are adding data into the Kinesis Stream. That data is available in real-time by any consumers of that stream, but Kinesis offers no way to persist that data anywhere or deliver it natively to any other services. But what we can do is integrate the Kinesis Data Stream with the Kinesis Firehose delivery stream. That data is delivered into Firehose in real-time.
Producers can also send data directly into Firehose if you have no need for the features that Kinesis Data Streams provide or you just want to use Firehose directly. In any case, Firehose receives the data in real-time, but this is where that changes. So, even though Firehose receives data in real-time, Firehose itself is not a real-time service. Kinesis Data Streams are real-time, but Firehose is known as a near-real-time service. What this means in practice is that any data being handled by the service is buffered for delivery. Firehose waits for one MB of data or 60 seconds. These can be adjusted, but these are the general minimums of the product.
For low-volume producers, Firehose will generally wait for the full 60 seconds and then deliver that data through to the destinations. For high-volume producers, it will deliver every MB of data that's injected into the product. So, even though Firehose gets the data in real-time, it doesn't deliver it to the destination in real-time, and that's essential to remember for the exam. If there are any answers involving Firehose, it cannot be a real-time solution. It can only be near real-time. From an AWS perspective, something in the range of 200 milliseconds would be a real-time product, but something in the range of 60 seconds would be classified as near real-time, and you need to get a feel for the differences between those two and which products fit into which categories.
Now, Firehose can actually transform the data passing through it using Lambda. So, source records added to Firehose are sent to a Lambda function, and functions can be created from blueprints to perform common tasks. Transformed records are then sent back for delivery, but this can add to the latency of data flowing through the product. If you decide to do a transform, you can optionally store the unmodified data in a backup bucket that you define. Once the buffer or time buffer passes, data is passed into the final destinations, so transformed records can be sent into S3, Elasticsearch, Splunk, or HTTP endpoints. The only exception to this architecture for delivery is when you're using Redshift.
What happens with Redshift is that it uses an intermediate S3 bucket and then runs a Redshift copy to bring the data from S3 into the product. So, even though conceptually it's direct when used, you're actually copying data to an intermediate location, an S3 bucket, and then you're running the copy command to pull that data into Redshift, and that's handled all end-to-end by the Data Firehose product.
There are a few common situations where Firehose will be used. You might use it to provide persistence to data coming into a Kinesis stream, providing permanent storage of data that comes into a stream so it's not lost when it exits the rolling window that Kinesis Data Streams provide. Or, you might use it if you want to store data in a different format because Firehose can transform it using Lambda. Or you might want to deliver data that comes either directly into Firehose or via a data stream into one of the supported products. But just keep in mind, though, it is not real-time. I need to stress that for the exam—it's only near real-time.
You trade the fact that you don't have to build this yourself, so you don't need to put in the effort to build this solution, but what you lose is the real-time nature of Kinesis. If you need a solution that handles data in real-time, then you need to stick to Kinesis and use something like a Lambda function to handle what to do with that data, say, delivering it in real-time to Elasticsearch.
Now, with that being said, that is everything I wanted to cover in this lesson. For the exam, you just need a good architectural overview of how the Firehose product works and some of the scenarios in which it might be used. So, really try to focus on these core concepts. Exactly what Firehose does, try to commit to memory that it's only a near real-time product, and make sure that you remember all of the supported destinations.
Now, thanks for watching. Go ahead and complete this lesson, and then when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this video, I want to talk about another product within AWS, Kinesis Data Streams. So, let's just jump in and get started.
Kinesis is a service that a lot of people I talk to confuse with SQS. There are even exam questions which test your ability to select between the two. Now, this shouldn't be a difficult thing to do because they're actually very different products designed for different situations. Kinesis is a scalable streaming service. Now, what I mean by this is that it's designed to ingest data, lots of data from lots of devices or applications. Producers send data into a Kinesis stream. The stream is the basic entity of Kinesis, and it can scale from low levels of data throughput to near infinite amounts of data. Now, Kinesis is a public service and it's highly available in a region by design. You don't need to worry about replication or providing access from a network perspective like other application services within AWS. All of this is handled as a service.
Kinesis Streams provide a level of persistence. You have a default 24-hour rolling window, so when data is ingested by a Kinesis stream from a producer, it's accessible for 24 hours by default from that point. So, data which is 24 hours and one second old is discarded. Now, as a product, it includes storage for that amount of data, so however much you ingest within that 24-hour period, the storage is included. And this window can be increased up to a maximum of 365 days for additional costs. Kinesis supports lots of producers pushing data into a stream, but also multiple consumers reading data from that same stream. And consumers can access data from anywhere within the rolling window, for example, the default 24 hours. And each of these consumers might access this data at different levels of granularity, so maybe looking at data every second, or looking at data points once per minute or once per hour. This makes Kinesis great for things like analytics and dashboards.
Now, visually the product architecture looks like this. On the right, we have producers, and these might be things like EC2 instances, on-premises servers, mobile applications or devices, and even things like IoT sensors. On the left, we have consumers. Again, these could be on-premises servers running software to access Kinesis, EC2 instances, or even Lambda functions which can be configured to invoke when data is added to the stream. In the middle is the stream itself, and it's into this that producers send data, so the stream ingests this data. And it's from this that consumers read the data.
Now, the way that a Kinesis stream scales is by using a shard architecture. A stream starts off with one shard, and as additional scale is required, shards are added to the stream. Now, each shard provides its own capacity: one MB per second of ingestion capacity and two MB per second of consumption. The more shards a stream has, the more expensive it is and the more performance that it provides. Now, what also impacts the price is the data window. As I mentioned previously, by default, a stream provides a 24-hour window, and this can be increased up to 365 days for additional cost. And remember, the window is also persistent, so a 365-day window means 365 days' worth of data stored by Kinesis.
The way that the data is stored on a stream is via Kinesis data records, and these have a maximum size of one MB. Kinesis data records are stored across shards, meaning the performance scales in a linear way based on the number of shards. Kinesis also has a related product called the Kinesis Data Firehose, and there will be a separate video discussing this in more detail. This connects to a Kinesis stream and can move the data which arrives onto a stream en masse into another AWS service, an example being S3. So, if you have a fleet of sensors which stream data into Kinesis and that's used for real-time analysis, but if you also need to store this longer term, maybe to analyze it using a different AWS product such as EMR, which is a big data analytics tool, then you can put that data into S3 using Kinesis Firehose.
Now, I just want to spend a few moments more comparing SQS and Kinesis so that you understand the differences from a conceptual level. Now, one of the common areas of confusion is this difference between these two products. When should you pick SQS versus Kinesis? Well, if you're in the exam and you're reviewing one particular question, then you need to review it through this lens. Is the question about the ingestion of data, or is it about worker pools decoupling, or does it mention asynchronous communications? Well, if it's about the ingestion of data, it's going to be Kinesis. If it's about any of the others, then assume it's SQS first and only change your mind if you have strong reasons to do so.
SQS generally has one thing or one group of things sending messages to the queue. This might be something like a web tier inside an auto-scaling group. Generally, you won't have hundreds or thousands of sensors sending to an SQS queue. It's not designed for that type of workflow. Additionally, you'll generally only have one consumer or group of consumers reading from the queue, generally a worker tier. SQS queues are generally used for decoupling application components. They allow asynchronous communications where the sender and receiver don't need to be aware of each other and don't care about each other. SQS also doesn't really provide the concept of persistence. Messages on a queue are temporary. Once they're received and processed, the next step is deletion, at which point they're gone forever. There's no concept of a time window within SQS queues.
Now, contrast this to Kinesis. It's designed for huge scale ingestion of data. Lots of things sending data into a stream at potentially super high data rates. And it's designed for multiple consumers, each of which might be consuming data at different rates. Kinesis is designed for ingestion, analytics, monitoring, application clicks or mobile click streams. If you think for a minute about the two products, they aren't all that similar, either in function or in terms of the ideal architecture. Try and make sure that before you go into the exam, you really clearly see the distinction between these two products. There's always going to be one or two questions asking you about either of these products and generally one which asks you to pick between them for a given scenario.
Now, that's everything that I wanted to cover in this video at the high level about Kinesis Data Streams. Thanks for watching, go ahead and complete the video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to talk about dead letter queues, which is another piece of SQS functionality that you need to be aware of. So, let's just jump in and get started.
Dead letter queues are designed to help you handle reoccurring failures while processing messages which are within an SQS queue. So, let's say that you have a queue, and inside this queue is a single message, and let's say that this particular message is problematic. Something about it is causing errors while processing it. So, the first time that it's received, it's invisible for the duration of the visibility timeout. Then, once the visibility timeout expires, it appears again in the queue, assuming that it hasn't been successfully processed and then explicitly deleted. But imagine that this process happens again and again. The message is received, processing fails, and eventually, the message appears again after the visibility timeout. This process could continue forever, and it's this issue which dead letter queues aim to fix.
Every time the message is received, the receive count attribute is incremented: initially 1, then 2, then 3, then 4, then 5, and so on. What we can do is define a redrive policy. So, this defines the source queue, the dead letter queue to use, and the conditions where the message will be moved into this dead letter queue, and it defines a variable called max receive count. So, how this works is that when the receive count on a given message is more than the max receive count, and when the message isn't explicitly deleted, it's moved to the dead letter queue.
Setting up a dead letter queue gives you some really useful pieces of functionality. It allows you to configure an alarm for any messages which are delivered to a dead letter queue, so this could automatically notify you if you have any problematic messages. It's a separate area, which allows you to perform separate isolated diagnostics, so you can examine logs for a particular message to determine why it's repeatedly failed processing. You can analyze the contents of messages which are delivered to a dead letter queue to diagnose what's causing the issue, and it also allows you to test or apply separate processing which can be used for problematic messages.
Now, one really important thing to keep in mind when you're using dead letter queues in the real world is that all SQS queues have retention periods for messages. So, if a message ages past a certain point and hasn't been processed, then that message is dropped. Now, the way that this works is that when a message is added to a queue, it has an N queue timestamp, so the timestamp of the point that it was sent into the queue. Now, when you're moving a message from a normal queue to a dead letter queue, this N queue timestamp is not adjusted, so it remains the same. The timestamp is maintained, and it's the date and time when it was added to the original queue. So, you have to be really careful when a message is moved into a dead letter queue. If a message, for example, has been in a source queue for one day, and the retention period on a dead letter queue is two days, the message will only remain in the dead letter queue for one additional day because this original N queue timestamp is used rather than the date and time that the message was moved into the dead letter queue. So, generally, the retention period of dead letter queues should be longer than source queues, and this takes into account that the N queue timestamp is not updated when the message is moved between queues.
So, dead letter queues are a really useful architecture, which allows you to build additional rigor into any processes surrounding queues. It allows you to define this dead letter queue, which helps with diagnostics, and you can add additional processing features which allow problematic messages to be processed, and many other use cases. And finally, a single dead letter queue can be used for multiple source queues, so that's also something to keep in mind.
Now, that's everything I wanted to cover in this lesson. So, thanks for watching, go ahead and complete this video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover a feature of SQS called delay cues. And this is going to be a quick topic. It's just something that you'll need to understand for the exam and it might come in useful for the real world. So let's just jump in and get started.
Delay cues, at a high level, allow you to postpone the delivery of messages to consumers. Now, as a refresher, by now you'll understand the concept of a visibility timeout. The concept is simple enough, but we'll use this as an opportunity for a quick refresher. So we start with an SQS cue, and inside this cue, we send a single message, which is added to the cue using the send message operation. Once a message is in the cue, messages can be polled using receive message. And while the message is being processed, the visibility timeout takes effect. During this time, any further receive message calls will return no results. During this processing period, either the process will complete and the message will be explicitly deleted or not. If not, this suggests a failure in processing, and the message will reappear on the cue.
Now, the visibility timeout period is configurable. The default is 30 seconds, and the valid range is 0 seconds through to 12 hours. This value can be changed on a per cue or per message basis, in which case it's changed with the change message visibility operation. Now, the critical thing to understand about visibility timeout is that messages need to appear on the cue and be received before this visibility timeout occurs. So, this is used to allow automatic reprocessing. So, you receive messages from a cue and you begin processing. If that processing fails and the application doing it crashes, it might not be in a position where it can tell the cue that processing has failed. And so, visibility timeout means that after a certain configurable duration, that message will reappear in the cue and can be processed again. So, visibility timeout is generally used for error correction and automatic reprocessing.
Now, a delay cue is significantly different. With a delay cue, we configure a value called delay seconds on that cue. Now, this means that messages which are added to the cue will start off in an invisible state for that period of time. So, when messages are added, they're conceptually parked or invisible for that duration of time. They're not available on the cue. During this delay seconds period, any receive messages operation will return nothing. Once the period expires, the message will be visible on the cue. Now, the default is zero, and for a cue to be a delay cue, it needs to be set to a non-zero value, and the maximum is 15 minutes. You can also use message timers to configure this on a per message basis, and this has the same minimum of zero and maximum of 15 minutes. But it is important to know that you can't use this per message setting on FIFO cues. It's not supported.
Delay cues, in a way, are similar to visibility timeouts because both features make messages unavailable to consumers for a specific period of time. But the difference between the two is that for delay cues, a message is hidden automatically when it's first added to the cue. Using visibility timeouts, a message is initially visible, and it's only hidden after it's consumed from the cue and automatically reappears if that message isn't deleted. So, delay cues are generally used when you need to build in a delay in processing into your application. Maybe you need to perform a certain set of tasks before you begin processing a message, or maybe you want to add a certain amount of time between an action that a customer takes and for the processing of the message that represents that action. Visibility timeouts are used to support automatic reprocessing of problematic messages. So, it's important to understand that these two are completely different features.
Now, with that being said, that is everything I wanted to cover in this lesson. I'll make sure I include some links attached to this lesson which provide additional information, but this is what you'll need to understand for the exam and to get started in the real world. At this point, though, you can go ahead and complete the video, and when you're ready, I look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to quickly step through the differences between standard SQSQs and FIFO SQSQs. So let's quickly jump in and get started. To get started with understanding some of the architectural differences between standard and FIFO Qs, I want you to think about FIFO Qs as single-lane highways and then think about standard Qs as multi-lane highways. Imagine the messages as cars driving along these highways.
What this means is that the performance of a FIFO Q, so the number of cars per second in this analogy and the number of messages per second in reality, is limited by the width of the road. FIFO Qs can handle 300 messages per second without batching and 3000 width. Now this is actually 300 transactions per second to the SQS API when using FIFO mode. Each transaction is one message, but with batching, it means that each transaction can contain 10 messages. Now it's worth mentioning at this point that there is a high throughput mode for FIFO, but at the time of creating this lesson, it's only available in preview.
Standard Qs, so multi-lane highways in this analogy, don't suffer from any real performance issues and can scale to a near-infinite number of transactions per second. FIFO Qs, as the name suggests, guarantees order. They're first in, first out, so what you're trading is performance for this preserved order. They also guarantee exactly once processing, removing the chance of duplicate message delivery. Now another odd restriction is that FIFO Qs have to have a FIFO suffix in order to be a valid FIFO Q. Now remember that one because I've seen it come up in the exam many times before.
Now FIFO Qs are great for workflow-based order processing, command ordering, so if you've got a system administrator who's entering commands into a processing system and you need the order of those commands to be maintained, then FIFO Qs are ideal, as well as any sequential iterative price adjustment calculations for sales order workflows. Now standard Qs, so the multi-lane highways of Qs, they're faster. Conceptually, think of this as multiple messages being carried on the highway at the same time, so the multi-lane part of this analogy. But because of this, there are a few important trade-offs. First, there's no rigid preservation of message ordering, it's best efforts only. And second, what's guaranteed is only at least once message delivery, meaning in theory messages can be delivered more than once, so any applications that use standard Qs need to be able to accommodate the potential for multiple of the same messages to be delivered.
Now standard Qs are ideal for decoupling application components or for workable architectures or to batch together items for future processing, so all of these are ideal use cases for standard SQSQs. Now that's everything I wanted to cover, I just wanted to make sure that for the exam, you understand the difference in architecture between these two different Q types. Thanks for watching, go ahead and complete the lesson, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover the architecture of another really important product within AWS. It's something I've already mentioned in other lessons, and it's the Simple Queue Service, or SQS. So let's jump in and explore what the product provides and exactly how it works. Simple Queue Service, or SQS, provides managed message queues. And it's a public service, so it's accessible anywhere with access to the AWS public space endpoints, including private VPCs if they have connectivity to the services. It's fully managed, so it's delivered as a service. You create a queue, and the service delivers that queue as a service. Now, queues are highly available and highly performant by design, so you don't need to worry about replication and resiliency. It happens within a region by default.
Now, queues come in one of two types: standard queues and FIFO queues. FIFO queues guarantee an order, so if messages one, two, and three are added in that order to a FIFO queue, then when you receive messages, you'll also see them in order—so one, two, and then three. With a standard queue, this is best efforts, but there's always the possibility that messages could be received out of order. Now, FIFO queues do come with some other considerations, but more on that later. Now, the messages that are added to a queue can be up to 256 kilobytes in size. If you need to deal with any data that is larger, then you can store it on something like S3 and link to that object inside the message. Architecturally, though, ideally, you want to keep messages small because they're easier to process and manage at scale.
Now, the way that a queue works is that clients can send messages to that queue, and other clients can poll the queue. Polling is the process of checking for any messages on a queue, and when a client polls and receives messages, those messages aren't actually deleted from the queue. They're actually hidden for a period of time—the visibility timeout. The visibility timeout is the amount of time that a client can take to process a message in some way. So, if the client receives messages from the queue, and if it finishes processing whatever workload that that message represents, then it can explicitly delete that message from the queue, and that means that it's gone forever. But if a client doesn't explicitly delete that message, then after the visibility timeout, the message will reappear in the queue. Architecturally, this is a great way of ensuring fault tolerance because it means that if a client fails when it's processing a job, or maybe even fails completely, then the queue handles the default action to put the message back in the queue, which makes that message available for processing by a different client. So, the visibility timeout is really important, and it's something that features regularly on the exam. Just be aware that the visibility timeout is the amount of time that a message is hidden when it's received, and if it's not explicitly deleted, then it appears back in the queue to be processed again.
Now, SQS also has the concept of a dead letter queue, and this is a queue where problem messages can be moved to. For example, if a message is received five or more times and never successfully deleted, then one possible outcome of that can be to move the message to the dead letter queue. Dead letter queues allow you to do different sets of processing on messages that can be problematic. So, if messages are being added to the queue in a corrupt way, or if there's something specific about these messages that means different styles of processing are required, then you can have different workloads looking at the dead letter queue. Now, I've already talked about how queues can be used to decouple application components. One component adds things to the queue, another reads from the queue, and neither component needs to be aware of or worry about the other. But queues are also great for scaling. Auto-scaling groups can scale based on the length of the queue, and lambdas can be invoked when messages appear on a queue, and this allows you to build complex worker-pool-style architectures.
Now, this is a pretty common style of architecture that you might see, which involves a queue. So you might have two auto-scaling groups: one on the right is the web application pool, and the one on the left is a worker pool. So, a customer might upload a master video to the web application pool via a web app, and the master video is taken by this web application pool, and it's stored in a master video bucket, and a message is also added to an SQS queue. Now, the message itself has a link to the master video, so the S3 location that the master video is located at, and this avoids having to deal with unwieldy message sizes. At this point, that's all that the web pool needs to do, and that's where the responsibility ends for this particular part of the application. Now, the web pool is controlled by an auto-scaling group, and its scaling is based on the CPU load of the instances inside that auto-scaling group, meaning that it grows out as the load on the system increases. The scaling of the worker pool is based on the length of the SQS queue, so the number of messages in the queue. As the number of messages on the queue increases, the auto-scaling group scales out based on this number of messages. So, it adds additional EC2 instances to cope with the additional processing. So, instances inside this auto-scaling group, they all pool the queue and receive messages. These messages are linked to the master video, which is stored in the master bucket, which they also retrieve. Now, they perform some processing on that video, in this example generating different sizes of videos, and they store them in a different bucket, and then the original message that was on the queue is deleted.
Now, if the processing fails, or even if an instance fails, then it will be reprovisioned automatically by the auto-scaling group, and the message that it was working on will automatically reappear on the queue after the visibility timeout has expired. As the queue empties, the number of worker instances scales back in, all the way to zero if no processing workloads exist. So, the auto-scaling group that's running the worker pool is constantly looking for the length of the queue. When messages appear on the queue, the auto-scaling group for the worker pool scales out, adds additional instances, those instances poll the queue, retrieve the messages, download the master video from the master bucket, perform the transcode operations, store that in the transcode bucket, delete the message from the queue, and the size of the worker pool auto-scaling group will scale back in as that workload decreases. Now, this is an example of a worker-pool elastic architecture that's using an SQS queue. At this point, the responsibilities of the worker pool have finished. It doesn't have any visibility of or care about the health of the web pool. It purely responds to messages that appear inside the SQS queue. So, the effect of the SQS queue is to decouple these different components of this application and allow each of them to scale independently. Once the worker pool has finished its processing, then the web pool can retrieve the videos of different sizes from the transcode bucket and then present these to the user of our application.
Now, this video processing architecture is one that's generally used to illustrate exactly how queues function. So, a multi-part application where one part produces a workload and the other part scales automatically to perform some processing of that workload. It's actually a simplified version of the architecture that would generally be used in a production implementation of this. For workloads like this, where one job is logged and multiple different outputs are needed, generally we would use a more complicated version, which looks something like this. It has a similar architecture, but it uses SNS and SQS fan-out. And the way that that works is, once the master video is uploaded from our application user and placed into the master video bucket, a message is sent, but instead of the message going directly onto an SQS queue, the message is added onto an SNS topic. Now, this SNS topic has a number of subscribers. For each different video size required, there is one independent SQS queue configured as a subscriber to that topic. So, in this example, one for 480p, one for 720p, and one for 1080p. So, each size has its own queue and its own auto-scaling group, which scales based on the length of that individual queue. And this means that if the different workload types need different sizes or capabilities of instances, then they can independently scale.
S3 buckets are capable of generating an event when an object is uploaded to that bucket, but it can only generate one event. So, in order to take that one event and create multiple different events that can be used independently, you'll use this fan-out design. So, you'll take one single SNS topic with multiple subscribers, generally multiple SQS queues, and then that message will be added into each of those queues, allowing for multiple jobs to be started per object upload. Now, I want you to really remember this one for the exam. You can't see me right now, but I'm winking as much as I can. Really, really remember this one for the exam, this fan-out architecture, because it will come in handy for the exam, I promise you. But at this point, let's move on to the last few points that I want to cover before we finish this theory lesson.
I mentioned at the start of the lesson that there are two types of queues: standard and FIFO. It's important that you understand the differences, benefits, and limitations of both of these. So, think of standard queues like a multi-lane highway, and think of FIFO queues like a single-lane road with no opportunity to overtake. Standard queues guarantee at least once delivery, and they make no guarantees on the order of that delivery. FIFO queues both guarantee the order and guarantee exactly once delivery, and that's a critical difference. With standard queues, you could get the same message delivered twice on two different polls, and the order can be different. FIFO queues guarantee exactly once delivery, and also they guarantee to maintain the order of messages in the same order as they were added, so first in, first out.
Now, because FIFO queues are single-lane roads, their performance is limited: 3,000 messages per second with batching, and 300 per second without. So, FIFO queues don't offer exceptional levels of scaling because standard queues are more like multi-lane highways, and they can scale to a near-infinite level, because you can just continue adding additional lanes to that multi-lane highway. So, standard queues scale in a much more linear and fluid way. With SQS, you'll build on requests, and a request is not the same as a message. A request is a single request that you make to SQS. So, one single request can receive between one and ten messages, or zero, and anywhere up to 64 kilobytes of data in total. So, SQS is actually less efficient and less cost-effective the more frequently that you make requests, because you'll build based on requests, and requests can actually return zero messages. The more frequently that you poll an SQS queue, the less cost-effective the service is.
Now, why this matters is there are actually two ways to poll an SQS queue. You have short polling and long polling. Short polling uses one request, and it can receive zero or more messages. But if the queue has zero messages on that queue, then it still consumes a request, and it immediately returns zero messages. This means that if you only use short polling, keeping a queue close to zero length would require an almost constant stream of short polls. Each of these consuming a request, and each of these being a billable item based on the product. Now, long polling, on the other hand, is where you can specify wait time seconds, and this can be up to 20 seconds. If messages are available on the queue, when you lodge the request, then they will be received. Otherwise, it will wait for messages to arrive. Up to ten messages and 64 kilobytes will be counted as a single request, but it will wait for up to 20 seconds until messages do arrive on the queue. Long polling is how you should poll SQS because it uses fewer requests. It will sit waiting for messages to arrive on the queue if none currently exist.
One final point, because messages can live in an SQS queue for some time, anywhere up to 14 days, the product supports encryption at rest using KMS. So, this is server-side encryption. It's encryption of the data as it's stored persistently on disk. Now, data by default is encrypted in transit between SQS and any clients, but you need to understand the difference between encryption at rest and encryption in transit. They're not the same thing. Now, access to a queue is based on identity policies, or you can also use a queue policy. So, identity policies or queue policies can be used to control access to a queue from the same account, but queue policies only can allow access from external accounts. And a queue policy is just a resource policy, just like the ones that you've used earlier in the course on S3 buckets or SNS topics.
Now that's all of the theory that I wanted to cover about SQS queues. Thanks for watching. Go ahead and complete this video, and then once you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to go into a little bit more depth about API Gateway. Now we've got a lot to cover in a single lesson so let's jump in and get started. API Gateway is a service which lets you create and manage APIs. Now an API is an application programming interface, it's a way that applications communicate with each other. So for example if you run the Netflix application on your TV then it's using an API to communicate with the Netflix back-end services.
API Gateway acts as an endpoint or an entry point for applications looking to talk to your services, and architecturally it sits between applications which utilize APIs and the integrations which are the back-end services which provide the functionality of that API. Now API Gateway is highly available and scalable so you don't have to worry about either, it's delivered as a managed service. It handles authorizations so you can define who can access your APIs using the API Gateway, it can be configured to handle throttling so how often individuals can use APIs, it can perform caching to reduce the amount that your back-end services are called as part of the usage of your API, it supports cores so you can control security of cross-domain calls within browsers and it supports transformations and all of this within the API Gateway product.
It also supports the open API spec which makes it easy to create definition files for APIs so APIs can be imported into API Gateway and it also supports direct integration with AWS services. So for things like writing into DynamoDB, starting a step function or anything through to sending messages to SNS topics you might not even need any back-end compute so it's capable of directly integrating with a range of AWS services.
Now API Gateway is a public service and so it can act as the front-end for services running within AWS or on-premises and it can also be an effective migration product to provide a consistent front-end while the backing services are being moved from on-premises into AWS or even re-architected moving from monolithic compute services such as virtual servers through to serverless architectures using Lambda. Now lastly it can provide APIs that use HTTP, REST or even web socket-based APIs.
Now visually this is how the high-level architecture of API Gateway looks. We have API Gateway in the middle here and this is acting as the endpoint for the consumers of our API and this could be mobile applications or the APIs or even web applications loaded from static hosting within an S3 bucket. In any case these all connect to the API running on the API Gateway using the endpoint DNS name.
Now it's actually the API Gateway's job to act as an intermediary between clients and what are called integrations and these are the back-end services which provide the functionality to API Gateway. API Gateway is capable of connecting to HTTP endpoints running in AWS or on-premises, it can use Lambda for compute and this is something that's typically used within serverless architectures and as I mentioned previously it can even directly integrate with some AWS services such as DynamoDB, SNS and step functions.
Now there are three phases in most API Gateway interactions: the request phase which is where the client makes a request to the API Gateway and then this is moved through API Gateway to the service provided by the integrations, and then finally the response phase where the response is provided back to the client. The request phase at a high level does three things: it authorizes, validates and transforms incoming requests from the client into a form that the integration can handle and then the response takes the output from the integration, it transforms it, prepares it and then returns it through to the client.
API Gateway also integrates with CloudWatch to store logging and metric-based data for request and response side operations and it also provides a cache which improves performance for clients and also reduces the number of requests made to the back-end integrations. So that's the high-level architecture and through the remainder of this lesson I want to touch on a number of the pieces of functionality in a little bit more detail and we'll start with authentication.
API Gateway supports a range of authentication methods. Now you can allow APIs to be complete open access so no authentication is required but there are different types of authentication which are supported by the product and let's use the example of the Categorum application which is now serverless. API Gateway can use Cognito user pools for authentication, this is one of the supported methods. If this method is used then the client authenticates with Cognito and receives a Cognito token in return assuming a successful authentication, it passes that token in with the request to API Gateway and because of the tight integration which API Gateway has with Cognito it can natively validate the token.
So that's Cognito but API Gateway can also be extended to use Lambda-based authorization which used to be called custom authorization. With this flow we assume that the client has some form of bearer token something which asserts an identification and it passes this into API Gateway with the request. Now at this point API Gateway not knowing how to natively validate this authentication or authorization it calls a Lambda authorizer and it's the job of this function to validate the request. So it either does some custom compute maybe checking a local user store or it calls an ID provider an external provider of identification to check the ID. If this all comes back okay and the Lambda function is happy it returns to API Gateway an IAM policy and a principal identifier, API Gateway then evaluates the policy and it either sends the request on to a Lambda function so invoking the Lambda function or it returns a 403 access denied error if the access is denied.
Now IAM can also be used to authenticate and authorize with API Gateway by passing credentials in the headers but this level of detail is beyond what's required for the exam. I just think it's useful to give you the architecture visually so you can picture how all the components fit together.
At this point let's move on and talk about endpoint types. With API Gateway it's possible to configure a number of different endpoint types for your APIs. First we've got edge optimized and with edge optimized endpoint types any incoming requests are routed to the nearest cloud front pop or point of presence. We've also got regional endpoints and these are used when you have clients in the same region so this doesn't deploy out using the cloud front network instead you get a regional endpoint which clients can connect into so this is relatively low overhead it doesn't use the cloud front network and this is generally suitable when you have users or other services which consume your APIs in the same AWS region. Lastly we have private endpoint types and these are endpoints which are only accessible within a VPC via an interface endpoint so this is how you can deploy completely private APIs if you use the private endpoint type.
The next concept I want to talk about are API Gateway stages. When you deploy an API configuration in API Gateway you do so to a stage for example you might have the prod and dev stage for the Categorum application. Most things within API Gateway are defined based on a stage so in this case you could have the production application connecting to the prod stage and developers testing new additions via the dev stage. Each of these stages has its own unique endpoint URL as well as its own settings. Each of these stages can be deployed onto individually so you might have version one of the API configuration deployed into production and this uses version one of a lambda function as a backing integration and then we might have version two which is currently under development deployed into the dev stage and this also could use a separate backing lambda function containing the new code.
Now you can roll back deployments on a stage so they can be used for some pretty effective isolation and testing but what you can also do with API Gateway stages is to enable canary deployments on stages. What this means is that when enabled any new deployments which you make to that stage are actually deployed on a sub part of that stage the canary part of that stage and not the stage itself. So traffic distribution can be altered between the base stage and the canary based on a user configurable value and eventually the canary can be promoted to be the base stage and the process repeated.
In this example it means that version two of the API configuration can be tested by the development team and then canary can be enabled on production, version two can be deployed onto production and this will be deployed into the canary because canary is enabled on the production stage. We can adjust the distribution of traffic between the main production stage and its canary until we can completely happy and then we can promote the canary to be the full base stage and this process of development production cycles can then continue. If you're not happy with how a canary is performing, if it's got bugs or if it's negative in terms of performance then you can always remove it and return back to the base stage.
Now at this point I have to apologize I hate getting you to remember facts and figures but for the exam I genuinely think these facts and figures might help so do your best to note them down and remember them even if you only do it at a high level even if you only get the basics I think it will help you answer certain exam questions quicker and with less thought.
So to start with error codes generated by API gateway are generally in one of two categories: first we have 400 series error codes and these are client errors this suggests that something is wrong on the client side so something wrong either on the client or in terms of how it's making a request through to API gateway maybe permissions are wrong maybe headers are malformed anything that's on the client side, then we have 500 series errors and these are server errors so this indicates that there's a valid request but there's a back-end issue.
Now inside both of these categories there are a number of important requests that you need to remember the error code number four and I want to step through these on this part of the lesson. So 400 – 400 this is one that's really hard to diagnose because it can actually have many different root causes but if you do see a 400 error then you should at least be aware that it's a generic client side error, we've got 403 and this suggests an access denied error so either that the authorizer has executed and then indicates to API gateway that the request should be denied or the request has been filtered by something like the web application firewall.
Next we've got a 429 error code and this is an indication that throttling is occurring, I mentioned earlier that API gateway can be configured to throttle requests so if you're getting a 429 error it means that you've exceeded a configured throttling amount so 429 associate that with throttling. Now if you get a 502 error this is a bad gateway exception and this indicates that a bad output has been returned by whatever is providing the backing services so if you've got a lambda function servicing request your API then a 502 error suggests that that lambda is returning something that's invalid. A 503 error indicates service unavailable so this could indicate that the backing endpoint is offline or you're having some form of major service issues so 503 is definitely one to remember I have seen that come up in the exam. 504 indicates an integration failure now there is a limit of 29 seconds for any requests to API gateway so even though lambda has a timeout of 15 minutes if lambda is providing backing compute for an API gateway API then if that request takes longer than 29 seconds then this can generate a 504 error so you need to make sure that any lambda functions that are backing your APIs are capable of responding within that 29 second limit otherwise you might get 504 errors.
And I've included a link that's attached to this lesson which details all of the error codes as well as a little bit more detail if you do want to use it for extra reading. Now one final thing before we finish up with this in-depth lesson for API gateway and that's to talk about caching. Now you should be familiar with the general concept of caching at this point in the course as it relates to API gateway. We start in the middle with an API gateway stage and this is important because caching is configured per stage, this matters both for the exam and if you're developing this infrastructure for production situations.
Now what happens without a cache is that any users at the application make requests to the API gateway stage and there are some back-end integrations which service those requests. Without a cache those services would be used on each and every request. With caching though you define a cache on that stage, it can be anywhere from 500 mb to 237 gb in size, it caches things by default for 300 seconds and this can be configured from zero meaning disabled through to a maximum of 3600 seconds, and a point that you should know for the exam is that this cache can be encrypted.
Now using a cache means that calls will only be made to the back end when there's a cache miss and this means reduced load, reduced cost and improved performance because of the lower latency that caching provides. Okay so that's everything I wanted to cover in this in-depth lesson on API gateway, this is definitely a service where you need to be aware of much more in the developer and operation streams of AWS certifications. At this point though that is everything that I'm going to be talking about so go ahead complete this lesson and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I'm going to be covering AWS Step Functions. To understand why Step Functions exist, we need to look at some of the problems with Lambda that it addresses. Step Functions address some of the limitations of Lambda, or not so much limitations, but design decisions that have been made with the Lambda product. No product is perfect, and it's important to understand the product limitations or the design decisions which have been implemented as a product has been created.
Now you know by now that Lambda is a function-as-a-service product, and the best practice is to create functions which are small, focused, and do one thing very well. What you should never be doing with Lambda is trying to put a full application inside a Lambda function, A because it's bad practice, and B because there's an execution duration limit of 15 minutes. A Lambda function cannot run past this 15-minute limit for its execution duration. Now you can, in theory, chain Lambda functions together so one Lambda function reaches its end and it directly invokes another, and by doing this, in theory, you can get another 15 minutes, but this gets messy at scale.
What you're doing is building a chain of functions in an attempt to create a long-running flow, and this isn't what Lambda's designed for. It's made worse due to the fact that Lambda runtime environments are stateless; each environment is isolated, cleaned each time, and any data needs to be transferred between the environments if you want to maintain any form of state, which is why you can't hold a state through different Lambda functions or different Lambda function invocations.
Imagine an example where you might have an order processing system — you can upload a picture of your pet, maybe a cat or a dog or a lizard, and have it printed on different types of material, maybe glass, metal or high-quality paper. This process can take more than 15 minutes, and it will involve lots of decision points, potentially manual human intervention. There's a state — the order, the process — it's all data that needs to persist, and doing it by chaining together lots of Lambda functions is really, really messy.
Step Functions as a service let you create what are known as state machines. Think of a state machine as a workflow — it has a start point and it has an end point, and in between there are states. States you can think of as things which occur inside the state machine — states can do things, they can decide things, and they all take in data, modify data, and output data. So states are the things inside these workflows. Conceptually, the state machine is designed to perform an activity or perform a flow which consists of lots of individual components and maintain the idea of data between those states.
Imagine that you're ordering something from an online retailer such as Amazon.com — so you complete the purchase, and behind the scenes, between you completing the purchase and you receiving your goods, lots of things happen behind the scenes. Your stock is located, it's physically picked, it's packed and verified, postage is booked, and when it's dispatched your order is flagged as being dispatched, and that's an example of a long-running order flow. With Amazon it might only take a few hours to move through this flow from beginning to end, but with something more bespoke it could take longer, and that's why the maximum duration for state machine executions within Step Functions is one year.
Now there are actually two types of workflows available within Step Functions — we've got Standard and Express. When you create a state machine, you need to choose between the two, and it influences some of the features — so the speed and the maximum duration. For the exam, you only need to remember that at a high level, Standard is the default and it has a one-year execution limit. Express — that's designed for high-volume event processing workloads such as IoT, streaming data processing and transformation, mobile application backends, or any of those types of workloads — and these can run for up to five minutes, so you would use Standard for anything that's long-running and Express for things that are highly transactional and need much more in terms of processing guarantees.
Now, state machines can be started in lots of different ways — a few examples are using the API Gateway, IoT rules, you might use EventBridge if you're wanting to use event-driven architectures, Lambda can initiate state machines, and you can even do it manually. Generally, state machines are used for backend processing, so something in your application will initiate a state machine execution. With state machines, you can use a template to create and export state machines once they're configured to your liking — it's called Amazon States Language or ASL, and it's based on JSON, and you'll use this yourself during the demo lesson which is coming up later in this section.
Now, state machines, like any other AWS services, are provided with permissions to interact with other AWS services by using IAM roles. The state machine assumes the role while running, and it gets credentials to interact with any AWS services that it needs to. Before we look at the state machine architecture visually, I want to focus on states themselves — I want you to understand the type of states that exist, so let's look at that next.
As a reminder, states are the things inside a workflow — the things which occur — so let's step through what states we have available. First we've got the Succeed and Fail states, and basically if the process through a state machine ever reaches one of these states, then it succeeds or it fails depending on which of these states it arrives at — that's nice and easy.
Next we've got the Wait state, and the Wait state will wait for a certain period of time or it will wait until a specific date and time. It's provided with this information as an input, and it holds or pauses the processing of the state machine workflow until the duration is passed or until that specific point in time.
Next we've got Choice — and Choice is a state which allows the state machine to take a different path depending on an input, and it's useful if you want a different set of behavior based on that input. For example, you might want a state machine to react differently depending on the stock levels of an item in an order, so the Choice state allows you to have a choice inside a state machine, and you'll be using the Choice state as part of the demo later in this section.
Next we've got the Parallel state, and the Parallel state allows you to create parallel branches within a state machine — so you might want to take a certain set of actions depending on an input, and that might use the Choice state, but one of those choices might be to perform multiple sets of things at the same time. So you might have one of the choices of a Choice state leading to the Parallel state, and that's exactly what you're going to implement in the demo lesson at the end of this section.
Next we've got the Map state, and a Map state accepts a list of things — an example might be a list of orders — and for each item in that list, the Map state performs an action or a set of actions based on that particular item. So if you have 10 items being ordered inside an order, you might have a Map state that performs a certain set of things 10 times — one for each of those items on that order.
Now these are all examples of states, but they are states which control the flow of things through a state machine. The last type of state that I want to talk about is a Task state, and a Task state represents a single unit of work performed by a state machine. So it's the Task states themselves that allow you to perform actions — it allows the state machine to actually do things.
So a Task state can be integrated with lots of different services — so things like Lambda, AWS Batch, DynamoDB, the Elastic Container Service, SNS, SQS, Glue, SageMaker, EMR, and lots of different AWS services — and when you configure this integration, that's how a state machine can actually perform work. So what it does to do the work itself — the architecture of a state machine is that it coordinates the work occurring, so a state machine has different states that control flow through that state machine, and then it has Task states which coordinate with other external services to perform that actual work.
Now let's look at how all of this fits together visually because it will make a lot more sense. For this example, we're going to use the scenario that we're going to look at in the demo lesson at the end of this section. The scenario that we have is a serious one — Bob has a cat called Whiskers who can never get enough cuddles. It's become so bad that poor Whiskers has had to design a Step Functions-powered serverless application to remind his human minion Bob every time a cuddle is required.
Whiskers wants to be in full control of the frequency of the cuddles, and there are times when Whiskers might need a cuddle within a few minutes, but sometimes it could be more than 15 minutes or even hours away. He wants to be able to notify his human minion when the next cuddle is needed however far away he is, and so there needs to be multiple ways of reminding Bob. Bob isn't always around, and so the reminder method needs to be flexible.
We need email reminders so that if Bob is at a computer he can receive the reminder, and we also need an SMS reminder so if Bob isn't at home he can immediately rush home to cuddle Whiskers. Now because Whiskers is a cat, and because he's fussy, the time between cuddles could be longer than 15 minutes, so we can't use Lambda — so we're going to use Step Functions. Step Functions work with a base entity called a state machine, and the Pet Cuddle-A-Tron will use one state machine.
Inside the state machine are a number of states — first we've got a Wait state called Timer, and Timer waits for a predefined amount of time, the time period that you set until the next cuddle is required. Then we have a Choice state, and the state machine is pretty flexible — it allows you to decide on three methods of notification: email-only notification, SMS-only notification, or both. The Choice state has three paths that it can direct progress down depending on which option is chosen.
The choices are three Task states — we've got Email Only, we have SMS Only, and we've got Email and SMS, and there are two Lambda functions — Email Reminder and SMS Reminder. Depending on the choice taken, one or both of these Lambda functions are invoked as part of the state machine execution. If the Email Only choice is taken, then logically this only invokes the Email Reminder Lambda, and this uses the Simple Email Service to send an email to Bob demanding a cuddle.
If the SMS Only choice is taken, then this performs the same action but for SMS only — so Bob will receive a text message with Whiskers' cuddle-based demands. If the Email and SMS choice is taken, then this performs both actions — it invokes both Lambda functions so Bob receives both an email and an SMS reminder.
Now the back end of this application is provided by the Step Function service in the form of a state machine, but the whole application end to end is actually implemented as a serverless application. Bob has a laptop and it downloads the client-side web application from an S3 bucket — so inside this S3 bucket we have HTML and JavaScript, and the JavaScript lets Bob's browser connect to a managed API hosted by the API Gateway.
The API Gateway is what Bob's browser communicates with, and this is backed up by a Lambda function, and the Lambda function is the thing that behind the scenes provides the compute service necessary to interact with the JavaScript running on Bob's laptop. The combination of both of these allows Bob's laptop to initiate the execution of the state machine every time he sets a cuddle reminder. So the state machine is actually invoked by Bob clicking on a button on a web page that's provided by this serverless application.
Now you'll see this when you open the Pet Cuddle-A-Tron application — it will just be a HTML page that's loaded from an S3 bucket, but it will ask you for a number of pieces of input. You'll get asked for the number of seconds until the next cuddle, as well as a custom message, and depending on the notification method that you pick, you'll need to enter either an email address or a phone number or both.
When you've entered all of the required information based on which method of notification you'll select, you'll click on one of three buttons — one for Email Only, one for SMS Only, and one for Both — and clicking on that button generates an event. This communicates with the API Gateway, it causes an invocation of the API Lambda function, and the API Lambda function passes all of the information entered on this serverless web app all the way through to the state machine.
The state machine begins its execution based on the options that you've selected — it waits for a certain period of time, and then it makes a choice based on your selected notification method, and then it invokes one or both Lambda functions that will either send you an email and SMS or both.
And this is the application that you're going to implement in the Pet Cuddle-A-Tron demo lesson in this section of the course. Now if this looks complicated, don't worry, because we'll be implementing this piece by piece, bit by bit, together. I'll be around every step of the way to guide you on exactly how to implement this fairly complex architecture inside AWS.
I promise you by the end of the demo lesson it will make complete sense. In summary, Step Functions let you create state machines, and state machines are long-running serverless workflows — they have a start and an end, and in between they have states, and states can be directional decision points, or they can be tasks which actually perform things on behalf of the state machine. And by using them, you can build complex workflows which integrate with lots of different AWS services. But at this point, that's it for the theory — so go ahead and complete this lesson, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover the theory and architecture for the simple notification service or SNS. SNS is a key component of many architectures within AWS so it's one which you need to fully understand. So let's jump in and get started because I really want to make sure that you understand SNS end to end.
Simple notification service or SNS is a highly available, durable, secure, pub/sub messaging service. It's a public AWS service meaning to access it you need network connectivity with the public AWS endpoints, but the benefit of this is that it's accessible from anywhere that has that network connectivity. What it does at a high level is coordinate the sending and delivery of messages, and messages are payloads which are up to 256 kilobytes in size. Now I'm not mentioning the size because you need to know it exactly for the exam but more so so that you understand that you can't send an entire cat movie using the service. Architecturally, messages are not designed for large binary files.
Now the base entity of SNS is the SNS topic, and it's on these topics where permissions are controlled as well as where most of the configuration for SNS is defined. SNS has the concept of a publisher, and a publisher is the architectural name for something which sends messages to a topic. With a pub/sub architecture, publishers send things into a topic. Now as well as publishers, each topic can have subscribers and these by default receive all of the messages which are sent to the topic.
Now subscribers can come in many different forms. We've got things like HTTP and HTTPS endpoints, email addresses which can receive the message, SQS queues where each message is added to the queue as it's sent to the topic. Topics can also be configured with mobile push notification systems as subscribers so that messages sent to a topic are delivered to mobile phones as push notifications or as SMS messages. Even Lambda functions can be subscribed to a topic so that that Lambda function is invoked as messages are sent into the topic.
SNS is used across AWS products and services. CloudWatch uses it when alarms change state, CloudFormation uses it when stacks change state, and Auto Scaling groups can even be configured to send notifications to a topic when a scaling event occurs. SNS is one of those subjects which is a lot easier to understand if you look at it visually, so let's move on to an architecture diagram.
Now the SNS service as I just mentioned is a public space AWS service. It operates from the AWS public zone, and because of that the service can be accessed from the public internet assuming the entity trying to access it has the relevant AWS permissions. And assuming that a VPC is configured to be able to access public AWS endpoints, then SNS can be accessed from a VPC as well. SNS as a service runs from this public zone and you can create topics inside of SNS. For each topic, a wide variety of producers — so external APIs running on the public internet or CloudWatch or EC2 or Auto Scaling groups or CloudFormation stacks and many other AWS services — can publish messages into a topic.
The topic has subscribers, and things can be subscribers and producers at the same time, for example APIs. Any subscribers by default will receive all of the messages sent to the topic by publishers, but it's possible to apply a filter onto a subscriber which means that subscriber will only receive messages which are relevant to its particular functionality.
Another interesting architecture that I want to comment on just briefly is the fan out architecture, and this is when you have a single SNS topic with multiple SQSQs as subscribers — and we'll be covering SQSQs later in this section — but this is a way that you can create multiple related workloads. So if for example a message is sent to an SNS topic when a processing job arrives, that message can be added to multiple SQSQs which are configured as subscribers for that SNS topic. Now each of these Qs might perform the same related processing but using the pet tube as an example might work on a different variant, so there might be processing a different bitrate or a different video size. So using fan out is a great way of sending a single message to an SNS topic representing a single processing workload and then fan that out to multiple SQSQs to process that workload in slightly different and isolated ways.
Now the functionality that's offered by SNS is pretty important. It really is a foundational service for developing application architectures within the AWS platform. So SNS offers delivery status, so with a number of different types of subscribers you can confirm the status of delivery of messages to those subscribers. An example of some subscriber types that do support delivery status is HTTP or HTTPS endpoints, Lambda and SQS. As well as delivery status, SNS also supports delivery retries, so you've got the concept of reliable delivery within SNS.
SNS is also a highly available and scalable service within a region. So SNS is a regionally resilient service, so all the data that's sent to SNS is replicated inside a region. It's scalable inside that region so it can cope with a range of workloads from nothing all the way up to highly transactional workloads, and it's also highly available. So if particular availability zones fail, then an SNS topic will continue to function.
SNS is also capable of server-side encryption or SSE, which means that any data that needs to be stored persistently on disk can be done so in an encrypted form, so this is important if you do have any requirements which mandate the use of on-disk encryption. Now SNS topics are also capable of being used cross-account just like S3 buckets. You can apply a resource policy — in the case of an SNS topic this is a topic policy — it's exactly the same. It's a resource policy that you apply to the resource, the SNS topic, and you can configure from a resource perspective exactly what identities have access to that topic.
Now you'll need to know all of this architecture relating to SNS for the exam. And SNS really is one of those topics that you'll be using extensively as you deploy projects inside AWS, but for now that's everything that you need to know about SNS. You will, as we go through the course, get more and more experience with the products, but I wanted to illustrate all of the important pieces of architecture at this point. So go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover the serverless architecture. Serverless is a type of architecture which is relatively commonplace within AWS, mainly because AWS includes many products and services which support its use. The key thing to understand about the serverless architecture, aside from the fact that there are really servers running behind the scenes, is that it's not one single thing, and while serverless is an architecture, it's more a software architecture than a hardware architecture.
The aim with the serverless architecture—and where its name comes from—is that as a developer, architect, or administrator, you're aiming to manage few, if any, servers, because servers are things which carry overhead, such as cost, administration, and risk, and the serverless architecture aims to remove as much of that as possible. In many ways, serverless takes the best bits from a few different architectures, mostly microservices and event-driven architectures, and within serverless you break an application down into as many tiny pieces as possible, even beyond microservices, into collections of small and specialized functions.
These functions start up, do one thing really, really well, and then they stop. In AWS, logically, because of this, Lambda is used, but there are other platforms such as Microsoft Azure which has their own equivalent, namely Azure Functions, and from an architecture perspective, the actual technology which is used is less relevant. These functions which make up your application run in stateless and ephemeral environments, and why this matters is because if the application is architected to assume a clean and empty environment, then these functions can run anywhere.
Every time they run, they obtain the data that they need, they do something, and then optionally, they store the result persistently somehow or deliver that output to something else. The reason why Lambda is cheap is because it's scalable, each environment is easy to provision, and each environment is the same, so the serverless architecture uses this to its advantage, where each function that runs does so in an ephemeral and stateless environment.
Another key concept within serverless is that generally everything is event-driven, which means that nothing is running until it's required, and any function code that your application uses is only running on hardware when it's processing a system or customer interaction—an event. Serverless environments should use fast products such as Lambda for any general processing needs, since Lambda as a service is built based on execution duration, and functions only run when some form of execution is happening.
Because serverless is event-driven, it means that while not being used, a serverless architecture should be very close to zero cost until something in that environment generates an event, so serverless environments generally have no persistent usage of compute within that system. Now, where you need other systems beyond normal compute, a serverless environment should use, where possible, managed services—it shouldn't reinvent the wheel.
Examples are using S3 for any persistent object storage, or DynamoDB (which we haven't covered yet) for any persistent data storage, and third-party identity providers such as Google, Twitter, Facebook, or even corporate identities such as Active Directory, instead of building your own. Other services that AWS provides, such as Elastic Transcode, can be used to convert media files or manipulate these files in other ways.
With the serverless architecture, your aim should be to consume as a service whatever you can, code as little as possible, use function as a service for any general-purpose compute needs, and then use all of those building blocks together to create your application. Now, let's look at this visually, because I think an architecture diagram might make it easier to understand exactly what a serverless architecture looks like, so let's step through a simple serverless architecture, and we're going to do so visually.
I want your default position to be that unless we state otherwise, you're not using any self-managed compute—so no servers and no EC2 instances—unless we discuss otherwise, and that should be your starting position. At each step throughout this architecture, I'll highlight exactly why the parts are serverless and why it matters.
Now, we're going to use a slightly more inclusive example—this time, we're going to use PetTube, which was rebranded to be a little more inclusive after an uproar about it only being for cats. To start with, we've got Julie using her laptop, and she wants to upload some woofy holiday videos, so she browses to an S3 bucket that's running as a static website for the PetTube application, downloads some HTML, and that HTML has some JavaScript included within it.
One crucial part of the serverless architecture is that modern web browsers are capable of running client-side JavaScript inside the browser, and this is what actually provides the front end for the PetTube application—JavaScript that's running in the browser of the user that's downloaded from a static website S3 bucket. So at this point, the application has no self-managed compute that's being used—we've simply downloaded HTML from an S3 bucket with some included JavaScript that's now running in Julie's web browser.
Now, PetTube uses third-party identity providers for its authentication; like all good serverless applications, it doesn't use its own store of identity or its own store of users, which results in lower admin overhead, and also avoids the limit on the number of IAM users that can exist inside one AWS account, which is 5,000 IAM users per account. So if we used IAM users for authentication, then PetTube would be limited to 5,000 users, and each user of the application would need one additional account—one additional username and one additional password.
Instead of doing that, we use a third-party identity provider, and one that our users are already likely to have an account inside, so that reduces the number of accounts that our users are required to maintain. The JavaScript that's running in Julie's browser communicates with the third-party identity provider, and we're going to assume that we're using Google, and you'll have seen the screen that's generated if you've ever logged into Gmail or anything that uses Gmail logins—but this could just as easily be Twitter, Facebook, or any other third-party identity provider.
The key thing to understand is that Julie logs into this identity provider, and it's this identity provider that validates that the user claiming to be Julie is in fact Julie, so it checks her username and password, and if it's happy with the process—or if it's happy with the username and password combination that Julie's provided—then it returns to Julie an identity token, which proves that she's authenticated with the Google identity provider.
Now, AWS can't directly use third-party identities, and so the JavaScript that's running in Julie's browser communicates with an AWS service called Cognito, and Cognito swaps this Google identity token for temporary AWS credentials, which can be used to access AWS resources. So the JavaScript in Julie's browser now has available some temporary AWS credentials that it can use to interact with AWS, and so it uses these temporary credentials to upload a video of Woofy to an S3 bucket—this is the original bucket of our application, the bucket where the master videos go that our customers upload.
Notice that so far in this process, no self-managed compute or servers have been used to provision this service—we've performed all of these activities without using any compute servers or compute instances that we need to manage or design as solutions architects, since it's all delivered by using managed services such as S3, Cognito, and the Google identity provider.
Now, when the Woofy video arrives inside the originals bucket, that bucket is configured to generate an event which contains the details of the object that was uploaded, and it's set to send that event to and invoke a Lambda function to process that video. That Lambda function takes in the event and creates jobs within the Elastic Transcoder service, which is a managed service offered by AWS that can take in media and manipulate that media—one of the things it can do is transcode the media, generating media of different sizes from one master video file.
Multiple jobs get created, one for each size of video that's required, and the Elastic Transcoder gets the location of the original video as part of the initiation of the job and loads in that video at the start of each job processing cycle, so each job outputs an object to a transcoder bucket—one object for each different size of the original video—and in addition, details on each of the new videos are added to a database, in this case DynamoDB.
Again at this stage, notice that we still have no self-managed servers—the only resources that are consumed are storage space in S3 and DynamoDB, and any processing time used for the Lambda function and Elastic Transcoder jobs. With this architecture so far, we've allowed a customer to upload a master video, transcoded it into different video sizes, and at no point have we consumed any self-managed compute, EC2 instances, or any other long-running compute services—it’s all managed services or compute that’s used in Julie’s browser.
Now the last part of the architecture is where Julie, by clicking another part of the client site that's running inside her browser, can interact with another Lambda function—we’ll call this My Media—and this Lambda function will load data from the database, identify which objects in the transcode bucket are Julie’s, and return URLs for Julie to access, and this is how Julie can load up a web page which shows all of the videos that she’s uploaded to the PetTube application.
Now this is a simplified diagram—in reality, it’s a little bit more complex—for example, API Gateway would generally be used between any client-side processing and the Lambda functions, but conceptually this is actually how it works. We’ve got no self-managed servers, no self-managed database servers, and little, if any, costs that are incurred for base usage—it’s a fully consumption-based model that consumes compute only when it’s being used, when events are generated either from a system-side or a client-side, and it uses third-party services as much as possible.
Now there are many third-party services to choose from, and you can never expect to know them all end-to-end, but the key thing to understand about serverless is the way to do things, and I’ve covered that in this lesson. Later in the section, you’ll experience how to implement a serverless application within the demo lesson called PetCuddleatron, and this will show you how to implement a serverless application just like the one that’s on screen—it’s slightly less complex, but it’s one that uses many of the same architectural fundamentals, and it should start to really cement the theory that you’re learning right now.
Now before we move on to this demo, there are a few more services that I need to cover which the PetCuddleatron demo lesson will utilize. So for now, that’s it for this lesson—thanks for watching, go ahead and complete this video, and then when you’re ready, I’ll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I want to cover CloudWatch events. We've covered CloudWatch earlier in the course, which focused on metrics and monitoring, and we've also covered CloudWatch logs, which focused on the ingestion and management of logging data.
CloudWatch events delivers a near real-time stream of system events, and these events describe changes in AWS products and services. When an instance is terminated, started or stopped, these generate an event, and when any AWS products and services which are supported by CloudWatch events perform actions, they generate events that the product has visibility of.
Events Bridge is the service which is replacing CloudWatch events, and it can perform all of the same bits of functionality that CloudWatch events can produce, as it's got a superset of its functionality. In addition, Events Bridge can also handle events from third parties as well as custom applications.
They do both share the same basic underlying architecture, but AWS are now starting to encourage a migration from CloudWatch events over to Events Bridge. We've got a lot of architecture to cover, so let's jump in and get started.
Both Events Bridge and CloudWatch events perform at a high level the same basic task; they allow you to implement an architecture which can observe if X happens or if something happens at a certain time, so Y, then do Z. X is a supported service which generates an event, so it's a producer of an event, and Y can be a certain time or time period, and this is specified using the Unix Cron format, which is a flexible format letting you specify one or more times when something should occur, and Z is a supported target service to deliver the event to.
Events Bridge is basically CloudWatch events version two; it uses the same underlying APIs, and it has the same basic architecture, but AWS recommend that for any new deployments, you should use Events Bridge because it has a superset of the features offered by CloudWatch events. Things created in one are visible in the other for now, but this could change in the future, so as a general best practice, you should start using Events Bridge by default for any of the functions that you can use CloudWatch events for.
Now, both of these services actually operate using a default entity, which is known as an event bus, and both of them actually have a default event bus for a single AWS account. A bus in this context is a stream of events which occur from any supported service inside that AWS account.
Now, in CloudWatch events, there is only one event bus available, so it's implicit; it's not really exposed to the UI, it just exists. You interact with it, but because there's only one of them, it's not actually exposed as a visible thing; you just look for events and then send these events to targets when you want something to occur. So in CloudWatch events, there is only one event bus, and it's not exposed inside the UI.
In Event Bridge, you can create additional buses, either for your applications or third-party products and services, and you can interact with these buses in the same way as the account default event bus.
Now, with CloudWatch events and Event Bridge, you create rules, and these rules pattern match events which occur on the buses, and when they see an event which matches, they deliver that event to a target. Alternatively, you also have schedule-based rules which are essentially pattern-matching rules but which match a certain date and time or ranges of dates and times, so if you're familiar with the Unix Cron system, this is similar.
For a schedule rule, you define a Cron expression, and the rule executes whenever this matches and delivers this to a particular target. So the rule matches an event, and it routes that event to one or more targets which you define on that rule, and an example of one target is to invoke a specific Lambda function.
Now, architecturally, at the heart of Event Bridge is the default account event bus, which is a stream of events which are generated by supported services within the AWS account. Now, EC2 is an example of a supported service, and let's say in this case, we've got Bob changing the state of an EC2 instance, and he's changing the state from stopped to running.
When the instance changes state, an event gets generated which runs through the event bus. Event Bridge, which sits over the top of any event buses that it has exposure to, monitors all of the events which pass through this event bus.
Now, within Event Bridge or CloudWatch events, which I'm going to start calling just Event Bridge from now on because it makes it easier, but within Event Bridge, we have rules. Now, rules are created, and these are linked to a specific event bus, and the default is the account default event bus.
The two types of rules are pattern matching rules, and these match particular patterns of the events themselves as they pass through the event bus. We've also got scheduled rules which match particular cron-formatted times or ranges of times, and when this cron-formatted expression matches a particular time, the rule is executed, and in both of these cases, when a rule is executed, the rule delivers the particular event that it's matched through to one or more targets.
And of course, as I just mentioned, examples of these targets could be to invoke a lambda function. Now, events themselves are just JSON structures, and the data in the event structure can be used by the targets.
So in the example of a state change of an EC2 instance, the lambda function will receive the event JSON data, which includes which instance has changed state, what state it's changed into, as well as other things like the date and time when the change occurred.
So that's a theory of both CloudWatch events and the event bridge, and both of these products are used as a central point for managing events generated inside an AWS account and controlling what to do with those events.
So at this point, that is everything that I wanted to cover. Go ahead and complete this lesson, and then when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in part three of this series, I want to finish off and talk about some advanced elements of Lambda. Now we've got a lot to cover, so let's jump in and get started.
First, I want to talk about the ways a Lambda function can be invoked. We've got three different methods for invoking a Lambda function: we've got synchronous invocation, asynchronous invocation, and invocation using event source mappings. And I want to step through each of them visually so that you can understand in detail how they work because this is essential for the exam.
So let's start off with synchronous invocation of Lambda. With this model, you might start off with a command line or API directly invoking a Lambda function; the Lambda function is provided with some data and it executes that data. Now all this time, the command line or API is waiting for a response because it's synchronous — it needs to wait here until the Lambda function completes its execution. So the Lambda function finishes and it returns that data, whether it's a success or a failure.
Now synchronous invocation also happens if Lambda is used indirectly via the API gateway, which is the use case for many serverless architectures. So we might have some clients using a web application via API gateway and this proxies through to one or more Lambda functions; again, the Lambda function performs some processing all the while the client is waiting for a response within their web application. And then when the Lambda function responds, this goes back via the API gateway and back through to the client.
The common factors with both of these approaches is that the client sends a request which invokes Lambda and the result, be it a success or failure, is returned during that initial request — the client is waiting for any data to be returned. Another implication of a synchronous invocation is that any errors or retries have to be handled within the client. The Lambda function runs once, it returns something, and then it stops; if there's a problem or data isn't processed correctly, then the client needs to rerun that request — and this happens at the client side.
So synchronous invocation is generally used when it's a human directly or indirectly invoking a Lambda function. Next, let's look at asynchronous invocation, and this is typically used when AWS services invoke Lambda functions on your behalf. Let's use an example: an S3 bucket with S3 events enabled — so we upload a new image of whiskers to this S3 bucket, which causes an event to be generated and sent through to Lambda, and this is an asynchronous invocation.
So S3 isn't waiting around for any kind of response — it basically just forgets about it at this point. Once it sent that event through to Lambda, it doesn't continue waiting; it doesn't worry about this event at all. Now maybe as part of processing this image, it's generating a thumbnail or maybe performing some kind of analysis and storing that data into DynamoDB, but again, S3 isn't waiting around for any of this — it's asynchronous. Lambda is responsible for any reprocessing in the event that there's a failure, and this reprocessing value is configurable between zero and two times.
Now a key requirement for this is that the function code needs to be idempotent — and this is important. If you've never heard this term before, let me explain. Let's say that you had $10 in your bank account and I wanted to increase this value to $20. Now there are two ways that I could do this if I operated the bank: I could simply add $10 to your balance, increasing it from 10 to 20, or I could explicitly set the balance to 20.
Now if I set the balance to 20 and this operation failed at some undetermined point in this process, then I could simply rerun the process, safe in the knowledge that even running it again on your balance would only at worst set the value to $20 again — this is known as an idempotent operation. You can run it as many times as you want and the outcome will be the same.
Now if I performed the operation where I added $10 to your account and the operation failed, it could have failed before it added the $10 or after; if it failed after and I rerun the operation, well now you'd have $30 — and this is an example of something which is not idempotent.
When Lambda retries an operation it doesn't really provide any other information — the function just reruns. So logically in this example you would need to make sure that your function code isn't additive or subtractive — it just needs to perform its intended task; with this example it needs to set your balance to $20.
Generally when designing a Lambda function which is used in this way, the Lambda function needs to finish with a desired state — it needs to make something true. If you're using Lambda functions which are designed in a non-idempotent way, you can end up with some questionable results.
Now Lambda can be configured to send any events which it can't process after those automatic retries to a dead letter queue, which can be used for diagnostic processing. And a new feature of Lambda is the ability to create destinations — so events processed by Lambda functions can be delivered to another destination such as SQS, SNS, another Lambda function, and even EventBridge; and separate destinations can be configured based on successful processing or failures.
So this is asynchronous invocation — it's generally used by AWS services which are capable of generating events and sending those events to Lambda. It means that Lambda can automatically reprocess failed events and the original source of the event isn't waiting for processing to complete.
But there is a third type of invocation. The last type of invocation is known as Event Source Mapping, and this is typically used on streams or queues which don't generate events — so things where some kind of polling is required. Let's look at an example: let's say that we have a Kinesis data stream, and into this stream, a fleet of producer vans driving around scanning with LIDAR and imaging equipment are all producing data which is being put into a Kinesis stream.
Now Kinesis is a stream based product — generally consumers can read from a stream but it doesn't generate events when data is added, so historically this wouldn't have been an ideal fit for Lambda which is an event driven service. So what happens is that we have a hidden component called an event source mapping which is polling queues or streams looking for new data and getting back source batches — so batches of source data from this data source.
Now these source batches are then broken up as required based on a batch size and sent into a Lambda function as event batches. Now a single Lambda function invocation could in theory receive hundreds of events in a batch — it depends on how long each event takes to process. Remember Lambda has a 15 minute timeout so you need to carefully control this event batch size to ensure that the Lambda function doesn't terminate before completing this batch.
Now there's one really important thing that you need to understand about event source mapping. With a synchronous invocation an event is delivered to Lambda from the source and Lambda doesn't need permissions to the source service unless it actually wants to read more data from that source — for example, if an object is added to an S3 bucket, then S3 generates and delivers an event which contains details of that event (so which object was uploaded and perhaps some other metadata).
But unless you need to read additional data from S3 maybe to get the actual object, well then the Lambda function doesn't need S3 permissions. With event source mapping invocation, the source service isn't delivering an event — the event source mapping is reading from that source. And so the event source mapping uses permissions from the Lambda execution role to access the source service.
And this is really important to know because it does come up in the exam. So even if a Lambda function receives an event batch containing Kinesis data, even though the Lambda function doesn't directly read from Kinesis, the execution role needs Kinesis permissions because the event source mapping uses them on its behalf to retrieve that data.
Now any batches which consistently fail can be sent to an SQS queue or an SNS topic for further processing or analysis. Now that's the third type of invocation — this is event source mapping invocation, and that's the method used when Lambda functions are processing SQS queues, Kinesis streams, DynamoDB streams and even Amazon managed streaming for Apache Kafka. And this last one is something that we won't be covering within the course, but it's important to know all of the different types of products that use event source mapping based invocation.
With that being said that's all of the three types of invocation I wanted to cover — so let's move on to a different topic, this time Lambda versions. With Lambda functions it's possible to define specific versions of Lambda functions — so you could have different versions of the given function, for example, version one, version two and version three.
Now as it relates to Lambda, a version of a function is actually the code plus the configuration of that Lambda function — so the resources and any environment variables in addition to any other configuration information. Now when you publish a version, that version is immutable — it never changes once it's published, and it even has its own Amazon resource name. So once you publish a version you can no longer change that version.
There's also the concept of dollar latest, and dollar latest points at the latest version of a Lambda function — now this can obviously change as you publish later and later versions of the function, so this is not immutable. You can also create aliases — so for example, dev, stage and prod, and these can point at a particular version of a Lambda function, and these can be changed — so these aliases are not immutable.
So generally with large scale deployments of Lambda you'd be producing Lambda function versions for all of the major changes, and using aliases so that different components of your serverless application can point at those specific immutable version numbers — so that's important to know for the exam.
So the last thing I want to talk about is Lambda startup times, and to understand that you need to understand how Lambda functions are actually executed. Lambda code runs inside a runtime environment — and this is also referred to as an execution context; think of this as a small container which is allocated an amount of resource which runs your Lambda code.
When a Lambda function is first invoked — let's say by receiving an S3 event — this execution context needs to be created and configured, and this takes time. First the environment itself is created and this requires physical hardware; then any runtimes which are required are downloaded and installed — let's say this is for Python 3.8; then the deployment package is downloaded and then installed — and this takes time.
Now this process is known as a cold start, and all in, this process can take hundreds of milliseconds or more — which can be significant if a Lambda function is performing a task which touches a human who is expecting a response. Now if this is an S3 event, then maybe this extra time isn't such a big deal — but you need to be aware that this cold start occurs because an execution context is being created and configured, any prerequisites are being downloaded and installed, the deployment package is being downloaded and installed — and that's all before the function itself can execute.
Now if the same Lambda function is invoked again without too much of a gap, then it's possible that Lambda will use the same execution context — and this is known as a warm start. It doesn't need to set up the environment or download the deployment package because all of that is already contained within the execution context; this time the context just receives the event and immediately begins processing.
A warm start means the code can be running within milliseconds because there's no lengthy build process. A Lambda function which invokes again fairly soon after a cold start can reuse an execution context — but if too long a time period goes between invocations, then the context can be deleted which results in another cold start.
Also one function invocation runs at a time per context — so if you need 20 invocations of a function at once, then this can result in 20 cold starts. Now you can make this process more efficient — you can actually use a feature known as provisioned concurrency where you can inform AWS in advance. An execution context can be provisioned for you in advance for Lambda invocations.
You might use these when you know that you have periods of high load on a serverless application or if you're preparing for a new production release of a serverless application and want to pre-create all of these execution environments. Now there are also other things that you can do to improve performance — you can use the temp space to pre-download things within an execution context.
For example, maybe you're using some animal images as part of your processing — well, if another function uses the same execution context, then it too will have access to those same animal images without having to download them a second time. Now you do need to be careful because your functions need to be able to cope with the environment being new and clean every time — they can never assume the presence of anything.
From a code perspective, you can create other things like database connections outside of the Lambda function handler code. So when you create a Lambda function, generally most things go within the Lambda function handler — but if you create anything outside of the Lambda function handler, then these will be made available for any future function invocations in the same context.
So anything that you define within a Lambda function handler is limited to that one specific invocation of that Lambda function, but for anything which you anticipate there being a potential for reuse, you can declare that outside of the Lambda function handler — and in theory, that will be available for any other invocations of the Lambda function which occur within that same execution context.
But again, you need to make sure that your function doesn't require or expect that — every single time a function invokes, it should be absolutely fine with recreating everything. You should by default assume that execution contexts are stateless and any invocation of a Lambda function is going to be operating in a completely freshly created environment, but if you want to be efficient, your functions should also be able to reuse common aspects that persist through different function invocations.
Now again, these are all deep dive things that you need to be aware of for the exam. I've covered a lot of these elements across all three parts of this Lambda deep dive mini series, but at this point, that's everything I wanted to cover in part three — and this is the last part of this mini series. So thanks for watching. Go ahead and complete this video, and when you're ready, I look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back to part two of this lesson series going into a little bit more depth on Lambda. In this part of the series I'm going to be talking about Lambda networking, Lambda permissions and Lambda monitoring. Now this is a lot to cover in one lesson so let's jump in and get started.
Lambda has two networking modes and you need to be aware of both of them for the exam. First we have public which is the default and then second we have VPC networking. Now you need to understand the architecture of both of them so let's step through them in a little bit more detail.
For public networking we start with an AWS environment and inside it a single Lambda function. Now this is part of a wider application, let's say the Categorum Enterprise application running in a VPC, which uses Aurora for the database, EC2 for compute, and the Elastic file system for shared file storage. Now this is the default configuration for Lambda where it's running in the public AWS network, so Lambda using this configuration can access public space AWS services such as SQS and DynamoDB or internet-based services such as IMDB if the Lambda function wanted to fetch the latest details of cat-themed movies and TV shows.
So Lambda running by default using public networking means that it has network connectivity to public space AWS services and the public internet; it can connect to both of those from a networking perspective, and as long as it has the required methods of authentication and authorization then it can access all of those services. Now public networking offers the best performance for Lambda because no customer-specific networking is required—Lambda functions can run on shared hardware and networking with nothing specific to one particular customer—but this does mean that any Lambda functions running with this default have no access to services running within a VPC unless those services are configured with public addressing as well as security rules to allow external access, so this is a big limitation that you need to understand for the exam.
So the architecture on screen now—this Lambda function could not access Aurora, EC2, or the Elastic File system unless they had public addressing and the security was configured to allow that access, so in this example without configuration changes the Lambda function could access public services but would have no access to anything running inside the VPC. Now in most cases in my experience Lambda is used with this public networking model, but there are situations where this isn't enough and for those situations Lambda can be configured to run inside a VPC.
Let's look at how. This time we have the same architecture—so a VPC running within AWS—but this time the Lambda function is configured to run inside a private subnet at the bottom. Now this is the same subnet where the Catergram Enterprise infrastructure is running from, and for the exam specifically the key thing to understand about Lambdas running inside a VPC is that they obey all of the same rules as anything else running in a VPC because they're actually running within that VPC.
So to start with, this means that Lambda functions running inside a VPC can freely access other VPC-based resources assuming any network ACLs and security groups allow that access, but the flip side of this means they can't access things outside of the VPC unless networking configuration exists within the VPC to allow this external access. So by default with this architecture the Lambda function couldn't access DynamoDB or any internet-based endpoints such as with this example IMDB.
Now if you face any exam questions or you need to design any solutions which involve Lambda functions running within a VPC, then just treat them like anything else running in that VPC. So this means that you could use a VPC endpoint, for example a gateway endpoint, to provide access to DynamoDB; because the Lambda function is running within the VPC it could utilize a gateway endpoint to access DynamoDB, or in the case that the Lambda function needed access to AWS public services or the internet, you could deploy a NAT gateway in a public subnet and then attach an internet gateway to the VPC.
Remember Lambda running within a VPC behaves like any other VPC-based service—the same gateways and configurations are needed to allow VPC-based Lambda functions to communicate with the AWS public zone and the public internet. Now you also need to give your Lambda functions EC2 network permissions via the execution role, which I'll cover very soon because the Lambda service needs to create network interfaces within your VPC, it requires these permissions, and this architecture of using network interfaces within a VPC is what I want to quickly cover now.
Now there used to be disadvantages to running Lambda in a VPC—significant disadvantages—and the reason was the networking architecture that Lambda used. VPC-based Lambda functions don't actually run within your VPC; the way they work is similar to Fargate, so we have AWS and there's a Lambda service VPC and a customer VPC.
Now let's keep things simple and say that we only have three Lambda functions. Now the way that this historically worked is that each of these Lambda functions when invoked would create an elastic network interface within the customer VPC and traffic would flow between this service VPC and the customer VPC. Now the problem is that configuring these elastic network interfaces on a per-function, per-invocation basis would take time and add delay to the execution of the Lambda function code.
In addition, this architecture doesn't scale well because parallel function executions or concurrency required additional elastic network interfaces, and the more popular a system became the worse the problem became—with larger systems you had more and more performance issues and more and more issues with keeping VPC capacity available for larger and larger numbers of ENIs.
Now luckily this is the old architecture—this is the way that Lambda used to handle this private networking—it's not how it works anymore. With the new way, instead of requiring an elastic network interface per function execution, AWS analyzes all of the functions running in a region in an account and builds up a set of unique combinations of security groups and subnets.
So for every unique one of those, one ENI is required in the VPC. So if all your functions used a collection of subnets but the same security groups, then one network interface would be required per subnet; if they all used the same subnet and all used the same security group, then all of your Lambda functions could use the single elastic network interface. So a single connection between the Lambda Service VPC and your VPC is created for every unique combination of security groups and subnets used by your Lambda functions.
Now the network interfaces using this architecture are created when you configure the Lambda function, and typically this might take 90 seconds, but this is done once—so when you create the function or when you update the network and configuration, this network and configuration is created or updated—and that means that it isn't required every single time a Lambda function is invoked, so it doesn't delay your function invocations.
Now this means that you can use private networking at scale without increasing the number of elastic network interfaces required, so where it used to be a bad idea performance-wise to use VPC-based Lambdas, this is no longer the case.
So that's networking—so this is how you configure Lambda functions if you need them to have access to private VPC services—and it's important that you understand both the public and VPC networking model especially for the exam, because you will face questions on the exam about executing Lambda functions within a VPC.
Again one really important hint that I will provide is just treat Lambda functions running in a VPC like any other VPC-based resource, and by now you should know how to architect a VPC so that services running in that VPC have access to everything that they need, so just treat Lambda functions in the same way.
Now let's look at the security of Lambda functions. When it comes to Lambda permissions, there are actually two key parts of the permissions model that you need to understand—one of them is pretty well known and that's covered at the associate level, the other not so much.
Now let's start with a typical Lambda environment—this is a runtime environment, the thing where your Lambda functions execute within—so this is running a runtime, in this case Python 3.8, it's allocated some resources and the code loads and runs within this environment.
Now for this environment to access any AWS products and services it needs to be provided with an execution role—this is a role which is assumed by Lambda and by doing so the code within the environment gains the permissions of that role based on the role's permissions policy—so a role is created which has a trust policy which trusts Lambda, and the permissions policy that that role has is used to generate the temporary credentials that the Lambda function uses to interact with other resources.
So in many ways this is just the same as an EC2 instance role—so this governs what permissions the function receives, which might be something like loading data from DynamoDB and storing output data into S3.
Now this is the most well-known aspect of Lambda permissions, but there is another part—Lambda actually has resource policies. Now this in many ways is like a bucket policy on S3; it controls who can interact with a specific Lambda function.
It's this resource policy which can be used to allow external accounts to invoke a Lambda function or certain services to use a Lambda function such as SNS or S3. The resource policy is something changed when you integrate other services with Lambda, and you can manually change it via the CLI or the API; unless something's changed between creating this lesson and when you're watching it, it currently can't be changed using the console UI, so this is only something which can be manipulated using the CLI or the API.
So that's how security works within a Lambda function. Now one more thing that I want to cover before finishing up with part two is logging.
So Lambda uses CloudWatch, CloudWatch Logs, and X-Ray for various aspects of its logging and monitoring—so any logging information generated from Lambda executions, that goes into CloudWatch Logs, so the output of Lambda functions, any messages that you output to the log, any errors, details on the duration of the execution—that's all stored into CloudWatch Logs.
Any metrics—so details such as invocation successes or failure numbers, any retries, anything to do with latency—that's all stored in CloudWatch, so CloudWatch is the thing that stores metrics, and this is important to understand: logging goes into CloudWatch Logs, and any details on the number of invocations, successes or failures, anything around metrics goes straight into CloudWatch.
Now Lambdas can also be integrated into X-Ray, which I cover elsewhere in the course, and this can be used to add distributed tracing capability—so if you need to trace the path of a user or the path of a session through a serverless application which uses Lambda, then you can use the X-Ray service.
Now I don't expect this to feature heavily on the exam but just remember the terms X-Ray and distributed tracing because that might come in handy for one or two exam questions if these topics do crop up.
Now one really important thing to remember for the exam is that for Lambda to be able to log into CloudWatch Logs to generate the output of any of the executions, you need to give Lambda permissions via the execution role—so there's actually a pre-built policy and role within AWS specifically designed to give Lambda functions the basic permissions that they require to log information into CloudWatch Logs.
And one really common exam scenario is where you're trying to diagnose why a Lambda function is not working—there's nothing in CloudWatch Logs—and one possible answer is that it doesn't have the required permissions via the execution role.
Now that's everything I wanted to cover in part two of this Lambda in-depth mini series—so we've covered networking, both public and private, we've covered security, and we've covered logging—so go ahead and complete this lesson and when you're ready I look forward to you joining me in part three.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this multi-part lesson mini-series, I want to talk about AWS Lambda. Lambda is a function as a service or a fast product. This means that you provide specialized short running and focused code to Lambda and it takes care of running it and billing you only for what you consume. So a Lambda function is a piece of code which Lambda runs and every Lambda function is using a supported runtime. So an example of a supported runtime is Python 3.8. So when you create a Lambda function, you need to define which runtime that piece of code uses. Now, when you provide your code to Lambda, it's loaded into and executed within a runtime environment. And this runtime environment is specifically created to run code using a certain runtime, a certain language. So when you create a Lambda function that uses the Python 3.8 runtime, then the runtime environment that's created is itself specifically designed to run Python 3.8 code.
Now, when you create a Lambda function, you also define the amount of resource that a runtime environment is provided with. So you directly allocate a certain amount of memory and based on that amount of memory, a certain amount of virtual CPU is allocated, but this is indirect. You don't get to choose the amount of CPU. This is based on the amount of memory. Now, the key thing to understand about Lambda as a service, because it's a function as a service product, because it's designed for short running and focused functions, you only actually build for the duration that a function runs. So based on the amount of resource allocated to an environment and based on the duration that that function runs for per invocation, that determines how much you'll build for the Lambda product. So you'll build for the duration of function executions.
Now, Lambda is a key part of serverless architectures running within AWS. And over this section of the course, you're going to get some experience of how you can use Lambda to create serverless or event-driven architectures. Architecturally, the way that Lambda works is this. You define a Lambda function. Now, you can think of a Lambda function as a unit of configuration. Yes, you can also use the term Lambda function to describe the actual code. But when you think of a Lambda function, think of it as the code plus all the associated wrappings and configuration. Your Lambda function at its most basic is a deployment package which Lambda executes. So when you create a Lambda function, you define the language which the function is written in. You provide Lambda with a deployment package and you set some resources. And whenever the Lambda function is invoked, what actually happens is the deployment package is downloaded and executed within this runtime environment.
Now, Lambda supports lots of different runtimes. Some of the common ones are various different versions of Python. We also have Ruby. We've got Java. We've also got Go and there's also C# as well as various versions of Node.js. Now, you can also create custom ones using Lambda layers. And many of these are created by the community. For the exam though, one really important point is that if you see or hear the term Docker, consider this to mean not Lambda. So Docker is an anti-pattern for Lambda. Now, Lambda does now support using Docker images, but this is distinct from the word Docker. If you hear the term Docker in the exam, then it generally will be referring to traditional containerized computing. So that's using a specific Docker image to spin up a container and use it in a containerized compute environment such as ECS.
Now, you can also use container images with Lambda. Now, that's a different process. That means that you're using your existing container build processes, the same ones that you use to create Docker images. But instead, you're creating specific images designed to run inside the Lambda environment. So don't confuse Docker container images and Docker with images used for Lambda. They're two different things. The only thing that they share is that you can use your existing build processes to build Lambda images. Now, custom runtimes could allow languages such as Rust, which is a very popular community-based language to work within the product. So if you search using Google or any other popular search engine, you'll be able to find lots of languages which have been added by the community using the Lambda layer functionality. And I'll be talking about that elsewhere in the course.
Now, you select the runtime to use when creating the function, and this determines the components which are available inside the runtime environment. So Python code, for instance, requires Python of a certain version to be installed in addition to various Python modules. Conceptually, think about it like this. Every time a Lambda function is invoked, which means to execute that function, a new runtime environment is created with all of the components that that Lambda function needs. Let's say, for example, a Python 3.8-based Lambda function. So the code loads, it's executed, and then it terminates. Next time, a new clean environment is created, it does the same thing, and then it terminates. Lambda functions are stateless, which means no data is left over from a previous invocation. Every time a function is invoked, it's a brand new invocation, a brand new environment. Now, I'm going to be talking about this in part 3 of this series, because this isn't always the case, but you have to assume that it is architecturally. So your code running within Lambda needs to be able to work 100% of the time if it's a new environment. Lambda runtime environments have no state. Now, there are some situations where a function might be invoked multiple times within the same environment. And I'll be talking about that in part 3 of this series. But as a base level, a default, assume that every time a Lambda function is invoked, it's inside a brand new runtime environment.
Now, you also define the resources that Lambda functions use, and this determines how much resource the runtime environment gets. Now, you directly define the memory. And this is anywhere from 128 MB to 10,240 MB in one MB steps. Now, you don't directly control the amount of virtual CPU. This scales with the memory. So 1769 MB of memory gives you one VCPU of allocation, and it's linear. So the less memory means less virtual CPUs, and more memory means additional VCPU capacity. The runtime environment also has some disk space allocation. 512 MB is mounted as forward slash TMP within the runtime environment. This is the default amount, but it can scale to 10,240 MB. Now, you can use this, but keep in mind, you have to assume that it's blank every single time a Lambda function is invoked. This should only be viewed as temporary space.
Lambda functions can run for up to 900 seconds or 15 minutes. And this is known as the function timeout. This is important because for anything beyond 15 minutes, you can't use Lambda directly. And that's a really important figure to know for the exam. You know by now I'm not a fan of people memorizing facts and figures, but this is definitely one that you need to remember for the exam. So 15 minutes is a critical amount of time for a Lambda function. You can use other things, such as step functions, to create longer running workflows, but one invocation of one function has a maximum of 15 minutes or 900 seconds.
Now, we're going to be covering security in more detail in part two, as well as networking. But the security for a Lambda function is controlled using execution roles. And these are IAM roles, assumed by the Lambda function, which provides permissions to interact with other AWS products and services. So any permissions which a Lambda function needs to be provided with are delivered by creating an execution role and attaching that to a specific Lambda function.
Now, just a few final things before we finish up some common uses of Lambda. So Lambda forms a core part of the delivery of serverless applications within AWS. And generally this uses products such as S3, API gateway, and Lambda. So these three together are often used to deliver serverless applications. Lambda can also be used for file processing, using S3, S3 events, and Lambda. So a very common example that's used in training is watermarking images. So have images uploaded to S3, generate an S3 event, invoke a Lambda function, which applies a watermark, and then terminates. And you're only billed for the compute resources used during those Lambda function invocations. You can also use Lambda for database triggers. So this is using DynamoDB, as well as DynamoDB streams, and then Lambda. So Lambda can be invoked any time data is inserted, modified, or deleted from a DynamoDB table with streams enabled. And this is another powerful architecture.
You can also use Lambda to implement a form of serverless cron. So you can use EventBridge or CloudWatch events to invoke Lambda functions at certain times of day, or certain days of week, to perform certain scripted activities. And this is something that traditionally you would need to run on something like an EC2 instance, but using Lambda means that you're only billed for the amount of time that these functions are executing. So this is another really common use case. And then finally, you can perform real-time stream data processing. So Lambda's can be configured to invoke whenever data is added to a Kinesis stream. And this can be useful because Lambda is really scalable. And so it can scale with the amount of data being streamed into a Kinesis stream. And again, this is another really common architecture for any businesses that are streaming large quantities of data into AWS, and they require some form of real-time processing.
Now that's everything that I wanted to cover in part one of this series. Remember, it's a three-part mini-series, part two and part three, are going to introduce some more advanced concepts. Specifically, though, is that you'll need for the exam. But at this point, go ahead, complete this lesson, and then when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. This is part two of this lesson, and we’re going to continue immediately from the end of part one. So let's get started.
Now, the previous architecture can be evolved by using queues. A queue is a system that accepts messages. Messages are sent onto a queue and can be received or polled off the queue. In many queues, there's ordering, meaning that in most cases, messages are received off the queue in a first-in, first-out (FIFO) architecture, though it's worth noting that this isn't always the case.
Using a queue-based decoupled architecture, CatTube would look something like this: Bob would upload his newest video of whiskers laying on the beach to the upload component. Once the upload is complete, instead of passing this directly onto the processing tier, it does something slightly different. It stores the master 4K video inside an S3 bucket and adds a message to the queue detailing where the video is located, as well as any other relevant information, such as what sizes are required. This message, because it’s the first message in the queue, is architecturally at the front of the queue. At this point, the upload tier, having uploaded the master video to S3 and added a message to the queue, finishes this particular transaction. It doesn’t talk directly to the processing tier and doesn't know or care if it’s actually functioning. The key thing is that the upload tier doesn't expect an immediate answer from the processing tier. The queue has decoupled the upload and processing components.
It's moved from a synchronous style of communication where the upload tier expects and needs an immediate answer and waits for that answer, to asynchronous communications. Here, the upload tier sends the message and can either wait in the background or just continue doing other things while the processing tier does its job. While this process is going on, the upload component is probably getting additional videos being uploaded, and they’re added to the queue along with the whiskers video processing job. Other messages that are added to the queue are behind the whiskers job because there is an order in this queue: it is a FIFO queue.
At the other side of the queue, we have an auto-scaling group, which has been configured with a minimum size of 0, a desired size of 0, and a maximum size of 1,337. Currently, it has no instances provisioned, but it has auto-scaling policies that provision or terminate instances based on what's called the queue length, which is the number of items in the queue. Because there are messages on the queue added by the upload tier, the auto-scaling group detects this and increases the desired capacity from 0 to 2. As a result, instances are provisioned by the auto-scaling group. These instances start polling the queue and receive messages that are at the front of the queue. These messages contain the data for the job and the location of the S3 bucket and the object in that bucket. Once these jobs are received from the queue by these processing instances, they can retrieve the master video from the S3 bucket.
The jobs are processed by the instances, and once they are completed, the messages are deleted from the queue, leaving only one job in the queue. At this point, the auto-scaling group may decide to scale back because of the shorter queue length, so it reduces the desired capacity from 2 to 1, which terminates one of the processing instances. The instance that remains polls the queue and receives the last message. It completes the processing of that message, performs the transcoding on the videos, and leaves zero messages in the queue. The auto-scaling group realizes this and scales back the desired capacity from 1 to 0, resulting in the termination of the last processing EC2 instance.
Using a queue architecture to place a queue between two application tiers decouples those tiers. One tier adds jobs to the queue and doesn’t care about the health or the state of the other tier. The other tier can read jobs from the queue, and it doesn't care how they got there. This is unlike the previous example where application load balancers were used between tiers. While this did allow for high availability and scaling, the upload tier in the previous example still synchronously communicated with one instance of the processing tier. With the queue architecture, no communication happens directly between the components. The components are decoupled and can scale independently and freely. In this case, the processing tier uses a worker fleet architecture that can scale anywhere from zero to a near-infinite number of instances based on the length of the queue.
This is a really powerful architecture because of the asynchronous communications it uses. It's an architecture commonly used in applications like CatTube, where customers upload things for processing, and you want to ensure that a worker fleet behind the scenes can scale to perform that processing. You might be asking why this matters in the context of event-driven architectures, and I’m getting there, I promise.
If you continue breaking down a monolithic application into smaller and smaller pieces, you'll eventually end up with a microservice architecture, which is a collection of, as the name suggests, microservices. Microservices do individual things very well. In this example, we have the upload microservice, the processing microservice, and the store and manage microservice. A full application like CatTube might have hundreds or even thousands of these microservices. They might be different services, or there might just be many copies of the same service, like in this example, which is fortunate because it's much easier to diagram. The upload service is a producer, the processing node is a consumer, and the data store and manage microservice performs both roles.
Logically, producers produce data or messages, and consumers, as the name suggests, consume data or messages. There are also microservices that can do both things. The things that services produce and consume architecturally are events. Queues can be used to communicate events, as we saw with the previous example, but larger microservices architectures can get complex quickly. Services need to exchange data between partner microservices, and if we do this with a queue architecture, we'll logically have many queues. While this works, it can be complicated. Keep in mind that a microservice is just a tiny self-sufficient application. It has its own logic, its own store of data, and its own input/output components.
Now, if you hear the term "event-driven architecture," I don’t want you to be too apprehensive. Event-driven architectures are simply a collection of event producers, which might be components of your application that directly interact with customers, parts of your infrastructure like EC2, or systems monitoring components. These are bits of software that generate or produce events in reaction to something. If a customer clicks submit, that might be an event. If an error occurs during the upload of the whiskers holiday video, that's an event. Producers are things that produce events, and the inverse of this is consumers—pieces of software that are ready and waiting for events to occur. When they see an event they care about, they take action. This might involve displaying something for a customer, dispatching a human to resolve an order packing issue, or retrying an upload.
Components or services within an application can be both producers and consumers. Sometimes a component might generate an event, for example, a failed upload, and then consume events to force a retry of that upload. The key thing to understand about event-driven architectures is that neither the producers nor the consumers are sitting around waiting for things to occur. They're not constantly consuming resources or running at 100% CPU load, waiting for things to happen. Producers generate events when something occurs, such as when a button is clicked, an upload works, or when it doesn’t work. These producers produce events, but consumers aren’t waiting around for those events. They have those events delivered, and when they receive an event, they take an action, then stop. They're not constantly consuming resources.
Applications would be really complex if every software component or service needed to be aware of every other component. If every application component required a queue between it and every other component to put events into and access them from, the architecture would be really complicated. Best practice event-driven architectures have what's called an event router, a highly available central exchange point for events. The event router has an event bus, which you can think of as a constant flow of information. When events are generated by producers, they're added to this event bus, and the router can deliver them to event consumers.
The WordPress system we’ve used so far has been running on an EC2 instance, which is essentially a consistent allocation of resources. Whether the WordPress system is under low load or large load, we’re still billed for that EC2 instance, consuming resources. Now, imagine a system with lots of small services all waiting for events. If events are received, the system springs into action, allocating resources and scaling components as needed. It deals with those events, then returns to a low or no resource usage state, which is the default. Event-driven architectures only consume resources when needed. There’s nothing constantly running or waiting for things to happen. We don’t constantly poll, hoping for something to happen. We have producers that generate events when something happens. For example, on Amazon.com, when you click "order," it generates an event, and actions are taken based on that event. But Amazon.com doesn’t constantly check your browser every second to see if you've clicked "submit."
So, in summary, a mature event-driven architecture only consumes resources while handling events. When events are not occurring, it doesn’t consume resources. This is one of the key components of a serverless architecture, which I’ll talk about more later in this section.
I know this has been a lot of theory, but I promise you, as you continue through the course, it will really make sense why I introduced this theory in detail at this point. It will help you with the exam, too. In the rest of this section, we’ll be covering more AWS-specific and practical topics, but they’ll all rely on your knowledge of this evolution of systems architecture.
Thanks for watching this video. You can go ahead and finish it off, and when you’re ready, I look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this first technical lesson of this section of the course, we'll be stepping through what an event-driven architecture is and comparing it to other architectures available within AWS. As a solutions architect, this matters because you're the one who needs to design a solution using a specific architecture around a given set of business requirements, so you need to have a good base level understanding of all of the different types of architectures available to you within AWS. You can't build something unless you fully understand the architectures, so let's jump in and get started because we've got a lot to cover.
Now, to help illustrate how an event-driven architecture works, let's consider an example. And the example that I want to use is a popular online video sharing platform that you've all probably heard of. Yes, that's right, it's CatTube. One of the popular ways that CatTube is used is for people to upload holiday videos of their cats. So Bob uploads a 4k quality video of whiskers on holiday to CatTube. Now at this point, CatTube begins some processing and it generates lots of different versions of that video at various different quality levels, for example, 1080p, 720p, and 480p. Now this is only part of the application, but it happens to be the most intensive in terms of resource usage. The website also needs to display videos, manage playlists and channels, and store and retrieve data to and from a database.
Now, there are a few ways that we could architect this solution. Historically, the most popular systems architecture was known as a monolithic architecture. Now think of this as a single black box with all of the components of the application within it. So in this example, I'm just showing a subset, but we've got the upload component where Bob uploads his collection of videos where whiskers is on holiday, the processing component which does the conversion of videos, and then we have the store and manage component which interacts with the underlying persistent storage. Now this architecture has a number of considerations, a number of important things to keep in mind. Because it's all one entity, it fails together as an entity. If one component fails, it impacts the whole thing end to end. If uploading fails, it could also affect processing as well as store and manage. Logically, you know that they're separate things, you know that uploading is different than processing, which is different than store and manage, but if they're all contained in a single monolithic architecture, one code base, one big monolithic component, then the failure of any part of that monolith can affect everything else.
The other thing to consider when talking about monoliths is they also scale together. They're highly coupled. All of the components generally expect to be on the same server directly connected and have the same code base. You can't scale one without the other. Generally, with monolithic architectures, you need to vertically scale the system because everything expects to be running on the same piece of compute hardware. And finally, and this is one of the most important aspects of monolithic architectures that you need to be aware of, they generally build together. All of the components of a monolithic architecture are always running and because of that, they always incur charges. Even if the processing engine is doing nothing, even if no videos are being uploaded, the system capacity has to be enough to run all of them. And so they always have allocated resources, even if they aren't consuming them. So using a monolithic architecture tends to be one of the least cost-effective ways to architect systems, ranging from small to enterprise scale.
Now we've seen earlier in the course how we can evolve a monolithic design into a tiered one. With a tiered architecture, the monolith is broken apart. What we have now is a collection of different tiers and each of these tiers can be on the same server or different servers. With this architecture, the different components are still completely coupled together because each of the tiers connects to a single endpoint of another tier. The upload tier needs to be able to send data directly at the processing tier, and again, this could be on the same server or a different server. With the WordPress example that you looked at earlier in the course, we separated the database component of the monolithic application onto its own RDS instance and left the EC2 instance running the Apache web server and the WordPress application. But both of those services still needed to communicate with each other. They were very tightly coupled.
Now, the immediate benefit of a tiered architecture versus a monolith is that these individual tiers can be vertically scaled independently. Put simply, you can increase the size of the server that's running each of these application tiers. What this means is that if the processing tier, for example, requires more CPU capacity, then it can be increased in size to cope with that additional load without having to increase the size of the upload or the store and manage tiers. But this architecture can be evolved even more. Instead of each tier directly connecting to each other tier, we can utilize load balances located between each of the tiers. Remember in the previous section I mentioned internal load balances. This is an example of when internal load balances are useful. It means that in this example the upload tier is no longer communicating with a specific instance of the processing tier. And it means that the store and manage tier is not communicating with a specific instance of the processing tier. Both of them are going via a load balancer. And if you remember from the section of the course where I talked about load balances, this means it's abstracted. It allows for horizontal scaling, meaning additional processing tier instances can be added. Communication occurs via the load balances, so the upload and store and manage tiers have no exposure to the architecture of the processing tier, whether it's one instance or a hundred. This means that the processing tier is now able to be scaled horizontally by adding additional instances, and it's now highly available. If one instance fails, the load balancer just redistributes the connections across the working instances. So by abstracting away from individual instance architecture for the individual tiers, using load balances now means we can scale each tier independently, either vertically or horizontally.
Now, this architecture isn't perfect for two main reasons. First, the tiers are still coupled. The upload tier, for example, expects and requires the processing tier to exist and to respond. While the load balancer means that we can have multiple instances for the processing tier, for example, the processing tier has to exist. If it fails completely, then the upload tier itself will fail because the upload tier expects at least one instance of the processing tier to answer it. If there's a backlog in processing, if the processing tier slows down and it starts to take longer to accept jobs for processing, then that can also impact the upload tier and the customer experience. The other issue with this architecture is that even if there's no jobs to be processed, the processing tier has to have something running. Otherwise, there'll be a failure when the upload tier attempts to add an upload job. So it's not possible to scale the individual tiers of the application back down to zero because the communication is synchronous. The upload tier expects to perform a synchronous communication with the processing tier. It expects to ask for a job to be entered and it requires an answer. So while the tiered architecture improves things, it doesn't solve all of the problems.
Okay, so this is the end of part one of this lesson. It was getting a little bit on the long side and so I wanted to add a break. It's an opportunity just to take a rest or grab a coffee. Part two will be continuing immediately from the end of part one. So go ahead, complete the video, and when you're ready, join me in part two.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back! In this lesson, I want to talk in a little bit of detail about gateway load balancers. These are a relatively new addition to the load balancer family and are designed for very specific sets of use cases, which I'll cover in this lesson. We have a lot of architectural theory to cover, so let's jump in and get started straight away.
Before we talk about gateway load balancers, I want to step through the type of situation where you might choose to use one. Consider this architecture: the Categorum Application Server in a public subnet communicating with the public internet. Now, what's missing here is some kind of inspection-based security appliance, something which can check data for any exploits to protect our application server. If this is an important application, which it is because it involves cats, we can improve the architecture by adding a security appliance, which would be a transparent security device. It would sit in the flow of traffic inbound and outbound, transparently reviewing traffic as it enters the application from the public internet, protecting the application against any known exploits, and then filtering any traffic on the way back out. For example, detecting and preventing any information leakage.
This works well assuming that we don't really have to think about scaling. The issue comes when we grow or shrink our application. Remember, AWS pushes the concept of elasticity, where applications can grow and shrink based on increasing and decreasing load on a system. When you need to deal with a growing and shrinking number of application instances and where this growth is extreme, you need an appropriate number of security appliances. This can be complex and prone to failures.
Now, this solution is tightly coupled, where the application and security instances are tied together. The failure of one can impact the other. It doesn't scale well even in a single application environment, and it's even more complex if you're trying to build multi-tenant applications. It's in this type of situation where you need to use some kind of security appliance at scale and have flexibility around network architecture. That's when you might choose to use a gateway load balancer. Over the remainder of this lesson, I want to step through what they do, how they function, and how to use them.
A gateway load balancer is a product that AWS has developed to help you run and scale third-party security appliances, such as firewalls, intrusion detection systems, intrusion prevention systems, or even data analysis tools. You might use these, for example, to perform inbound and outbound transparent traffic inspection or protection. AWS has a lot of awesome networking products, but there are many large businesses that use third-party security and networking products. You might do this because you have existing skills with those products and want to use them inside AWS, or you might have a formal requirement to use those products or a specific feature or set of features that only one specific vendor can deliver.
In those cases, you'll need to use a third-party appliance. To do that at scale in a manageable way, you'll need to use a gateway load balancer. At a high level, a gateway load balancer has two major components. First, gateway load balancer endpoints, which run from a VPC where the traffic you want to monitor, filter, or control originates from or is destined to. In the example I'll be using, this would be the VPC where the Categorum Application Instance is hosted. Gateway load balancer endpoints are much like interface endpoints within VPCs, which you've experienced so far, but with some key improvements that I'll talk about in a second.
The second component is the gateway load balancer (GWLB) itself. This load balancer sends packets across multiple back-end instances, which are just normal EC2 instances running security software. In order for this type of architecture to work, the gateway load balancer needs to forward packets without any alteration. The security appliance needs to review packets as they're sent or received—after all, that's the whole point. These packets have source and destination IP addresses that might be okay on the original network but might not work on the network where the security appliances are hosted. So, gateway load balancers use a protocol called Geneva, a tunneling protocol. A tunnel is created between the gateway load balancer and the back-end instances (the security appliances). Packets are encapsulated and sent through this tunnel to the appliances.
Now, let's review this visually, which should help you understand how all the components fit together. Let's say that we have a laptop, and this is accessing the Categorum application. I'm keeping this simple for now and not including any VPC boundaries. I'll show you this in a moment. Traffic leaves the source laptop, moves into a VPC through an internet gateway, and arrives at a gateway load balancer endpoint. Gateway load balancer endpoints are like a normal VPC interface endpoint, with one major difference: they can be added to a route table as the next hop, allowing them to be part of traffic flows controlled by that route table.
So, traffic via a route table is directed at this endpoint, and the traffic flows through to the gateway load balancer. Gateway load balancers work similarly to a network load balancer but integrate with gateway load balancer endpoints and encapsulate all traffic that they handle using the Geneva protocol. This means that packets are unaltered—they have the same source IP, destination IP, source port, destination port, and contents as when they were created and sent. This allows the security appliances to scan the packets, review them for any security issues, block them as required, perform analysis, or adjust them as needed. When finished, the packets are returned over the same tunnel, encapsulated back to the load balancer, where the Geneva encapsulation is removed, and the packets move back to the gateway load balancer endpoint and through to the intended destination.
The benefits of this architecture are that gateway load balancers will load balance across security appliances, so you can horizontally scale. The gateway load balancer manages flow stickiness, so one flow of data will always use one appliance. This is useful because it allows that appliance to monitor the state of flows through a system. It provides abstraction, meaning you can use multiple security appliances to provide resilience. If one fails, packets are just moved over to another available security appliance. In this way, it's much like other load balancers within AWS.
Just before we finish up with this lesson, I wanted to provide a more detailed typical architecture where you might use a gateway load balancer. This is the Category application, running in a pair of private subnets at the bottom, behind an application load balancer which is running in a pair of public subnets. Off to the right, we have a separate VPC running a set of security appliances inside an auto-scaling group, so this can grow and shrink based on load to the application. Now, what I want to do now is step through traffic flow through this architecture and show you how the gateway load balancer architecture works to ensure we can scale this security platform.
We start at the top with a client accessing the web application. This flow will arrive at the application load balancer, which uses public addressing. Logically, it first hits the internet gateway. The internet gateway is configured with an ingress route table, also known as a gateway route table, which influences what happens as traffic arrives at the VPC. In this case, our packets are destined for the public IP that the application load balancer on the right is using. The internet gateway first translates the destination public IP address to the corresponding private IP of that application load balancer, which will be running inside the 10.16.96.0/20 subnet.
The third route is used because it's the most specific route for the 10.16.96.0/20 range, and traffic is sent toward the gateway load balancer endpoint in the right availability zone. Gateway load balancer endpoints are like interface endpoints, but they can be the targets within routes. The gateway load balancer endpoint receives these packets and moves them to the gateway load balancer itself, running in the security VPC. At this point, the packets still have the original IP addressing, and this would normally be a problem, since the security VPC might be using a different or conflicting IP range. However, while the source and destination addressing remain the same, the packets are encapsulated using the Geneva protocol and sent through unaltered to the security appliance chosen by the gateway load balancer.
Once the analysis is complete, the packets are returned encapsulated to the gateway load balancer. The encapsulation is stripped, and they are returned via the endpoint to the Category VPC. Since the original IP addressing is maintained, the route table on the top public subnet is used, which has a local route for the VPC-side range. The most specific route is used, and packets flow through to the application load balancer and from there to the chosen application instance. The logic for this is decided by the application load balancer.
The return path uses the same logic. Data leaves the application instance in response to the initial communication from the laptop and will return via the application load balancer. The load balancer is in a subnet with a local route, but the default route goes toward the gateway load balancer endpoint in the same availability zone. Since traffic is going back to the client device that originally accessed the Category application, it will have a public IP destination IP address, so the default route will be used. This means the packets will flow back to the gateway load balancer endpoint and then through to the gateway load balancer, where they'll be encapsulated, passed through to the appliances, then back to the load balancer, de-encapsulated, and passed back to the gateway load balancer endpoint.
Once they’re back at the gateway load balancer endpoint, the subnet has the internet gateway as the default route. This will be used, and traffic moves through the internet gateway, where its source IP will be changed to the corresponding public one of the application load balancer, then sent to the original client device.
And that's it: transparent inline network security done in a scalable, resilient, and abstracted way. I'll be talking more about some of the more nuanced features in other lessons, but for now, that's the basics of gateway load balancers. In terms of architecture, they share many elements with network and application load balancers, including the target group architecture, but they have a very specific purpose: network security at scale. With that said, that's everything I wanted to cover in this video. Go ahead and complete the lesson, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this brief lesson, I want to cover two features of the Elastic Load Balancer series of products, and those features are SSL offload and session stickiness. Now, you'll need to be aware of the architecture of both of these for the exam. The implementation details aren't required; the theory of the architecture is what matters, so let's jump in and get started. Now, there are three ways that a load balancer can handle secure connections, and these three ways are bridging, pass-through, and offload. Each of these comes with their pros and cons, and for the exam and to be a good solutions architect, you need to understand the architecture and the positives and negatives of them all, so let's step through each of these in turn.
So, first, we've got bridging mode, and this is actually the default mode of an application load balancer. With bridging mode, one or more clients make one or more connections to a load balancer, and that load balancer is configured so that its listener uses HTTPS, and this means that SSL connections occur between the client and the load balancer. So, they're decrypted, known as terminated, on the load balancer itself, and this means that the load balancer needs an SSL certificate which matches the domain name that the application uses, and it also means, in theory, that AWS does have some level of access to that certificate, and that's important if you have strong security frameworks that you need to stay inside of. So, if you're in a situation where you need to be really careful about where your certificates are stored, then potentially, you might have a problem with bridged mode. Once the secure connection from the client has been terminated on the load balancer, the load balancer makes second connections to the backend compute resources (EC2 instances in this example). Remember, HTTPS is just HTTP with a secure wrapper. So, when the SSL connection comes from the client to the front-facing listener side of the load balancer, it gets terminated, which essentially means that the SSL wrapper is removed from the unencrypted HTTP that's inside. So, the load balancer has access to the HTTP, which it can understand and use to make decisions. So, the important thing to understand is that an application load balancer in bridging mode can actually see the HTTP traffic. It can take actions based on the contents of HTTP, and this is the reason why this is the default mode for the application load balancer. And it's also the reason why the application load balancer requires an SSL certificate because it needs to decrypt any data that's being encrypted by the client, it needs to decrypt it first, then interpret it, then create new encrypted sessions between it and the backend EC2 instances.
Now, this also means that the EC2 instances will need matching SSL certificates, so certificates that match the domain name that the application is using. So, the Elastic Load Balancer will re-encrypt the HTTP within a secure wrapper and deliver this to the EC2 instances, which will use the SSL certificate to decrypt that encrypted connection. So, they both need the SSL certificates to be located on the EC2 instances, as well as needing the compute to be able to perform those cryptographic operations. So, in bridging mode (which is the default), every EC2 instance at the backend needs to perform cryptographic operations, and for high-volume applications, the overhead of performing these operations can be significant. So, the positives of this method is that the Elastic Load Balancer gets to see the unencrypted HTTP and can take actions based on what's contained in this plain-text protocol. The method does have negatives, though, because the certificate does need to be stored on the load balancer itself, and that's a risk, and then the EC2 instances also need a copy of that certificate, which is an admin overhead, and they need the compute to be able to perform the cryptographic operations. So, those are two pretty important negatives that can play a part in which connection method you select for any architectures that you design.
Now, next we have SSL pass-through, and this architecture is very different. With this method, the client connects, but the load balancer just passes that connection along to one of the backend instances; it doesn't decrypt it at all. The connection encryption is maintained between the client and the backend instances. The instances still need to have the SSL certificates installed, but the load balancer doesn't. Specifically, it's a network load balancer which is able to perform this style of connection architecture. The load balancer is configured to listen using TCP, so this is important. It means that it can see the source and destination IP addresses and ports, so it can make basic decisions about which instances send traffic to (i.e., the process of performing the load balancing), but it never touches the encryption. The encrypted connection exists as one encrypted tunnel between the client all the way through to one of the backend instances. Now, using this method means that AWS never needs to see the certificate that you use; it's managed and controlled entirely by you. You can even use a Cloud HSM appliance (which I'll talk about later in the course) to make this even more secure. The negative, though, is that you don't get to perform any load balancing based on the HTTP part because that's never decrypted. It's never exposed to the network load balancer, and the instances still need to have the certificates and still need to perform the cryptographic operations, which uses compute.
Now, the last method that we have is SSL offload, and with this architecture, clients connect to the load balancer in the same way using HTTPS. The connections use HTTPS and are terminated on the load balancer, and so it needs an SSL certificate which matches the name that's used by the application, but the load balancer is configured to connect to the backend instances using HTTP, so the connections are never encrypted again. What this means is that from a customer perspective, data is encrypted between them and the load balancer, so at all times while using the public internet, data is encrypted, but it transits from the load balancer to the EC2 instances in plain-text form. It means that while a certificate is required on the load balancer, it's not needed on the EC2 instances. The EC2 instances only need to handle HTTP traffic, and because of that, they don't need to perform any cryptographic operations, which reduces the per-instance overhead and also potentially means you can use smaller instances. The downside is that data is in plain-text form across AWS's network, but if this isn't a problem, then it's a very effective solution.
So, now that we've talked about the different connection architectures, now let's quickly talk about stickiness. Connection stickiness is a pretty important concept to understand for anybody designing a scalable solution using load balancers. Now, let's look at an example architecture. We have our customer Bob, a load balancer, and a set of backend EC2 instances. If we have no session stickiness, then for any sessions which Bob or anyone else makes, they're distributed across all of the backend instances based on fair balancing and any health checks. So, generally, this means a fairly equal distribution of connections across all backend instances. The problem with this approach, though, is that if the application doesn't handle sessions externally, every time Bob lands on a new instance, it would be like he's starting again. He would need to log in again and fill his shopping cart again. Applications need to be designed to handle state appropriately; an application which uses stateless EC2 instances where the state is handled in, say, DynamoDB can use this non-sticky architecture and operate without any problems. But if the state is stored on a particular server, then you can't have sessions being fully load balanced across all of the different servers because every time a connection moves to a different server, it will impact the user experience.
Now, there is an option available within Elastic Load Balancers called session stickiness, and within an application load balancer, this is enabled on a target group. Now, what this means is that if enabled, the first time that a user makes a request, the load balancer generates a cookie called AWSALB, and this cookie has a duration which you define when enabling the feature, and a valid duration is anywhere between 1 second and 7 days. If you enable this option, it means that every time a single user accesses this application, the cookie is provided along with the request, and it means that for this one particular cookie, sessions will be sent always to the same backend instance. So, in this case, all connections will go to EC2-2 for this one particular user. Now, this situation of sending sessions to the same server will happen until one of two things occur. The first thing is that if we have a server failure, so in this example, if EC2-2 fails, then this one particular user will be moved over to a different EC2 instance. And the second thing which can occur to change this session stickiness is that the cookie can expire. As soon as the cookie expires and disappears, the whole process will repeat over again, and the user will receive a new cookie and be allocated a new backend instance.
Session stickiness is designed to allow an application to function using a load balancer if the state of the user session is stored on an individual server. The problem with this method is that it can cause uneven load on backend servers because a single user, even if he or she is causing significant amounts of load, will only ever use one single server. Where possible, applications should be designed to use stateless servers, so holding the session or user state somewhere else, so not on the EC2 instance, but somewhere else like DynamoDB. And if you do that, if you host the session externally, it means that the EC2 instances are completely stateless, and load balancing can be performed automatically by the load balancer without using cookies in a completely fair and balanced way.
So, that's everything I wanted to cover about connection stickiness, and that's now the end of this lesson. I just wanted to quickly cover two pretty important techniques that you might need to be aware of for the exam. So, at this point, go ahead and complete the video, and when you're ready, as always, I'll look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to talk about Auto Scaling Groups and health checks. Now, this is going to be fairly brief, but it's something that's really important for the exam, so let's jump in and get started. Auto Scaling Groups assess the health of instances within that group using health checks, and if an instance fails a health check, it is replaced within the Auto Scaling Group, which is a method of automatically healing the instances within the group. Now, there are three different types of health checks that can be used with Auto Scaling Groups: EC2, which is the default, ELB checks that can be enabled on an Auto Scaling Group, and custom health checks.
With EC2 checks, which are the default, any of these statuses is viewed as unhealthy: essentially, anything but the instance running is considered unhealthy. So, if the instance is stopping, stopped, terminated, shutting down, or impaired (meaning it doesn't have two out of two status checks), it is viewed as unhealthy. We also have the option of using load balancer health checks, and for an instance to be viewed as healthy when this option is used, the instance needs to be both running and passing the load balancer health check. This is important because if you're using an application load balancer, these checks can be application-aware. You can define a specific page of the application to be used as a health check, and you can do text pattern matching, which can be checked using an application load balancer. So, when you integrate this with an Auto Scaling Group, the checks that the Auto Scaling Group is capable of performing become much more application-aware.
Finally, we have custom health checks, where an external system can be integrated to mark instances as healthy or unhealthy. This allows you to extend the functionality of the Auto Scaling Group health checks by implementing a process specific to your business or using an external tool. Now, I also want to introduce the concept of a health check grace period. By default, this is 300 seconds or 5 minutes, and this is essentially a configurable value that needs to expire before health checks take effect on a specific instance. So, in this particular case, if you select 300 seconds, it means that the system has 5 minutes to launch, perform any bootstrapping, and complete any application startup procedures or configuration before it can fail a health check. This is really useful if you're performing bootstrapping with your EC2 instances that are launched by the Auto Scaling Group.
Now, this is an important concept because it does come up on the exam and is often a cause of an Auto Scaling Group continuously provisioning and then terminating instances. If you don't have a sufficiently long health check grace period, you can be in a situation where the health checks start taking effect before the applications have finished configuring, and at that point, the instance will be viewed as unhealthy, terminated, and a new instance will be provisioned, and that process will repeat over and over again. So, you need to know how long your application instances take to launch, bootstrap, and perform any configuration processes, and that's how long you need to set your health check grace period to be.
Now, that's everything I wanted to cover in this brief theory lesson. I just wanted to make sure you understand the options that you have available for health checks within Auto Scaling Groups. With that being said, go ahead and complete this video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to quickly cover a pretty advanced feature of Auto Scaling Groups, and that's Auto Scaling Group Lifecycle Hooks. Let's jump in and take a look at what these are and how they work. Lifecycle hooks allow you to configure custom actions that can occur during Auto Scaling Group actions, meaning you can define actions during either instance launch transitions or instance terminate transitions. When an Auto Scaling Group scales out or in, it will either launch or terminate instances, and normally, this process is completely under the control of the Auto Scaling Group. As soon as it makes a decision to provision or terminate an instance, this process happens with no ability for you to influence the outcome. However, with lifecycle hooks, when you create them, instances are paused within the launch or terminate flow, waiting in this state until either a configurable timeout expires (the default being 3600 seconds), after which the process continues or is abandoned, or you explicitly resume the process using the "complete lifecycle action" after performing the desired activity. Additionally, lifecycle hooks can be integrated with EventBridge or SNS notifications, enabling your systems to perform event-driven processing based on the launch or termination of EC2 instances within an Auto Scaling Group.
Visually, let's start with a simple Auto Scaling Group. If we configure instance launch and terminate hooks, this is what it might look like: normally, when an Auto Scaling Group scales out, an instance will be launched and initially placed in a pending state. Once it completes, it moves to the "in service" state, but this doesn’t give us any opportunity to perform custom activities. If we hook into this transition with a lifecycle hook, the instance would move from "pending" to "pending wait," staying in this state until the custom actions are performed. An example might be loading or indexing data, which could take some time, and during this time, the instance remains in the "pending wait" state. Once the process is done, the instance moves to "pending proceed," and then to "in service." This is the process when configuring a lifecycle hook for the instance launch transition. The same happens in reverse when we define an instance terminate hook: normally, the instance would move from "terminating" to "terminated," with no opportunity for custom actions. However, with a lifecycle hook, the instance moves from "terminating" to "terminating wait," where it stays for a timeout period (default 3600 seconds), or until we explicitly call the "complete lifecycle action" operation. This period could be used to back up data, logs, or clean up the instance prior to termination. Once the timeout expires or we call the "complete lifecycle action," the instance moves from "terminating wait" to "terminating proceed," and then to "terminated." Finally, lifecycle hooks integrate with SNS for transition notifications, and EventBridge can also be used to initiate other processes in an event-driven way. That’s everything I wanted to cover about lifecycle hooks, so at this point, go ahead and complete this lesson, and when you're ready, I look forward to you joining me in the next one.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover in a little bit more detail something I've touched on earlier and that's Auto Scaling Group Scaling Polices, so let's just jump in and get started.
One thing many students get confused over is whether scaling policies are required on an Auto Scaling Group, and now you'll see in demos elsewhere in the course that this is not the case; they can be created without any Auto Scaling Policies and they work just fine. When created without any scaling policies, it means that an Auto Scaling Group has static values for min size, max size, and desired capacity. Now if you hear the term manual scaling, that actually refers to when you manually adjust these values, which is useful in testing or urgent situations or when you need to hold capacity at a fixed number of instances, for example, as a cost control measure.
Now in addition to manual scaling, we also have different types of dynamic scaling which allow you to scale the capacity of your Auto Scaling Group in response to changing demand. There are a few different types of dynamic scaling and I want to introduce them here and then cover them in a little bit more detail. At a high level, each of these adjusts the desired capacity of an Auto Scaling Group based on a certain criteria. First, we have simple scaling, and with this one you define actions which occur when an alarm moves into an alarm state. For example, by adding one instance if CPU utilization is above 40% or removing one instance if CPU utilization is below 40%, this helps infrastructure scale out and in based on demand. The problem is that this scaling is inflexible; it's adding or removing a static amount based on the state of an alarm, so it's simple but not all that efficient.
Step scaling increases or decreases the desired capacity based on a set of scaling adjustments known as step adjustments that vary based on the size of the alarm breach. So you can define upper and lower bounds, for example, you can pick a CPU level which you want, say 50%, and you can say that if the actual CPU is between 50 and 60%, then do nothing; if the CPU is between 60 and 70%, then add one instance, or if the CPU is between 70 and 80%, add two instances, and then finally, the CPU is between 90 and 100%, then add three instances, and you can do the same in reverse. The same step changes as CPU is below 50%, only removing rather than adding instances. Now generally step scaling is always better than simple because it allows you to adjust better to changing load patterns on the system.
Next we have target tracking, which comes with a predefined set of metrics. Currently, this is CPU utilization, average network in, average network out, and ALB request count per target. Now the premise is simple enough: you define an ideal value, so the target that you want to track against for that metric, for example, you might say that you want 50% CPU on average. The auto scaling group then calculates the scaling adjustment based on the metric and the target value all automatically. The auto scaling group keeps the metric at the value that you want and it adjusts the capacity as required to make that happen, so the further away the actual value of the metric is from your target value, the more extreme the action, either adding or removing compute. Then lastly, it's possible to scale based on an SQS queue and this is a common architecture for a workload where you can increase or decrease capacity based on the approximate number of messages visible, so as more messages are added to the queue, the auto scaling group increases in capacity to process messages, and then as the queue empties, the group scales back to reduce costs.
Now one really common area of confusion is the difference between simple scaling and step scaling. AWS recommends step scaling versus simple at this point in time but it's important to understand why, so let's take a look visually. Let's start with some simple scaling and I want to explain this using the same auto scaling group but at three points in time. The auto scaling group is initially configured with a minimum of one, a maximum of four, and a desired of one, and that means right now we're going to have one out of a maximum of four instances provisioned and operational, and let's also assume that the current average CPU is 10%. Now with simple scaling, we create or use an existing alarm as a guide. Let's say that we decide to use the average CPU utilization, so we create two different scaling rules. The first, which says that if average CPU is above 50%, then add two instances, and another which removes two instances if the CPU is below 50%. With this type of scaling, if the CPU suddenly jumped to say 60%, then the top rule would apply and this rule would add two instances, changing the desired capacity from one to three. This value is still within the minimum of one and the maximum of four, and so two additional instances would be provisioned with room for a fourth. If the CPU usage dropped to say 10%, then the second rule would apply and the desired capacity would be reduced by two or set to the minimum, so in this case, it would change from three to one. Two instances would be terminated and the auto scaling group would be running with one instance and a capacity for three more as required. Now this works but it's not very flexible. Whatever the load, whether it's 1% over what you want or 50% over, two instances are added, and the same is used in reverse. Whether it's 1% below what you want or 50% below, the same two instances are always removed, so with simple scaling, you're adding or removing the same amount no matter how extreme the increases and decreases in the metric that you're monitoring.
With step scaling, it's more flexible. So with step scaling, you're still checking an alarm but for step scaling, you can define rules with steps, so you can define an alarm which alarms when the CPU is above 50% and one which alarms when the CPU is below 50%, and you can create steps which adjust capacity based on how far away from that value it is. So in this case, if the CPU usage is between 50 and 59%, do nothing, between 60 and 69%, add one, between 70 and 79%, add two, and then between 80 and 100%, add three, and the same in reverse: so between 40 and 49%, do nothing, between 30 and 39%, remove one, between 20 and 29%, remove two, and then between 0 and 19%, remove three. So let's say that we had an auto scaling group at six points in time, so we start with the auto scaling group on the left and let's say that it has 5% load and let's say that we have the same minimum, one, maximum four, as the previous example. The policy is trying to remove three instances with this level of CPU, but as the auto scaling group has the minimum of one, the auto scaling group starts with one instance with a capacity for a further three as required. If our application receives a massive amount of incoming load, let's say that the CPU usage increases to 100%, and this is an extreme example, but based on the scaling policy, this would add three instances, taking us to the maximum value of four, so our auto scaling group now has four instances running, which is also the maximum value for that group. Now at this point, with the same amount of incoming load, the increased number of instances is probably going to reduce the average CPU. Let's say that it reduces it to 55%, well, this causes no change in instances, and neither added or removed because anything in the range of 40 to 59% means zero change. Next, say that the load on the system reduces, so CPU drops to 5%, and this removes three instances, dropping the desired capacity down to one with the option for a further three instances as required. Next, the average CPU stays at 5%, but the minimum of the auto scaling group is one, so the number of instances stay the same even though the step scaling rule should attempt to remove three instances at this level, so we always have the minimum number of instances as defined within the minimum value of the auto scaling group. Now maybe we end the day with some additional load on the system, let's say for example that the CPU usage goes to 60%, and this adds one additional instance, so you should be able to see by now that step scaling is great for variable load where you need to control how systems scale out and in. It allows you to handle large increases and decreases in load much better than simple scaling, so based on how extreme the increase or decrease is determines how many units of compute are added or removed; it's not static like simple scaling, and that's the main difference between simple and step: the ability to scale in different ways based on how extreme the load changes are.
With that being said though, that's everything I wanted to cover in this lesson. Go ahead and complete the video, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson I'm going to be covering EC2, auto scaling groups, which is how we can configure EC2 to scale automatically based on demand placed on the system. Auto scaling groups are generally used together with elastic load balances and launch templates to deliver elastic architectures. Now we've got a lot to cover, so let's jump in and get started.
Auto scaling groups do one thing — they provide auto scaling for EC2. Strictly speaking, they can also be used to implement a self-healing architecture as part of that scaling or in isolation. Auto scaling groups make use of configuration defined within launch templates or launch configurations, and that's how they know what to provision. An auto scaling group uses one launch configuration or one specific version of a launch template which is linked to it. You can change which of those is associated, but it's one of them at a time, and so all instances launched using the auto scaling group are based on this single configuration definition, either defined inside a specific version of a launch template or within a launch configuration.
Now, an auto scaling group has three super important values associated with it — the minimum size, the desired capacity, and the maximum size — and these are often referred to as min, desired, and max, and can often be expressed as x, y, or z. For example, 1, 2, 4 means 1 minimum, 2 desired, and 4 maximum. An auto scaling group has one foundational job which it performs — it keeps the number of running EC2 instances the same as the desired capacity, and it does this by provisioning or terminating instances. So, the desired capacity always has to be more than the minimum size and less than the maximum size.
If you have a desired capacity of 2 but only one running EC2 instance, then the auto scaling group provisions a new instance. If you have a desired capacity of 2 but have three running EC2 instances, then the auto scaling group will terminate an instance to make these two values match. You can keep an auto scaling group entirely manual so there's no automation and no intelligence — you just update values and the auto scaling group performs the necessary scaling actions.
Normally though, scaling policies are used together with auto scaling groups. Scaling policies can update the desired capacity based on certain criteria, for example CPU load, and if the desired capacity is updated, then as I've just mentioned, it will provision or terminate instances.
Visually, this is how it looks: we have an auto scaling group, and these run within a VPC across one or more subnets. The configuration for EC2 instances is provided either using launch templates or launch configurations, and then on the auto scaling group we specify a minimum value — in this case 1 — and this means there will always be at least one running EC2 instance, in this case the cat pictures blog. We can also set a desired capacity, in this example 2, and this will add another instance if a desired capacity is set which is higher than the current number of instances. If this is the case, then instances are added. Finally, we could set the maximum size — in this case to 4 — which means that two additional instances could be provisioned, but they won't immediately be because the desired capacity is only set to 2 and there are currently two running instances.
We could manually adjust the desired capacity up or down to add or remove instances which would automatically be built based on the launch template or launch configuration. Alternatively, we could use scaling policies to automate that process and scale in or out based on sets of criteria.
Architecturally, auto scaling groups define where instances are launched. They're linked to a VPC, and subnets within that VPC are configured on the auto scaling group. Whatever subnets are configured will be used to provision instances into. When instances are provisioned, there's an attempt to keep the number of instances within each availability zone even. So in this case, if the auto scaling group was configured with three subnets and the desired capacity was also set to three, then it's probable each subnet would have one EC2 instance running within it — but this isn't always the case. The auto scaling group will try and level capacity where available.
Scaling policies are essentially rules — rules which you define which can adjust the values of an auto scaling group — and there are three ways that you can scale auto scaling groups. The first is not really a policy at all — it's just to use manual scaling, and I just talked about doing that. This is where you manually adjust the values at any time and the auto scaling group handles any provisioning or termination that's required.
Next there's scheduled scaling, which is great for sale periods where you can scale out the group when you know there's going to be additional demand or when you know a system won't be used so you can scale in outside of business hours. Scheduled scaling adjusts the desired capacity based on schedules, and this is useful for any known periods of high or low usage. For the exam, if you have known periods of usage, then scheduled scaling is going to be a great potential answer.
Then we have dynamic scaling, and there are three subtypes. What they all have in common is they are rules which react to something and change the values on an auto scaling group. The first is simple scaling — and this, well, it's simple. This is most commonly a pair of rules — one to provision instances and one to terminate instances. You define a rule based on a metric, and an example of this is CPU utilization. If the metric, for example CPU utilization, is above 50% then adjust the desired capacity by adding one, and if the metric is below 50% then remove one from the desired capacity. Using this method you can scale out (meaning adding instances) or scale in (meaning terminating instances) based on the value of a metric.
Now, this metric isn't limited to CPU — it can be many other metrics including memory or disk input/output. Some metrics need the CloudWatch agent to be installed. You can also use some metrics not on the EC2 instances — for example, maybe the length of an SQS queue (which we'll cover elsewhere in the course) or a custom performance metric within your application such as response time.
We also have stepped scaling, which is similar, but you define more detailed rules, and this allows you to act depending on how out of normal the metric value is. So maybe add one instance if the CPU usage is above 50%, but if you have a sudden spike of load maybe add three if it's above 80%, and the same could happen in reverse. Step scaling allows you to react quicker the more extreme the change in conditions. Step scaling is almost always preferable to simple — except when your only priority is simplicity.
And then lastly we have target tracking, and this takes a slightly different approach — it lets you define an ideal amount of something, say 40% aggregate CPU, and then the group will scale as required to stay at that level, provisioning or terminating instances to maintain that desired amount or that target amount. Not all metrics work for target tracking, but some examples of ones that are supported are average CPU utilization, average network in, average network out, and the one that's relevant to application load balancers — request count per target.
Now lastly there's a configuration on an auto scaling group called a cooldown period, and this is a value in seconds. It controls how long to wait at the end of a scaling action before doing another. It allows auto scaling groups to wait and review chaotic changes to a metric and can avoid costs associated with constantly adding or removing instances — because remember, there is a minimum billable period since you'll be billed for at least the minimum time every time an instance is provisioned, regardless of how long you use it for.
Now auto scaling groups also monitor the health of instances that they provision. By default, this uses the EC2 status checks. So if an EC2 instance fails, EC2 detects this, passes this on to the auto scaling group, and then the auto scaling group terminates the EC2 instance — then it provisions a new EC2 instance in its place. This is known as self-healing, and it will fix most problems isolated to a single instance. The same would happen if we terminated an instance manually — the auto scaling group would simply replace it.
Now there's a trick with EC2 and auto scaling groups — if you create a launch template which can automatically build an instance, then create an auto scaling group using that template, set the auto scaling group to use multiple subnets in different availability zones, then set the auto scaling group to use a minimum of one, a maximum of one, and a desired of one, then you have simple instance recovery. The instance will recover if it's terminated or if it fails. And because auto scaling groups work across availability zones, the instance can be reprovisioned in another availability zone if the original one fails. It's cheap, simple, and effective high availability.
Now auto scaling groups are really cool on their own, but their real power comes from their ability to integrate with load balancers. Take this example: Bob is browsing to the cat blog that we've been using so far, and he's now connecting through a load balancer. And the load balancer has a listener configured for the blog and points at a target group. Instead of statically adding instances or other resources to the target group, then you can use an auto scaling group configured to integrate with the target group.
As instances are provisioned within the auto scaling group, then they're automatically added to the target group of that load balancer. And then as instances are terminated by the auto scaling group, then they're removed from that target group. This is an example of elasticity because metrics which measure load on a system can be used to adjust the number of instances. These instances are effectively added as load balancer targets, and any users of the application, because they access via the load balancer, are abstracted away from the individual instances and they can use the capacity added in a very fluid way.
And what's even more cool is that the auto scaling group can be configured to use the load balancer health checks rather than EC2 status checks. Application load balancer checks can be much richer — they can monitor the state of HTTP or HTTPS requests. And because of this, they're application aware, which simple status checks which EC2 provides are not.
Be careful though — you need to use an appropriate load balancer health check. If your application has some complex logic within it and you're only testing a static HTML page, then the health check could respond as okay even though the application might be in a failed state. And the inverse of this — if your application uses databases and your health check checks a page with some database access requirements — well, if the database fails then all of your health checks could fail, meaning all of your EC2 instances will be terminated and reprovisioned when the problem is with the database, not the instances. And so you have to be really careful when it comes to setting up health checks.
Now the next thing I want to talk about is scaling processes within an auto scaling group. So you have a number of different processes or functions performed by the auto scaling group, and these can be set to either be suspended or they can be resumed.
So first we've got launch and terminate, and if launch is set to suspend, then the auto scaling group won't scale out if any alarms or schedule actions take place. And the inverse is if terminate is set to suspend, then the auto scaling group will not terminate any instances. We've also got add to load balancer, and this controls whether any instances provisioned are added to the load balancer. Next we've got alarm notification, and this controls whether the auto scaling group will react to any CloudWatch alarms. We've also got AZ rebalance, and this controls whether the auto scaling group attempts to redistribute instances across availability zones.
We've got health check, and this controls whether instance health checks across the entire group are on or off. We've also got replace unhealthy, which controls whether the auto scaling group will replace any instances marked as unhealthy. We've got scheduled actions, which controls whether the auto scaling group will perform any scheduled actions or not. And then in addition to those, you can set a specific instance to either be standby or in service, and this allows you to suspend any activities of the auto scaling group on a specific instance.
So this is really useful — if you need to perform maintenance on one or more EC2 instances, you can set them to standby, and that means they won't be affected by anything that the auto scaling group does.
Now before we finish, I just want to talk about a few final points, and these are really useful for the exam. Auto scaling groups are free — the only costs are for the resources created by the auto scaling group, and to avoid excessive costs, use cooldowns within the auto scaling group to avoid rapid scaling.
To be cost-effective, you should also think about using more smaller instances, because this means you have more granular control over the amount of compute and therefore costs that are incurred by your auto scaling group. So if you have two larger instances and you need to add one, that's going to cost you a lot more than if you have 20 smaller instances and only need to add one. Smaller instances mean more granularity, which means you can adjust the amount of compute in smaller steps, and that makes it a more cost-effective solution.
Now auto scaling groups are used together with application load balancers for elasticity, so the load balancer provides the level of abstraction away from the instances provisioned by the auto scaling group, so together they're used to provision elastic architectures.
And lastly, an auto scaling group controls the when and the where — so when instances are launched and which subnets they're launched into. Launch templates or launch configurations define the what — so what instances are launched and what configuration those instances have.
Now at this point, that's everything I wanted to cover in this lesson — it's been a huge amount of theory for one lesson, but these are really essential concepts that you need to understand for the exam. So go ahead and complete this lesson, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover two features of EC2, launch configurations and launch templates; now they both perform a similar thing, but launch templates came after launch configurations and include extra features and capabilities. Now I want this lesson to be fairly brief, as launch configurations and launch templates are actually relatively easy to understand, and what we're going to be covering in the next lesson is auto scaling groups which utilize either launch configurations or launch templates. So I'll try to keep this lesson as focused as possible, but let's jump in and get started.
Launch configurations and launch templates at a high level perform the same task: they allow the configuration of EC2 instances to be defined in advance, with documents which let you configure things like the AMI to use, the instance type and size, the configuration of the storage which instances use, and the key pair which is used to connect to that instance. They also let you define the networking configuration and security groups that an instance uses, configure the user data which is provided to the instance, and the IAM role which is attached to the instance used to provide the instance with permissions. Everything which you usually define at the point of launching an instance, you can define in launch configurations and launch templates.
Now both of these are not editable; you define them once and that configuration is locked. Launch templates, as the newer of the two, allow you to have versions, but for launch configurations, versions aren't available. Launch templates also have additional features or allow you to control features of the newer types of instances—things like T2 or T3 unlimited CPU options, placement groups, capacity reservations, and things like elastic graphics.
AWS recommend using launch templates at this point in time because they're a super set of launch configuration; they provide all of the features that launch configuration provides and more. Architecturally, launch templates also offer more utility. Launch configurations have one use: they're used as part of auto scaling groups which we'll be talking about later in this section. Auto scaling groups offer automatic scaling for EC2 instances, and launch configurations provide the configuration of those EC2 instances which will be launched by auto scaling groups. And as a reminder, they're not editable nor do they have any versioning capability; if you need to adjust the configuration inside a launch configuration, you need to create a new one and use that new launch configuration.
Now launch templates—they can also be used for the same thing, so providing EC2 configuration which is used within auto scaling groups—but in addition, they can also be used to launch EC2 instances directly from the console or the CLI, so good old Bob can define his instance configuration in advance and use that when launching EC2 instances.
Now you'll get the opportunity to create and use launch templates in the series of demo lessons later in this section; for now, I just wanted to cover all of the theory back to back so you can appreciate how it all fits together.
That's everything though that I wanted to cover in this lesson about launch configurations and launch templates. In the next lesson, I'll be talking about auto scaling groups which are closely related; both of them work together to allow EC2 instances to scale in response to the incoming load on a system. But for now, go ahead and finish this video and when you're ready, I look forward to speaking to you in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover application and network load balances in a little bit more detail. It's critical for the exam that you understand when to pick application load balances and when to pick network load balances, as they're both used for massively different situations. Now, we do have a lot to cover, so let's jump in and get started.
I want to start by talking about consolidation of load balances. Historically, when using classic load balances, you connected instances directly to the load balancer or you integrated an auto scaling group directly with that load balancer, an architecture which looked something like this: a single domain name, categor.io, using a single classic load balancer with an attached single SSL certificate for that domain, and then an auto scaling group is attached to that, with the classic load balancer distributing connections over those instances.
The problem is that this doesn't scale because classic load balancers don't support SNI, and you can't have multiple SSL certificates per load balancer, meaning every single unique HTTPS application that you have requires its own classic load balancer, which is one of the many reasons that classic load balancers should be avoided. In this example, we have Catergram and Dogagram, both of which are HTTPS applications, and the only way to use these is to have two different classic load balancers.
Compare this to the same application architecture, with both applications—Catergram and Dogagram—this time using a single application load balancer. This one handles both applications, and we can use listener-based rules, which I’ll talk about later in the lesson, where each of these listener-based rules can have an SSL certificate handling HTTPS for both domains. Then we can have host-based rules which direct incoming connections at multiple target groups that forward these on to multiple auto scaling groups, which is a two-to-one consolidation—halving the number of load balancers required to deliver these two different applications.
But imagine how this would look if we had a hundred legacy applications and each of these used a classic load balancer; moving from version one to version two offers significant advantages, and one of those is consolidation. So now I just want to focus on some of the key points about application load balancers—things which are specific to the version two or application load balancer.
First, it's a true layer seven load balancer and it's configured to listen on either HTTP or HTTPS protocols, which are layer seven application protocols that an application load balancer understands and can interpret information carried using both. Now, the flip side is that the application load balancer can't understand any other layer seven protocols—so things such as SMTP, SSH, or any custom gaming protocols are not supported by a layer seven load balancer like the application load balancer, and that's important to understand.
Additionally, the application load balancer has to listen using HTTP or HTTPS listeners; it cannot be configured to directly listen using TCP, UDP, or TLS, and that does have some important limitations and considerations that you need to be aware of, which I’ll talk about later on in this lesson.
Because it's a layer seven load balancer, it can understand layer seven content—so things like the type of the content, any cookies used by your application, custom headers, user location, and application behavior—meaning the application load balancer is able to inspect all of the layer seven application protocol information and make decisions based on that, something that the network load balancer cannot do. It has to be a layer seven load balancer, like the application load balancer, to understand all of these individual components.
An important consideration about the application load balancer is that any incoming connections—HTTP or HTTPS (and remember HTTPS is just HTTP transiting using SSL or TLS)—in all of these cases, whichever type of connection is used, are terminated on the application load balancer. This means that you can't have an unbroken SSL connection from your customer through to your application instances—it’s always terminated on the load balancer, and then a new connection is made from the load balancer through to the application.
This matters to security teams, and if your business operates in a strict security environment, this might be very important and, in some cases, can exclude using an application load balancer. It can't do end-to-end unbroken SSL encryption between a client and your application instances, and it also means that all application load balancers which use HTTPS must have SSL certificates installed on the load balancer, because the connection has to be terminated there and then a new connection made to the instances.
Application load balancers are also slower than network load balancers because additional levels of the networking stack need to be processed, and the more levels involved, the more complexity and the slower the processing. So if you're facing any exam questions that are really strict on performance, you might want to look at network load balancers instead.
A benefit of application load balancers is that, because they're layer seven, they can evaluate the application health at layer seven—in addition to just testing for a successful network connection, they can make an application layer request to the application to ensure that it's functioning correctly.
Application load balancers also have the concept of rules, which direct connections arriving at a listener—so if you make a connection to a load balancer, what it does with that connection is determined by rules, which are processed in priority order. You can have many rules affecting a given set of traffic, and they’re processed in priority order, with the last one being the default catch-all rule, though you can add additional rules, each of which can have conditions.
Conditions inside rules include checking host headers, HTTP headers, HTTP request methods, path patterns, query strings, and even source IP, meaning these rules can take different actions depending on the domain name requested (like categor.io or dogogram.io), the path (such as images or API), the query string, or even the source IP address of any customers connecting to that application load balancer.
Rules can also have actions—these are the things the rules do with traffic: they can forward that traffic to a target group, redirect it to something else (maybe another domain name), provide a fixed HTTP response (like an error or success code), or perform authentication using OpenID or Cognito.
Visually, this is how it looks: a simple application load balancer deployment with a single domain, categor.io, using one host-based rule with an attached SSL certificate. The rule uses host header as a condition and forward as an action, forwarding any connections for categor.io to the target group for the categor application.
If you want additional functionality, let’s say that you want to use the same application load balancer for a corporate client trying to access categor.io—maybe users of Bowtie Incorporated using the 1.3.3.7 IP address are attempting to access it, and you want to present them with an alternative version of the application. You can handle that by defining a listener rule where the condition is the source IP address of 1.3.3.7, and the action forwards traffic to a separate target group—an auto scaling group handling a second set of instances dedicated to this corporate client.
Because the application load balancer is a layer seven device, it can see inside the HTTP protocol and make decisions based on anything in that protocol or up to layer seven. Also, the connection from the load balancer to the instances for target group two will be a separate set of connections—highlighted by a slightly different color—because HTTP connections from enterprise users are terminated on the load balancer, with a separate connection to the application instances. There’s no option to pass through encrypted connections to the instances—it must be terminated—so if you need unbroken encrypted connections, you must use a network load balancer.
Since it’s a layer seven load balancer, you can use rules that work on layer seven protocol elements, like routing based on paths or headers, or redirecting traffic at the HTTP level. For example, if this ALB also handles traffic for dogogram.io, you could define a rule that matches dogogram.io and, as an action, configure a redirect toward categor.io—the obviously superior website. These are just a small subset of features available within the application load balancer, and because it's layer seven, you can perform routing decisions based on anything observable at that level, making it a really flexible product.
Before finishing, let’s take a look at network load balancers. They function at layer four, meaning they can interpret TCP, TLS, and UDP protocols, but have no visibility or understanding of HTTP or HTTPS. They can't interpret headers, see cookies, or handle session stickiness from an HTTP perspective, as that requires cookie awareness—which a layer four device doesn’t have.
Network load balancers are incredibly fast, capable of handling millions of requests per second with about 25% of the latency of application load balancers, since they don't deal with upper layers of the stack. They’re ideal for non-HTTP or HTTPS protocols—such as SMTP (email), SSH, game servers, or financial applications that don’t use web protocols.
If exam questions refer to non-web or non-secure web traffic that doesn’t use HTTP/HTTPS, default to network load balancers. One downside is that health checks only verify ICMP and basic TCP handshaking, not application awareness, so no detailed health checking is possible.
A key benefit is that they can be allocated static IPs, which is useful for white-listing—corporate clients can white-list NLB IPs to let them pass through firewalls, which is great for strict security environments. They can also forward TCP directly to instances, and because network layers build on top of each other, the network load balancer doesn’t interrupt any layers above TCP, allowing unbroken encrypted channels from clients to application instances.
This is essential to remember for the exam—using network load balancers with TCP listeners is how you achieve end-to-end encryption. They're also used for PrivateLink to provide services to other VPCs—another crucial exam point.
To wrap up, let’s do a quick comparison. I find it easier to remember when to use a network load balancer, and if it’s not one of those cases, default to application load balancers for their added flexibility. Use network load balancers if you need unbroken encryption between client and instance, static IPs for white-listing, the best performance (millions of RPS and low latency), non-HTTP/HTTPS protocols, or PrivateLink.
For everything else, use application load balancers—their functionality is often valuable in most scenarios. That’s everything I wanted to cover about application and network load balancers for the exam. Go ahead and complete this video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. This is part two of this lesson, and we're going to continue immediately from the end of part one, so let's get started.
This time, we have a typical multi-tiered application. We start with a VPC and inside that two availability zones. On the left, we also have an Internet Facing Low Balancer. Then we have a Web Instance Auto Scaling Group providing the front-end capability of the application. Then we have another Low Balancer, this time an internal Low Balancer, with only private IP addresses allocated to the nodes. Next, we have an Auto Scaling Group for the application instances. These are used by the web servers for the application. Then on the right, we have a pair of database instances. In this case, let's assume they're both Aurora database instances. So we have three tiers, Web, Application and Database.
Now, without Load Balancers, everything would be tied to everything else. Our user, Bob, would have to communicate with a specific instance in the web tier; if this failed or scaled, then Bob's experience would be disrupted. The instance that Bob is connected to would itself connect to a specific instance in the application tier, and if that instance failed or scaled, then again, Bob's experience would be disrupted.
What we can do to improve this architecture is to put Load Balancers between the application tiers to abstract one tier from another. And how this changes things is that Bob actually communicates with an ELB node, and this ELB node sends this connection through to a particular web server. But Bob has no knowledge of which web server he's actually connected to because he's communicating via a Load Balancer. If instances are added or removed, then he would be unaware of this fact because he's abstracted away from the physical infrastructure by the Load Balancer.
Now, the web instance that Bob is using would need to communicate with an instance of the application tier, and it would do this via an internal Load Balancer. And again, this represents an abstraction of communication. So in this case, the web instance that Bob is connected to isn't aware of the physical deployment of the application tier; it's not aware of how many instances exist, nor which one it's actually communicating with. And then at this point, to complete this architecture, the application server that's being used would use the database tier for any persistent data storage needs.
Now, without using Load Balancers with this architecture, all the tiers are tightly coupled together. They need an awareness of each other. Bob would be connecting to a specific instance in the web tier; this would be connecting to a specific instance in the application tier, and all of these tiers would need to have an awareness of each other. Load Balancers remove some of this coupling; they loosen the coupling. And this allows the tiers to operate independently of each other because of this abstraction. Crucially, it allows the tiers to scale independently of each other.
In this case, for example, it means that if the load on the application tier increased beyond the ability of two instances to service that load, then the application tier could grow independently of anything else—in this case scaling from two to four instances. The web tier could continue using it with no disruption or reconfiguration because it's abstracted away from the physical layout of this tier, because it's communicating via a Load Balancer. It has no awareness of what's happening within the application tier.
Now, we're going to talk about these architectural implications in depth later on in this section of the course. But for now, I want you to be aware of the architectural fundamentals. And one other fundamental that I want you to be completely comfortable with is cross zone load balancing, and this is a really essential feature to understand.
So let's look at an example visually: Bob accessing a WordPress blog, in this case, The Best Cats. And we can assume because this is a really popular and well-architected application that it's going to be using a load balancer. So Bob uses his device and browsers to the DNS name for the application, which is actually the DNS name of the load balancer.
We know now that a load balancer by default has at least one node per availability zone that it's configured for. So in this example, we have a cut down version of the Animals for Life VPC, which is using two availability zones. So in this case, an application load balancer will have a minimum of two nodes, one in each availability zone. And the DNS name for the load balancer will direct any incoming requests equally across all of the nodes of the load balancer.
So in this example, we have two nodes, one in each availability zone. Each of these nodes will receive a portion of incoming requests based on how many nodes there are. For two nodes, it means that each node gets 100% divided by two, which represents 50% of the load that's directed at each of the load balancer nodes.
Now, this is a simple example. In production situations, you might have more availability zones being used, and at higher volume, so higher throughput, you might have more nodes in each availability zone. But this example keeps things simple. So however much incoming load is directed at the load balancer DNS name, each of the load balancer nodes will receive 50% of that load.
Now, originally load balancers were restricted in terms of how they could distribute the connections that they received. Initially, the way that it worked is that each load balancer node could only distribute connections to instances within the same availability zone.
Now, this might sound logical, but consider this architecture where we have four instances in availability zone A and one instance in availability zone B. This would mean that the load balancer node in availability zone A would split its incoming connections across all instances in that availability zone, which is four ways. And the node in availability zone B would also split its connections up between all the instances in the same availability zone. But because there's only one, that would mean 100% of its connections to the single EC2 instance.
Now, with this historic limitation, it means that node A would get 50% of the overall connections and would further split this down four ways, which means each instance would be allocated 12.5% of the overall load. Node B would also receive 50% of the overall load, and normally it would split that down across all instances also in that same availability zone. But because there's only one, that one instance would get 100% of that 50%. So all of the instances in availability zone A would receive 12.5% of the overall load, and the instance in availability zone B would receive 50% of the overall load.
So this represents a substantially uneven distribution of the incoming load because of this historic limitation of how load balancer nodes could distribute traffic. And the fix for that was a feature known as cross zone load balancing.
Now, the name gives away what this does. It simply allows every load balancer node to distribute any connections that it receives equally across all registered instances in all availability zones. So in this case, it would mean that the node in availability zone A could distribute connections to the instance in AZB, and the node in AZB could distribute connections to instances in AZA. And this represents a much more even distribution of incoming load. And this is known as cross zone load balancing, the ability to distribute or load balance across availability zones.
Now, this is a feature which originally was not enabled by default. But if you're deploying an application load balancer, this comes enabled as standard. But you still need to be aware of it for the exam because it's often posed as a question where you have a problem—an uneven distribution of load—and you need to fix it by knowing that this feature exists. So it's really important that you understand it for the exam.
So before we finish up with this lesson, I just want to reconfirm the most important architectural points about elastic load balancers. If there are only a few things that you take away from this lesson, these are the really important points.
Firstly, when you provision an elastic load balancer, you see it as one device which runs in two or more availability zones, specifically one subnet in each of those availability zones. But what you're actually creating is one elastic load balancer node in one subnet in each availability zone that that load balancer is configured in. You're also creating a DNS record for that load balancer which spreads the incoming requests over all of the active nodes for that load balancer. Now you start with a certain number of nodes, let's say one node per availability zone, but it will scale automatically if additional load is placed on that load balancer.
Remember by default, cross-own load balancing means that nodes can distribute requests across to other availability zones, but historically this was disabled, meaning connections potentially would be relatively imbalanced. But for application load balancers, cross-own load balancing is enabled by default.
Now load balancers come in two types. Internet facing, which just means that the nodes are allocated with public IP version 4 addresses. That's it. It doesn't change where the load balancer is placed, it just influences the IP addressing for the nodes of that load balancer. Internal load balancers are the same, only their nodes are only allocated private IP addresses.
Now one of the most important things to remember about load balancers is that an internet facing load balancer can communicate with public instances or private instances. EC2 instances don't need public IP addressing to work with an internet facing load balancer. An internet facing load balancer has public IP addresses on its nodes, it can accept connections from the public internet and balance these across both public and private EC2 instances. That's really important to understand for the exam, so you don't actually need public instances to utilize an internet facing load balancer.
Now load balancers are configured via listener configuration, which as the name suggests controls what those load balancers listen to. And again, I'll be covering this in much more detail later on in this section of the course.
And then lastly, remember the confusing part about load balancers: they require eight or more free IP addresses per subnet that they get deployed into. Strictly speaking, this means that a /28 subnet would be enough, but the AWS documentation suggests a /27 in order to allow scaling.
For now, that's everything that I wanted to cover, so go ahead and complete this lesson. And then when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson, I want to talk about the architecture of elastic load balancers. Now I'm going to be covering load balancers extensively in this part of the course, so I want to use this lesson as a sort of foundation. I'm going to cover the high level logical and physical architecture of the product and either refresh your memory on some things or introduce some of the finer points of load balancing for the first time, and both of these are fine.
Now, before we start, it's the job of a load balancer to accept connections from customers and then to distribute those connections across any registered backend compute; it means that the user is abstracted away from the physical infrastructure, and it means that the amount of infrastructure can change, so increase or decrease in number without affecting customers. And because the physical infrastructure is abstracted, it means that infrastructure can fail and be repaired, all of which is hidden from customers.
So with that quick refresher done, let's jump in and get started covering the architecture of elastic load balancers. Now I'm going to be stepping through some of the key architectural points visually, so let's start off with a VPC, which uses two availability zones, AZA and AZB, and then in those availability zones, we've got a few subnets, two public and some private. Now let's add a user, Bob, together with a pair of load balancers.
Now, as I just mentioned, it's the job of a load balancer to accept connections from a user base and then distribute those connections to backend services; for this example, we're going to assume that those services are long running compute or EC2, but as you'll see later in this section, that doesn't have to be the case, as elastic load balancers, specifically application load balancers, support many different types of compute services—it's not only EC2.
Now, when you provision a load balancer, you have to decide on a few important configuration items: the first, you need to pick whether you want to use IP version four only or dual stack, and dual stack just means using IP version four and the newer IP version six. You also need to pick the availability zones which the load balancer will use; specifically, you're picking one subnet in two or more availability zones.
Now, this is really important because this leads in to the architecture of elastic load balancers, so how they actually work; based on the subnets that you pick inside availability zones, when you provision a load balancer, the product places into these subnets one or more load balancer nodes. So what you see as a single load balancer object is actually made up of multiple nodes, and these nodes live within the subnets that you pick.
So when you're provisioning a load balancer, you need to select which availability zones it goes into, and the way you do this is by picking one and one only subnet in each of those availability zones. So in the example that's on screen now, I've picked to use the public subnet in availability zone A and availability zone B, and so the product has deployed one or more load balancer nodes into each of those subnets.
Now when a load balancer is created, it actually gets created with a single DNS record; it's an A record, and this A record actually points at all of the elastic load balancer nodes that get created with the product. So any connections that are made using the DNS name of the load balancer are actually made to the nodes of that load balancer, and the DNS name resolves to all of the individual nodes.
It means that any incoming requests are distributed equally across all of the nodes of the load balancer, and these nodes are located in multiple availability zones and they scale within that availability zone, and so they're highly available. If one node fails, it's replaced; if the incoming load to the load balancer increases, then additional nodes are provisioned inside each of the subnets that the load balancer is configured to use.
Now another choice that you need to make when creating a load balancer, and this is really important for the exam, is to decide whether that load balancer should be internet facing or whether it should be internal; this choice, so whether to use internet facing or internal, controls the IP addressing for the load balancer nodes. If you pick internet facing, then the nodes of that load balancer are given public addresses and private addresses; if you pick internal, then the nodes only have private IP addresses.
So that's the only difference; otherwise, they're the same architecturally—they have the same nodes and the same load balancer features, and the only difference between internet facing and internal is whether the nodes are allocated public IP addresses. Now the connections from our customers which arrive at the load balancer nodes, the configuration of how that's handled is done using a listener configuration; as the name suggests, this configuration controls what the load balancer is listening to—so what protocols and ports will be accepted at the listener or front side of the load balancer?
Now there's a dedicated lesson coming up later in this section which focuses specifically on the listener configuration; at this point, I just wanted to introduce it. So at this point, Bob has initiated connections to the DNS name associated with the load balancer, and that means that he's made connections to load balancer nodes within our architecture.
Now at this point, the load balancer nodes can then make connections to instances that are registered with this load balancer, and the load balancer doesn't care about whether those instances are public EC2 instances—so allocated with a public IP address—or they're private EC2 instances—so instances which reside in a private subnet and only have private addressing.
I want to keep reiterating this because it's often a point of confusion for students who are new to load balancers: an internet-facing load balancer—and remember this means that it has nodes that have public addresses so it can be connected to from the public internet—can connect both to public and private EC2 instances. Instances that are used do not have to be public.
Now this matters because in the exam, when you face certain questions which talk about how many subnets or how many tiers are required for an application, it does test your knowledge that an internet-facing load balancer does not need private or public instances—it can work with both of those. The only requirement is that load balancer nodes can communicate with the back-end instances, and this can happen whether the instances have allocated public addressing or whether they're private only instances.
The important thing is that if you want a load balancer to be reachable from the public internet, it has to be an internet-facing load balancer because logically it needs to be allocated with public addressing. Now load balancers in order to function need eight or more free IP addresses in the subnets that they're deployed into; now strictly speaking, this means a /28 subnet, which provides a total of 16 IP addresses, but minus the five reserved by AWS, this leaves 11 free per subnet.
But AWS suggests that you use a /27 or larger subnet to deploy an elastic load balancer in order that it can scale. Keep in mind that strictly speaking, both a /28 and /27 subnets are both correct in their own ways to represent the minimum subnet size for a load balancer; AWS do suggest in their documentation that you need a /27, but they also say you need a minimum of eight free IP addresses.
Now logically, a /28, which leaves 11 free, won't give you the room to deploy a load balancer and back end instances, so in most cases, I try to remember /27 as the correct value for the minimum for a load balancer. But if you do see any questions which show a /28 and don't show a /27, then /28 is probably the right answer.
Now internal load balancers are architecturally just like internet facing load balancers, except they only have private IPs allocated to their nodes, and so internal load balancers are generally used to separate different tiers of applications. So in this example, our user Bob connects via the internet facing load balancer to the web server, and then this web server can connect to an application server via an internal load balancer, and this allows us to separate application tiers and allow for independent scaling.
So let's look at this visually. Okay, so this is the end of part one of this lesson; it was getting a little bit on the long side and I wanted to give you the opportunity to take a small break, maybe stretch your legs or make a coffee. Now part two will continue immediately from this point, so go ahead, complete this video, and when you're ready, I'll look forward to you joining me in part two.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to spend a few minutes just covering the evolution at the Elastic Load Balancer product; it's important for the exam and real world usage that you understand its heritage and its current state. Now this is going to be a super quick lesson because most of the detail I'm going to be covering in dedicated lessons which are coming up next in this section of the course, so let's jump in and take a look.
Now there are currently three different types of Elastic Load Balancers available within AWS; if you see the term ELB or Elastic Load Balancers then it refers to the whole family, all three of them. Now the load balancers are split between version 1 and version 2; you should avoid using the version 1 load balancer at this point and aim to migrate off them onto version 2 products which should be preferred for any new deployments, and there are no scenarios at this point where you would choose to use a version 1 load balancer versus one of the version 2 types.
Now the load balancer product started with the classic load balancer known as CLB which is the only version 1 load balancer and this was introduced in 2009, so it's one of the older AWS products. Now classic load balancers can load balance HTTP and HTTPS as well as lower level protocols but they aren't really layer 7 devices, they don't really understand HTTP and they can't make decisions based on HTTP protocol features; they lack much of the advanced functionality of the version 2 load balancers and they can be significantly more expensive to use.
One common limitation is that classic load balancers only support one SSL certificate per load balancer which means for larger deployments you might need hundreds or thousands of classic load balancers and these could be consolidated down to a single version 2 load balancer, so I can't stress this enough for any questions or any real world situations you should default to not using classic load balancers.
Now this brings me on to the new version 2 load balancers; the first is the application load balancer or ALB and these are truly layer 7 devices so application layer devices, they support HTTP, HTTPS and the web socket protocols, and they're generally the type of load balancer that you'd pick for any scenarios which use any of these protocols.
There's also network load balancers or NLBs which are also version 2 devices but these support TCP, TLS which is a secure form of TCP and UDP protocols, so network load balancers are the type of load balancer that you would pick for any applications which don't use HTTP or HTTPS; for example if you wanted to load balance email servers or SSH servers or a game which used a custom protocol so didn't use HTTP or HTTPS then you would use a network load balancer.
In general version 2 load balancers are faster and support target groups and rules which allow you to use a single load balancer for multiple things or handle the load balancing different based on which customers are using it. Now I'm going to be covering the capabilities of each of the version 2 load balancers separately as well as talking about rules but I wanted to introduce them now as a feature.
Now for the exam you really need to be able to pick between network load balancers or application load balancers for a specific situation so that's what I want to work on over the coming lessons; for now though this has just been an introduction lesson that talks about the evolution of these products and that's everything that I wanted to cover in this lesson, so go ahead complete lesson and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I want to talk about the regional and global AWS architecture, so let's jump in and get started. Now throughout this lesson, I want you to think about an application that you're familiar with, which is global, and for this example, I'll be talking about Netflix, because this is an application that most people have at least heard of. Now, Netflix can be thought of as a global application, but it's also a collection of smaller regional applications which make up the Netflix global platform, so these are discrete blocks of infrastructure which operate independently and are duplicated across different regions around the world.
As a solutions architect, when we're designing solutions, I find that there are three main types of architectures: small scale architectures which will only ever exist in one region or one country, then systems which also exist in one region or country but where there's a DR requirement so if that region fails for some reason, then it fails over to a second region, and then lastly, systems that operate within multiple regions and need to operate through failure in one or more of those regions.
Now, depending on how you architect systems, there are a few major architectural components which will map onto AWS products and services, so at a global level, first we have global service location and discovery—when you type Netflix.com into your browser, what happens and how does your machine discover where to point at?
Next, we've got content delivery—how does the content or data for an application get to users globally, and are there pockets of storage distributed globally or is it pulled from a central location?
Lastly, we've got global health checks and failover—detecting if infrastructure in one location is healthy or not and moving customers to another country as required, so these are the global components.
Next, we have regional components starting with the regional entry point, then we have regional scaling and regional resilience and then the various application services and components, so as we go through the rest of the course, we're going to be looking at specific architectures and as we do, I want you to think about them in terms of global and regional components— which parts can be used for global resilience and which parts are local only.
So let's take a look at this visually starting with the global elements, and let's keep using Netflix as an example and say that we have a group of users who are starting to settle down for the evening and want to watch the latest episode of Ozarks, so the Netflix client will use DNS for the initial service discovery, and Netflix will have configured the DNS to point at one or more service endpoints.
Let's keep things simple at this point and assume that there is a primary location for Netflix in a US region of AWS, maybe US East One, and this will be used as the primary location and if this fails, then Australia will be used as a secondary. Now, another valid configuration would be to send customers to their nearest location—in this case, sending our TV fans to Australia—but in this case, let's just assume we have a primary and a secondary region.
So this is the DNS component of this architecture and Route 53 is the implementation within AWS, and because of its flexibility, it can be configured to work in any number of ways. The key thing for this global architecture, though, is that it has health checks, so it can determine if the US region is healthy and direct all sessions to the US while this is the case or direct sessions to Australia if there are problems with the primary region.
Now regardless of where infrastructure is located, a content delivery network can be used at the global level, and this ensures that content is cached locally as close to customers as possible, and these cache locations are located globally and they all pull content from the origin location as required.
So just to pause here briefly, this is a global perspective—the function of the architecture at this level is to get customers through to a suitable infrastructure location, making sure any regional failures are isolated and sessions moved to alternative regions, and it attempts to direct customers at a local region, at least if the business has multiple locations, and lastly, it attempts to improve caching using a content delivery network such as CloudFront.
If this part of our architecture works well, customers will be directed towards a region that has infrastructure for our application, and let's assume this region is one of the US ones. At this point, the traffic is entering one specific region of the AWS infrastructure, and depending on the architecture, this might be entering into a VPC or using public space AWS services, but in either case, now we're architecturally zoomed in, and so we have to think about this architecture now in a regional sense.
The most effective way to think about systems architecture is a collection of regions making up a whole—if you think about AWS products and services, very few of them are actually global, most of them run in a region, and many of those regions make up AWS.
Now it's efficient to think in this way, and it makes designing a large platform much easier. For the remainder of this course, we're going to be covering architecture in depth—how things work, how things integrate, and what features products provide.
Now the environments that you will design will generally have different tiers, and tiers in this context are high level groupings of functionality or different zones of your application.
Initially, communications from your customers will generally enter at the web tier—generally this will be a regional based AWS service such as an application load balancer or API gateway, depending on the architecture that the application uses.
The purpose of the web tier is to act as an entry point for your regional based applications or application components, and it abstracts your customers away from the underlying infrastructure, meaning that the infrastructure behind it can scale or fail or change without impacting customers.
Now the functionality provided to the customer via the web tier is provided by the compute tier, using services such as EC2, Lambda, or containers which use the elastic container service, so in this example, the load balancer will use EC2 to provide compute services through to our customers.
Now we'll talk throughout the course about the various different types of compute services which you can and should use for a given situation, but the compute tier though will consume storage services, another part of all AWS architectures, and this tier will use services such as EBS, which is the elastic block store, EFS, which is the elastic file system, and even S3 for things like media storage.
You'll also find that many global architectures utilize CloudFront, the global content delivery network within AWS, and CloudFront is capable of using S3 as an origin for media, so Netflix might store movies and TV shows on S3 and these will be cached by CloudFront.
Now all of these tiers are separate components of an application and can consume services from each other, and so CloudFront can directly access S3 in this case to fetch content for delivery to a global audience.
Now in addition to file storage, most environments require data storage and within AWS this is delivered using products like RDS, Aurora, DynamoDB and Redshift for data warehousing, but in order to improve performance, most applications don't directly access the database—instead, they go via a caching layer, so products like ElastiCache for general caching or DynamoDB Accelerator known as DAX when using DynamoDB.
This way, reads to the database can be minimized, and applications will instead consult the cache first and only if the data isn't present in the cache will the database be consulted and the contents of the cache updated.
Now caching is generally in memory, so it's cheap and fast—databases tend to be expensive based on the volume of data required versus cache and normal data storage, so where possible, you need to offload reads from the database into the caching layer to improve performance and reduce costs.
Now lastly, AWS have a suite of products designed specifically to provide application services—so things like Kinesis, Step Functions, SQS and SNS, all of which provide some type of functionality to applications, either simple functionality like email or notifications, or functionality which can change an application's architecture such as when you decouple components using queues.
Now as I mentioned at the start of this lesson, you're going to be learning about all of these components and how you can use them together to build platforms. For now, just think of this as an introduction lesson—I want you to get used to thinking of architectures from a global and regional perspective as well as understanding that application architecture is generally built using components from all of these different tiers: the web tier, the compute tier, caching, storage, the database tier and application services.
Now at this point, that's all of the theory that I wanted to go through—remember, this is just an introduction lesson, so go ahead, finish this lesson and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video, I want to talk at a high level about the AWS Backup product. Now, this is something that you need to have an awareness of for most of the AWS exams and to get started in the real world, but it's not something that you need to understand in depth for all of the AWS certifications. So let's jump in and cover the important points of the product.
So AWS Backup is a fully managed data protection service. At this level of study, you can think about it as a backup and restore product, but it also includes much more in the way of auditing and management oversight. The product allows you to consolidate the management and storage of your backups in one place, across multiple accounts and multiple regions if you configure it that way, so that's important to understand the product is capable of being configured to operate across multiple accounts. So this utilizes services like Control Tower and organizations to allow this, and it's also capable of copying data between regions to provide extra data protection. But the main day-to-day benefit that the product provides is this consolidation of management and storage within one place, so instead of having to configure backups of RDS in one place, DynamoDB in another, and organize some kind of script to take regular EBS snapshots, AWS Backup can do all this on your behalf.
Now, AWS Backup is capable of interacting with a wide range of AWS products, so many AWS services are fully supported—various compute services, so EC2 and VMware running within AWS, block storage such as EBS, file storage products such as EFS and the various different types of FSX, and then most of the AWS database products are supported such as Aurora, RDS, Neptune, DynamoDB and DocumentDB—and then even object storage is supported using S3. Now, all of these products can be centrally managed by AWS Backup, which means both the storage and the configuration of how the backup and retention operates.
Now, let's step through some of the key concepts and components of the AWS Backup product. First, we have one of the central components, and that's backup plans; it's on these where you can configure the frequency of backups, so how often backups are going to occur every hour, every 12 hours, daily, weekly or monthly. You can also use a cron expression that creates snapshots as frequently as hourly. Now, if you have any business backup experience, you might recognize this—if you select weekly, you can specify which days of the week you want backups to be taken, and if you specify monthly, you can choose a specific day of the month.
Now, you can also enable continuous backups for certain supported products, and this allows you to use a point-in-time restore feature, so if you've enabled continuous backups, then you can restore a supported service to a particular point in time within a window. Now, you can configure the backup window as well within backup plans, so this controls the time that backups begin and the duration of that backup window. You can configure life cycles, which define when a backup is transitioned to cold storage and when it expires—when you transition a backup into cold storage, it needs to be stored there for a minimum of 90 days.
Backup plans also set the vault to use—and more on this in a second—and they allow you to configure region copy, so you can copy backups from one region to another. Next, we have backup resources, and these are logically what is being backed up—so whether you want to back up an S3 bucket or an RDS database, that's what a resource is, what resources you want to back up.
Next, we have vaults, and you can think of vaults as the destination for backups; it's here where all the backup data is stored, and you need to configure at least one of these. Now, vaults by default are read and write, meaning that backups can be deleted, but you can also enable AWS backup vault lock—and this is not to be confused by Glacier or object locking. AWS backup vault lock enables a write once read many, known as WORM mode, for the vault—once enabled, you get a 72-hour cool-off period, but once fully active, nobody, including AWS, can delete anything from the vault, and this is designed for compliance-style situations.
Now, any data retention periods that you set still apply, so backups can age out, but setting this means that it's not possible to bypass or delete anything early, and the product is also capable of performing on-demand backups as required, so you're not limited to only using backup plans. Some services also support a point-in-time recovery method, and examples of this include S3 and RDS, and this means that you can restore to the state of that resource at a specific date and time within the retention window.
Now, with all of these features, the product is constantly evolving, and rather than have this video be out of date the second something changes, I've attached a few links which detail the current state of many of these features, and I'd encourage you to take a look when you want to understand the product's up-to-date capabilities when you're watching this video.
Now, this is all you need to understand as a base foundation for AWS Backup for all of the AWS exams. If you need additional knowledge—so more theory detail in general, perhaps more specialized deep-dive knowledge on the security elements of the product, or maybe some practical knowledge—then there will be additional videos. These will only be present if you need this additional knowledge for the particular course that you're studying—if you only see this video, don't worry, it just means that this is all you need to know.
At this point, though, that is everything I wanted to cover, so go ahead and complete this video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I'm going to be covering a really useful product within AWS, the Elastic File System, or EFS. It's a product which can prove useful for most AWS projects because it provides network-based file systems which can be mounted within Linux EC2 instances and used by multiple instances at once. For the Animals for Life WordPress example that we've been using throughout the course so far, it will allow us to store the media for posts outside of the individual EC2 instances, which means that the media isn't lost when instances are added and removed, and that provides significant benefits in terms of scaling as well as self-healing architecture. In summary, we're moving the EC2 instances to a point where they're closer to being stateless.
So let's jump in and step through the EFS architecture. The EFS service is an AWS implementation of a fairly common shared storage standard called NFS, the Network File System, specifically version 4 of the Network File System. With EFS, you create file systems which are the base entity of the product, and these file systems can be mounted within EC2 Linux instances. Linux uses a tree structure for its file system; devices can be mounted into folders in that hierarchy, and EFS file system, for example, could be mounted into a folder called forward slash NFS forward slash media. What's more impressive is that EFS file systems can be mounted on many EC2 instances, so the data on those file systems can be shared between lots of EC2 instances.
Now keep this in mind as we talk about evolving the architecture of the Animals for Life WordPress platform. Remember, it has a limitation that the media for posts, so images, movies, audio, they're all stored on the local instance itself; if the instance is lost, the media is also lost. EFS storage exists separately from an EC2 instance, just like EBS exists separately from EC2. Now EBS is block storage, whereas EFS is file storage, but Linux instances can mount EFS file systems as though they are connected directly to the instance.
EFS is a private service by default; it's isolated to the VPC that it's provisioned into. Architecturally, access to EFS file systems is via mount targets, which are things inside a VPC, but more on this next when we step through the architecture visually. Now even though EFS is a private service, you can access EFS file systems via hybrid networking methods that we haven't covered yet, so if your VPC is connected to other networks, then EFS can be accessed over those, using VPC peering, VPN connections, or AWS Direct Connect, which is a physical private networking connection between a VPC and your existing on-premises networks. Now don't worry about those hybrid products, I'll be covering all of them in detail later in the course.
For now though, just understand that EFS is accessible outside of a VPC using these hybrid networking products as long as you configure this access. So let's look at the architecture of EFS visually. Architecturally, this is how it looks: EFS runs inside a VPC, in this case the Animals for Life VPC. Inside EFS, you create file systems and these use POSIX permissions. If you don't know what this is, I've included a link attached to the lesson which provides more information. Super summarized though, it's a standard for interoperability which is used in Linux, so a POSIX permissions file system is something that all Linux distributions will understand.
Now the EFS file system is made available inside a VPC via mount targets and these run from subnets inside the VPC. The mount targets have IP addresses taken from the IP address range of the subnet that they're inside, and to ensure high availability, you need to make sure that you put mount targets in multiple availability zones, just like NAT gateways for a fully highly available system; you need to have a mount target in every availability zone that a VPC uses. Now it's these mount targets that instances use to connect to the EFS file systems.
Now it's also possible, as I touched on on the previous screen, that you might have an on-premises network and this generally would be connected to a VPC using hybrid networking products such as VPNs or Direct Connect, and any Linux-based server that's running on this on-premises environment can use this hybrid networking to connect through to the same mount targets and access EFS file systems.
Now before we move on to a demo where you'll get the practical experience of creating a file system and accessing it from multiple EC2 instances, there are a few things about EFS which you should know for the exam. First, EFS is for Linux-only instances; from an official AWS perspective, it's only officially supported using Linux instances. EFS offers two performance modes, general purpose and max IO. General purpose is ideal for latency-sensitive use cases, web servers, content management systems, it can be used for home directories or even general file serving as long as you're using Linux instances. Now general purpose is the default and that's what we'll be using in this section of the course within the demos.
Max IO can scale to higher levels of aggregate throughput and operations per second but it does have a trade-off of increased latencies, so max IO mode suits applications that are highly parallel; so if you've got any applications or any generic workloads such as big data, media processing, scientific analysis, anything that's highly parallel, then it can benefit from using max IO — but for most use cases just go with general purpose.
There are also two different throughput modes, bursting and provisioned. Bursting mode works like GP2 volumes inside EBS so it has a burst pool, but the throughput of this type scales with the size of the file systems, so the more data you store in the file system the better performance that you get. With provisioned, you can specify throughput requirements separately from size, so this is like the comparison between GP2 and IO1; with provisioned, you can specify throughput requirements separate from the amount of data you store, so that's more flexible but it's not the thing that's used by default — generally you should pick bursting.
Now for the exam, you don't need to remember the raw numbers but I have linked some in the lesson description if you want additional information, so you can see the different performance characteristics of all of these different options.
Now Amazon EFS file systems have two storage classes available: we've got infrequent access or IA, and that storage class is a lower cost storage class which is designed for storing things that are infrequently accessed — so if you need to store data in a cost effective way but you don't intend to access it often, then you can use infrequent access. Next we've got standard, and the standard storage class is used to store frequently accessed files; it's also the default and you should consider it the default when picking between the different storage classes. Conceptually, these mirror the trade-offs of the S3 object storage classes — use standard for data which is used day to day and infrequent access for anything which isn't used on a consistent basis. And just like S3, you have the ability to use life cycle policies which can be used to move data between classes.
Okay, so that's the theory of EFS. It's not all that difficult a product to understand but you do need to understand it architecturally for the exam, and so to help with that it's now time for a demo. I want you to really understand how EFS works — it's something that you probably will use if you use AWS for any real world projects. Now the best way to understand it is to use it, and so that's what we're going to do in the next lesson which is a demo. You're going to have the opportunity to create an EFS file system, provision some EC2 instances, and then mount that file system within both EC2 instances, create a test file and see that that's accessible from both of those instances, proving that EFS is a shared network file system.
But at this point, that's all of the theory that I wanted to cover, so go ahead finish up this video and when you're ready I look forward to you joining me in the demo lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover a service which starts to feature more and more on the exam: the database migration service known as DMS. Now, this lesson is an extension of my lesson from the Associate Architect course, so even if you've taken that course and watched that lesson, you should still watch this lesson fully. Now, this product is something which, as well as being on the exam, if you're working as a Solutions Architect in the AWS space and if your projects involve databases, you will extensively use this product. It's something that you need to be aware of regardless, so let's jump in and get started.
Database migrations are complex things to perform. Normally, if we exclude the vendor tooling which is available, it's a manual process end to end. It usually involves setting up replication, which is pretty complex, or it means taking a point-in-time backup and restoring this to the destination database. But how do you handle changes which occur between taking that backup and when the new database is live? How do you handle migrations between different databases? These are all things where DMS comes in handy. It's essentially a managed database migration service. The concept is simple enough: it starts with a replication instance, which runs on EC2. This instance runs one or more replication tasks. You need to define a source and destination endpoints, which point at the source and target databases, and the only real restriction with the service is that one of the endpoints must be running within AWS. You can't use the product for migrations between two on-premises databases. Now, you don't actually need to have any experience using the product, but there will be a demo lesson elsewhere in this section which gives you some practical exposure. For this theory lesson though, we need to focus on the architecture, so let's continue by reviewing that visually.
Using DMS is simple enough architecturally. You start with a source and target database, and one of those needs to be within AWS. The databases themselves can use a range of compatible engines, such as MySQL, Aurora, Microsoft SQL, MariaDB, MongoDB, PostgreSQL, Oracle, Azure SQL, and many more. Now, in between these, conceptually is the database migration service, known as DMS, which uses a replication instance—essentially an EC2 instance with migration software and the ability to communicate with the DMS service. Now on this instance, you can define replication tasks, and each of these replication instances can run multiple replication tasks. Tasks define all of the options relating to the migration, but architecturally, two of the most important things are the source and destination endpoints, which store the replication information so that the replication instance and task can access the source and target databases. So a task essentially moves data from the source database, using the details in the source endpoint, to the target database using the details stored in the destination endpoint configuration.
The value from DMS comes in how it handles those migrations. Now, jobs can be one of three types. We have full load migrations, and these are used to migrate existing data. So, if you can afford an outage long enough to copy your existing data, then this is a good one to choose. This option simply migrates the data from your source database to your target database, and it creates the tables as required. Next, we have full load plus CDC, and this stands for change data capture, and this migrates existing data and replicates any ongoing changes. This option performs a full load migration, and at the same time, it captures any changes occurring on the source. After the full load migration is complete, then captured changes are also applied to the target. Eventually, the application of changes reaches a steady state, and at this point, you can shut down your applications, let the remaining changes flow through to the target, and then restart your applications and point them at the new target database. Finally, we've got CDC only, and this is designed to replicate only data changes. In some situations, it might be more efficient to copy existing data using a method other than AWS DMS. Also, certain databases, such as Oracle, have their own export and import tools, and in these cases, it might be more efficient to use those tools to migrate the initial data and then use DMS simply to replicate the changes starting at the point when you do that initial bulk load. So, CDC only migrations are actually really effective if you need to bulk transfer the data in some way outside of DMS.
Now lastly, DMS doesn't natively support any form of schema conversion, but there is a dedicated tool in AWS, known as the schema conversion tool, or SCT, and the sole purpose of this tool is to perform schema modifications or schema conversions between different database versions or different database engines. So, this is a really powerful tool that often goes hand-in-hand with migrations which are being performed by DMS. Now, DMS is a great tool for migrating databases from on-premises to AWS. It's a tool that you will get to use for most larger database migrations, so as a solutions architect, it's another tool which you need to understand end-to-end in the exam. If you see any form of database migration scenario, as long as one of the databases is within AWS and as long as there are no weird databases involved which aren't supported by the product, then you can default to using DMS. It's always a safe default option for any database migration questions. If the question talks about a no-downtime migration, then you absolutely should default to DMS.
Now, at this point, let's talk in a little bit more detail about a few aspects of DMS which are important. First, I want to talk about the schema conversion tool, or SCT, in a little bit more detail. So, this is actually a standalone application, which is only used when converting from one database engine to another. It can be used as part of migrations where the engines being migrated from and to aren't compatible, and another use case is that it can be used for larger migrations when you need to have an alternative way of moving data between on-premises and AWS, rather than using a data link. Now, SCT is not used—and this is really important—it's not used for movements of data between compatible database engines. For example, if you're performing a migration from an on-premises MySQL server to an AWS-based RDS MySQL server, then the engines are the same. Even though the products are different, the engines are the same, and so SCT would not be used. SCT works with OLTP databases such as MySQL, Microsoft SQL, and Oracle, and also OLAP databases such as Teradata, Oracle, Vertica, and even Green Plum. Now, examples of the types of situations where the schema conversion tool would be used include things like on-premises Microsoft SQL through to AWS RDS MySQL migrations because the engine changes from Microsoft SQL to MySQL, and then we could also use SCT for an on-premises Oracle to AWS-based Aurora database migration, again because the engines are changing.
Now, there is another type of situation where DMS can be used in combination with SCT, and that's for larger migrations. So, DMS can often be involved with large-scale database migrations, so things which are multi-terabytes in size. And for those types of projects, it's often not optimal to transfer the data over the network. It takes time, and it consumes network capacity that might be used heavily for normal business operations. So, DMS is able to utilize the Snowball range of products which are available for bulk transfer of data into and out of AWS. So, you can use DMS in combination with Snowball, and this actually uses the schema conversion tool. So, this is how it works. So, step one, you use the schema conversion tool to extract the data from the database and store it locally and then move this data to a Snowball device which you've ordered from AWS. Step two is that you ship that device back to AWS, they load that data into an S3 bucket, and then DMS migrates from S3 into the target store. So, the target database, if you decide to use change data capture, then you can also migrate changes since the initial bulk transfer. These also use S3 as an intermediary before being written to the target database by DMS. So, DMS normally will transfer the data over the network; it can transfer over direct connect or a VPN or even a VPC peer. But if the data volumes that you're migrating are bigger than you can practically transfer over your network link, then you can order a Snowball and use DMS together with SCT to make that transfer much quicker and more effective.
Now, the rule to remember for the exam is that SCT is only used for migrations when the engine is changing, and the reason why SCT is used here is because you're actually migrating a database into a generic file format, which can be moved using Snowballs. And so, this doesn't break the rule of only doing it when the database engine changes because you are essentially changing the database. You're changing it from whatever engine the source uses, and you're storing it in a generic file format for transfer through to AWS on a Snowball device. Now, that's everything that I wanted to cover in this lesson, and this has been an extension of the coverage which I did at the Associate Architect level. You are going to get the chance to experience this product practically in a demo, but in this lesson, I just wanted to cover the theory. So, thanks for watching. Go ahead and complete this lesson, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this video, I want to talk about a feature of RDS called RDS proxy, which is something important to know in its own right but also supports many other architectures involving RDS. Now we've got a lot to cover, so let's jump in and get started. Before we talk about how RDS proxy works, let's step through why you might want to use the product. First, opening and closing connections to databases takes time and consumes resources, which is often the bulk of many smaller database operations. If you only want to read and write a tiny amount, the overhead of establishing a connection can be significant, and this can be especially obvious when using serverless because if you have a lot of lambda functions invoking or accessing an RDS database, for example, that's a lot of connections to constantly open and close, especially when you're only billed for the time that you're using compute, as with lambda.
Now, another important element is that handling failure of database instances is hard. How long should you wait for the connection to work? What should your application do while waiting? When should it consider it a failure? How should it react? And then how should it handle the failover to the standby instance in the case of RDS? Doing all of this within your application adds significant overhead and risk. A database proxy is something that can help, but maybe you don't have any database proxy experience, and even if you do, can you manage them at scale? Well, that's where RDS proxy adds value. At a high level, what RDS proxy does, or indeed any database proxy, is change your architecture. Instead of your application connecting to a database every time it needs to use it, instead, they connect to a proxy, and the proxy maintains a pool of connections to the database that are open for the long term. Then, any connections to the proxy can use this already established pool of database connections. It can actually do multiplexing, where it can maintain a smaller number of connections to a database versus the connections to the proxy, multiplexing requests over the connection pool between the proxy and the database, so you can have a smaller number of actual connections to the database versus the connections to the database proxy, and this is especially useful for smaller database instances where resources are at a premium.
So, in terms of how an architecture might look using RDS proxy, let's start with this: a VPC in US East One with three availability zones and three subnets in each of those availability zones. In AZB, we have a primary RDS instance replicating to a standby running in AZC. Then we have Categoram, our application, running in the web subnets in the middle here, and the application makes use of some lambda functions, which are configured to use VPC networking and run from the subnet in availability zone B, and so there's a lambda ENI in that subnet. Without RDS proxy, the Categoram application servers will be connecting directly to the database every time they need to access data, and additionally, every time one of those lambda functions is invoked, they would need to directly connect to the database, which would significantly increase their running time. With RDS proxy, though, things change. The proxy is a managed service, and it runs only from within a VPC, in this case, across all availability zones A, B, and C. Now, the proxy maintains a long-term connection pool, in this case, to the primary node of the database running in AZB. These connections are created and maintained over the long term, and they're not created and terminated based on individual application needs or lambda function invocations. Our clients, in this case, the Categoram EC2 instances and lambda functions, connect to the RDS proxy rather than directly to the database instances, and these connections are quick to establish and place no load on the database server because they're between the clients and the proxy.
Now, at this point, the connections between the RDS proxy and database instances can be reused, meaning that even if we have constant lambda function invocations, they can reuse the same set of long-running connections to the database instances. More so, multiplexing is used so that a smaller number of database connections can be used for a larger number of client connections, and this helps reduce the load placed on the database server even more. RDS proxy even helps with database failure or failover events because it abstracts these away from the application. The clients we have can connect to the RDS proxy instances and wait even if the connection to the backend database isn't operational, and this is a situation that might occur during failover events from the primary to the standby. In the event that there is a failure, the RDS proxy can establish new connections to the new primary in the background, and the clients stay connected to the same endpoint, the RDS proxy, and they just wait for this to occur.
So, that's a high-level example architecture. Let's look at when you might want to use RDS proxy, and this is more for the exam, but you need to have an appreciation for the types of scenarios where RDS proxy will be useful. You might decide to use it when you have errors such as too many connection errors, because RDS proxy helps reduce the number of connections to a database, and this is especially important if you're using smaller database instances such as T2 or T3, so anything small or anything burst-related. Additionally, it's useful when using AWS Lambda because you're not having the per-invocation database connection setup, usage, and termination. It can reuse a long-running pool of connections maintained by the RDS proxy, and it can also use existing IAM authentication, which the Lambda functions have access to via their execution role.
Now, RDS proxy is also useful for long-running applications such as SaaS apps where low latency is critical. So, rather than having to establish database connections every time a user interaction occurs, they can use this existing long-running connection pool. RDS proxy is also really useful where resilience to database failure is a priority. Remember, your clients connect to the proxy, and the proxy connects to the backend databases, so it can significantly reduce the time for a failover event and make it completely transparent to your application. This is a really important concept to grasp because your clients are connected to the single RDS proxy endpoint, even if a failover event happens in the background. Instead of having to wait for the database C name to move from the primary to the standby, your applications are transparently connected to the proxy, and they don't realize it's a proxy; they think they're connecting to a database. The proxy, though, is handling all of the interaction between them and the backend database instances.
Now, before we finish up, I want to cover some key facts about RDS proxy—think of these as the key things that you need to remember for the exam. RDS proxy is a fully managed database proxy that's usable with RDS and Aurora. It's auto-scaling and highly available by default, so you don't need to worry about it, and this represents a much lower admin overhead versus managing a database proxy yourself. Now, it provides connection pooling, which significantly reduces database load, and this is for two main reasons. Firstly, we don't have the constant opening and closing of database connections, which does put unnecessary stress on the database, but in addition, we can also multiplex to use a lower number of connections between the proxy and the database relative to the number of connections between the clients and the proxy, so this is really important.
Now, RDS proxy is only accessible from within a VPC, so you can't access this from the public internet; it needs to occur from a VPC or from private VPC-connected networks. Access to the RDS proxy uses a proxy endpoint, and this is just like a normal database endpoint; it's completely transparent to the application. An RDS proxy can also enforce SSL/TLS connections, so it can enforce these to ensure the security of your applications, and it can reduce failover time by over 60% in the case of Aurora. This is somewhere in the region of a 66 to 67% improvement versus connecting to Aurora directly. Critically, it abstracts the failure of a database away from your application, so the application connected to an RDS proxy will just wait until the proxy makes a connection to the other database instance. So, during a failover event, where we're failing over from the primary to the standby, the RDS proxy will wait until it can connect to the standby and then just continue fulfilling requests from client connections, and so it abstracts away from underlying database failure.
Now, at this point, that is everything I wanted to cover in this high-level lesson on RDS proxy, so go ahead and complete the video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to talk about an advanced feature of Amazon Aurora, multi-master rights. This feature allows an Aurora cluster to have multiple instances capable of performing both reads and writes, which is in contrast with the default mode for Aurora that only allows one writer and many readers. So let's get started and look at the architecture.
Just to refresh where we are, the default Aurora mode is known as single master, and this equates to one read-write instance, so one database instance that can perform read and write operations, and then in addition, it can also have zero or more read-only replicas. An Aurora cluster running in the default mode of single master has a number of endpoints which are used to interact with the database. We've got the cluster endpoint, which can be used for read or write operations, and then we've got another endpoint, a read endpoint, that's used for load balancing reads across any of the read-only replicas inside the cluster.
An important consideration with an Aurora cluster running in single master mode is that failover takes time. For a failover to occur, a replica needs to be promoted from read-only mode to read-write mode. In multi-master mode, all of the instances by default are capable of both read and write operations, so there isn't this concept of a lengthy failover if one of the instances fails in a multi-master cluster. At a high level, a multi-master Aurora cluster might seem similar to a single master one, with the same cluster structure, the same shared storage, and multiple Aurora provisioned instances existing in the cluster. The differences start with the fact that there is no cluster endpoint to use; an application is responsible for connecting to instances within the cluster. There's no load balancing across instances with a multi-master cluster; the application connects to one or all of the instances in the cluster and initiates operations directly.
So that's important to understand—there is no concept of a load-balanced endpoint for the cluster. An application can initiate connections to one or both of the instances inside a multi-master cluster. Now, the way that this architecture works is that when one of the read-write nodes inside a multi-master cluster receives a write request from the application, it immediately proposes that data be committed to all of the storage nodes in that cluster. It proposes that the data it receives to write is committed to storage. At this point, each node that makes up the cluster either confirms or rejects the proposed change. It rejects it if it conflicts with something that's already in flight, for example, another change from another application writing to another read-write instance inside the cluster.
What the writing instance is looking for is a quorum of nodes to agree, a quorum of nodes that allow it to write that data, at which point it can commit the change to the shared storage. If the quorum rejects it, then it cancels the change with the application and generates an error. Now, assuming that it can get a quorum to agree to the write, then that write is committed to storage and it's replicated across every storage node in the cluster just as it is with a single master cluster, but—and this is the major difference with a multi-master cluster—that change is then replicated to other nodes in the cluster. This means that those other writers can add the updated data into their in-memory caches, and this means that any reads from any other instances in the cluster will be consistent with the data that's stored on shared storage because instances' cached data needs to be updated inside any in-memory caches of any other instances within the multi-master cluster.
So that's what this replication does. Once the instance on the right has got agreement to commit that change to the cluster's shared storage, it replicates that change to the instance on the left, the instance on the left updates its in-memory cache, and then if that instance is used for any read operations, it's always got access to the up-to-date data.
Now, to understand some of the benefits of multi-master mode, let's look at a single master failover situation. In this scenario, we have an Aurora single master cluster with one primary instance performing reads and writes and one replica which is only performing read operations. Now, Bob is using an application, and this application connects to this Aurora cluster using the cluster endpoint. The cluster endpoint at this stage points to the primary instance, which is the one that's used for read and write operations and always points at the primary instance.
If the primary instance fails, then access to the cluster is interrupted, so immediately we know that this application cannot be fault-tolerant because access to the database is now disrupted. At this point, though, the cluster will realize that there is a failure event and it will change the cluster endpoint to point at the replica, which the cluster decides will be the new primary instance, but this failover process takes time. It's quicker than normal RDS because each replica shares the cluster storage and they can be more replicas, but it can take time. The configuration change to make one of the other replicas the new primary instance inside the cluster is not an immediate change—it causes disruption.
Now, let's contrast this with multi-master. With multi-master, both instances are able to write to the shared storage; they're both writers. The application can connect with one or both of them, and let's assume at this stage that it connects to both. Both instances are capable of read and write operations, the application could maintain connections to both, and be ready to act if one of them fails, but when that writer fails, it could immediately just send 100 percent of any future data operations to the writer which is working perfectly. There would be little if any disruption. If the application is designed in this way—it's designed to operate through this failure—the application could almost be described as fault-tolerant. So, an Aurora multi-master cluster is one component that is required in order to build a fault-tolerant application. It's not a guarantee, and it's not always a thousand percent fault-tolerant, but it is the foundation of being able to build a fault-tolerant application because the application can maintain connections to multiple writers at the same time.
In terms of the high-level benefits, it offers better and much faster availability. The failover events can be performed inside the application, and it doesn't even need to disrupt traffic between the application and the database because it can immediately start sending any write operations to another writer. It can be used to implement fault tolerance, but the application logic needs to manually load balance across the instances—it's not something that's handled by the cluster. With that being said, though, that's everything I wanted to cover in this lesson. It's not something I expect to immediately feature in detail on the exam, so we can keep it relatively brief. Go ahead and complete the video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to quickly cover the Aurora Global Database Product, which allows you to create global level replication using Aurora from a master region to up to five secondary AWS regions. Now, the name probably gives away the function, but to avoid any confusion, global databases allow you to create global-level replication using Aurora from a master region to up to five secondary AWS regions. This is one of the things which you just need an awareness of for the exam; I don't expect it to feature heavily, but I want you to be aware of exactly what functionality Aurora Global Database provides.
So, to keep this lesson as brief as possible, let's quickly jump in and look at the architecture first. This is a common architecture that you might find when using Aurora Global Databases. We've got an environment here which operates from two or more regions. We've got a primary region, US East One on this example on the left. This primary region offers similar functionality to a normal Aurora cluster, with one read and write instance and up to 15 read-only replicas in that cluster.
Global databases introduce the concept of secondary regions, and the example that's on screen is AP Southeast 2, which is the Sydney region on the right of your screen, and these can have up to 16 replicas. The entire secondary cluster is read-only, so in this example, all 16 replicas would be read-only replicas. The entire secondary cluster during normal operations is read-only. Now, the replication from the primary region to secondary regions occurs at the storage layer, and replication is typically within one second from the primary to all of the secondaries. Applications can use the primary instance in the primary region for write operations and then the replicas in the primary or the replicas in the secondary regions for read operations.
So, that's the architecture, but what's perhaps more important for the exam is when you would use global databases. So let's have a look at that next. Aurora Global Databases are great for cross-region disaster recovery and business continuity. You can basically create a global database, set up multiple secondary regions, and then if you do have a disaster that affects an entire AWS region, you can make these secondary clusters act as primary clusters so they can do read-write operations. It offers a great solution for cross-region disaster recovery and business continuity, and because of the one-second replication time between the primary region and secondary regions, it makes sure that both the RPO and RTO values are really low.
If you do perform a cross-region failover, they're also great for global read scaling. So, if you want to offer low latency to any international areas where you have customers, remember that low latency generally equates to really good performance. If you want to offer low-latency performance improvements to international customers, you can create lots of secondary regions replicated from a primary region, and then the application can sit in those secondary regions and just perform read operations against the secondary clusters, providing your customers with great performance.
Again, it's important to understand that Aurora Global Databases, the replication occurs at the storage layer, and it's generally around one second or even less between regions from the primary region to all secondary regions. It's also important to understand that this is one-way replication from the primary to the secondary regions; it is not bidirectional replication, and replication has no impact on database performance because it occurs at the storage layer. So no additional CPU usage is required to perform the replication tasks. It happens at the storage layer.
Secondary regions can have 16 replicas. If you think about Aurora, normally it can have one read and write primary instance and then up to 15 read replicas for a total of 16, so it makes sense that secondary regions, because they don't have this read-write primary instance, all of the replicas inside a secondary can be read replicas. So, it can have a total of 16 replicas per secondary region, and all of these can be promoted to read-write if you do have any disaster situations. Currently, there is a maximum of five secondary regions, though just like most things in AWS, this is likely to change.
Now again, for the exam, I don't expect this particular product to feature extensively, but I do want you to have an awareness so that when it does begin to be mentioned in the exam or if you need to use it in production, you have a starting point by understanding the architecture. With that being said, though, that is everything that I wanted to cover in this theory lesson. So, go ahead, complete the video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover Aurora Serverless. Aurora Serverless is a service which is to Aurora what Fargate is to ECS. It provides a version of the Aurora database product where you don't need to statically provision database instances of a certain size or worry about managing those database instances. It's another step closer to a database as a service product. It removes one more piece of admin overhead, the admin overhead of managing individual database instances. From now on, when you're referring to the Aurora product that we've covered so far in the course, you should refer to it as Aurora provisioned versus Aurora Serverless, which is what we'll cover in this lesson.
With Aurora Serverless, you don't need to provision resources in the same way as you did with Aurora provisioned. You still create a cluster, but Aurora Serverless uses the concept of ACUs or Aurora Capacity Units. Capacity units represent a certain amount of compute and a corresponding amount of memory. For a cluster, you can set minimum and maximum values, and Aurora Serverless will scale between those values, adding or removing capacity based on the load placed on the cluster. It can even go down to zero and be paused, meaning that you're only billed for the storage that the cluster consumes.
Now billing is based on the resources that you use on a per-second basis, and Aurora Serverless provides the same levels of resilience as you're used to with Aurora provisioned. So, you get cluster storage that's replicated across six storage nodes across multiple availability zones. Now, some of the high-level benefits of Aurora Serverless: it's much simpler, it removes much of the complexity of managing database instances and capacity, it's easier to scale, it seamlessly scales the compute and memory capacity in the form of ACUs as needed with no disruption to client connections, and you'll see how that works architecturally on the next screen. It's also cost-effective. When you use Aurora Serverless, you only pay for the database resources that you consume on a per-second basis, unlike with Aurora provisioned, where you have to provision database instances in advance, and you charge for the resources that they consume, whether you're utilizing them or not.
The architecture of Aurora Serverless has many similarities with Aurora provisioned, but it also has crucial differences, so let's review both of those, the similarities and the differences. The Aurora cluster architecture still exists, but it's in the form of an Aurora Serverless cluster. Now, this has the same cluster volume architecture which Aurora provisioned uses. In an Aurora Serverless cluster, though, instead of using provisioned servers, we have ACUs, which are Aurora Capacity Units. These capacity units are actually allocated from a warm pool of Aurora capacity units which are managed by AWS. The ACUs are stateless, they're shared across many AWS customers, and they have no local storage, so they can be allocated to your Aurora Serverless cluster rapidly when required. Now, once these ACUs are allocated to an Aurora Serverless cluster, they have access to the cluster storage in the same way that a provisioned Aurora instance would have access to the storage in a provisioned Aurora cluster. It's the same thing; it's just that these ACUs are allocated from a shared pool managed by AWS.
Now, if the load on an Aurora Serverless cluster increases beyond the capacity units which are being used, and assuming the maximum capacity setting of the cluster allows it, then more ACUs will be allocated to the cluster. And once the compute resource, which represents this new, potentially bigger ACU, is active, then any old compute resources representing unused capacity can be deallocated from your Aurora Serverless cluster. Now, because of the ACU architecture, because the number of ACUs are dynamically increased and decreased based on load, the way that connections are managed within an Aurora Serverless cluster has to be slightly more complex versus a provisioned cluster. In an Aurora Serverless cluster, we have a shared proxy fleet which is managed by AWS. Now, this happens transparently to you as a user of an Aurora Serverless cluster, but if a user interacts with the cluster via an application, it actually goes via this proxy fleet. Any of the proxy fleet instances can be used, and they will broker a connection between the application and the Aurora Capacity Units.
Now, this means that because the client application is never directly connecting to the compute resource that provides an ACU, it means that the scaling can be fluid, and it can scale in or out without causing any disruptions to applications while it's occurring because you're not directly connecting with an ACU. You're connecting via an instance in this proxy fleet. So, the proxy fleet is managed by AWS on your behalf. The only thing you need to worry about for an Aurora Serverless cluster is picking the minimum and maximum values for the ACU, and you only have a bill for the amount of ACU that you're using at a particular point in time as well as the cluster storage. So that makes Aurora Serverless really flexible for certain types of use cases.
Now, a couple of examples of types of applications which really do suit Aurora Serverless. The first is infrequently used applications, maybe a low-volume blog site such as "The Best Cats," where connections are only attempted for a few minutes several times per day, or maybe on really popular days of the week. With Aurora Serverless, if you were using the product to run the "Best Cat Pics" blog, which you'll experience in the demo lesson, then you'd only pay for resources for the Aurora Serverless cluster as you consume them on a per-second basis. Another really good use case is new applications. If you're deploying an application where you're unsure about the levels of load that will be placed on the application, so you're going to be unsure about the size of the database instance that you'll need. With Aurora provisioned, you would still need to provision that in advance and potentially change it, which could cause disruption. If you use Aurora Serverless, you can create the Aurora Serverless cluster and have the database autoscale based on the incoming load.
It's also really good for variable workloads. If you're running a normally lightly used application which has peaks, maybe 30 minutes out of an hour or on certain days of the week during sale periods, then you can use Aurora Serverless and have it scale in and out based on that demand. You don't need to provision static capacity based on the peak or average as you would do with Aurora provisioned. It's also really good for applications with unpredictable workloads, so if you're really not sure about the level of workload at a given time of day, you can't predict it, you don't have enough data, then you can provision an Aurora Serverless cluster and initially set a fairly large range of ACUs so the minimum is fairly low and the maximum is fairly high, and then over the initial period of using the application, you can monitor the workload. If it really does stay unpredictable, then potentially Aurora Serverless is the perfect database product to use because if you're using anything else, say an Aurora provisioned cluster, then you always have to have a certain amount of capacity statically provisioned. With Aurora Serverless, you can, in theory, leave an unpredictable application inside Aurora Serverless constantly and just allow the database to scale in and out based on that unpredictable workload.
It's also great for development and test databases because Aurora Serverless can be configured to pause itself during periods of no load, and during the database pause, you only build for the storage. So, if you do have systems which are only used as part of your development and test processes, then they can scale back to zero and only incur storage charges during periods when it's not in use, and that's really cost-effective for this type of workload. It's also great for multi-tenant applications. If you've got an application where you're billing a user a set dollar amount per month per license to the application, if your incoming load is directly aligned to your incoming revenue, then it makes perfect sense to use Aurora Serverless. You don't mind if a database supporting your product scales up and costs you more if you also get more customer revenue, so it makes perfect sense to use Aurora Serverless for multi-tenant applications where the scaling is fairly aligned between infrastructure size and incoming revenue.
So, these are some classic examples of when Aurora Serverless makes perfect sense. Now, this is a product I don't yet expect to feature extensively on the exam. It will feature more and more as time goes on, and so by learning the architecture at this point, you get a head start and you can answer any questions which might feature on the exam about Aurora Serverless and comparing it to the other RDS products, which is often just as important. But at this point, that's all of the theory that I wanted to cover, all of the architecture. So go ahead, finish up this video, and when you're ready, I look forward to joining you in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I'm going to be covering the architecture of the Amazon Aurora managed database product from AWS. I mentioned earlier that Aurora is officially part of RDS, but from my perspective, I've always viewed it as its own distinct product. The features that it provides and the architecture it uses to deliver those features are so radically different than normal RDS that it needs to be treated as its own product. So, we've got a lot to cover, so let's jump in and get started.
As I just mentioned, the Aurora architecture is very different from normal RDS. At its very foundation, it uses the base entity of a cluster, which is something that other engines within RDS don’t have. A cluster is made up of a number of important things. Firstly, from a compute perspective, it's made up of a single primary instance and then zero or more replicas. Now, this might seem similar to how RDS works with the primary and the standby replica, but it’s actually very different. The replicas within Aurora can be used for reads during normal operations, so it’s not like the standby replica inside RDS. The replicas inside Aurora can actually provide the benefits of both RDS multi-AZ and RDS read replicas. So, they can be inside a cluster and can be used to improve availability, but they can also be used for read operations during the normal operation of a cluster.
Now, that alone would be worth the move to Aurora since you don’t have to choose between read scaling and availability. Replicas inside Aurora can provide both of those benefits. Now, the second major difference in the Aurora architecture is its storage. Aurora doesn’t use local storage for the compute instances. Instead, an Aurora cluster has a shared cluster volume. This is storage that is shared and available to all compute instances within a cluster. This provides a few benefits, such as faster provisioning, improved availability, and better performance.
A typical Aurora cluster looks something like this. It functions across a number of availability zones—in this example, A, B, and C. Inside the cluster is a primary instance and optionally a number of replicas. Again, these function as failover options if the primary instance fails, but they can also be used during the normal functioning of the cluster for read operations from applications. Now, the cluster has shared storage, which is SSD-based, and it has a maximum size of 128TIB. It also has six replicas across multiple availability zones. When data is written to the primary DB instance, Aurora synchronously replicates that data across all of these six storage nodes spread across the availability zones, which are associated with your cluster. All instances inside your cluster, so the primary and all of the replicas, have access to all of these storage nodes.
The important thing to understand, though, from a storage perspective, is that this replication happens at the storage level. So, no extra resources are consumed on the instances or the replicas during this replication process. By default, the primary instance is the only instance able to write to the storage, and the replicas and the primary can perform read operations. Because Aurora maintains multiple copies of your data in three availability zones, the chances of losing data as a result of any disk-related failure are greatly minimized. Aurora automatically detects failures in the disk volumes that make up the cluster shared storage. When a segment or a part of a disk volume fails, Aurora immediately repairs that area of the disk. When Aurora repairs that area of disk, it uses the data inside the other storage nodes that make up the cluster volume and it automatically recreates that data. It ensures that the data is brought back into an operational state with no corruption. As a result, Aurora avoids data loss and reduces any need to perform point-in-time restores or snapshot restores to recover from disk failures.
So, the storage subsystem inside Aurora is much more resilient than that which is used by the normal RDS database engines. Another powerful difference between Aurora and the normal RDS database engines is that with Aurora, you can have up to 15 replicas, and any of them can be the failover target for a failover operation. So, rather than just having the one primary instance and the one standby replica of the non-Aurora engines, with Aurora, you’ve got 15 different replicas that you can choose to fail over to. And that failover operation will be much quicker because it doesn’t have to make any storage modifications.
Now, as well as the resiliency that the cluster volume provides, there are a few other key elements that you should be aware of. The cluster shared volume is based on SSD storage by default, so it provides high IOPS and low latency. It's high-performance storage by default. You don't get the option of using magnetic storage. Now, the billing for that storage is very different than with the normal RDS engines. With Aurora, you don't have to allocate the storage that the cluster uses. When you create an Aurora cluster, you don't specify the amount of storage that's needed. Storage is simply based on what you consume. As you store data up to the 128 TIB limit, you'll bill on consumption.
Now, the way that this consumption works is that it's based on a high watermark. So, if you consume 50 GIB of storage, you'll bill for 50 GIB of storage. If you free up 10 GIB of data (so move down to 40 GIB of consumed data), you'll still bill for that high watermark of 50 GIB, but you can reuse any storage that you free up. What you'll bill for is a high watermark—the maximum storage that you've consumed in a cluster. And if you go through a process of significantly reducing storage and you need to reduce storage costs, then you need to create a brand new cluster and migrate data from the old cluster to the new cluster.
Now, it is worth mentioning that this high watermark architecture is being changed by AWS, and this no longer is applicable for the more recent versions of Aurora. I’m going to update this lesson once this feature becomes more widespread, but for now, you do still need to assume that this high watermark architecture is being used. Now, because the storage is for the cluster and not for the instances, it means replicas can be added and removed without requiring storage provisioning or removal, which massively improves the speed and efficiency of any replica changes within the cluster. Having this cluster architecture also changes the access method versus RDS.
Aurora clusters, like RDS clusters, use endpoints. These are DNS addresses that are used to connect to the cluster. Unlike RDS, Aurora clusters have multiple endpoints that are available for an application. As a minimum, you have the cluster endpoint and the reader endpoint. The cluster endpoint always points at the primary instance, and that's the endpoint that can be used for read and write operations. The reader endpoint will also point at the primary instance if that's all that there is, but if there are replicas, then the reader endpoint will load balance across all of the available replicas, and this can be used for read operations.
Now, this makes it much easier to manage read scaling using Aurora versus RDS because as you add additional replicas, which can be used for reads, this reader endpoint is automatically updated to load balance across these new replicas. You can also create custom endpoints, and in addition to that, each instance, so the primary and any of the replicas, has their own unique endpoint. So, Aurora allows for a much more custom and complex architecture versus RDS.
So, let’s move on and talk about costs. With Aurora, one of the biggest downsides is that there isn’t actually a free tier option. You can’t use Aurora within the free tier, and that’s because Aurora doesn’t support the micro instances that are available inside the free tier. But for any instances beyond an RDS single AZ micro-sized instance, Aurora offers much better value. For any compute that you use, there's an hourly charge, and you'll bill per second with a 10-minute minimum. For storage, you’ll bill based on a gigabyte month consumed metric, of course taking into account the high watermark. So, this is based on the maximum amount of storage that you've consumed during the lifetime of that cluster, and as well, there is an I/O cost per request made to the cluster shared storage.
Now, in terms of backups, you're given 100% of the storage consumption for the cluster in free backup allocation. So, if your database cluster is 100GIB, then you're given 100GIB of storage for backups as part of what you pay for that cluster. So, for most situations, for anything low usage or medium usage, unless you've got high turnover in data or unless you keep the data for long retention periods, in most cases, you'll find that the backup costs are often included in the charge that you pay for the database cluster itself.
Now, Aurora provides some other really exciting features. In general, though, backups in Aurora work in much the same way as they do in RDS. So, for normal backup features, for automatic backups, for manual snapshot backups, this all works in the same way as any other RDS engine, and restores will create a brand-new cluster. So, you've experienced this in the previous demo lesson where you created a brand-new RDS instance from a snapshot, and this architecture by default doesn't change when you use Aurora.
But you've also got some advanced features, which can change the way that you do things. One of those is backtrack, and this is something that needs to be enabled on a per-cluster basis. It will allow you to roll back your database to a previous point in time. So, consider the scenario where you've got major corruption inside an Aurora cluster, and you can identify the point at which that corruption occurred. Well, rather than having to do a restore to a brand-new database at a point in time before that corruption, if you enable backtrack, you can simply roll back in place your existing Aurora cluster to a point before that corruption occurred. And that means you don’t have to reconfigure your applications; you simply allow them to carry on using the same cluster—it's just the data is rolled back to a previous state before the corruption occurred.
You need to enable this on a per-cluster basis, and you can adjust the window that backtrack will work for, but this is a really powerful feature that's exclusive, at the time of creating this lesson, to Aurora. You also have the ability to create what's known as a fast clone, and a fast clone allows you to create a brand-new database from an existing database. But crucially, it doesn't make a one-for-one copy of the storage for that database. What it does is it references the original storage, and it only stores any differences between those two.
Now, differences can be either that you update the storage in your cloned database, or it can also be that data is updated in the original database, which means that your clone needs a copy of that data before it was changed on the source. So, essentially, your cloned database only uses a tiny amount of storage—it only stores data that's changed in the clone or changed in the original after you make the clone, and that means that you can create clones much faster than if you had to copy all of the data bit by bit. It also means that these clones don’t consume anywhere near the full amount of data—they only store the changes between the source data and the clone.
So, I know that’s a lot of architecture to remember. I’ve tried to quickly step through all of the differences between Aurora and the other RDS engines. You’ll have lessons upcoming later in this section, which deep dive into a little bit more depth of specific Aurora features that I think you will need for the exam, but in this lesson, I just wanted to provide a broad-level overview of the differences between Aurora and the other RDS engines.
So, in the next demo lesson, you're going to get the opportunity to migrate the data for our WordPress application stack from the RDS MariaDB engine into the Aurora engine. So, you'll get some experience of creating an Aurora cluster and interacting with it with some data that you've migrated, but at this point, that's all of the theory that I wanted to cover. So, go ahead, complete this video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to talk about a specific feature of RDS called RDS Custom. Now this is a really niche topic. I've yet to see it used in the real world and for the exams you really only need to have the most surface level understanding so I'm going to keep this really brief. So RDS Custom fills the gap between the main RDS product and then EC2 running a database engine. RDS is a fully managed database server as a service product. Essentially it gives you access to databases running on a database server which is fully managed by AWS and so any OS or engine access is limited using the main RDS product.
Now databases running on EC2 are self-managed but this has significant overhead because done in this way you're responsible for everything from the operating system upwards. So RDS Custom bridges this gap; it gives you the ability to occupy a middle ground where you can utilize RDS but still get access to some of the customizations that you have access to when running your own DB engine on EC2. Now currently RDS Custom works for MS SQL and Oracle and when you're using RDS Custom you can actually connect using SSH, RDP and session manager and actually get access to the operating system and database engine.
Now RDS Custom, unlike RDS, is actually running within your AWS account. If you're utilizing normal RDS then if you look in your account you won't see any EC2 instances or EBS volumes or any backups within S3. That's because they're all occurring within an AWS managed environment. With RDS, the networking works by injecting elastic network interfaces into your VPC. That's how you get access to the RDS instance from a networking perspective, but with RDS Custom everything is running within your AWS account so you will see an EC2 instance, you will see EBS volumes and you will see backups inside your AWS account.
Now if you do need to perform any type of customization of RDS Custom then you need to look at the database automation settings to ensure that you have no disruptions caused by the RDS automation while you're performing customizations. You need to pause database automation, perform your customizations and then resume the automation so re-enable full automation and this makes sure that the database is ready for production usage. Now again I'm skipping through a lot of these facts and talking only at a high level because realistically you're probably never going to encounter this in production and if you do have any exposure to it on the exam just knowing that it exists will be enough.
Now from a service model perspective, this is how using RDS Custom changes things. So on this screen anything that you see in blue is customer managed, anything that you see in orange is AWS managed and then anything that has a gradient is a shared responsibility. So if you're using a database engine running on-premises then you're responsible for everything as the customer. So application optimization, scaling, high availability, backups, any DB patches, operating system patches, operating system install and management of the hardware. End-to-end that's your responsibility.
Now if you migrate to using RDS, this is how it looks where AWS has responsibility for everything but application optimization. Now if for whatever reason you can't use RDS, then historically your only other option was to use a database engine running on EC2 and this was the model in that configuration. So AWS handled the hardware but from an operating system installation perspective, operating system patches, database patches, backups, HA, scaling and application optimization were still the responsibility of the customer. So you only gained a tiny amount of benefit versus using an on-premises system.
With RDS Custom we have this extra option where the hardware is AWS's responsibility, the application optimization is the customer responsibility but everything else is shared between the customer and AWS. So this gives you some of the benefits of both. It gives you the ability to use the RDS product and benefit from the automation while at the same time allowing you an increased level of customization and the ability to connect into the instance using SSH, session manager or RDP.
Now once again for the exam this is everything that you'll need to understand. It only currently works for Oracle and MS SQL and for the real world you probably won't encounter this outside of very niche scenarios. With that being said though that is everything I wanted to cover in this video, so go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about data security within the RDS product. I want to focus on four different things: Authentication, so how users can log into RDS; Authorization, how access is controlled; Encryption in transit between clients and RDS; and then encryption at rest, so how data is protected when it's written to disk. Now we've got a lot to cover so let's jump in and get started.
With all of the different engines within RDS you can use encryption in transit, which means data between the client and the RDS instance is encrypted via SSL or TLS, and this can actually be set to mandatory on a per user basis. Encryption at rest is supported in a few different ways depending on the database engine. By default, it's supported using KMS and EBS encryption. So this is handled by the RDS host and the underlying EBS based storage. As far as the RDS database engine knows, it's just writing unencrypted data to storage. The data is encrypted by the host that the RDS instance is running on. KMS is used and so you select a customer master key or CMK to use. Either a customer managed CMK or an AWS managed CMK and then this CMK is used to generate data encryption keys or DECs, which are used for the actual encryption operations.
Now when using this type of encryption, the storage, the logs, the snapshots, and any replicas are all encrypted using the same customer master key and importantly encryption cannot be removed once it's added. Now these are features supported as standard with RDS. In addition to KMS EBS based encryption, Microsoft SQL and Oracle support TDE. Now TDE stands for transparent data encryption and this is encryption which is supported and handled within the database engine. So data is encrypted and decrypted within the database engine itself not by the host that the instance is running on, and this means that there's less trust. It means that you know data is secure from the moment it's written out to disk by the database engine.
In addition to this, RDS Oracle supports TDE using cloud HSM and with this architecture, the encryption process is even more secure with even stronger key controls because cloud HSM is managed by you with no key exposure to AWS. It means that you can implement encryption where there is no trust chain which involves AWS and for many demanding regulatory situations this is really valuable. Visually, this is how the encryption architecture looks. Architecturally, let's say that we have a VPC and inside this a few RDS instances running on a pair of underlying hosts and these instances use EBS for underlying storage. Now we'll start off with Oracle on the left, which uses TDE and so cloud HSM is used for key services because TDE is native and handled by the database engine. The data is encrypted from the engine all the way through to the storage with AWS having no exposure and outside of the RDS instance to the encryption keys which are used.
With KMS based encryption, KMS generates and allows usage of CMKs which themselves can be used to generate data encryption keys known as DECays. These data encryption keys are loaded onto the RDS hosts as needed and are used by the host to perform the encryption or decryption operations. This means the database engine doesn't need to natively support encryption or decryption; it has no encryption awareness. From its perspective, it's writing data as normal and it's encrypted by the host before sending it on to EBS in its final encrypted format. Data that's transferred between replicas, as with MySQL in this example, is also encrypted as are any snapshots of the RDS EBS volumes and these use the same encryption key. So that's at rest encryption and there's one more thing that I want to cover before we finish this lesson and that's IAM authentication for RDS.
Normally logins to RDS are controlled using local database users. These have their own usernames and passwords, they're not IAM users and are outside of the control of AWS. One gets created when you provision an RDS instance but that's it. Now you can configure RDS to allow IAM user authentication against a database and this is how. We start with an RDS instance on which we create a local database user account configured to allow authentication using an AWS authentication token. How this works is that we have IAM users and roles, in this case an instance role, and attached to those roles and users are policies. These policies contain a mapping between that IAM entity, so the user or role, and a local RDS database user. This allows those identities to run a generate DB auth token operation which works with RDS and IAM and based on the policies attached to the IAM identities, it generates a token with a 15-minute validity. This token can then be used to log in to the database user within RDS without requiring a password.
So this is really important to understand by associating a policy with an IAM user or an IAM role, it allows either of those two identities to generate an authentication token which can be used to log into RDS instead of a password. Now one really important thing to understand going into the exam is that this is only authentication. This is not authorization. The permissions over the RDS database inside the instance are still controlled by the permissions on the local database user. So authorization is still handled internally. This process is only for authentication which involves IAM and only if you specifically enable it on the RDS instance.
Now that's everything I wanted to cover about encryption in transit, encryption at rest as well as RDS IAM based authentication. So thanks for watching, go ahead and complete this video and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this video I want to talk about RDS read replicas. Now read replicas provide a few main benefits to us as solutions architects or operational engineers; they provide performance benefits for read operations, they help us create cross-region failover capability, and they provide a way for RDS to meet really low recovery time objectives, just as long as data corruption isn't involved in a disaster scenario. Now let's step through the key concepts and architectures because they're going to be useful for both the exam and the real world.
Read replicas, as the name suggests, are read-only replicas of an RDS instance. Unlike MultiAZ, where you can't by default use the standby replica for anything, you can use read replicas but only for read operations. Now MultiAZ running in cluster mode, which is the newer version of MultiAZ, is like a combination of the old MultiAZ instance mode together with read replicas. But, and this is really important, you have to think of read replicas as separate things; they aren't part of the main database instance in any way. They have their own database endpoint address and so applications need to be adjusted to use them. An application, say WordPress, using an RDS instance will have zero knowledge of any read replicas by default. Without application support, read replicas do nothing. They aren't functional from a usage perspective. There's no automatic failover, they just exist off to one side.
Now they're kept in sync using a synchronous replication. Remember MultiAZ uses synchronous replication and that means that when data is written to the primary instance, at the same time as storing that data on disk on the primary, it's replicated to the standby. And conceptually think of this as a single write operation, both on the primary and on the standby. With asynchronous, data is written to the primary first at which point it's viewed as committed. Then after that it's replicated to the read replicas and this means in theory there could be a small lag, maybe seconds, but it depends on network conditions and how many writes occur on the database. For the exam for any RDS questions and exclude Aurora for now, remember that synchronous means MultiAZ and asynchronous means read replicas.
Read replicas can be created in the same region as the primary database instance or they can be created in other AWS regions known as cross region read replicas. If you create a cross region read replica, then AWS handle all of the networking between regions and this occurs transparently to you and it's fully encrypted in transit.
Now why do read replicas matter? Well there are two main areas of importance that I want you to think about. First is read performance and read scaling for a database instance. So you can create five direct read replicas per database instance and each of these provides an additional instance of read performance. So this offers a simple way of scaling out your read performance on a database. Now read replicas themselves can also have their own read replicas but this means that lag starts to become a problem because asynchronous replication is used. There can be a lag between the main database instance and any read replicas and if you then create read replicas of read replicas then this lag becomes more of a problem. So while you can use multiple levels of read replicas to scale read performance even more lag does start to become even more of a problem. So you need to take that into consideration.
Additionally read replicas can help you with global performance improvements for read workloads. So if you have read workloads in other AWS regions then these workloads can directly connect to read replicas and not impact the performance at the primary instance in any way. In addition read replicas benefit us in terms of recovery point objectives and recovery time objectives. So snapshots and backups improve RPOs the more frequent snapshots occur and the better backups are this offers improved recovery point objectives because it limits the amount of data which can be lost but it doesn't really help us for recovery time objectives because restoring snapshots takes a long time especially for large databases.
Now read replicas offer a near zero RPO and that's because the data that's on the read replica is synced from the main database instance. So there's very little potential for data loss assuming we're not dealing with data corruption. Read replicas can be promoted quickly they offer a near zero RPO. So in a disaster scenario where you have a major problem with your RDS instance you can promote a read replica and this is a really quick process but and this is really important you should only look at using read replicas during disaster recovery scenarios when you're recovering from failure. If you're recovering from data corruption then logically the read replica will probably have a replica of that corrupted data. So read replicas are great for achieving low RTOs but only for failure and not for data corruption.
Now read replicas are read only until they're promoted and when they're promoted you're able to use them as a normal RDS instance. There's also a really simple way to achieve global availability improvements and global resilience because you can create a cross region read replica in another AWS region and use this as a failover region if AWS ever have a major regional issue.
Now at this point that's everything I wanted to cover about read replicas. If appropriate for the exam that you're studying I might have another lesson which goes into more technical depth or a demo lesson which allows you to experience this practically. If you don't see either of these then don't worry they're not required for the exam that you're studying. At this point though, that's everything I'm going to cover so go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this video, I want to talk about how RDS can be backed up and restored, as well as covering the different methods of backup that we have available. Now we do have a lot to cover, so let's jump in and get started. Within RDS, there are two types of backup-like functionality: automated backups and snapshots. Both of these are stored in S3, but they use AWS-managed buckets, so they won't be visible to you within your AWS console. You can see backups in the RDS console, but you can't move to S3 and see any form of RDS bucket, which exists for backups. Keep this in mind because I've seen questions on it in the exam.
Now, the benefits of using S3 is that any data contained in backups is now regionally resilient, because it's stored in S3, which replicates data across multiple AWS availability zones within that region. RDS backups, when they do occur, are taken in most cases from the standby instance if you have multi-AZ enabled. So, while they do cause an I/O pause, this occurs from the standby instance, and so there won't be any application performance issues. If you don't use multi-AZ, for example, with test and development instances, then the backups are taken from the only available instance, so you may have pauses in performance.
Now, I want to step through how backups work in a little bit more detail, and I'm going to start with snapshots. Snapshots aren't automatic; they're things that you run explicitly or via a script or custom application. You have to run them against an RDS database instance. They're stored in S3, which is managed by AWS, and they function like the EBS snapshots that you've covered elsewhere in the course. Snapshots and automated backups are taken of the instance, which means all the databases within it, rather than just a single database. The first snapshot is a full copy of the data stored within the instance, and from then on, snapshots only store data which has changed since the last snapshot.
When any snapshot occurs, there is a brief interruption to the flow of data between the compute resource and the storage. If you're using single AZ, this can impact your application. If you're using multi-AZ, this occurs on the standby, and so won't have any noticeable effect. Time-wise, the initial snapshot might take a while; after all, it's a full copy of the data. From then on, snapshots will be much quicker because only changed data is being stored. Now, the exception to this are instances where there's a lot of data change. In this type of scenario, snapshots after the initial one can also take significant amounts of time. Snapshots don't expire; you have to clear them up yourself. It means that snapshots live on past when you delete the RDS instance. Again, they're only deleted when you delete them manually or via some external process. Remember that one because it matters for the exam.
Now you can run one snapshot per month, one per week, one per day, or one per hour. The choice is yours because they're manual. And one way that lower recovery point objectives can be met is by taking more frequent snapshots. The lower the time frame between snapshots, the lower the maximum data loss that can occur when you have a failure. Now, this is assuming we only have snapshots available, but there is another part to RDS backups, and that's automated backups. These occur once per day, but the architecture is the same. The first one is a full, and any ones which follow only store changed data. So far, you can think of them as though they're automated snapshots, because that's what they are. They occur during a backup window which is defined on the instance. You can allow AWS to pick one at random or use a window which fits your business. If you're using single AZ, you should make sure that this happens during periods of little to no use, as again there will be an I/O pause. If you're using multi-AZ, this isn't a concern, as the backup occurs from the standby.
In addition to this automated snapshot, every five minutes, database transaction logs are also written to S3. Transaction logs store the actual operations which change the data, so operations which are executed on the database. And together with the snapshots created from the automated backups, this means a database can be restored to a point in time with a five-minute granularity. In theory, this means a five-minute recovery point objective can be reached. Now automated backups aren't retained indefinitely; they're automatically cleared up by AWS, and for a given RDS instance, you can set a retention period from zero to 35 days. Zero means automated backups are disabled, and the maximum is 35 days. If you use a value of 35 days, it means that you can restore to any point in time over that 35-day period using the snapshots and transaction logs, but it means that any data older than 35 days is automatically removed.
When you delete the database, you can choose to retain any automated backups, but, and this is critical, they still expire based on the retention period. The way to maintain the contents of an RDS instance past this 35-day max retention period is that if you delete an RDS instance, you need to create a final snapshot, and this snapshot is fully under your control and has to be manually deleted as required. Now, RDS also allows you to replicate backups to another AWS region, and by backups, I mean both snapshots and transaction logs. Now, charges apply for both the cross-region data copy and any storage used in the destination region, and I want to stress this really strongly. This is not the default. This has to be configured within automated backups. You have to explicitly enable it.
Now let's talk a little bit about restores. The way RDS handles restores is really important, and it's not immediately intuitive. It creates a new RDS instance when you restore an automated backup or a manual snapshot. Why this matters is that you will need to update applications to use the new database endpoint address because it will be different than the existing one. When you restore a manual snapshot, you're restoring the database to a single point in time. It's fixed to the time that the snapshot was created, which means it influences the RPO. Unless you created a snapshot right before a failure, then chances are the RPO is going to be suboptimal. Automated backups are different. With these, you can choose a specific point to restore the database to, and this offers substantial improvements to RPO. You can choose to restore to a time which was minutes before a failure.
The way that it works is that backups are restored from the closest snapshot, and then transaction logs are replayed from that point onwards, all the way through to your chosen time. What's important to understand though is that restoring snapshots isn't a fast process. If appropriate for the exam that you're studying, I'm going to include a demo where you'll get the chance to experience this yourself practically. It can take a significant amount of time to restore a large database, so keep this in mind when you think about disaster recovery and business continuity. The RDS restore time has to be taken into consideration.
Now in another video elsewhere in this course, I'm going to be covering read replicas, and these offer a way to significantly improve RPO if you want to recover from failure. So, RDS automated backups are great as a recovery to failure, or as a restoration method for any data corruption, but they take time to perform a restore, so account for this within your RTO planning. Now once again, if appropriate for the exam that you're studying, you're going to get the chance to experience a restore in a demo lesson elsewhere in the course, which should reinforce the knowledge that you've gained within this theory video. If you don't see this then don't worry, it's not required for the exam that you're studying.
At this point though, that is everything I wanted to cover in this video, so go ahead and complete the video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this video, I want to talk through the ways in which RDS offers high availability. Historically, there was one way: multi-AZ. Over time, RDS has been improved, and now there are multi-AZ instance deployments and multi-AZ cluster deployments, each offering different benefits and trade-offs. In this video, I want to step through the architecture and functionality of both. Now, we do have a lot to cover, so let's jump in and get started straight away.
Historically, the only method of providing high availability that RDS had was multi-AZ, which is now called multi-AZ instance deployment. With this architecture, RDS has a primary database instance containing any databases that you create. When you enable multi-AZ mode, this primary instance is configured to replicate its data synchronously to a standby replica running in another availability zone, meaning the standby also has a copy of your databases.
In multi-AZ instance mode, this replication occurs at the storage level, which is less efficient than the cluster multi-AZ architecture, but more on this later in the video. The exact method that RDS uses for this replication depends on the database engine you pick. MariaDB, MySQL, Oracle, and PostgreSQL use Amazon failover technology, whereas Microsoft's SQL instances use SQL server database mirroring or always-on availability groups. In any case, this is abstracted away, and all you need to understand is that it's a synchronous replica.
Architecturally, all accesses to the databases are via the database CNAME, which is a DNS name that by default points at the primary database instance. With multi-AZ instance architecture, you always access the primary database instance, with no access to the standby, even for reads. Its job is to simply sit there until a failure scenario occurs with the primary instance. Other things, such as backups, can occur from the standby, so data is moved into S3 and then replicated across multiple availability zones in that region, placing no extra load on the primary because it's occurring from the standby.
Remember, all accesses, both reads and writes, from this multi-AZ architecture occur to and from the primary instance. If anything happens to the primary instance, RDS detects it, and a failover occurs. This can be done manually if you're testing or need to perform maintenance, but generally, this is an automatic process. What happens in this scenario is that the database CNAME changes from pointing at the primary to pointing at the standby, which becomes the new primary. Since this is a DNS change, it generally takes between 60 to 120 seconds for this to occur, meaning there can be brief outages.
This can be reduced by removing any DNS caching in your application for this specific DNS name. If you do remove this caching, it means the second RDS has finished the failover, and the DNS name has been updated. Your application will use this name, which now points at the new primary instance. So, this is the architecture when you're using the older multi-AZ instance architecture.
I want to cover a few key points of this architecture before we look at how multi-AZ cluster architecture works. So, just to summarize, replication between primary and standby is synchronous, meaning that data is written to the primary and then immediately replicated to the standby before being viewed as committed. Now, multi-AZ does not come within the free tier because of the extra cost for the standby replica that's required. And multi-AZ with the instance architecture means that you only have one standby replica, and that's important. It's only one standby replica, and this standby replica cannot be used for reads or writes. Its job is to simply sit there and wait for failover events.
A failover event can take anywhere from 60 to 120 seconds to occur, and multi-AZ mode can only be within the same region, meaning different availability zones within the same AWS region. Backups can be taken from the standby replica to improve performance, and failovers will occur for various reasons such as availability zone outage, the failure of the primary instance, manual failover, instance type change (when you change the type of the RDS instance), and even when you're patching software. So, you can use failover to move any consumers of your database onto a different instance, patch the instance with no consumers, and then flip it back. So, it does offer some great features that can help you maintain application availability.
Now, next, I want to talk about multi-AZ using a cluster architecture. When you watch the Aurora video, you might be confused between this architecture and Amazon Aurora. So, I'm going to stress the differences between multi-AZ cluster for RDS and Aurora in this video, and this is to prepare you for when you watch the Aurora video. It's really critical for you to understand the differences between multi-AZ cluster mode for RDS and Amazon Aurora.
We start with a similar VPC architecture, but now in addition to the single client device on the left, I'm adding two more. In this mode, RDS is capable of having one writer replicate to two reader instances, which is a key difference between this and Aurora. With this mode of RDS multi-AZ, you can have two readers only. These are in different availability zones than the writer instance, but there will only be two, whereas with Aurora, you can have more. The difference between this mode of multi-AZ and the instance mode is that these readers are usable. You can think of the writer like the primary instance within multi-AZ instance mode in that it can be used for both writes and read operations.
The reader instances, unlike multi-AZ instance mode, can be utilized while they're in this state. They can be used only for read operations, which will need application support since your application needs to understand that it can't use the same instance for reads and writes. But it means you can use this multi-AZ mode to scale your read workloads, unlike multi-AZ instance mode.
In terms of replications between the writer and the readers, while data is sent to the writer and it's viewed as being committed when at least one of the readers confirms that it's been written, it's resilient at that point across multiple availability zones within that region. The cluster that RDS creates to support this architecture is different in some ways and similar in others compared to Aurora. In RDS multi-AZ mode, each instance still has its own local storage, which, as you'll see elsewhere in this course, is different than Aurora. Like Aurora, though, you access the cluster using a few endpoint types.
The first is the cluster endpoint, which you can think of like the database CNAME in the previous multi-AZ architecture. It points at the writer and can be used for both reads and writes against the database or administration functions. Then, there's a reader endpoint, which points at any available reader within the cluster, and in some cases, this includes the writer instance. Remember, the writer can also be used for reads. In general operation, though, this reader endpoint will be pointing at the dedicated reader instances, and this is how reads within the cluster scale.
So, applications can use the reader endpoint to balance their read operations across readers within the cluster. Finally, there are instance endpoints, and each instance in the cluster gets one of these. Generally, it's not recommended to use them directly, as it means any operations won't be able to tolerate the failure of an instance because they don't switch over to anything if there's an instance failure. So, you generally only use these for testing and fault finding. This is the multi-AZ cluster architecture.
Before I finish up with this video, I just want to cover a few key points about this specific type of multi-AZ implementation. And don't worry, you're going to get the chance to experience RDS practically in other videos in this part of the course. First, RDS using multi-AZ in cluster mode means one writer and two reader DB instances in different availability zones. This gives you a higher level of availability versus instance mode because you have this additional reader instance versus the single standby instance in multi-AZ instance mode.
In addition, multi-AZ cluster mode runs on much faster hardware, using Graviton architecture and local NVMe SSD storage. So, any writes are written first to local superfast storage and then flushed through to EBS. This gives you the benefit of local superfast storage, in addition to the availability and resilience benefits of EBS. Furthermore, when multi-AZ uses cluster mode, readers can be used to scale read operations against the database. So, if your application supports it, you can set read operations to use the reader endpoint, which frees up capacity on the writer instance and allows your RDS implementation to scale to high levels of performance versus any other mode of RDS.
And again, you'll see when you're watching the Aurora video, Aurora as a database platform can scale even more. I'll detail exactly how in that separate video. When using multi-AZ in cluster mode, replication is done using transaction logs, which is much more efficient. This also allows for faster failover. In this mode, failover can occur in as little as 35 seconds, plus any time required to apply the transaction logs to the reader instances. But in any case, this will occur much faster than the 60 to 120 seconds needed when using multi-AZ instance mode.
And just to confirm, when running in this mode, writes are viewed as committed when they've been sent to the writer instance and stored, then replicated to at least one reader, which has confirmed that it's written that data. So, as you can see, these are completely different architectures, and in my opinion, multi-AZ in cluster mode adds significant benefits over instance mode. You'll see how this functionality is extended again when I talk about Amazon Aurora, but for now, that's everything I wanted to cover in this video. Thanks for watching, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this video, which is the first of this series, I'm going to step through the architecture of the relational database service known as RDS. Now, this video will focus on the architecture of the product, with upcoming videos going into specific features in more depth. Now, we do have a lot to cover, so let's jump in and get started.
Now, I've heard many people refer to RDS as a database as a service or DB AAS product. Now, details are important, and you need to understand why this is not the case. A database as a service product is where you pay money, and in return, you get a database; this isn't what RDS does. With RDS, you pay for and receive a database server, so it would be more accurate to call it a database server as a service product. Now, this matters because it means that on this database server or instance, which RDS provides, you can have multiple databases.
RDS provides a managed version of a database server that you might have on-premises, only with RDS, you don't have to manage the hardware, the operating system, or the installation, as well as much of the maintenance of the DB engine, and RDS, of course, runs within AWS.
Now, with RDS, you have a range of database engines to use, including MySQL, Maria DB, PostgreSQL, and then commercial databases such as Oracle and Microsoft SQL. Some of these are open source, and some are commercial, and so there will be licensing implications; and if appropriate for the exam that you're working towards, there will be a separate video on this topic.
Now, there's one specific term that I want you to disassociate from RDS, and that's Amazon Aurora. You might see Amazon Aurora discussed commonly along with RDS, but this is actually a different product. Amazon Aurora is a custom database engine and product created by AWS, which has compatibility with some of the above engines, but it was designed entirely by AWS. Many of the features I'll step through while talking about RDS are different for Aurora, and most of these are improvements, so in your mind, separate Aurora from RDS.
So, in summary, RDS is a managed database server as a service product. It provides you with a database instance—so a database server—which is largely managed by AWS. Now, you don't have access to the operating system or SSH access. Now, I have a little asterisk here because there is a variant of RDS called RDS Custom where you do have some more low-level access, but I'll be covering that in a different video if required. In general, when you think about RDS, think no SSH access and no operating system access.
Now, what I think might help you at this point is to look at a typical RDS architecture visually, and then over the remaining videos in this series, I'll go into more depth on certain elements of the product.
So, RDS is a service which runs within a VPC, so it's not a public service like S3 or DynamoDB; it needs to operate in subnets within a VPC in a specific AWS region, and for this example, let's use US East 1, and to illustrate some cross-region bits of this architecture, our second region will be AP Southeast 2. And then we're going to have within US East 1 a VPC, and let's use three availability zones here: A, B, and C.
Now, the first component of RDS which I want to introduce is an RDS subnet group. This is something that you create, and you can think of this as a list of subnets which RDS can use for a given database instance or instances. So, in this case, let's say that we create one which uses all three of the availability zones; in reality, this means adding any subnets in those three availability zones which you want RDS to use, and in this example, I'm going to actually create another one. We're going to have two database subnet groups, and you'll see why in a second.
In the top database subnet group, let's say I add two public subnets, and in the bottom database subnet group, let's say three private subnets. So, when launching an RDS instance, whether you pick to have it highly available or not—and I'll talk about how this works in an upcoming video—you need to pick a DB subnet group to use.
So, let's say that I picked the bottom database subnet group and launched an RDS instance, and I chose to pick one with high availability. So, it would pick one subnet for the primary instance and another for the standby. It picks at random unless you indicate a specific preference, but it will put the primary and standby within different availability zones. Now, because these database instances are within private subnets, it means that they would be accessible from inside the VPC or from any connected networks such as on-premises networks connected using VPNs or Direct Connect, or any other VPCs which are peered with this one, and I'll cover all of those topics elsewhere in the course if I haven't already done so.
Now, I could also launch another set of RDS instances using the top database subnet group, and the same process would be followed; assuming that I picked to use multi-AZ, RDS would pick two different subnets in two different availability zones to use.
Now, because these are public subnets, we could also, if we really wanted to, elect to make these instances accessible from the public internet by giving them public addressing, and this is something which is really frowned upon from a security perspective, but it's something that you need to know is an option when deploying RDS instances into public subnets.
Now, you can use a single DB subnet group for multiple instances, but then you're limited to using the same defined subnets. If you want to split databases between different sets of subnets, as with this example, then you need multiple DB subnet groups, and generally, as a best practice, I like to have one DB subnet group for one RDS deployment—I find it gives me the best overall flexibility.
Okay, so another few important aspects of RDS which I want to cover: first, RDS instances can have multiple databases on them; second, every RDS instance has its own dedicated storage provided by EBS, so if you have a multi-AZ pair—primary and standby—each has their own dedicated storage. Now, this is different than how Amazon Aurora handles storage, so try to remember this architecture for RDS—each instance has its own dedicated EBS-provided storage.
Now, if you choose to use multi-AZ, as in this architecture, then the primary instances replicate to the standby using synchronous replication. Now, this means that the data is replicated to the standby as soon as it's received by the primary; it means the standby will have the same set of data as the primary—so the same databases and the same data within those databases.
Now, you can also decide to have read replicas; I'll be covering what these are and how they work in another dedicated video, but in summary, read replicas use asynchronous replication, and they can be in the same region but also other AWS regions. These can be used to scale read load or to add layers of resilience if you ever need to recover in a different AWS region.
Now, lastly, we also have backups of RDS. There is a dedicated video covering backups later on in this section of the course, but just know that backups occur to S3—it’s to an AWS-managed S3 bucket, so you don't see the bucket within your account, but it does mean that data is replicated across multiple availability zones in that region. So, if you have an AZ failure, backups will ensure that your data is safe. If you use multi-AZ mode, then backups occur from the standby instance, which means no negative performance impact.
Now, this is the basic product architecture; I'll be expanding on all of these key areas in dedicated videos, as well as giving you the chance to get practical experience via some demos and mini-projects if appropriate.
For now, let's cover one final thing before we finish this video, and that's the cost architecture of RDS. So, before I finish the video, I want to talk about RDS costs because it's a database server as a service product—you're not really billed based on your usage. Instead, like EC2, which RDS is loosely based on, you're billed for resource allocation, and there are a few different components to RDS's cost architecture.
First, you've got the instance size and type—logically, the bigger and more feature-rich the instance, the greater the cost, and this follows a similar model to how EC2 is billed; the fee that you see is an hourly rate, but it's billed per second.
Next, we have the choice of whether multi-AZ is used or not—because multi-AZ means more than one instance, there's going to be additional cost. Now, how much more cost depends on the multi-AZ architecture, which I'll be covering in detail in another video.
Next is a per-gig monthly fee for storage, which means the more storage you use, the higher the cost, and certain types of storage such as provisioned IOPS cost more; and again, this is aligned to how EBS works because the storage is based on EBS.
Next is the data transfer costs, and this is a cost per gig of data transfer in and out of your DB instance from or to the internet and other AWS regions.
Next, we have backups and snapshots—so you get the amount of storage that you pay for for the database instance in snapshot storage for free. So, if you have 2 TB of storage, then that means 2 TB of snapshots for free. Beyond that, there is a cost, and this cost is gig per month of storage, so the more data is stored, the more it costs; the longer it's stored, the more it costs—one TB for one month is the same cost as 500 GB for two months, so it's a per-GB-month cost.
And then finally, we have any extra costs based on using commercial DB engine types, and again, I'll be covering this if appropriate in a dedicated video elsewhere in the course.
Okay, so at this point, that is everything I wanted to cover in this video. As I mentioned at the start, this is just an introduction to RDS architecture; we're going to be going into more detail on specific key points in upcoming videos, but for now, that's everything I wanted to cover. So, go ahead and complete the video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover something which can be argued is bad practice to do inside AWS and that's running databases directly on EC2, as you'll find out in this section of the course there are lots of AWS products which provide database services so running any database on EC2 at best requires some justification.
In this lesson I want to step through why you might want to directly run databases on EC2 and why it's also a bad idea; it's actually always a bad idea to run databases on EC2, and the argument really is whether the benefits to you or your business outweigh the fact that it is a bad idea.
So let's jump in and take a look at some of the reasons why you should and shouldn't run databases on EC2.
Generally when people think about running databases on EC2 they picture one of two things: first, a single instance and on this instance you're going to be running a database platform, an application of some kind and perhaps a web server such as Apache, or you might picture a simple split architecture where the database is separated from the web server and application, so you'll have two instances, probably smaller instances than the single large one.
And architecturally I hope this makes sense so far, since so far in the course with the Animals for Life WordPress application stack example we've used the architecture on the left—a single EC2 instance with all of the application tiers or components on one single instance, crucially one single instance running within a single availability zone.
Now if you have a split architecture like on the right you can either have both EC2 instances inside the same availability zone or you could split the instances across two—so AZA and AZB.
Now when you change the architecture in this way, when you split up the components into separate instances, whether you decide to put those both in the same availability zone or split them, you need to understand that you've introduced a dependency into the architecture—the dependency that you've introduced is that there needs to be reliable communication between the instance running the application and the database instance, if not the application won't work.
And if you do decide to split these instances across multiple availability zones then you should also be aware that there is a cost for data when it's transiting between different availability zones in the same region—it's small but it does exist.
Now that's in contrast to where communications between instances using private IPs in the same availability zone is free, so that's a lot to think about from an architectural perspective, but that's what we mean when we talk about running databases on EC2—this is the architecture, generally one or more EC2 instances with at least one of them running the database platform.
Now there are some reasons why you might want to run databases on EC2 in your own environment—you might need access to the operating system of the database and the only way that you can have this level of access is to run on EC2 because other adbs products don't give you OS level access.
This is one of those things though that you should really question if a client requests it because there aren't many situations where OS level access is really a requirement—do they need it, do they want it, or do they only think that they want it?
So if you have a client or if your business states that they do need OS level access the first thing that you should do is question that statement.
Now there are some database tuning things which can only be done with root level access and because you don't have this level of access with managed database products then these values or these configuration options won't be tuneable.
But in many cases and you'll see this later on in this section AWS does allow you to control a lot of these parameters that historically you would need root access for without having root access.
So again this is one of those situations where you need to question any situation where it's presented to you that you need database root access—it's worth noting often that it's an application vendor demanding this level of access not the business themselves, but again it's often the case that you need to delve into the justifications.
This level of access is often not required and a lot of software vendors now explicitly support AWS's managed database products, so again verify any suggestion of this level of access.
Now something that is often justified is that you might need to run a database or a database version which AWS don't provide—this is certainly possible and more so with emerging types of databases or databases with really niche use cases.
You might actually need to implement an application with a particular database that is not supported by AWS and any of its managed database products, and in that case the only way of servicing that demand is to install that database on EC2.
So that's one often justified reason for running databases on EC2, or it might be that a particular project that you're working on has really, really detailed and specific requirements and you need a very specific version of an OS and a very specific version of a DB in combination which AWS don't provide.
Or you might need or want to implement an architecture which AWS also don't provide—certain types of replication done in certain ways or at certain times.
Or it could be something as simple as the decision makers in your organization just want a database running on EC2—you could argue that they're being unreasonable to just demand a database running on EC2 but in many cases you might not have a choice.
So it can always be done—you can run databases on EC2 as long as you're willing to accept the negatives, so these are all valid, some of them I would question or fight or ask for justification but situations certainly do exist which require you to use databases on EC2.
And I'm stressing these because I've seen tricky exam questions where the right answer is to use a database on EC2, so I want to make sure that you've got fresh in your mind some of the styles of situations where you might actually want to run a database on EC2.
But now let's talk about why you really shouldn't put a database product on EC2—even with the previous screen in mind, even with all of those justifications, you need to be aware of the negatives.
And the first one is the admin overhead—the admin overhead of managing the EC2 instance as well as the database host, the database server, both of these require significant management effort.
Don't underestimate the effort required to keep an EC2 instance patched or keep a database host running at a certain compatible level with your application—you might not be able to upgrade or you might have to upgrade and keep the database version running in a very narrow range in order to be compatible with the application.
And whenever you perform upgrades or whenever you're fault finding, you need to do it out of core usage hours, which could mean additional time, stress and cost for staff to maintain both of these components.
Also don't forget about backups and disaster recovery management—so if your business has any disaster recovery planning, running databases on EC2 adds a lot of additional complexity.
And in this area, when you're thinking about backups and DR, many of AWS's managed database products we'll talk about throughout this section include a lot of automation to remove a lot of this admin overhead.
Perhaps one of the most serious limitations though is that you have to keep in mind that EC2 is running in a single availability zone—so if you're running on an EC2 instance, keep in mind you're running on an EBS volume in an EC2 instance, both of those are within a single availability zone.
If that zone fails, access to the database could fail and you need to worry about taking EBS snapshots or taking backups of the database inside the database server and putting those on storage somewhere, maybe S3—again, it's all admin overhead and risk that your business needs to be aware of.
Another issue is features—some of AWS's database products genuinely are amazing, a lot of time and effort and money have been put in on your behalf by AWS to make these products actually better than what you can achieve by installing database software on EC2.
So by limiting yourself to running databases on EC2, you're actually missing out on some of the advanced features and we'll be talking about all of those throughout this section of the course.
Another aspect is that EC2 is on or off—EC2 does not have any concept of serverless because explicitly it is a server, you're not going to be able to scale down easily or keep up with bursty style demand.
There are some AWS managed database products we'll talk about in this section which can scale up or down rapidly based on load, and by running a database product on EC2, you do limit your ability to scale and you do set a base minimum cost of whatever the hourly rate is for that particular size of EC2 instance—so keep that in mind.
So again, if you're being asked to implement this by your business, you should definitely fight this fight and get the business to justify why they want the database product on EC2 because they're missing out on some features and they're committing themselves to costs that they might not need to.
There's also replication—so if you've got an application that does need replication, there are the skills to set this up, the setup time, the monitoring and checking for its effectiveness and all of this tends to be handled by a lot of AWS's managed database products, so again, there's a lot of additional admin overhead that you need to keep in mind.
And lastly, we've got performance—this relates in some way to when I talked about features moments ago.
AWS do invest a considerable amount of time into optimization of their database products and implementing performance based features, and if you simply take an off the shelf database product and implement it on EC2, you're not going to be able to take advantage of these advanced performance features, so keep that in mind.
If you do run database software directly on EC2, you're limiting the performance that you can achieve.
But with that out of the way, that's all of the theory and logic that I wanted to cover in this lesson, so now you have an idea about why you should and why you shouldn't run your own database on an EC2 instance.
In the next lesson, which is a demo, we're going to take the single instance WordPress deployment that we've been using so far in the course, and we're going to evolve it into two separate EC2 instances—one of these is going to be running Apache and WordPress, so it's going to be the application server, and the other is going to be running a database server MariaDB.
Now this kind of evolution is best practice, at least as much as it can ever be best practice to run a self-managed database platform.
Now the reason we're doing this is we want to split up our single monolithic application stack—we want to get it to the point so the database is not running on the same instance as the application itself, because once we've done that, we can move that database into one of AWS's managed database products later in this section.
And that will allow us to take advantage of these features and performance that these products deliver.
It's never a good idea to have a single monolithic application stack when you can avoid it, so the way that we're running WordPress at the moment is not best practice for an enterprise application.
So by splitting up the application from the database, as we go through the course, it will allow us to scale each of these independently and take advantage independently of different AWS products and services, which can help us improve each component of our application.
So with that being said, go ahead and finish up this video, and then when you're ready, you can join me in the next lesson, which is going to be a demo where we're going to split up this monolithic WordPress architecture into two separate compute instances.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome to this lesson where I want to provide a really quick theoretical introduction to Acid and Base, which are two database transaction models that you might encounter in the exam and in the real world. Now this might seem a little abstract, but it does feature on the exam, and I promise in real world usage knowing this is a database superpower. So let's jump in and get started.
Acid and Base are both acronyms and I'll explain what they stand for in a moment, but they are both database transaction models. They define a few things about transactions to and from a database, and this governs how the database system itself is architected. At a real foundational level, there's a computer science theorem called the CAP theorem, and it stands for consistency, availability, and partition tolerance.
Now let's explore each of these quickly because they really matter. Consistency means that every read to a database will receive the most recent write or it will get an error. On the other hand, availability means that every request will receive a non-error response, but without the guarantee that it contains the most recent write, and that's important. Partition tolerance means that the system can be made of multiple network partitions, and the system continues to operate even if there are a number of dropped messages or errors between these network nodes.
Now the CAP theorem states that any database product is only capable of delivering a maximum of two of these different factors. One reason for this is that if you imagine that you have a database with many different nodes, all of these are on a network, imagine if communication fails between some of the nodes or if any of the nodes fail. Well you have two choices if somebody reads from that database: you can cancel the operation and thus decrease the availability but ensure the consistency, or you can proceed with the operation and improve the availability but risk the consistency. So as I just mentioned, it's widely regarded as impossible to deliver a database platform which provides more than two of these three different elements.
So if you have a database system which has multiple nodes and if a network is involved, then you generally have a choice to provide either consistency or availability, and the transaction models of ACID and BASE choose different trade-offs. ACID focuses on consistency and BASE focuses on availability. Now there is some nuance here and some additional detail but this is a high-level introduction. I'm only covering what's essential to know for the exam.
So let's quickly step through the trade-offs which each of these makes and we're going to start off with ACID. ACID means that transactions are atomic, transactions are also consistent, transactions are also isolated, and then finally, transactions are durable. And let's get the exam power-up out of the way: generally if you see ACID mentioned, then it's probably referring to any of the RDS databases. These are generally ACID-based and ACID limits the ability of a database to scale and I want to step through some of the reasons why.
Now I'm going to keep this high-level but I've included some links attached to this lesson if you want to read about this in additional detail. In this lesson though I'm going to keep it to what is absolutely critical for the exam. So let's step through each of these individually.
Atomic means that for a transaction either all parts of a transaction are successful or none of the parts of a transaction are successful. Consider if you run a bank and you want to transfer $10 from account A to account B; that transaction will have two parts: part one will remove $10 from account A and part two will add $10 to account B. Now you don't want a situation where the first part or the second part of that transaction can succeed on its own and the other part can fail — either both parts of a transaction should be successful or no parts of the transaction should be applied and that's what atomic means.
Now consistent means that transactions applied to the database move the database from one valid state to another — nothing in between is allowed. In databases such as relational databases there may well be links between tables where an item in one table must have a corresponding item in another where values might need to be in certain ranges, and this element just means that all transactions need to move the database from one valid state to another as per the rules of that database.
Isolated means that because transactions to a database are often executed in parallel they need not to interfere with each other; isolation ensures that concurrent executions of transactions leave the database in the same state that would have been obtained if transactions were executed sequentially. So this is essential for a database to be able to run lots of different transactions at the same time maybe from different applications or different users — each of them need to execute in full as they would do if they were the only transaction running on that database, they need not to interfere with each other.
And then finally we have durable which means that once a transaction has been committed it will remain committed even in the case of a system failure — once the database tells the application that the transaction is complete and committed once it's succeeded that data is stored somewhere that system failure or power failure or the restart of a database server or node won't impact the data.
Now most relational database platforms use acid-based transactions — it's why financial institutions generally use them because it implements a very rigid form of managing data and transactions on that data, but because of these rigid rules it does limit scalability.
Now next we have BASE, and BASE stands for basically available, it also stands for soft state, and then lastly it stands for eventually consistent — and again this is super high level and I've included some links attached to this lesson with more information.
Now it's also going to sound like I'm making fun of this transaction model because some of these things seem fairly odd, but just stick with me and I'll explain all of the different components. Basically available means that read and write operations are available as much as possible but without any consistency guarantees — so reads and writes are kinder or maybe. Essentially rather than enforcing immediate consistency, BASE-modeled NoSQL databases will ensure availability of data by spreading and replicating that data across all of the different nodes of that database. There isn't really an aim within the database to guarantee anything to do with consistency — it does its best to be consistent but there's no guarantee.
Now soft state is another one which is a tiny bit laughable in a way — it means that BASE breaks off with the concept of a database which enforces its own consistency; instead it delegates that responsibility to developers. Your application needs to be aware of consistency and state and work around the database if you need immediate consistency — so if you need a read operation to always have access to all of the writes which occurred before it immediately, and if the database optionally allows it, then your application needs to specifically ask for it. Otherwise, your application has to tolerate the fact that what it reads might not be what another instance of that application has previously written — so with soft state databases your application needs to deal with the possibility that the data that you're reading isn't the same data that was written moments ago.
Now all of these are fairly fuzzy and do overlap but lastly we have the fact that BASE does not enforce immediate consistency — it means that it might happen eventually, if we wait long enough then what we read will match what has been previously written eventually.
Now this is important to understand because generally by default a BASE transaction model means that any reads to a database are eventually consistent — so applications do need to tolerate the fact that reads might not always have the data for previous writes. Many databases are capable of providing both eventually consistent and immediately consistent reads, but again the application has to have an awareness of this and explicitly ask the database for consistent reads.
Now it sounds like BASE transactions are pretty bad right? Well not really — databases which use BASE are actually highly scalable and can deliver really high performance because they don't have to worry about all the pesky annoying things like consistency within the database — they offload that to the applications.
Now DynamoDB within AWS is an example of a database which normally works in a BASE-like way — it offers both eventually and immediately consistent reads but your application has to be aware of that. Now DynamoDB also offers some additional features which offer ACID functionality such as DynamoDB transactions, so that's something else to keep in mind.
Now for the exam specifically I have a number of useful defaults: if you see the term BASE mentioned then you can safely assume that it means a NoSQL style database; if you see the term ACID mentioned then you can safely assume as a default that it means an RDS database — but if you see NoSQL or DynamoDB mentioned together with ACID then it might be referring to DynamoDB transactions and that's something to keep in mind.
Now that's everything I wanted to cover in this high-level lesson about the different transaction models. This topic is relatively theoretical and pretty deep and there's a lot of extra reading, but I just wanted to cover the essentials of what you need for the exam, so I've covered all of those facts in this lesson and at this point it is the end of the lesson — so thanks for watching, go ahead and complete the video, and then when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. This is part two of this lesson, and we're going to continue immediately from the end of part one, so let's get started.
Now, there are other types of database platforms—NoSQL platforms—and this doesn't represent one single way of doing things, so I want to quickly step through some of the common examples of NoSQL databases or non-relational databases.
The first type of database in the NoSQL category that I want to introduce is key-value databases, and the title gives away the structure: key-value databases consist of sets of keys and values. There's generally no concept of structure; it's just a list of keys and value pairs. In this case, it's a key-value database for one of the animals for life rescue centers, and it stores the date and time and a sensor reading from a feeding sensor, recording the number of cookies removed from the feeder during the previous 60 minutes. So essentially, the key on the left stores the date and time, and on the right is the number of cookies eaten as detected by the sensor during the previous 60 minutes.
So that's it for this type of database—it's nothing more complex than that—it's just a list of key-value pairs. As long as every single key is unique, then the value doesn't matter. It has no real schema, nor does it have any real structure, because there are no tables or table relationships. Some key-value databases allow you to create separate lists of keys and values and present them as tables, but they're only really used to divide data—there are no links between them.
This makes key-value databases really scalable because sections of this data could be split onto different servers. In general, key-value databases are just really fast; it's simple data with no structure, and there isn't much that gets in the way between giving the data to the database and it being written to disk. For key-value databases, only the key matters—you write a value to a key and you read a value from a key.
The value is opaque to the database; it could be text, it could be JSON, it could be a cat picture—it doesn't matter. In the exam, look out for any question scenarios which present simple requirements or mention data which is just names and values or pairs or keys and values—look out for questions which suggest no structure. If you see any of these types of scenarios, then key-value stores are generally really appropriate.
Key-value stores are also used for in-memory caching, so if you see any questions in the exam that talk about in-memory caching, then key-value stores are often the right way to go, and I'll be introducing some products later in the course which do provide in-memory key-value storage.
Okay, so let's move on, and the next type of database that I want to talk about is actually a variation of the previous model—a variation on key-value—and it's called a wide column store.
Now, this might look familiar to start with. Each row or item has one or more keys; generally, one of them is called the partition key, and then optionally, you can have additional keys as well as the partition key. Now, DynamoDB, which is an AWS example of this type of database—this secondary key is called the sort or the range key. It differs depending on the database, but most examples of wide column stores generally have one key as a minimum, which is the partition key, and then optionally, every single row or item in that database can have additional keys.
Now that's really the only rigid part of a wide column store—every item in a table has to have the same key layout, so that's one key or more keys, and they just need to be unique to that table. Wide column stores offer groupings of items called tables, but they're still not the same type of tables as in relational database products—they're just groupings of data.
Every item in a table can also have attributes, but—and this is really important—they don't have to be the same between items. Remember how in relational database management systems, every table had attributes, and then every row in that table had to have a value for every one of those attributes? That is not the case for most NoSQL databases and specifically wide column stores—because that's what we're talking about now.
In fact, every item can have any attribute—it could have all of the attributes, so all of the same attributes between all of the items, it could have a mixture, so mix and matching attributes on different items, or an item could even have no attributes. There is no schema, no fixed structure on the attribute side—it’s normally partially opaque for most database operations.
The only thing that matters in a wide column store is that every item inside a table has to use the same key structure, and it has to have a unique key—so whether that's a single partition key or whether it's a composite key (a partition key and something else). If it's a single key, it has to be unique; if it's a composite key, the combination of both of those values has to be unique. That's the only rule for placing data into a table using wide column stores.
Now DynamoDB inside AWS is an example of this type of database, so DynamoDB is a wide column store. Now this type of database has many uses—it’s very fast, it’s super scalable, and as long as you don’t need to run relational operations such as SQL commands on the database, it often makes the perfect database product to take advantage of, which is one of the reasons why DynamoDB features so heavily amongst many web-scale or large-scale projects.
Okay, so let's move on. And next I want to talk about a document database, and this is a type of NoSQL database that's designed to store and query data as documents. Documents are generally formatted using a structure such as JSON or XML, but often the structure can be different between documents in the same database.
You can think of a document database almost like an extension of a key-value store, where each document is interacted with via an ID that’s unique to that document, but the value—the document contents—are exposed to the database, allowing you to interact with it. Document databases work best for scenarios like order databases or collections or contact-style databases—situations where you generally interact with the data as a document.
Document databases are also great when you need to interact with deep attributes—so nested data items within a document structure. The document model works well with use cases such as catalogs, user profiles, and lots of different content management systems where each document is unique, but it changes over time, so it might have different versions; documents might be linked together in hierarchical structures or when you're linking different pieces of content in a content management system.
For any use cases like this, document-style databases are perfect. Each document has a unique ID, and the database has access to the structure inside the document. Document databases provide flexible indexing, so you can run really powerful queries against the data that could be nested deep inside a document.
Now let’s move on. Column databases are the next type of database type that I want to discuss, and understanding the power of these databases requires knowing the limitations of their counterpart—row-based databases—which is what most SQL-based databases use. Row-based databases are where you interact with data based on rows.
So in this example, we have an orders table; it has order ID, the product ordered, color, size, and price. For every order, we have a row, and those rows are stored on disk together. If you needed to read the price of one order from the database, you read the whole row from disk. If you don’t have indexes or shortcuts, you’ll have to find that row first, and that could mean scanning through rows and rows of data before you reach the one that you want to query.
Now if you want to do a query which operates over lots of rows—for example, you wanted to query all the sizes of every order—then you need to go through all of the rows, finding the size of each. Row-based databases are ideal when you operate on rows—creating a row, updating a row, or deleting rows. Row-based databases are often called OLTP or Online Transaction Processing Databases, and they are ideal, as the name suggests, for systems which are performing transactions—so order databases, contact databases, stock databases, things which deal in rows and items where these rows and items are constantly accessed, modified, and removed.
Now column-based databases handle things very differently. Instead of storing data in rows on disk, they store it based on columns. The data is the same, but it’s grouped together on disk based on column—so every order value is stored together, every product item, every color, size, and price, all grouped by the column that the data is in.
Now this means two things. First, it makes it very, very inefficient for transaction-style processing, which is generally operating on whole rows at a time. But this very same aspect makes column databases really good for reporting, so if your queries relate to just one particular column, because that whole column is stored on disk grouped together, then that’s really efficient.
You could perform a query to retrieve all products sold during a period, or perform a query which looks for all sizes sold in total ever and looks to build up some intelligence around which are sold most and which are sold least. With column store databases, it’s really efficient to do this style of querying—reporting style querying.
An example of a column-based database in AWS is Redshift, which is a data warehousing product, and that name gives it away. Generally what you’ll do is take the data from an OLTP database—a row-based database—and you’ll shift that into a column-based database when you’re wanting to perform reporting or analytics. So generally, column store databases are really well suited to reporting and analytics.
Now lastly I want to talk about graph-style databases. Earlier in the lesson I talked about tables and keys and how relational database systems handle the relationships by linking the keys of different tables. Well with graph databases, relationships between things are formally defined and stored in the database itself along with the data—they’re not calculated each and every time you run a query—and this makes them great for relationship-driven data, for example social media or HR systems.
Consider this data: three people, two companies, and a city—these are known as nodes inside a graph database. Nodes and nouns—so objects. Nodes can have properties which are simple key-value pairs of data, and these are attached to the nodes.
So far, this looks very much like a normal database—nothing is new so far. But with graph databases, there are also relationships between the nodes, which are known as edges. Now these edges have a name and a direction—so Natalie works for XYZ Corp, and Greg works for both XYZ Corp and Acme Widgets. Relationships themselves can also have attached data—so name-value pairs. In this particular example, we might want to store the start date of any employment relationship.
A graph database can store a massive amount of complex relationships between data or between nodes inside a database—and that’s what’s key. These relationships are actually stored inside the database as well as the data. A query to pull up details on all employees of XYZ Corp would run much quicker than on a standard SQL database because that data of those relationships is just being pulled out of the database just like the actual data.
With a relational-style database, you’d have to retrieve the data, and the relationships between the tables is computed when you execute the query—so it’s a really inefficient process with relational database systems. These relationships are fixed and computed each and every time a query is run.
With a graph-based database, those relationships are fluid, dynamic—they’re stored in the database along with the data—and it means when you’re interacting with data and looking to take advantage of these fluid relationships, it’s much more efficient to use a graph-style database.
Now using graph databases is very much beyond the scope of this course, but I want you to be aware of it because you might see questions in the exam which mention the technology, and you need to be able to identify or eliminate answers based on the scenario—based on the type of database that the question is looking to implement.
So if you see mention of social media in an exam or systems with complex relationships, then you should think about graph databases first.
Now that's all I wanted to cover in this lesson. I know it's been abstract and high level—I wanted to try and make it as brief as possible. I know I didn’t really succeed because we had a lot to cover, but I want this to be a foundational set of theory that you can use throughout the databases section, and it will help you in the exam.
For now though, that's everything I wanted to cover in this lesson, so go ahead complete the video and when you're ready you can join me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this first technical lesson of this section of the course, I wanted to provide a quick fundamentals lesson on databases; if you already have database experience then you can play me on super fast speed and think of this lesson as a good confirmation of the skills that you already have, but if you don't have database experience though, that's okay, as this lesson will introduce just enough knowledge to get you through the course and I'll include additional reading material to get you up to speed with databases in general.
Now we do have a fair amount to get through so let's jump in and get started: databases are systems which store and manage data, but there are a number of different types of database systems and crucial differences between how data is physically stored on disk and how it's managed on disk and in memory, as well as how the systems retrieve data and present it to the user.
Database systems are very broadly split into relational and non-relational; relational systems are often referred to as SQL or SQL, though this is actually wrong because SQL is a language which is used to store, update and retrieve data, and it's known as the structured query language and it's a feature of most relational database platforms.
Strictly speaking, it's different than the term relational database management system, but most people use the two interchangeably, so if you see or hear the term SQL or RDBMS which is relational database management system, they're all referring to relational database platforms and most people use them interchangeably.
Now one of the key identifiable characteristics of relational database systems is that they have a structure to their data, so that's inside and between database tables and I'll cover that in a moment: the structure of a database table is known as a schema and with relational database systems it's fixed or rigid, meaning it's defined in advance before you put any data into the system.
A schema defines the names of things, valid values of things and the types of data which are stored and where; more importantly, with relational database systems there's also a fixed relationship between tables, so that's fixed and also defined in advance before any data is entered into the system.
Now no SQL on the other hand—well let's start by making something clear: no SQL isn't one single thing, as no SQL as the name suggests is everything which doesn't fit into the SQL mold, everything which isn't relational, but that represents a large set of alternative database models which I'll cover in this lesson.
One common major difference which applies to most no SQL database models is that generally there is a much more relaxed concept of a schema, so generally they all have weak schemers or no schemers and relationships between tables are also handled very differently.
Both of these impact the situations that a particular model is right for and that's something that you need to understand at a high level for the exam and also when you're picking a database model for use in the real world.
Before I talk about the different database models I wanted to visually give you an idea of how relational database management systems known as RDBMSs or SQL systems conceptualize the data that you store within them: consider an example of a simple PET database where you have three humans and for those three humans you want to record the PETs that those humans are owned by.
The key component of any SQL based database system is a table, and every table has columns and these are known as attributes; the column has a name—its attribute name—and then within each row of that table each column has to have a value, and this is known as the attribute value.
So in this table for example the columns are f name, first name, l name, last name and age, and then for each of the rows 1, 2 and 3 the row has an attribute value for each of the columns, so each of the attributes which are the columns have an attribute value in each row.
Now generally the way that data is modeled in a relational database management system or a SQL database system is that data which relates together is stored within a table, so in this case all of the data on the humans is stored within one table.
Every row in the table has to be uniquely identifiable and so we define something that's known as a primary key, which is unique in the table and every row of that table has to have a unique value for this attribute.
So note in this table how every row has a unique value 1, 2 and 3 for this primary key; now with this database model we've also got a similar table for the animals—so we've got whiskers and woofy—and they also have a primary key that's been defined which is the animal or AID.
And this primary key on this table also has to have a unique value in every row on the table, so in this case whiskers is animal ID 1 and woofy is animal ID 2.
Each table in a relational database management system can have different attributes, but for a particular table every row in that table needs to have a value stored for every attribute in that table, so see how the animals table has name and types whereas the human table has first name, last name and age.
But note how in both tables for every row every attribute has to have a value, and because SQL systems are relational we generally define relationships between the tables.
Now this is a join table and it makes it easy to have many to many relationships, so a human could have many animals and each animal can have many human minions.
A join table has what's known as a composite key which is a key formed of two parts, and for composite keys together they have to be unique, so notice how the second and third rows have the same animal ID—that's fine because the human ID is different—and as long as the composite key in its entirety is unique that's also fine.
Now the keys in different tables are how the relationships between the tables are defined, so in this example the human table has a relationship with the join table, and it allows each human to have multiple animals and each animal to have multiple humans.
In this example the animal ID of two which is woofy is linked to human ID two and three which is Julie and James; they're both woofies minions because that doggo needs double the amount of tasty treats.
Now all these keys and the relationships are defined in advance and this is done using the schema—it's fixed and it's very difficult to change after the first piece of data goes in.
The fact that this schema is so fixed and has to be declared in advance makes it difficult for a sequel or a relational system to store any data which has rapidly changing relationships, and a good example of this is a social network such as Facebook where relationships change all the time.
So this is a simple example of a relational database system—it generally has multiple tables, a table stores data which is related, so humans and animals, tables have fixed schemas, they have attributes, they have rows, each row has a unique primary key value and has to contain some value for all of the attributes in the table, and in those tables they have relationships between each other which are also fixed and defined in advance.
So this is sequel, this is relational database modelling.
Okay so this is the end of part one of this lesson—it was getting a little bit on the long side and so I wanted to add a break; it's an opportunity just to take a rest or grab a coffee, and part two will be continuing immediately from the end of part one, so go ahead complete the video and when you're ready join me in part two.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk through how we implement DNSSEC using Route 53. Now if you haven't already watched my DNS and DNSSEC fundamentals video series you should pause this video and watch those before continuing. Assuming that you have let's jump in and get started.
Now you should be familiar with this architecture—this is how Route 53 works normally. In this example I'm using the animalsforlive.org domain, so a query against this would start with our laptop, go to a DNS resolver, then to the root servers looking for details of the .org top-level domain, and then it would go to the .org top-level domain name servers looking for animalsforlive.org, and then it would proceed to the four name servers which are hosting the animalsforlive.org zone using Route 53.
On the right-hand side here we have an AWS VPC using the plus two address, which is the Route 53 resolver, and those instances can query the animalsforlive.org domain from inside the VPC.
Now enabling DNSSEC on a Route 53 hosted zone is done from either the Route 53 console UI or the CLI, and once initiated the process starts with KMS. This part can either be done separately or as part of enabling DNSSEC signing for the hosted zone, but in either case, an asymmetric key pair is created within KMS, meaning a public part and a private part. Now you can think of these conceptually as the key signing keys or KSKs, but in actual fact, the KSK is created from these keys; these aren't the actual keys, but this is a nuance which isn't required at this level.
So these keys are used to create the public and private key signing keys which Route 53 uses, and these keys need to be in the US East 1 region—that's really important, so keep that in mind. Next, Route 53 creates the zone signing keys internally; this is really important to understand—both the creation and the management of the zone signing keys is handled internally within Route 53, and KMS isn't involved.
Next, Route 53 adds the key signing key and the zone signing key public parts into a DNS key record within the hosted zone, and this tells any DNSSEC resolvers which public keys to use to verify the signatures on any other records in this zone. Next, the private key signing key is used to sign those DNS key records and create the RRSIG DNS key record, and these signatures mean that any DNSSEC resolver can verify that the DNS key records are valid and unchanged.
Now at this point that's signing within the zone configured, which is step one. Next, Route 53 has to establish the chain of trust with the parent zone—the parent zone needs to add a DS record or delegated signer record, which is a hash of the public part of the key signing key for this zone, and so we need to make this happen.
Now how we do this depends on if the domain is registered via Route 53; if so, the registered domains area of the Route 53 console or the equivalent CLI command can be used to make this change, and Route 53 will liaise with the appropriate top-level domain and add the delegated signer record.
Now if we didn't register the domain using Route 53 and are instead just using it to host the zone, then we're going to need to perform this step manually. Once done, the top-level domain—in this case .org—will trust this domain via the delegated signer record, which, as I mentioned, is a hash of the domain's public key signing key, and the domain zone will sign all records within it either using the key signing or zone signing keys.
As part of enabling this, you should also make sure to configure CloudWatch alarms—specifically, create alarms for DNSSEC internal failure and DNSSEC key signing keys needing action, as both of these indicate a DNSSEC issue with the zone which needs to be resolved urgently, either an issue with the key signing key itself or a problem interacting with KMS.
Lastly, you might want to consider enabling DNSSEC validation for VPCs; this means for any DNSSEC enabled zones, if any records fail validation due to a mismatch signature or otherwise not being trusted, they won't be returned—this doesn't impact non-DNSSEC enabled zones, which will always return results, and this is how to work with the Route 53 implementation of DNSSEC.
What I wanted to do now is step you through an actual implementation of DNSSEC for a hosted zone within Route 53, and to do that we're going to need to move across to my AWS console. Okay, so now we're at the AWS console, and I'm going to step you through an example of enabling DNSSEC on a Route 53 domain, and to get started I'm going to make sure I'm in an AWS account where I have admin permissions.
In this case, I'm logged in as the IAM admin user, which is an IAM identity with admin permissions. As always, I'm going to make sure that I have the Northern Virginia region selected, and once I've done that, I'm going to go ahead and open Route 53 in a new tab.
In my case, it's already inside recently visited services; if it's not, you can just search for it in the search box at the top, but I'm going to go ahead and open Route 53. So I'm going to go to the Route 53 console and click on hosted zones, and in my case, I've got two hosted zones—animalsforlife.org and animalsforlife1337.org—and I'm going to go ahead and DNSSEC enable animalsforlife.org, so I'm going to go ahead and go inside this hosted zone.
Now if I just go ahead and move across to my command prompt and if I run this command—so dig animalsforlife.org DNSKEY +dnssec—this will query this domain attempting to look for any DNSKEY records using DNSSEC, and as you can see, there are no DNSSEC results returned, which is logical because this domain is not enabled for DNSSEC.
So moving back to this console, I'm going to click on DNSSEC signing under the domain and then click on enable DNSSEC signing. Now if this were a production domain, the order of these steps really matters, and you need to make sure that you wait for certain time periods before conducting each of these steps—specifically, you need to make sure that you're making changes taking into consideration the TTL values within your domain.
I'll include a link attached to this video which details all of these time-critical prerequisites that you need to make sure you consider before enabling DNS signing. In my case, I don't need to worry about that because this is a fresh, empty domain.
The first thing we're going to do is create a key signing key, and as I mentioned earlier in this video, this is done using a KMS key. So the first thing I'm going to do is to specify a KSK name—a key signing key name—and I'm going to call it A4L-KSK for Animals for Life key signing key.
Next you'll need to decide on which key to use within KMS to create this key signing key. Now unfortunately the user interface is a little bit inconsistent—AWS have decided to rename CMKs to KMS keys—so you might see the interface looking slightly different when you're doing this video.
Regardless, you need to create a KMS key, so check the box saying create customer managed CMK or create KMS key depending on what state the user interface is in, and you'll need to give a name to this key—again, this is creating an asymmetric KMS key—so I'm going to call it A4L-KSK-KMS-Key, and once I've done that, I can go ahead and click on create KSK and enable signing.
Now behind the scenes this is creating an asymmetric KMS key and using this to create the key signing key pair that this hosted zone is going to use. Now this part of the process can take a few minutes, and so I'm going to skip ahead until this part has completed.
Okay so that's completed, and that means that we now have an active key signing key within this hosted zone, and that means it's also created a zone signing key within this hosted zone.
If I go back to my terminal and I rerun this same command and press enter, you'll see that we still get the same empty results, and this can be because of caching—so I need to wait a few minutes before this will update. If I run it again now, we can see for the same query it now returns DNSKEY records—so one for 256 which represents the zone signing key (the public part of that key pair), and one for 257 which represents the key signing key (again the public part of that key pair), and then we have the corresponding RRSIG DNSKEY record which is a signature of these using the private key signing key.
So now internally we've got DNSSEC signing enabled for this hosted zone, and what we need to do now is create the chain of trust with the parent zone—in this case the .org top-level domain.
Now to do that, because I've also registered this domain using Route 53, I can do that from the registered domains area of the console, so I'll open that in a brand new tab. I'm going to go there, and then I'm going to go to the animalsforlife.org registered domain, and this is the area of the console where I can make changes, and Route 53 will liaise with the .org top-level domain and enter those changes into the .org zone.
Now the area that I'm specifically interested in is the DNSSEC status—currently this is set to disabled. What I'm going to do is to click on manage keys, and it's here where I can enter the public key—specifically, it's going to be the public key signing key of the animalsforlife.org zone—and I'm going to enter it so that it creates the delegated signer record in the .org domain which establishes this chain of trust.
So first I'm going to change the key type to KSK, then I'm going to go back to our hosted zone and I'm going to click on view information to create DS record, then I'm going to expand establish a chain of trust, and depending on what type of registrar you used you either need to follow the bottom instructions or these if you used Route 53—and I did, so I can go ahead and use the Route 53 registrar details.
Now the first thing you need to do is to make sure that you're using the correct signing algorithm—so it's ECDSAP256SHA256—so I'm going to go ahead and move back to the registered domains console, click on the algorithm drop-down, and select the matching signing algorithm: ECDSAP256SHA256.
Next I'll go back and I'll need to copy the public key into my clipboard—so remember a delegated signer record is just a hash of the public part of the key signing key—so this is what I'm copying into my clipboard, this is the public key of this key signing key.
So I'm going to go back and paste this in and then click on add. Now this is going to initiate a process where Route 53 are going to make the changes to the animalsforlife.org part of the .org top-level domain zone.
So specifically, in the .org top-level domain zone, there's going to be an entry for animalsforlife.org—by default, for normal DNS, this is going to contain name server records which delegate through to Route 53—what this process is going to do is also add a DS record which is a delegated signer record and it is going to contain a hash of this public key.
So now we have an end-to-end chain of trust from the DNS root all the way through to this resource record. This means any DNSSEC-enabled resolver can verify not just the domain-level DNSKEY and RRSIG records, but also any signed individual records like the A record we just created, ensuring integrity and authenticity from top to bottom.
That’s everything I wanted to cover in this video. I just wanted to give you a comprehensive overview of how to implement DNSSEC within Route 53—covering both the theoretical background and practical implementation steps to get it up and running. At this point, that’s the end of the video, so go ahead and complete the video, and when you’re ready, I’ll look forward to you joining me in the next one.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to cover Route 53 interoperability, and what I mean by that is using Route 53 to register domains or to host zone files when the other part of that is not with Route 53. Generally both of these things are performed together by Route 53 but it's possible for Route 53 just to do one or the other, so let's start by stepping through exactly what I mean.
When you register a domain using Route 53 it actually does two jobs at the same time, and while these two jobs are done together, conceptually they're two different things. Route 53 acts as a domain registrar and it provides domain hosting, so it can do both which is what happens initially when you register a domain, or it can be a domain registrar alone or it can host domains alone. It might only do one of them if, for example, you register a domain elsewhere and you want to use it with Route 53, and I want to take the time in this video to explain those edge case scenarios.
So let's quickly step through what happens when you register a domain using Route 53: first it accepts your money, the domain registration fee, which is a one-off fee or more specifically a once-a-year or once-every-three-year fee for actually registering the domain. Next it allocates for Route 53 DNS servers called name servers, then it creates a zone file which it hosts on the four name servers that I've just talked about, so that's the domain hosting part—allocating those servers and creating and hosting the zone file. If you hear me mention domain hosting, that's what it means.
Then once the domain hosting is sorted, Route 53 communicates with the registry for the specific top level domain that you're registering your domain within, so they have a relationship with the registry. So Route 53 is acting as the domain registrar, the company registering the domain on your behalf with the domain registry, and the domain registry is the company or entity responsible for the specific top level domain. So Route 53 gets the registry to add an entry for the domain, say for example animalsforlife.org, and inside this entry it adds four name server records and it points these records at the four name servers that I've just been talking about—this is the domain registrar part.
So the registrar registers the domain on your behalf, that's one duty, and then another entity provides DNS or domain hosting, and that's another duty. Often these are both provided by Route 53 but they don't have to be, so fix in your mind these two different parts: the registrar, which is the company who registers the domain on your behalf, and the domain hosting, which is how you add and manage records within hosted zones.
So let's step through this visually, looking at a few different options. First we have a traditional architecture where you register and host a domain using Route 53, so on the left conceptually we have the registrar role and this is within the registered domains area of the Route 53 console, and on the right we have the DNS hosting role and this is managed in the public hosted zone part of the Route 53 console.
So step one is to register a domain within Route 53—let's assume that it's the animalsforlife.org domain—so you liaise with Route 53 and you pay the fee required to register a domain, which is a per year or per three year fee. Now assuming nobody else has registered the domain before, the process continues: first the Route 53 registrar liaisers with the Route 53 DNS hosting entity and it creates a public hosted zone which allocates for Route 53 name servers to host that zone, which are then returned to the registrar.
I want to keep driving home that conceptually the registrar and the hosting are separate functions of Route 53 because it makes everything easier to understand. Once the registrar has these four name servers, it passes all of this along through to the .org top level domain registry, and the registry is the manager of the .org top level domain zone file, and it's via this entity that records are created in the top level domain zone for the animalsforlife.org domain. So entries are added for our domain which point at the four name servers which are created and managed by Route 53, and that's how the domain becomes active on the public DNS system.
At this point we've paid once for the domain registration to the registrar, which is Route 53, and we also have to pay a monthly fee to host the domain, so the hosted zone, and with this architecture this is also paid to Route 53. So this is a traditional architecture and this is what you get if you register and host a domain using Route 53, and this is a default configuration. So when you register a domain, while you might see it as one step, it's actually two different steps done by two different conceptual entities: the registrar and the domain hoster, and it's important to distinguish between these two whenever you think about DNS.
But now let's have a look at two different configurations where we're not using Route 53 for both of these different components. This time Route 53 is acting as a registrar only, so we still pay Route 53 for the domain, they still liaise on our behalf with the registry for the top level domain, but this time a different entity is hosting the domain, so the zone file and the name servers, and let's assume for this example it's a company called Hover.
This architecture involves more manual steps because the registrar and the DNS hosting entity are separate, so as the DNS admin you would need to create a hosted zone. The third party provider would generally charge a monthly fee to host that zone on name servers that they manage, and you would need to get the details of those servers once it's been created and pass those details on to Route 53, and Route 53 would then liaise with the .org top level domain registry and set those name server records within the domain to point at the name servers managed in this case by Hover.
With this configuration, which I'll admit I don't see all that often in the wild, the domain is managed by Route 53 but the zone file and any records within it are managed by the third party domain hosting provider, in this case Hover. Now the reason why I don't think we see this all that often in the wild is the domain registrar functionality that Route 53 provides—it's nothing special. With this architecture you're not actually using Route 53 for domain hosting, and domain hosting is the part of Route 53 which adds most of the value. If anything, this is the worst way to manage domains.
Let's look at another more popular architecture which I see fairly often in the wild, and that's using Route 53 for domain hosting only. Now you might see this either when a business needs to register domains via a third party provider—maybe they have an existing business deal or business discount—or you might have domains which have already been historically registered with another provider and where you want to get the benefit that Route 53 DNS hosting provides.
With this architecture the domain is registered via a third party domain registrar, in this case Hover, so it's the registrar in this example who liais with the top level domain registry, but we use Route 53 to host the domain. So at some point, either when the domain is being created or afterwards, we have to create a public hosted zone within Route 53—this creates the zone and the name servers to host the zone, obviously for a monthly fee.
So once this has been created we pass those details through to the registrar, who liais with the registry for the top level domain, and then those name server records are added to the top level domain meaning the hosted zone is now active on the public internet. Now it's possible to do this when registering the domain, so you could register the domain with Hover and immediately provide Route 53 name servers, or you might have a domain that's been registered years ago and you now want to use Route 53 for hosting and record management.
So you can use this architecture either while registering a domain or after the fact by creating the public hosted zone and then updating the name server records in the domain via the third party registrar and then the dot org registry. Now I know that this might seem complex, but if you just keep going back to basics and thinking about Route 53 as two things, then it's much easier.
Route 53 offers a component which registers the domain—so this is the registrar—and it also offers a component which hosts the zone files and provides managed DNS name servers. If you understand that both of those are different things, and when you normally register a domain using Route 53 both of them are being used, a hosted zone is created for you and then via the registrar part it's added to the domain record by the top level domain registry. If you understand that—so see these as two completely different components—then it's easy to understand how you can use Route 53 for only one of them and a separate third-party company for the other.
Now generally I think Route 53 is one of the better DNS providers on the market, and so generally for my own domains I will use Route 53 for both the registrar and the domain hosting components, but depending on your architecture, depending on any legacy configuration, you might have a requirement to use different entities for these different parts, and that's especially important if you're a developer looking at writing applications that take advantage of DNS, or if you're an engineer looking to implement or fault find these type of architectures.
Now with that being said, that's everything I wanted to cover in this theory video—I just wanted to give you a brief overview of some of the different types of scenarios that you might find in more complex Route 53 situations. At this point, go ahead and complete this video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to talk about Geoproximity routing which is another routing policy available within Route 53. So let's just jump in and get started.
Geoproximity aims to provide records which are as close to your customers as possible. If you recall, latency based routing provides the record which has the lowest estimated latency between your customer and the region that the record is in. Geoproximity aims to calculate the distance between a customer and a record and answer with the lowest distance. Now it might seem similar to latency, but this routing policy works on distance and also provides a few key benefits which I'll talk about in this video.
When using Geoproximity, you define rules—so you define the region that a resource is created in if it's an AWS resource, or provide the latitude and longitude coordinates if it's an external resource. You also define a bias, but more on that in a second.
Let's say that you have three resources: one in America, one in the UK, and one in Australia. Well, we can define rules which mean that requests are routed to those resources. If these were resources in AWS, we could define the region that the resources were located in—so maybe US East 1 or AP South East 2. If the resources were external, so non-AWS resources, we could define their location based on coordinates, but in either case Route 53 knows the location of these resources. It also knows the location of the customers making the requests, and so it will direct those requests at the closest resource.
Now we're always going to have some situations where customers in countries without any resources are using our systems—in this case Saudi Arabia, which is over 10,000 kilometers away from Australia and about 6,700 kilometers away from the UK. Under normal circumstances, this would mean that the UK resource would be returned for any users in Saudi Arabia. What geo proximity allows us to do though is to define a bias, so rather than just using the actual physical distance, we can adjust how Route 53 handles the calculation.
We can define a plus or minus bias. So for example, with the UK we might define a plus bias, meaning the effective area of service for the UK resource is increased larger than it otherwise would be. And we could do the same for the Australian resource but maybe providing a much larger plus bias. Now routing is distance based but it includes this bias, so in this case we can influence Route 53 so that customers from Saudi Arabia are routed to the Australian resource rather than the UK one.
Geo proximity routing lets Route 53 route traffic to your resources based on the geographic location of your users and your resources, but you can optionally choose to route more traffic or less traffic to a given resource by specifying a value. The value is called a bias. A bias expands or shrinks the size of a geographic region that is used for traffic to be routed to, so even in the example of the UK where it's just a single relatively small country, by adding a plus bias we can effectively make the size larger so that more surrounding countries route towards that resource.
In the case of Australia, by adding an even larger bias, we can make it so that countries even in the Middle East route towards Australia rather than the closer resource in the UK. So geo proximity routing is a really flexible routing type that not only allows you to control routing decisions based on the locations of your users and resources, it also allows you to place a bias on these rules to influence those routing decisions.
So this is a really important one to understand and it will come in really handy for a certain set of use cases. Now thanks for watching. That's everything that I wanted to cover in this video. Go ahead and complete the video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to talk about geolocation routing which is another routing policy available within Route 53. Now this is going to be a pretty brief video so let's jump in and get started.
In many ways geolocation routing is similar to latency, only instead of latency, the location of customers and the location of resources are used to influence resolution decisions. With geolocation routing, when you create records you tag the records with the location. Now this location is generally a country, so using ISO standard country codes, it can be continents—again using ISO continent codes such as SA for South America in this case—or records can be tagged with default. Now there's a fourth type which is known as a subdivision; in America you can tag records with the state that the record belongs to.
Now when a user is making a resolution request, an IP check verifies the location of the user. Depending on the DNS system, this can be the user directly or the resolver server, but in most cases these are one and the same in terms of the user's location. So we have the location of the user and we have the location of the records. What happens next is important because geolocation doesn't return the closest record, it only returns relevant records.
When a resolution request happens, Route 53 takes the location of the user and it starts checking for any matching records. First, if the user doing the resolution request is based in the US, then it checks the state of the user and it tries to match any records which have a state allocated to them. If any records match, they're returned and the process stops. If no state records match, then it checks the country of the user. If any records are tagged with that country, then they're returned and the process stops. Then it checks the continent; if any records match the continent that the user is based in, then they're returned and the process stops.
Now you can also define a default record which is returned if no record is relevant for that user. If nothing matches though—so there are no records that match the user's location and there's no default record—then a no answer is returned. So to stress again, this type of routing policy does not return the closest record, it only returns any which are applicable or the default, or it returns no answer.
So geolocation is ideal if you want to restrict content—for example, providing content for the US market only. If you want to do that, then you can create a US record and only people located in the US will receive that record as a response for any queries. You can also use this policy type to provide language specific content or to load balance across regional endpoints based on customer location.
Now one last time, because this is really important for the exam and for real world usage: this routing policy type is not about the closest record—geolocation returns relevant locations only. You will not get a Canadian record returned if you're based in the UK and no closer records exist. The smallest type of record is a subdivision which is a US state, then you have country, then you have continent, and finally optionally a default record. Use the geolocation routing policy if you want to route traffic based on the location of your customers.
Now it's important that you understand—which is why I've stressed this so much—that geolocation isn't about proximity, it's about location. You only have records returned if the location is relevant. So if you're based in the US but are based in a different state than a record, you won't get that record. If you're based in the US and there is a record which is tagged as the US as a country, then you will get that record returned. If there isn't a country specific record but there is one for the continent that you're in, you'll get that record returned, and then the default is a catchall. It's optional; if you choose to add it, then it's returned if your user is in a location where you don't have a specific record tagged to that location.
Now that's everything that I wanted to cover in this video. Thanks for watching. Go ahead and complete the video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to talk about latency based routing which is yet another routing policy available within Route 53. So let's jump in and get started.
Latency based routing should be used when you're trying to optimize for performance and user experience, when you want Route 53 to return records which can provide better performance. So how does it work? Well it starts with a hosted zone within Route 53 and some records with the same name. So in this case www, three of those records, they're A records and so they point at IP addresses. In addition, for each of the records you can specify a record region—so US East 1, US West 1, and AP Southeast 2 in this example. Latency based routing supports one record with the same name for each AWS region. The idea is that you're specifying the region where the infrastructure for that record is located.
Now in the background AWS maintains a database of latencies between different regions of the world. So when a user makes a resolution request it will know that that user is in Australia in this example. It does this by using an IP lookup service, and because it has a database of latencies, it will know that a user in Australia will have a certain latency to US East 1, a certain latency to US West 1, and hopefully the lowest latency to a record which is tagged to be in the Asia Pacific region—so AP Southeast 2. So that record is selected and it's returned to the user and used to connect to resources.
Latency based routing can also be combined with health checks. If a record is unhealthy, then the next lowest latency is returned to the client making the resolution request. This type of routing policy is designed to improve performance for global applications by directing traffic towards infrastructure with the best, so lowest latency, for users accessing that application.
It's worth noting though that the database which AWS maintain isn't real time. It's updated in the background and doesn't really account for any local networking issues, but it's better than nothing and can significantly help with performance of your applications.
Now that's all of the theory that I wanted to cover about latency based routing, so go ahead and complete the video, and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this video I want to talk about weighted routing, which is another routing policy available within Route 53. So let's jump in and get started straight away.
Weighted routing can be used when you're looking for a simple form of load balancing or when you want to test new versions of software. Like all other types of routing policy, it starts with a hosted zone and in this hosted zone— you guessed it—records. In this case, three www records. Now these are all A records and so they point at IP addresses, and let's assume that these are three EC2 instances.
With weighted routing, you're able to specify a weight for each record and this is called the record weight. Let's assume 40 for the top record, 40 for the middle, and 20 for the record at the bottom. Now how this record weight works is that for a given name—www in this case—the total weight is calculated. So 40 plus 40 plus 20 for a total of 100. Each record then gets returned based on its weighting versus the total weight. So in this example, it means that the top record is returned 40% of the time, the middle also 40% of the time, and the bottom record gets returned 20% of the time.
Setting a record weight to zero means that it's never returned, so you can do this if temporarily you don't want a particular record to be returned—unless all of the records are set to zero, in which case they're all returned. So any of the records with the same name are returned based on its weight versus the total weight. Now I've kept this example simple by using record weights that total 100 so it makes it easy to view them as percentages, but the same formula is used regardless. An individual record is returned based on its weight versus the total weight.
Now you can combine weighted routing with health checks, and if you do so, when a record is selected based on the above weight calculation, if that record is unhealthy then the process repeats. It's skipped over until a healthy record is selected and then that one's returned. Health checks don't remove records from the calculation and so don't adjust the total weight. The process is followed normally, but if an unhealthy record is selected to be returned, it's just skipped over and the process repeats until a healthy record is selected.
Now weighted routing, as I mentioned at the start, is great for very simple load balancing or when you want to test new software versions. If you want to have 5% of resolution requests go to a particular server which is running a new version of Catergram, then you have that option. So weighted routing is really useful when you have a group of records with the same name and want to control the distribution—so the amount of time that each of them is returned in response to queries.
Now that's everything I wanted to cover in this video, so go ahead, finish the video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this video, I want to talk about multivalue routing, which is another routing policy available within Route 53. So let's jump in and get started.
Multivalue routing, in many ways, is like a mixture between simple and failover, taking the benefits of each and merging them into one routing policy. With multivalue routing, we start with a hosted zone, and you can actually create many records all with the same name. In this case, we have three www records, and each of those records in this example is an A record, which maps onto an IP address.
Each of the records, when using this routing type, can have an associated health check. When queried, up to eight healthy records are returned to the client. If you have more than eight records, then eight are selected at random. At that point, the client picks one of those values and uses it to connect to the resource.
Because each of the records is health checked, any of the records which fail the check—such as the bottom record in this example—won’t be returned to the client and won’t be selected when connecting to resources. This helps improve reliability and ensures that only healthy endpoints are used.
Multivalue routing aims to improve availability by allowing a more active-active approach to DNS. You can use it if you have multiple resources which can all service requests and you want to select one at random. It’s not a substitute for a load balancer, which handles the actual connection process from a network perspective, but the ability to return multiple health-checkable IP addresses is a way to use DNS to improve the availability of an application.
To summarize: simple routing has no health checks and is generally used for a single resource such as a web server. Failover is used for active-backup architectures, commonly with an S3 bucket as a backup. Multivalue is used when you have many resources that can all handle requests and you want them all health checked and returned at random. Any healthy records will be returned to the client, and if you have more than eight, they’ll be returned randomly.
OK, so that’s everything for this type of routing policy. Go ahead and complete the video when you’re ready, and I’ll look forward to you joining me in the next video.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to talk about the second Route 53 routing policy that I'm going to be covering in this series of videos and that's fail over routing. Now let's just jump in and get started straight away.
With fail over routing we start with a hosted zone and inside this hosted zone a www.record. However with fail over routing we can add multiple records of the same name, a primary and a secondary.
Each of these records points at a resource and a common example is an out of band failure architecture where you have a primary application endpoint such as an EC2 instance and a backup or fail over resource using a different service such as an S3 bucket.
The key element to fail over routing is the inclusion of a health check. The health check generally occurs on the primary record. If the primary record is healthy then any queries to www in this case resolve to the value of the primary record which is the EC2 instance running category in this example. If the primary record fails its health check then the secondary value of the same name is returned in this case the S3 bucket.
The use case for fail over routing is simple. Use it when you need to configure active passive fail over where you want to route traffic to a resource when that resource is healthy or to a different resource when the original resource is failing its health check.
Now this is a fairly simple concept that you'll be experiencing yourself in a demo video which is coming up very soon but at this point that's everything that I wanted to cover in this video.
So go ahead complete the video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this video, I want to cover the Health Check feature within Route 53. Health checks support many of the advanced architectures of Route 53, and so it's essential that you understand how they work as an architect, developer, or engineer. So let's jump in and get started.
First, let's quickly step through some high-level concepts of Health checks. Health checks are separate from but are used by records inside Route 53. You don't create the checks within records. Health checks exist separately. You configure them separately. They evaluate something's health, and they can be used by records within Route 53.
Health checks are performed by a fleet of health checkers, which are distributed globally. This means that if you're checking the health of systems which are hosted on the public internet, then you need to allow these checks to occur from the health checkers. If you think they're bots or exploit attempts and block them, then it will cause false alarms. Health checks, as I just indicated, are not just limited to AWS targets. You can check anything which is accessible over the public internet. It just needs an IP address.
The checks occur every 30 seconds by default, or this can be increased to every 10 seconds at an additional cost. The checks can be TCP checks, where Route 53 tries to establish a TCP connection with the endpoint, and this needs to be successful within 10 seconds. You can have HTTP checks, where Route 53 must be able to establish a TCP connection with the endpoint within 4 seconds, and in addition, the endpoint must respond with an HTTP status code in the 200 range or 300 range within 2 seconds after connecting. And this is more accurate for web applications than a simple TCP check.
And finally, with HTTP and HTTPS checks, you can also perform string matching. Route 53 must be able to establish a TCP connection with the endpoint within 4 seconds, and the endpoint must respond with an HTTP status code in the 200 or 300 range within 2 seconds, and Route 53 health checker, when it receives the status code, it must also receive the response body from the endpoint within the next 2 seconds. Route 53 searches the response body for the string that you specify. The string must appear entirely in the first 5,120 bytes of the response body, or the endpoint fails the health check. This is the most accurate because not only do you check that the application is responding using HTTP or HTTPS, but you can also check the content of that response versus what the application should do in normal circumstances.
Based on these health checks, an endpoint is either healthy or unhealthy. It moves between those states based on its health, based on the checks conducted. Now lastly, the checks themselves can be one of 3 types. You can have endpoint checks, and these are checks which assess the health of an actual endpoint that you specify. You can use CloudWatch alarm checks, which react to CloudWatch alarms that can be configured separately and can involve some detailed in OS or in-app tests if you use the CloudWatch agent, which we cover elsewhere in the course. Finally, checks can be what's known as calculated checks, so checks of other checks. So you can create health checks which check application-wide health with lots of individual components.
Now, you're going to get the opportunity to actually implement a health check in a demo lesson, which is coming up very shortly in this section of the course. But what I want to do before that is to just give you an overview of exactly how the console looks when you're creating a health check. So let's move across to the console.
Okay, so we're at the AWS console, logged in to the general account in the Northern Virginia region. So to create a health check, we need to move to the Route 53 console, so I'm going to go ahead and do that. Remember how earlier in the theory component of this lesson, I mentioned how health checks are created externally from records? So rather than going into a hosted zone, selecting a record, and configuring a health check there, to create a health check, we go to the menu on the left and click on health checks. Then we'll click on create health check, and this is where we enter the information required to create the health check.
First, we need to give it a name, so let's just say that we use the example of test health check. I mentioned that there are three different types of health checks. We've got an endpoint health check, and this checks the health of the particular endpoint. We can use the status of other health checks, so this is a calculated health check, and as I mentioned, this allows you to create a health check which monitors the application as a whole and involves the health status of individual application components, and then finally we can use the status of a CloudWatch alarm to form the basis of this health check.
If we select endpoint for now, then you're able to pick either IP address or domain name. So you can specify the domain name of an application endpoint or you can use an IP address. If you pick domain name, then what this configures is that all of the Route 53 health checkers will resolve this domain name first and then perform a health check on the resulting IP address.
Now, in either case, you've got the option of either picking TCP, which does a simple TCP check, in which case you need to specify either the IPv4 or IPv6 address together with a port number. If you choose to use the more extensive HTTP or HTTPS health check, then you're asked to specify the same IP address and port number, so that will be used to establish the TCP connection. You can also specify the host name, and if you specify that, it will pass this value to the endpoint as a host header. So if you've got lots of different virtual hosts configured, then this is how you can specify a particular host that the website should deliver.
You're also able, because this is HTTP, you can specify a path to use for this health check. You can either specify the route path or you can specify a particular path to check. If you change this to HTTPS, then all of this information is the same, only this time it will use secure HTTP rather than normal HTTP.
Now, if we scroll down and expand advanced configuration, it's here where you can select the request interval, so the default is every 30 seconds, or you can specify fast and have the checks occur every 10 seconds. Now, this is a check every 10 seconds from every health checker involved within this health check, so the actual frequency of the health checks occurring on the endpoint will be much more frequent. This is one check every 10 seconds from every health checker.
You can specify the failure threshold, so this is the number of consecutive health checks that an endpoint must pass or fail for Route 53 to change the current status. So if you want to allocate a buffer and allow for the opportunity of the odd fail check not to influence the health state, then you can specify a suitable value in this box. It's here where you can specify a simple check, so HTTP or HTTPS, or you can elect to use string matching to do more rich checks of application health. So if you know that your application should deliver a certain string in the request body, then you can specify that here.
Now, you can also configure a number of advanced options. One of them is the latency graph, so you can show the latency of the checks against this endpoint. You can invert the health check status, so if the health check of an application is unhealthy, you can invert it to healthy and vice versa. So this is a fairly situational option that I haven't found much use for.
You also have the option of disabling the health check. This might be useful if you're performing temporary maintenance on an application, and if you check this box, then even if the application endpoint reports as unhealthy, it's considered healthy. You also get the option of specifying the health checker regions. You can use the recommended suggestion, and the health checkers will come from these locations, or you can select customize and pick the particular regions that you want to use. In most cases, you would use the recommended options.
Now, if we just go ahead and enter some sample values here, so I'm going to use 1.1.1.1. I'm going to leave the host name blank, I'm going to set the port number to 80, and then I'll scroll down and just enter a search string. Again, we're not going to create this or just enter a placeholder, click on next, and it's here where you can configure what happens when the health check fails.
Now, this is completely optional. We can use health checks within resource records only; we don't have to configure any notification, but if we do want to configure a notification, then we can create an alarm, and we can send this to either an existing or new SNS topic, and this is a method of how we can integrate this with other systems so we can have other AWS services configured to respond to notifications on this topic or we could integrate external systems so that when a health check fails, external action is taken. But this is what I wanted to show you. I just wanted to give you an overview of how it looks creating a health check within the console UI.
Now, don't worry, you're actually going to be doing this in a demo lesson, which is coming up elsewhere in this section, but I wanted to give you that initial exposure to how the console looks when creating a health check. At this point, let's go ahead and finish up the theory component of this lesson by returning to the architecture. Now you've seen how a health check is created architecturally, health checks look something like this: let's assume that somewhere near the UK we have an application Catergram, and we point a Route 53 record at this application, so let's assume that this is Catergram.io. What we can do is to associate a health check with this resource record, and doing so means that our application will be health checked by a globally distributed set of health checkers. So each of these health checkers performs a periodic check of our application, and based on this check, they report the resource as healthy or unhealthy.
If more than 18 percent of the health checkers report as healthy, then the health check overall is healthy, otherwise it's reported as unhealthy, and in most cases, records which are unhealthy are not returned in response to queries. Now, you're going to see throughout this section of the course and the wider course itself how health checks can be used to influence how DNS responds to queries and how applications can react to component failure. So Route 53 is an essential design and operational tool that you can use to influence how resolution requests occur and how they're routed through to your various different application components, and so understanding health checks is essential to be able to design Route 53 infrastructure, integrate this with your applications, and then manage it day to day as an operational engineer. So it's really important that you understand this topic end to end, no matter which stream of the AWS certifications that you're currently studying for.
Now, that's everything that I wanted to cover in this video. Go ahead and complete the video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this video I want to cover the first of a range of routing policies available within Route 53. We're going to start with the default, and as the name suggests, it's the simplest. This video is going to be pretty quick, so let's jump in and get started straight away.
Simple routing starts with a hosted zone. Let's assume it's a public hosted zone called animalsforlife.org. With simple routing, you can create one record per name. In this example, WWW, which is an A record type. Each record using simple routing can have multiple values, which are part of that same record.
When a client makes a request to resolve WWW and simple routing is used, all of the values are returned in the same query in a random order. The client chooses one of the values and then connects to that server based on the value, in this case, 1.2.3.4. Simple routing is simple, and you should use it when you want to route requests towards one single service, in this example, a web server.
The limitation of simple routing is that it doesn't support health checks, and I'll be covering what health checks are in the next video. But just remember, with simple routing, there are no checks that the resource being pointed at by the record is actually operational, and that's important to understand because all of the other routing types within Route 53 offer some form of health checking and routing intelligence based on those health checks.
Simple routing is the one type of routing policy which doesn't support health checks, and so it is fairly limited, but it is simple to implement and manage. So that's simple routing; again, as the name suggests, it's simple, it's not all that flexible, and it doesn't really offer any exciting features, but don't worry, I'll be covering some advanced routing types over the coming videos.
For now, just go ahead and complete this video, and then when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this video I want to quickly step through a topic which confuses people who are new to DNS and Route 53, and that's the difference between C names and alias records. Now I've seen exam questions which test your understanding of when to use one versus the other, so let's quickly go through the key things which you need to know.
Now let's start by describing the problem that we have if we only use C names. So in DNS, an A record maps a name to an IP address — for example, the name Categor.io to the IP address 1.3.3.7 — by now that should make sense. A C name on the other hand maps a name to another name, so if you had the above A record for Categor.io then you could create a C name record for Categor.io, pointing at Categor.io — it's a way to create another alternative name for something within DNS.
The problem is that you can't use a C name for the apex of a domain, also known as the naked domain, so you couldn't have a C name record for Categor.io pointing at something else — it just isn't supported within the DNS standard. Now this is a problem because many AWS services such as Elastic Load Balancers don't give you an IP address to use, they give you a DNS name, and this means that if you only use C names, pointing the naked Categor.io at an Elastic Load Balancer wouldn't be supported.
You could point www.categor.io at an Elastic Load Balancer because using a C name for a normal DNS record is fine, but you can't use a C name for the domain Apex, also known as the naked domain — now this is a problem which alias records fix. So for anything that's not the naked domain where you want to point a name at another name, C name records are fine — they might not be optimal as I'll talk about in a second, but they will work. For the naked domain known as the Apex of a domain, if you need to point at another name such as Elastic Load Balancers, you can't use C names.
But let's go through the solution — alias records. An alias record generally maps a name onto an AWS resource — now it has other functions, but at this level let's focus on the AWS resource part. Alias records can be used for both the naked domain known as the domain Apex or for normal records. For normal records such as www.categor.io, you could use C names or alias records in most cases, but for naked domains known as the domain Apex, you have to use alias records if you want to point at AWS resources.
For AWS resources, AWS try to encourage you to use alias records and they do this by making it free for requests made where an alias record points at an AWS resource, so generally in most production situations and for the exam, default to picking alias records for anything in a domain where you're pointing at AWS resources.
Now an alias is actually a subtype — you can have an A record alias and a C name record alias and this is confusing at first, but the way I think about this is both of them are alias records, but you need to match the record type with the type of the record you're pointing at. So take the example of an elastic load balancer — with an ELB, you're given an A record for the elastic load balancer, it's a name which points at an IP address, so you have to create an A record alias if you want to point at the DNS name provided by the elastic load balancer.
If the record that the resource provides is an A record, then you need to use an A record alias. So you're going to use alias records when you're pointing at AWS services such as the API gateway, CloudFront, Elastic Beanstalk, Elastic Load Balancers, Global Accelerator, and even S3 buckets — and you're going to experience this last one in a demo lesson which is coming up very soon.
Now it's going to make a lot more sense when you see it in action elsewhere in the course. For now, I just want to make sure that you understand the theory of both the limitations of C name records and the benefits that alias records provide. Now the alias is a type of record that's been implemented by AWS and it's outside of the usual DNS standard, so it's something that in this form you can only use if Route 53 is hosting your domains.
Keep that in mind as I talk about more of the features of Route 53 as we move through this section of the course. But at this point, that's everything that I wanted to cover — so go ahead, complete this video and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to talk about the other type of hosted zone available within Route 53 and that's private hosted zone, so let's jump in and get started straight away.
A private hosted zone is just like a public hosted zone in terms of how it operates, only it's not public — instead of being public it's associated with VPCs within AWS and it's only accessible within VPCs that it's associated with. You can associate a private hosted zone with VPCs in your account using the console UI, CLI and API, and even in different accounts if you use the CLI and API only. Everything else is the same — you can use them to create resource records and these are resolvable within VPCs.
It's even possible to use a technique called split view or split horizon DNS, which is where you have public and private hosted zones of the same name, meaning that you can have a different variant of a zone for private users versus public. You might do this if you want your company intranet to run on the same address as your website and have your users be presented with your intranet when internal, but the public website when anyone accesses from outside of your corporate network, or if you wanted certain systems to be accessible via your business's DNS but only within your environment.
Now let's quickly step through how private hosted zones work visually so that you have more of an idea of the end-to-end architecture. So we start with a private hosted zone and as with public zones we can create records within this zone. Now from the public internet our users can do normal DNS queries, so for things like Netflix.com and Categor.io, but the private hosted zone is inaccessible from the public internet — it can be made accessible though from VPCs.
Let's assume all three of these VPCs have services inside them and use the Route 53 resolver, so the VPC +2 address — well, any VPCs which we associate with the private hosted zone will be able to access that zone via the resolver, and any VPCs which aren't associated will face the same problem as the user on the public internet on the left — access isn't available, so private hosted zones are great when you need to provide records using DNS but maybe they're sensitive and need to be accessible only from internal VPCs. Just remember, to be able to access a private hosted zone the service needs to be running inside a VPC and that VPC needs to be associated with the private hosted zone.
Now before I finish up this short lesson let's talk about split view or split horizon DNS. Consider this scenario: you have a VPC running an Amazon workspace and to support some business applications, a private hosted zone with some records inside it — the private hosted zone is associated with VPC 1 on the right, meaning the workspace could use the Route 53 resolver to access the private hosted zone, for example to access the accounting records stored within the private hosted zone.
Now the private hosted zone is not accessible from the public internet, but what split view allows us to do is to create a public hosted zone with the same name — this public hosted zone might only have a subset of records that the private hosted zone has, so from the public internet, access to the public hosted zone would work in the same way as you would expect: via the ISP resolver server, then through to the DNS root servers, from there to the .org TLD servers, and from there through to the animals for life name servers provided by Route 53.
Any records inside the public hosted zone would be accessible, but records in the private hosted zone which are not in the public hosted zone — so accounting in this example — would be inaccessible from the public internet, and this is a common architecture where you want to use the same domain name for public access and internal access, but with a different set of records available to each. It's something that you'll need to be comfortable with as an architect designing solutions, a developer integrating DNS into your applications, or an engineer implementing this within AWS.
Now that's everything I want to cover on the theory of private hosted zone, so go ahead and complete this video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this video, I want to talk about Route 53 public hosted zones. There are two types of DNS zones in Route 53, public and private.
To start with, let's cover off some general facts and then we can talk specifically about public hosted zones. A hosted zone is a DNS database for a given section of the global DNS database, specifically for a domain such as AnimalsForLife.org.
Route 53 is a globally resilient service; these name servers are distributed globally and have the same dataset, so whole regions can be affected by outages and Route 53 will still function. Hosted zones are created automatically when you register a domain using Route 53, and you saw that earlier in the course when I registered the AnimalsForLife.org domain, but they can also be created separately if you want to register a domain elsewhere and use Route 53 to host it.
There's a monthly fee to host each hosted zone and a fee for the queries made against that hosted zone. A zone is where the public or private hosts DNS records, examples of these being A records or the IP version 6 equivalent, MX records, NS records, and text records, and I've covered these at an introductory level earlier in the course.
In summary, hosted zones are databases which are referenced via delegation using name server records, and a hosted zone when referenced in this way is authoritative for a domain such as AnimalsForLife.org. When you register a domain, name server records for that domain are entered into the top level domain zone; these point at your name servers, and then your name servers and the zone that they host become authoritative for that domain.
A public hosted zone is a DNS database — a zone file — which is hosted by Route 53 on public name servers, and this means it's accessible from the public internet and within VPCs using the Route 53 resolver. Architecturally, when you create a public hosted zone, Route 53 allocates four public name servers, and it's on those name servers that the zone file is hosted.
To integrate it with the public DNS system, you change the name server records for that domain to point at those four Route 53 name servers. Inside a public hosted zone, you create resource records which are the actual items of data which DNS uses.
You can — and I'll cover this in an upcoming video — use Route 53 to host zone files for externally registered domains. So for example, you can use hover or goad adi to register a domain, then create the public hosted zone in Route 53, get the four name servers which are allocated to that hosted zone, and then via the hover or goad adi interface, you can add those name servers into the DNS system for your domain.
I'll cover how this works in detail in a future video. Visually, this is how a public hosted zone looks and functions.
We start by creating a public hosted zone, and for this example, it's animalsforlife.org — creating this allocates four Route 53 name servers for this zone, and those name servers are all accessible from the public internet. They're also accessible from AWS VPCs using the Route 53 resolver, which, assuming DNS is enabled for the VPC, is directly accessible from an internal IP address of that VPC.
Inside this hosted zone, we can create some resource records — in this case, a www A record, two MX records for email, and a text record. Within the VPC, the access method is direct — the VPC resolver using the VPC plus two address — and this is accessible from any instances inside the VPC which use this as their DNS resolver, so they can query the hosted zone as they can any public DNS zone using the Route 53 resolver.
From a public DNS perspective, the architecture is the same in that the same zone file is used, but the mechanics are slightly different. DNS starts with the DNS root servers, and these are the first servers queried by our user's resolver server — so Bob is using a laptop talking to his ISP DNS resolver server, which queries the root servers.
The root servers have information on the .org top level domain, and so the ISP resolver server can then query the .org servers. These servers host the .org zone file, and this zone file has an entry for AnimalsForLife.org, which has four name servers, and these all point at the Route 53 public name servers for the public hosted zone for Animals For Life.
This process is called "walking the tree," and this is how any public internet host can access the records inside a public hosted zone using DNS. And that's how public hosted zones work — they're just a zone file which is hosted on four name servers provided by Route 53.
This public hosted zone can be accessed from the public internet or any VPCs which are configured to allow DNS resolution. There's a monthly cost for hosting this public hosted zone and a tiny charge for any queries made against it — almost nothing in the grand scheme of things, but for larger volume sites, it's something to keep in mind.
So that's public hosted zones — that's everything I wanted to cover in this video on the theory side of things. So go ahead and complete this video, and then when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video, I want to talk at a very basic level about the Elastic Kubernetes Service known as EKS. Now this is AWS's implementation of Kubernetes as a service. If you haven't already done so, please make sure that you've watched my Kubernetes 101 video because I'll be assuming that level of knowledge so I can focus more in this video about the EKS specific implementation. Now this video is going to stay at a very high level and if required for the topic that you're studying, there are going to be additional deep dive videos and/or demos on any of the relevant subject areas. Now let's just jump in and get started straight away.
So EKS is an AWS managed implementation of Kubernetes. That's to say, AWS have taken the Kubernetes system and added it as a service within AWS. It's the same Kubernetes that you'll see anywhere else just extended to work really well within AWS. And that's the key point here. Kubernetes is cloud agnostic. So if you need containers, but don't want to be locked into a specific vendor, or if you already have containers implemented, maybe using Kubernetes, then that's a reason to choose EKS.
Now EKS can be run in different ways. It can run on AWS itself. It can run on AWS Outposts, which conceptually is like running a tiny version of AWS on-premises. It can run using EKS anywhere, which basically allows you to create EKS clusters on-premises or anywhere else. And AWS even release the EKS product as open source via the EKS distro. Generally though, and certainly for this video, you can assume that I mean the normal AWS deployment mode of EKS, so running EKS within AWS as a product.
So the Kubernetes control plane is managed by AWS and scales based on load and also runs across multiple availability zones, and the product integrates with other AWS services in the way that you would expect an AWS product to do so. So it can use the Elastic Container Registry or ECR, it uses Elastic Load Balancers anywhere where Kubernetes needs load balancer functionality, IAM is integrated for security, and it also uses VPCs for networking.
EKS clusters mean the EKS control plane, so that's the bit that's managed by AWS as well as the EKS nodes, and I'll talk more about those in a second. ETCD, remember, this is the key value store which Kubernetes uses; this is also managed by AWS and distributed across multiple availability zones.
Now in terms of nodes, you have a few different ways that these can be handled. You can do self-managed nodes running in a group, so these are EC2 instances which you manage and you're billed for based on normal EC2 pricing. Then we have managed node groups which are still EC2, but this is where the product handles the provisioning and lifecycle management. Finally, you can run pods on Fargate.
With Fargate, you don't have to worry about provisioning, configuring, or scaling groups of instances, and you also don't need to choose the instance type or decide when to scale or optimize cluster packing. Instead, you define Fargate profiles which mean that pods can start on Fargate, and in general, this is similar to ECS Fargate which I've already covered elsewhere.
Now one super important thing to keep in mind, deciding between self-managed, managed node groups or Fargate is based on your requirements. So if you need Windows pods, GPU capability, Inferentia, Bottle Rocket, Outposts, or Local Zones, then you need to check the node type that you're going to use and make sure it's capable of each of these features. I've included a link attached to this lesson with an up-to-date list of capabilities, but please be really careful on this one because I've seen it negatively impact projects.
Now lastly, remember from the Kubernetes 101 video where I mentioned storage by default is ephemeral. Well, for persistent storage, EKS can use EBS, EFS, and FSX as storage providers, and these can be used to provide persistent storage when required for the product.
Now that's everything about the key elements of the EKS product. Let's quickly take a look visually at how a simple EKS architecture might look. Conceptually, when you think of an EKS deployment, you're going to have two VPCs. The first is an AWS managed VPC, and it's here where the EKS control plane will run from across multiple availability zones. The second VPC is a custom managed VPC, in this case, the Animals for Life VPC.
Now, if you're going to be using EC2 worker nodes, then these will be deployed into the customer VPC. Now, normally the control plane will communicate with these worker nodes via elastic network interfaces which are injected into the customer VPC. So the Kubelet service running on the worker nodes connects to the control plane, either using these ENIs which are injected into the VPC, but it can also use a public control plane endpoint. Any administration via the control plane can also be done using this public endpoint, and any consumption of the EKS services is via ingress configurations which start from the customer VPC.
Now, at a high level, that's everything that I wanted to cover about the EKS product. Once again, if you're studying a course which needs any further detail, there will be additional theory and demo lessons. But at this point, that's everything I want you to do in this video, so go ahead and complete the video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this fundamentals video I want to briefly talk about Kubernetes which is an open source container orchestration system, and you use it to automate the deployment, scaling and management of containerized applications. At a super high level, Kubernetes lets you run containers in a reliable and scalable way, making efficient use of resources and lets you expose your containerized applications to the outside world or your business. It's like Docker, only with robots to automate it and super intelligence for all of the thinking. Now Kubernetes is a cloud agnostic product so you can use it on-premises and within many public cloud platforms. Now I want to keep this video to a super high level architectural overview but that's still a lot to cover, so let's jump in and get started.
Let's quickly step through the architecture of a Kubernetes cluster. A cluster in Kubernetes is a highly available cluster of compute resources and these are organized to work as one unit. The cluster starts with the cluster control plane which is the part which manages the cluster; it performs scheduling, application management, scaling and deployment and much more. Compute within a Kubernetes cluster is provided via nodes and these are virtual or physical servers which function as a worker within the cluster; these are the things which actually run your containerized applications. Running on each of the nodes is software and at minimum this is container D or another container runtime which is the software used to handle your container operations, and next we have KubeLit which is an agent to interact with the cluster control plane. KubeLit running on each of the nodes communicates with the cluster control plane using the Kubernetes API. Now this is the top level functionality of a Kubernetes cluster — the control plane orchestrates containerized applications which run on nodes.
But now let's explore the architecture of control planes and nodes in a little bit more detail. On this diagram I've zoomed in a little — we have the control plane at the top and a single cluster node at the bottom, complete with the minimum Docker and KubeLit software running for control plane communications. Now I want to step through the main components which might run within the control plane and on the cluster nodes — keep in mind this is a fundamental level video, it's not meant to be exhaustive, Kubernetes is a complex topic so I'm just covering the parts that you need to understand to get started. The cluster will also likely have many more nodes — it's rare that you only have one node unless this is a testing environment.
First I want to talk about pods and pods are the smallest unit of computing within Kubernetes; you can have pods which have multiple containers and provide shared storage and networking for those pods, but it's very common to see a one container one pod architecture which as the name suggests means each pod contains only one container. Now when you think about Kubernetes don't think about containers — think about pods — you're going to be working with pods and you're going to be managing pods, the pods handle the containers within them. Architecturally you would generally only run multiple containers in a pod when those containers are tightly coupled and require close proximity and rely on each other in a very tightly coupled way. Additionally although you'll be exposed to pods you'll rarely manage them directly — pods are non-permanent things; in order to get the maximum value from Kubernetes you need to view pods as temporary things which are created, do a job and are then disposed of. Pods can be deleted when finished, evicted for lack of resources or if the node itself fails — they aren't permanent and aren't designed to be viewed as highly available entities. There are other things linked to pods which provide more permanence but more on that elsewhere.
So now let's talk about what runs on the control plane. Firstly I've already mentioned this one — the API known formally as kube-api server — this is the front end for the control plane, it's what everything generally interacts with to communicate with the control plane and it can be scaled horizontally for performance and to ensure high availability. Next we have ETCD and this provides a highly available key value store — so a simple database running within the cluster which acts as the main backing store for data for the cluster. Another important control plane component is kube-scheduler and this is responsible for constantly checking for any pods within the cluster which don't have a node assigned, and then it assigns a node to that pod based on resource requirements, deadlines, affinity or anti affinity, data locality needs and any other constraints — remember nodes are the things which provide the raw compute and other resources to the cluster and it's this component which makes sure the nodes get utilized effectively.
Next we have an optional component — the cloud controller manager — and this is what allows kubernetes to integrate with any cloud providers. It's common that kubernetes runs on top of other cloud platforms such as AWS, Azure or GCP and it's this component which allows the control plane to closely interact with those platforms. Now it is entirely optional and if you run a small kubernetes deployment at home you probably won't be using this component.
Now lastly in the control plane is the kube controller manager and this is actually a collection of processes — we've got the node controller which is responsible for monitoring and responding to any node outages, the job controller which is responsible for running pods in order to execute jobs, the end point controller which populates end points in the cluster (more on this in a second but this is something that links services to pods — again I'll be covering this very shortly), and then the service account and token controller which is responsible for account and API token creation. Now again I haven't spoken about services or end points yet — just stick with me, I will in a second.
Now lastly on every node is something called kproxy known as kube proxy and this runs on every node and coordinates networking with the cluster control plane — it helps implement services and configures rules allowing communications with pods from inside or outside of the cluster. You might have a kubernetes cluster but you're going to want some level of communication with the outside world and that's what kube proxy provides.
Now that's the architecture of the cluster and nodes in a little bit more detail but I want to finish this introduction video with a few summary points of the terms that you're going to come across. So let's talk about the key components — so we start with the cluster and conceptually this is a deployment of kubernetes, it provides management, orchestration, healing and service access. Within a cluster we've got the nodes which provide the actual compute resources and pods run on these nodes — a pod is one or more containers and is the smallest admin unit within kubernetes and often as I mentioned previously you're going to see the one container one pod architecture — simply put it's cleaner. Now a pod is not a permanent thing, it's not long lived — the cluster can and does replace them as required.
Services provide an abstraction from pods so the service is typically what you will understand as an application — an application can be containerized across many pods but the service is the consistent thing, the abstraction — service is what you interact with if you access a containerized application. Now we've also got a job and a job is an ad hoc thing inside the cluster — think of it as the name suggests as a job — a job creates one or more pods, runs until it completes, retries if required and then finishes — now jobs might be used as back end isolated pieces of work within a cluster.
Now something new that I haven't covered yet and that's ingress — ingress is how something external to the cluster can access a service — so you have external users, they come into an ingress, that's routed through the cluster to a service, the service points at one or more pods which provides the actual application. So an ingress is something that you will have exposure to when you start working with Kubernetes. And next is an ingress controller and that's a piece of software which actually arranges for the underlying hardware to allow ingress — for example there is an AWS load balancer ingress controller which uses application and network load balancers to allow the ingress, but there are also other controllers such as engine X and others for various cloud platforms.
Now finally and this one is really important — generally it's best to architect things within Kubernetes to be stateless from a pod perspective — remember pods are temporary — if your application has any form of long running state then you need a way to store that state somewhere. Now state can be session data but also data in the more traditional sense — any storage in Kubernetes by default is ephemeral provided locally by a node and thus if a pod moves between nodes then that storage is lost. Conceptually think of this like instance store volumes running on AWS EC2. Now you can configure persistent storage known as persistent volumes or PVs and these are volumes whose life cycle lives beyond any one single pod which is using them and this is how you would provision normal long running storage to your containerised applications — now the details of this are a little bit beyond this introduction level video but I wanted you to be aware of this functionality.
Ok so that's a high level introduction to Kubernetes — it's a pretty broad and complex product but it's super powerful when you know how to use it. This video only scratches the surface. If you're watching this as part of my AWS courses then I'm going to have follow up videos which step through how AWS implements Kubernetes with their EKS service. If you're taking any of the more technically deep AWS courses then maybe other deep dive videos into specific areas that you need to be aware of. So there may be additional videos covering individual topics at a much deeper level. If there are no additional videos then don't worry because that's everything that you need to be aware of. Thanks for watching this video, go ahead and complete the video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to quickly cover the theory of the Elastic Container Registry or ECR. Now I want to keep the theory part brief because you're going to get the chance to experience this in practice elsewhere in the course and this is one of those topics which is much easier to show you via a demo versus covering the theory. So I'm going to keep this as brief as possible so let's jump in and get started.
Well let's first look at what the Elastic Container Registry is. Well it's a managed container image registry service. It's like Docker Hub but for AWS so this is a service which AWS provide which hosts and manages container images and when I talk about container images I mean images which can be used within Docker or other container applications, so think things like ECS or EKS.
Now within the ECR product we have public and private registries and each AWS account is provided with one of each so this is the top level structure. Inside each registry you can have many repositories so you can think of these like repos within a source control system, so think of Git or GitHub—you can have many repositories. Now inside each repository you can have many container images and container images can have several tags and these tags need to be unique within your repository.
Now in terms of the security architecture the differences between public and private registries are pretty simple. First, a public registry means that anyone can have read-only access to anything within that registry but read-write requires permissions. The other side is that a private registry means that permissions are required for any read or any read-write operations, so this means with a public registry anyone can pull but to push you need permissions and for a private registry permissions are required for any operations.
So that's the high level architecture, let's move on and talk about some of the benefits of the elastic container registry. Well first and foremost it's integrated with IAM and this is logically for permissions, so this means that all permissions controlling access to anything within the product are controlled using IAM.
Now ECR offers security scanning on images and this comes in two different flavors: basic and enhanced, and enhanced is a relatively new type of scanning and this uses the inspector product. Now this can scan looking for issues with both the operating system and any software packages within your containers and this works on a layer by layer basis, so enhanced scanning is a really good piece of additional functionality that the product provides.
Now logically like many other AWS products, ECR offers near real-time metrics and these are delivered into CloudWatch. Now these metrics are for things like authentication or push or pull operations against any of the container images. ECR also logs all API actions into CloudTrail and then also it generates events which are delivered into EventBridge and this can form part of an event-driven workflow which involves container images.
Now lastly ECR offers replication of container images and this is both cross region and cross account, so these are all important features provided by ECR.
Now as I mentioned at the start of this lesson all I wanted to do is to cover the high-level theory of this product. It's far easier to gain an understanding of the product by actually using it. So elsewhere in the course you're going to get the chance to use ECR in some container-based workflows, so you'll get the chance to push some container images into the product and pull them when you're deploying your container-based applications.
Now that's everything I wanted to cover in this video, so go ahead and complete the video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson I want to briefly discuss the two different cluster modes that you can use when running containers within ECS, so that's EC2 mode and Fargate mode. The cluster mode defines a number of things, but one of them is how much of the admin overheads surrounding running a set of container hosts that you manage versus how much AWS manage. So the technical underpinnings of both of these are important, but one of the main differentiating facts is what parts you're responsible for managing and what parts AWS manage. There are some cost differences we'll talk about and certain scenarios which favor EC2 mode and others which favor Fargate mode, and we'll talk about all of that inside this lesson.
At this level, it's enough to understand the basic architecture of both of these modes and the situations where you would pick one over the other. So we've got a lot to cover, so let's jump in and get started.
The first cluster mode available within ECS is EC2 mode, and using EC2 mode we start with the ECS management components, so these handle high level tasks like scheduling, orchestration, cluster management and the placement engine which handles where to run containers, so which container hosts. Now these high level components exist in both modes, so that's EC2 mode and Fargate mode, and with EC2 mode an ECS cluster is created within a VPC inside your AWS account.
Because an EC2 mode cluster runs within a VPC, it benefits from the multiple availability zones which are available within this VPC—for this example let's assume we have two, AZA and AZB. With EC2 mode, EC2 instances are used to run containers, and when you create the cluster you specify an initial size which controls the number of container instances, and this is handled by an auto scaling group. We haven't covered auto scaling groups yet in the course, but there are ways that you can control horizontal scaling for EC2 instances, so adding more instances when requirements dictate and removing them when they're not needed, but for this example let's say that we have four container instances.
Now these are just EC2 instances—you will see them in your account, you'll be billed for them, you can even connect to them, so it's important to understand that when these are provisioned you will be paying for them regardless of what containers you have running on them. So with EC2 cluster mode you are responsible for these EC2 instances that are acting as container hosts, and now ECS provisions these EC2 container hosts, but there is an expectation that you will manage them generally through the ECS tooling. So ECS using EC2 mode is not a serverless solution—you need to worry about capacity and availability for your cluster.
ECS uses container registries, and these are where your container images are stored. Remember in a previous lesson I showed you how to store the container of cats images on Docker Hub, and that's an example of a container registry. AWS of course have their own which is called ECR—I've previously mentioned that—and you can choose to use that or something public like Docker Hub.
Now in the previous lesson I spoke about tasks and services which are how you direct ECS to run your containers. Well, tasks and services use images on container registries, and via the task and service definitions inside ECS, container images are deployed onto container hosts in the form of containers.
Now in EC2 mode ECS will handle certain elements of this, so ECS will handle the number of tasks that are deployed if you utilize services and service definitions, but at a cluster level you need to be aware of and manage the capacity of the cluster because the container instances are not something that's delivered as a managed service—they are just EC2 instances.
So ECS using EC2 mode offers a great middle ground if you want to use containers in your infrastructure but you absolutely need to manage the container hosts’ capacity and availability. Then EC2 mode is for you, because EC2 mode uses EC2 instances. If your business has reserved instances then you can use those, you can use EC2 spot pricing, but you need to manage all of this yourself.
It's important to understand that with EC2 mode, even if you aren't running any tasks or any services on your EC2 container hosts, you are still paying for them while they're in a running state, so you're expected to manage the number of container hosts inside an EC2 based ECS cluster. So whilst ECS as a product takes away a lot of the management overhead of using containers, in EC2 cluster mode you keep some of that overhead and some flexibility, so it's a great middle ground.
Fargate mode for ECS removes even more of the management overhead of using containers within AWS. With Fargate mode you don't have to manage EC2 instances for use as container hosts. As much as I hate using the term serverless, Fargate is a cluster model which means you have no service to manage, and because of this you aren't paying for EC2 instances regardless of whether you're using them or not.
Fargate mode uses the same surrounding technologies, so you still have the Fargate service handling schedule and orchestration, cluster management and placement, and you still use registries for the container images as well as use task and service definitions to define tasks and services. What differs is how containers are actually hosted.
Core to the Fargate architecture is the fact that AWS maintain a shared Fargate infrastructure platform. This shared platform is offered to all users of Fargate, but much like how EC2 isolates customers from each other, so does Fargate. You gain access to resources from a shared pool, just like you can run EC2 instances on shared hardware, but you have no visibility of other customers.
With Fargate you use the same task and service definitions, and these define the image to use, the ports, and how much resources you need, but with Fargate these are then allocated to the shared Fargate platform. You still have your VPC—a Fargate deployment still uses a cluster and a cluster uses a VPC which operates in availability zones, in this example AZA and AZB.
Where it starts to differ though is for ECS tasks—which remember, they're now running on the shared infrastructure—but from a networking perspective they're injected into your VPC. Each of the tasks is injected into the VPC and it gets given an elastic network interface, which has an IP address within the VPC.
At that point they work just like any other VPC resource and they can be accessed from within that VPC and from the public internet if the VPC is configured that way. So this is really critical for you to be aware of—tasks and services are actually running from the shared infrastructure platform and then they're injected into your VPC, they're given network interfaces inside a VPC, and it's using these network interfaces in that VPC that you can access them.
So if the VPC is configured to use public subnets which automatically allocate an IPv4 address, then tasks and services can be given public IPv4 addressing. Fargate offers a lot of customizability—you can deploy exactly how you want into either a new VPC or a custom VPC that you have designed and implemented in AWS.
With Fargate mode, because tasks and services are running from the shared infrastructure platform, you only pay for the containers that you're using based on the resources that they consume. So you have no visibility of any host costs, you don't need to manage hosts, provision hosts or think about capacity and availability—that's all handled by Fargate, and you simply pay for the container resources that you consume.
Now one final thing before we move to a demo where we're going to implement a container inside a Fargate architecture. For the exam, and as a solutions architect in general, you should be able to advise when a business or a team should use ECS.
There are actually three main options: using EC2 natively for an application (so deploying an application as a virtual machine), using ECS in EC2 mode (so using a containerized application but running in ECS using an EC2 based cluster), or using a containerized application running in ECS but in Fargate mode. So there's a number of different options.
Picking between using EC2 and ECS should in theory be easy—if you use containers, then pick ECS. If you're a business which already uses containers for anything, then it makes sense to use ECS. In the demo that we did earlier in this section, we used an EC2 instance and Docker to create a Docker image—that's an edge case though.
If you're wanting to just quickly test containers, you can use EC2 as a Docker host, but for anything production usage, it's almost never a good idea to do that. The normal options are generally to run an application inside an operating system inside EC2 or to utilize ECS in one of these two different modes.
Containers in general make sense if you're just wanting to isolate your applications, applications which have low usage levels, applications which all use the same OS, or applications where you don't need the overhead of virtualization. You would generally pick EC2 mode when you have a large workload and your business is price conscious.
If you care about price more than effort, you'll want to look at using spot pricing or reserved pricing or make use of reservations that you already have. Running your own fleet of EC2-based ECS hosts will probably be cheaper—but only if you can minimize the admin overhead of managing them, so scaling, sizing, as well as correcting any faults.
So if you have a large consistent workload, if you're heavily using containers, but if you are a price-conscious organization, then potentially pick EC2 mode. If you're overhead-conscious, even with large workloads, then Fargate makes more sense. Using Fargate is much less management overhead versus EC2 mode because you don't have any exposure to container hosts or their management.
So even large workloads—if you care about minimizing management overhead—then use Fargate. For small or burst-style workloads, Fargate makes sense because with Fargate you only pay for the container capacity that you use. Having a fleet of EC2-based container hosts running constantly for non-consistent workloads just makes no sense—it's wasting the capacity.
The same logic is true for batch or periodic workloads. Fargate means that you pay for what you consume. EC2 mode would mean paying for the container instances even when you don't use them.
Okay, I hope this starts to make sense. I hope the theory that we've covered starts to give you an impression for when you would use ECS, and then when you do use the product, how to distinguish between scenarios which suit EC2 mode versus Fargate mode.
So next up we have a demo lesson, and I'm going to get you to take the container of cats Docker image that we created together earlier in this section and run it inside an ECS Fargate cluster. By configuring it practically, it's going to help a lot of the facts and architecture points that we've discussed through this section stick, and these facts sticking will be essential to being able to answer any container-based questions in the exam.
I know the container of cats is a simple example, but the steps that you'll be performing will work equally well in something that's a lot more complex. As we go through the course, we're going to be revisiting ECS—it will feature in some architectures for the Animals for Life business later in the course.
For now though, I want you to just be across the fundamentals—enough to get started with ECS and enough for the associate AWS exams. So go ahead, complete this lesson, and when you're ready you can join me in the next lesson which will be an ECS Fargate demo.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to introduce the Elastic Container Service or ECS. In the previous lesson you created a Docker image and tested it by running up a container, all using the Docker container engine running on an EC2 instance, and this is always something that you can do with AWS. But ECS is a product which allows you to use containers running on infrastructure which AWS fully manage or partially manage, and it takes away much of the admin overhead of self managing your container hosts.
ECS is to containers what EC2 is to virtual machines, and ECS uses clusters which run in one of two modes: EC2 mode which uses EC2 instances as container hosts (and you can see these inside your account as just normal EC2 hosts running the ECS software), and Fargate mode which is a serverless way of running Docker containers where AWS manage the container host part and just leave you to define and architect your environment using containers.
Now in this lesson I'll be covering the high level concepts of the product which apply to both of those modes, and then in the following lesson I'll talk in a little bit more detail about EC2 mode and Fargate mode, so let's jump in and get started.
ECS is a service that accepts containers and some instructions that you provide and it orchestrates where and how to run those containers; it's a managed container based compute service. I mentioned this a second ago but it runs in two modes, EC2 and Fargate, which radically changes how it works under the surface, but for what I need to talk about in this lesson we can be a little abstract and say that ECS lets you create a cluster. I'll cover the different types of cluster architectures in the following lesson, but for now it's just a cluster; clusters are where your containers run from.
You provide ECS with a container image and it runs that in the form of a container in the cluster based on how you want it to run, but let's take this from the bottom up architecturally and just step through how things work. First you need a way of telling ECS about container images that you want to run and how you want them to be run; containers are all based on container images as we talked about earlier in this section, and these container images will be located on a container registry somewhere, and you've seen one example of that with the Docker Hub.
Now AWS also provide a registry, it's called the Elastic Container Registry or ECR, and you can use that if you want; ECR has a benefit of being integrated with AWS so all of the usual permissions and scalability benefits apply, but at its heart it is just a container registry—you can use it or use something else like Docker Hub.
To tell ECS about your container images you create what's known as a container definition; the container definition tells ECS where your container image is—logically it needs that—it tells ECS which port your container uses (remember in the demo we exposed port 80 which is HTTP), and so this is defined in the container definition as well.
The container definition provides just enough information about the single container that you want to define. Then we have task definitions and a task in ECS represents a self-contained application; a task could have one container defined inside it or many—a very simple application might use a single container just like the container of Katz application that we demoed in the previous lesson, or it could use multiple containers, maybe a web app container and a database container.
A task in ECS represents the application as a whole and so it stores whatever container definitions are used to make up that one single application; I remember the difference by thinking of the container definition as just a pointer to where the container is stored and what port is exposed, and the rest is defined in the task definition.
At the associate level this is easily enough detail but if you do want extra detail on what's stored in the container definition versus the task definition I've included some links attached to this lesson which give you an overview of both.
Task definitions store the resources used by the task—so CPU and memory—they store the networking mode that the task uses, they store the compatibility (so whether the task will work on EC2 mode or Fargate), and one of the really important things which the task definition stores is the task role.
A task role is an IAM role that a task can assume and when the task assumes that role it gains temporary credentials which can be used within the task to interact with AWS resources; task roles are the best practice way of giving containers within ECS permissions to access AWS products and services, and remember that one because it will come up in at least one exam question.
When you create a task definition within the ECS UI you actually create a container definition along with it, but from an architecture perspective I definitely wanted you to know that they're actually separate things.
This is further confused by the fact that a lot of tasks that you create inside ECS will only have one container definition and that's going to be the case with the container of Katz demo that we're going to be doing at the end of this section when we deploy our Docker container into ECS, but tasks and containers are separate things—a task can include one or more containers, and a lot of tasks do include one container which doesn't help with the confusion.
Now a task in ECS it doesn't scale on its own and it isn't by itself highly available, and that's where the last concept that I want to talk to you about in this lesson comes in handy and that's called an ECS service, and you configure that via a service definition.
A service definition defines a service, and a service is how for ECS we can define how we want a task to scale, how many copies we'd like to run—it can add capacity and it can add resilience because we can have multiple independent copies of our task running—and you can deploy a load balancer in front of a service so the incoming load is distributed across all of the tasks inside a service.
So for tasks that you're running inside ECS that are long running and business critical, you would generally use a service to provide that level of scalability and high availability; it's the service that lets you configure replacing failed tasks or scaling or how to distribute load across multiple copies of the same task.
Now we're not going to be using a service when we demo the container of cats demo at the end of this section because we'll only be wanting to run a single copy—you can run a single copy of a task on its own—but it's the service wrapper that you use if you want to configure scaling and high availability.
And it's tasks or services that you deploy into an ECS cluster and this applies equally to whether it's EC2 based or Fargate based; I'll be talking about the technical differences between those two in the next lesson, but for now the high level building blocks are the same—you create a cluster and then you deploy tasks or services into that cluster.
Now just to summarize a few of the really important points that I've talked about in this lesson: first is the container definition—this defines the image and the ports that will be used for a container—it basically points at a container image that's stored on a container registry and it defines which ports are exposed from that container.
It does other things as well and I've included a link attached to this lesson which gives you a full overview of what's defined in the container definition, but at the associate level all you need to remember is a container definition defines which image to use for a container and which ports are exposed.
A task definition applies to the application as a whole—it can be a single container (so a single container definition) or multiple containers and multiple container definitions—but it's also the task definition where you specify the task role, so the security that the containers within a task get: what can they access inside AWS?
It's an IAM role that's assumed and the temporary credentials that you get are what the containers inside the task can use to access AWS products and services.
So task definitions include this task role, the containers themselves, and you also specify at a task definition level the resources that your task is going to consume, and I'll be showing you that in the demo lesson at the end of this section.
The task role, obviously I've just talked about, is the IAM role that's assumed by anything that's inside the task—so the task role can be used by any of the containers running as part of a task—and that's the best practice way that individual containers can access AWS products and services.
And then finally we've got services and service definitions and this is how you can define how many copies of a task you want to run, and that's both for scaling and high availability.
So you can use a service and define that you want, say, five copies of a task running—you can put a load balancer in front of those five individual tasks and distribute incoming load across those—so it's used for scaling, it's used for high availability, and you can control other things inside a service such as restarts and the certain monitoring features that you've got access to in there.
And services are generally what you use if it's a business critical application or something in production that needs to cope with substantial incoming load.
In the demo that's at the end of this section, we won't be using a service—we'll be just deploying a task into our ECS cluster.
With that being said, though, that's all the high level ECS concepts that I wanted to talk about in this lesson; it's just enough to get you started so the next lesson makes sense, and then so when you do the demo and get some practical experience with ECS, everything will start to click.
At this point, though, go ahead, complete this video, and when you're ready, you can join me in the next lesson where I'll be talking about the different ECS cluster modes.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. This section will be focusing on another type of compute, container computing. To understand the benefits of the AWS products and services which relate to containers, you'll need to understand what containers are and what benefits container computing provides. In this lesson, I aim to teach you just that. It's all theory in this lesson, but immediately following this is a demo lesson where you'll have the chance to make a container yourself. We've got a lot to get through though, so let's jump in and get started.
Before we start talking about containers, let's set the scene. What we refer to as virtualization should really be called operating system or OS virtualization. It's the process of running multiple operating systems on the same physical hardware. I've already covered the architecture earlier in the course, but as a refresher, we've got an AWS EC2 host running the Nitro hypervisor, and running on this hypervisor, we have a number of virtual machines.
Part of this lesson's objectives is to understand the difference between operating system virtualization and containers, and so the important thing to realize about these virtual machines is that each of them is an operating system with associated resources. What's often misunderstood is just how much of a virtual machine is taken up by the operating system alone. If you run a virtual machine with say 4GB of RAM and a 40GB disk, the operating system can easily consume 60 to 70% of the disk and much of the available memory, leaving relatively little for the applications which run in those virtual machines as well as the associated runtime environments.
So with the example on screen now, it's obvious that the guest operating system consumes a large percentage of the amount of resource allocated to each virtual machine. Now what's the likelihood with the example on screen that many of the operating systems are actually the same? Think about your own business servers, how many run Windows, how many run Linux, how many do you think share the same major operating system version. This is duplication. On this example, if all of these guest operating systems used the same or similar operating system, it's wasting resources, it's duplication.
And what's more, with these virtual machines, the operating system consumes a lot of system resources, so every operation that relates to these virtual machines, every restart, every stop, every start is having to manipulate the entire operating system. If you think about it, what we really want to do with this example is to run applications one through to six in separate isolated protected environments. To do this, do we really need six copies of the same operating system taking up disk space and host resources? Well, the answer is no, not when we use containers.
Containerization handles things much differently. We still have the host hardware, but instead of virtualization, we have an operating system running on this hardware. Running on top of this is a container engine, and you might have heard of a popular one of these called Docker. A container in some ways is similar to a virtual machine in that it provides an isolated environment which an application can run within, but where virtual machines run a whole isolated operating system on top of a hypervisor, a container runs as a process within the host operating system.
It's isolated from all of the other processors, but it can use the host operating system for a lot of things like networking and file I/O. For example, if the host operating system was Linux, it could run Docker as a container engine. Linux plus the Docker container engine can run a container. That container would run as a single process within that operating system, potentially one of many. But inside that process, it's like an isolated operating system. It has its own file systems isolated from everything else and it can run child processors inside it, which are also isolated from everything else.
So a container could run a web server or an application server and do so in a completely isolated way. What this means is that architecturally, a container would look something like this, something which runs on top of the base OS and container engine, but consumes very little memory. In fact, the only consumption of memory or disk is for the application and any runtime environment elements that it needs—so libraries and dependencies. The operating system could run lots of other containers as well, each running an individual application.
So using containers, we achieve this architecture, which looks very much like the architecture used on the previous example, which use virtualization. We're still running the same six applications, but the difference is that because we don't need to run a full operating system for each application, the containers are much lighter than the virtual machines. And this means that we can run many more containers on the same hardware versus using virtualization. This density, the ability to run more applications on a single piece of hardware is one of the many benefits of containers.
Let's move on and look at how containers are architected. I want you to start off by thinking about what an EC2 instance actually is, and what it is is a running copy of its EBS volumes, its virtual disks. An EC2 instance's boot volume is used, it's booted and using this, you end up with a running copy of an operating system running in a virtualized environment. A container is no different in this regard. A container is a running copy of what's known as a Docker image.
Docker images are really special, though. One of the reasons why they're really cool technology-wise is they're actually made up of multiple independent layers. So Docker images are stacks of these layers and not a single monolithic disk image, and you'll see why this matters very shortly. Docker images are created initially by using a Docker file, and this is an example of a simple Docker file which creates an image with a web server inside it ready to run.
So this Docker file creates this Docker image. Each line in a Docker file is processed one by one and each line creates a new file system layer inside the Docker image that it creates. Let's explore what this means and it might help to look at it visually. All Docker images start off being created either from scratch or they use a base image, and this is what this top line controls. In this case, the Docker image we're making uses CentOS 7 as its base image.
Now this base image is a minimal file system containing just enough to run an isolated copy of CentOS. All this is is a super thin image of a disk—it just has the basic minimal CentOS 7 base distribution. And so that's what the first line of the Docker file does—it instructs Docker to create our Docker image using as a basis this base image. So the first layer of our Docker image, the first file system layer is this basic CentOS 7 distribution.
The next line performs some software updates and it installs our web server, Apache in this case, and this adds another layer to the Docker image. So now our image is two layers—the base CentOS 7 image and a layer which just contains the software that we've just installed. This is critical in Docker—the file system layers that make up a Docker image are normally read only. So every change you make is layered on top as another layer, and each layer contains the differences made when creating that layer.
So then we move on in our Docker file and we have some slight adjustments made at the bottom. It's adding a script which creates another file system layer for a total of three. And this is how a Docker image is made—it starts off either from scratch or using a base layer and then each set of changes in the Docker file adds another layer with just those changes in, and the end result is a Docker image that we can use which consists of individual file system layers.
Now strictly speaking, the layers in this diagram are upside down—a Docker image consists of layers stacked on each other starting with the base layer. So the layer in red at the bottom and then the blue layer which includes the system updates and the web server should be in the middle and the final layer of customizations in green should be at the top. It was just easier to diagram it in this way but in actuality it should be reversed.
Now let's look at what images are actually used for—a Docker image is how we create a Docker container. In fact, a Docker container is just a running copy of a Docker image with one crucial difference—a Docker container has an additional read write file system layer. File system layers—so the layers that make up a Docker image by default, they're read only. They never change after they're created, and so this special read write layer is added which allows containers to run anything which happens in the container.
If log files are generated or if an application generates or reads data, that's all stored in the read write layer of the container. Each layer is differential and so it stores only the changes made against it versus the layers below. Together all stacked up they make what the container sees as a file system. But here is where containers become really cool—because we could use this image to create another container, container two.
This container is almost identical—it uses the same three base layers. So the CentOS 7 layer in red beginning AB, the web server and updates that are installed in the middle blue layer beginning 8-1 and the final customization layer in green beginning 5-7. They're both the same in both containers—the same layers are used so we don't have any duplication. They're read only layers anyway and so there's no potential for any overwrites.
The only difference is the read write layer which is different in both of these containers. That's what makes the container separate and keeps things isolated. Now in this particular case if we're running two containers using the same base image then the difference between these containers could be tiny. So rather than virtual machines which have separate disk images which could be tens or hundreds of gigs, containers might only differ by a few meg in each of their read write layers—the rest is reused between both of these containers.
Now this example has two containers but what if it had 200? The reuse architecture that's offered by the way that containers do their disk images scales really well. Disk usage when you have lots of containers is minimized because of this layered architecture, and the base layers, the operating systems, they're generally made available by the operating system vendors generally via something called a container registry and a popular one of these is known as Docker Hub.
The function of a container registry is almost revealed in the name—it's a registry or a hub of container images. As a developer or architect you make or use a Docker file to create a container image and then you upload that image to a private repository or a public one such as the Docker Hub, and for public hubs other people will likely do the same including vendors of the base operating system images such as the CentOS example I was just talking about.
From there these container images can be deployed to Docker hosts which are just servers running a container engine—in this case Docker. Docker hosts can run many containers based on one or more images and a single image can be used to generate containers on many different Docker hosts. Remember a container is a single thing—your eye could take a container image and both use that to generate a container, so that's one container image which can generate many containers, and each of these are completely unique because of this read write layer that a container gets the solo use of.
Now you can use the Docker Hub to download container images but also upload your own. Private registries can require authentication but public ones are generally open to the world. Now I have to admit I have a bad habit when it comes to containers—I'm usually all about precision in the words that I use but I've started to use Docker and containerization almost interchangeably. In theory, a Docker container is one type of container, a Docker host is one type of container host, and the Docker Hub is a type of container hub or a type of container registry operated by the company Docker.
Now even I start to use these terms interchangeably I'll try not to, but because of the popularity of Docker and Docker containers you will tend to find that people say Docker when they actually mean containers—so keep an eye out for that one. Now the last thing before we finish up and go to the demo I just want to cover some container key concepts just as a refresher.
You've learned that Docker files are used to build Docker images and Docker images are these multi-layer file system images which are used to run containers. Containers are a great tool for any solutions architect because they're portable and they always run as expected. If you're a developer and you have an application, if you put that application and all of its libraries into a container, you know that anywhere that there is a compatible container host that that application can run exactly as you intended with the same software versions.
Portability and consistency are two of the main benefits of using containerized computing. Containers and images are super lightweight—they use the host operating system for the heavy lifting but are otherwise isolated. Layers used within images can be shared and images can be based off other images. Layers are read only and so an image is basically a collection of layers grouped together which can be shared and reused.
If you have a large container environment, you could have hundreds or thousands of containers which are using a smaller set of container images, and each of those images could be sharing these base file system layers to really save on capacity—so if you've got larger environments, you could significantly save on capacity and resource usage by moving to containers.
Containers only run what's needed—so the application and whatever the application itself needs. Containers run as a process in the host operating system and so they don't need to be a full operating system. Containers use very little memory and as you will see, they're super fast to start and stop, and yet they provide much of the same level of isolation as virtual machines—so if you don't really need a full and isolated operating system, you should give serious thought to using containerization because it has a lot of benefits, not least is the density that you can achieve using containers.
Containers are isolated and so anything running in them needs to be exposed to the outside world—so containers can expose ports such as TCP port 80 which is used for HTTP, and so when you expose a container port, the services that that container provides can be accessed from the host and the outside world. It's important to understand that some more complex application stacks can consist of multiple containers—you can use multiple containers in a single architecture either to scale a specific part of the application or when you're using multiple tiers, so you might have a database container, you might have an application container, and these might work together to provide the functionality of the application.
Okay so that's been a lot of foundational theory and now it's time for a demo. In order to understand AWS's container compute services, you need to understand how containers work. This lesson has been the theory, and the following demo lesson is where you will get some hands-on time by creating your own container image and container. It's a fun way to give you some experience, so I can't wait to step you through it. At this point they'll go ahead and finish this video and when you're ready you can join me in the demo lesson.
-
- Apr 2025
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover two important EC2 optimisation topics, Enhanced Networking and EBS Optimised Instances; both of these are important on their own, both provide massive benefits to the way EC2 performs, and they support other performance features within EC2 such as placement groups. As a solutions architect understanding their architecture and benefits is essential, so let's get started.
Now let's start with Enhanced Networking, which is a feature designed to improve the overall performance of EC2 networking and is required for any high-end performance features such as cluster placement groups. Enhanced Networking uses a technique called SRIOV or Single Route IO Virtualisation, and I've mentioned this earlier in the course.
At a high level, it makes it so that a physical network interface inside an EC2 host is aware of virtualisation. Without Enhanced Networking this is how networking looks on an EC2 host architecturally: in this example we have two EC2 instances, each of them using one virtual network interface, and both of these virtual network interfaces talk back to the EC2 host and each of them use the host's single physical network interface.
The crucial thing to understand here is that the physical network interface card isn't aware of virtualisation, and so the host has to sit in the middle controlling which instance has access to the physical card at one time. It's a process taking place in software so it's slower and it consumes a lot of host CPU, and when the host is under heavy load—so CPU or IO—it can cause drops in performance, spikes in latency, and changes in bandwidth; it's not an efficient system.
Enhanced Networking or SRIOV changes things: using this model, the host has network interface cards which are aware of virtualisation, and instead of presenting themselves as single physical network interface cards which the host needs to manage, it offers what you can think of as logical cards—multiple logical cards per physical card.
Each instance is given exclusive access to one of these logical cards and it sends data to this the same as it would do if it did have its own dedicated physical card, and the physical network interface card handles this process end to end without consuming mass amounts of host CPU.
This means a few things which matter to us as solutions architects: first, in general it allows for higher IO across all instances on the host and lower host CPU as a result, because the host CPU doesn't have the same level of involvement as when no enhanced networking is used.
What this translates into directly is more bandwidth—it allows for much faster networking speeds because it can scale and it doesn't impact the host CPU. Also, because the process occurs directly between the virtual interface that the instance has and the logical interface that the physical card offers, you can achieve higher packets per second or PPS, and this is great for applications which rely on networking performance, specifically those which need to shift lots of small packets around the small isolated network.
And lastly, because the host CPU isn't really involved—because it's offloaded to the physical network interface card—you get low latency and perhaps more importantly consistent low latency.
Enhanced networking is a feature which is either enabled by default or available for no charge on most modern EC2 instance types; there's a lot of detail in making sure that you have it enabled, but for the solutions architect stream none of that is important. As always though, I'll include some links attached to the lesson if you do want to know how to implement it operationally.
Okay, so that's enhanced networking, now let's move on to EBS optimized instances. Whether an instance is EBS optimized or not depends on an option that's set on a per instance basis—it's either on or it's off.
To understand what it does, it's useful to appreciate the context: what we know already is that EBS is block storage for EC2, which is delivered over the network. Historically, networking on EC2 instances was actually shared, with the same network stack being used for both data networking and EBS storage networking, and this resulted in contention and limited performance for both types of networking.
Simply put, an instance being EBS optimized means that some stack optimizations have taken place and dedicated capacity has been provided for that instance for EBS usage; it means that faster speeds are possible with EBS and the storage side of things doesn't impact the data performance and vice versa.
Now, on most instances that you'll use at this point in time, it's supported and enabled by default at no extra charge, and disabling it has no effect because the hardware now comes with the capability built in. On some older instances, it's supported but enabling it costs extra.
EBS optimization is something that's required on instance types and sizes which offer higher levels of performance—so things which offer high levels of throughput and IOPS—especially when using the GP2 and IO1 volume types which promise low and consistent latency as well as high input output operations per second.
So that's EBS optimization; it's nothing complex—it essentially just means adding dedicated capacity for storage networking to an EC2 instance, and at this point in time it's generally enabled and comes with all modern types of instances, so it's something you don't have to worry about, but you do need to know that it exists.
Now, that's the theory that I wanted to cover—I wanted to keep it brief. There's a lot more involved in using both of these and understanding the effects that they can have, but this is an architecture lesson for this stream; you just need to know that both features exist and what they enable you to do—what features they provide at a high level.
So thanks for watching, go ahead, finish this video, and when you're ready you can join me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson I want to cover EC2 dedicated hosts, a feature of EC2 which allows you to gain access to hosts dedicated for your use, which you can then use to run EC2 instances. Now I want to keep it brief because for the exam you just need to know that the feature exists, and it tends to have a fairly narrow use case in the real world. So let's just cover the really high-level points and exactly how it works architecturally. So let's jump in and get started.
An EC2 dedicated host, as the name suggests, is an EC2 host which is allocated to you in its entirety, so allocated to your AWS account for you to use. You pay for the host itself, which is designed for a specific family of instances, for example A1, C5, M5 and so on. Because you're paying for the host, there are no charges for any instances which are running on the host; the host has a capacity, and you're paying for that capacity in its entirety, so you don't pay for instances running within that capacity.
Now you can pay for a host in a number of ways: either on demand, which is good for short-term or uncertain requirements, or once you understand long-term requirements and patterns of usage, you can purchase reservations with the same one or three-year terms as the instances themselves, and this uses the same payment method architecture—so all upfront, partial upfront or no upfront.
The host hardware itself comes with a certain number of physical sockets and cores, and this is important for two reasons: number one, it dictates how many instances can be run on that host, and number two, software which is licensed based on physical sockets or cores can utilize this visibility of the hardware. Some enterprise software is licensed based on the number of physical sockets or cores in the server; imagine if you're running some software on a small EC2 instance but you have to pay for the software licensing based on the total hardware in the host that that instance runs on, even though you can't use any of that extra hardware without paying for more instance fees.
With dedicated hosts, you pay for the entire host, so you can license based on that host which is available and dedicated to you, and then you can use instances on that host free of charge after you've paid the dedicated host fees. So the important thing to realize is you pay for the host; once you've paid for that host, you don't have any extra EC2 instance charges, you're covered for the consumption of the capacity on that host.
Now the default way that dedicated hosts work is that the hosts are designed for a specific family and size of instance, so for example an A1 dedicated host comes with one socket and 16 cores. All but a few types of dedicated hosts are designed to operate with one specific size at a time, so you can get an A1 host which can run 16 A1 medium instances, or 8 large, or 4 extra large, or 2 extra large, or 1 4 extra large; all of these options consume the 16 cores available, and all but a few types of dedicated hosts require you to set that in advance—so they require you to set that one particular host can only run 8 large instances, or 4 extra large, or 2 extra large, and you can't mix and match.
Newer types of dedicated hosts, so those running the Nitro virtualization platform, offer more flexibility; an example of this is an R5 dedicated host which offers 2 sockets and 48 cores. Because this is Nitro-based, you can use different sizes of instances at the same time up to your core limit of that dedicated host—so one host might be running 1 12 extra large, 1 4 extra large and 4 2 extra large, which consumes 48 cores of that dedicated host; another host might use a different configuration, maybe 4 4 extra large and 4 2 extra large, which also consumes 48 cores.
With Nitro-based dedicated hosts, there's a lot more flexibility allowing a business to maximize the value of that host, especially if they have varying requirements for different sizes of instances. Now this is a great link which I've included in the lesson text which details the different dedicated host options available—so you've got different dedicated hosts for different families of instance, for example the A1 instance family; this offers 1 physical socket and 16 physical cores and offers different configurations for different sizes of instances.
Now if you scroll all the way down, it also gives an overview of some of the Nitro-based dedicated hosts which support this mix-and-match capability—so we've got the R5 dedicated host that I just talked about on the previous screen; we've also got the C5 dedicated host and this gives 2 example scenarios. In scenario 1 you've got 1 instance of a C5 9 extra large, 2 instances of C5 4 extra large and 1 instance of C5 extra large, and that's a total cores consumed of 36; there's also another scenario though where you've got 4 times 4 extra large, 1 times extra large and 2 times large—same core consumption but a different configuration of instances. And again, I'll make sure this is included in the lesson description; it also gives the on-demand pricing for all of the different types of dedicated host.
Now there are some limitations that you do need to keep in mind for dedicated host; the first one is AMI limits—you can't use REL, Seuss Linux or Windows AMIs with dedicated host, they are simply not supported. You cannot use Amazon RDS instances—again, they're not supported. You can't utilize placement groups—they're not supported on dedicated hosts, and there's a lesson in this section which talks in depth about placement groups, but in this context, as it relates to dedicated hosts, you cannot use placement groups with dedicated hosts—it's not supported.
Now with dedicated hosts, they can be shared with other accounts inside your organization using the RAM product, which is the resource access manager—it's a way that you can share certain AWS products and services between accounts; we haven't covered it yet, but we will do later in the course. You're able to share a dedicated host with other accounts in your organization, and other AWS accounts in your organization can then create instances on that host.
Those other accounts which have a dedicated host shared into them can only see instances that they create on that dedicated host; they can't see any other instances, and you, as the person who owns the dedicated host, you can see all of the instances running on that host, but you can't control any of the instances running on your host created by any accounts you share that host with—so there is a separation: you can see all of the instances on your host, you can only control the ones that you create, and then other accounts who get that host shared with them—they can only see instances that they create, so there's a nice security and visibility separation.
Now that's all of the theory that I wanted to cover around the topic of dedicated hosts; you don't need to know anything else for the exam, and if you do utilize dedicated hosts for any production usage in the real world, it is generally going to be around software licensing. Generally using dedicated hosts, there are restrictions—obviously they are specific to a family of instance, so it gives you less customizability, it gives you less flexibility on sizing, and you generally do it if you've got licensing issues that you need solved by this product.
In most cases, in most situations, it's not the approach you would take if you just want to run EC2 instances. But with that being said, go ahead, complete this video, and when you're ready, I'll look forward to you joining me in the next one.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about an important feature of EC2 known as placement groups. Normally when you launch an EC2 instance its physical location is selected by AWS placing it on whatever EC2 host makes the most sense within the availability zone that it's launched in. Placement groups allow you to influence placement ensuring that instances are either physically close together or not. As a Solutions Architect understanding how placement groups work and why you would use them is essential so let's jump in and get started.
There are currently three types of placement groups for EC2. All of them influence how instances are arranged on physical hardware but each of them do it for different underlying reasons. At a high level we have cluster placement groups and these are designed to ensure that any instances in a single cluster placement group are physically close together. We've got spread placement groups which are the inverse ensuring that instances are all using different underlying hardware and then we've got partition placement groups and these are designed for distributed and replicated applications which have infrastructure awareness, so where you want groups of instances but where each group is on different hardware. So I'm going to cover each of them in detail in this lesson once we talk about each of them they'll all make sense.
Now cluster and spread tend to be pretty easy to understand. Partition is less obvious if you haven't used the type of application which they support but it will be clear once I've explained it and once you've finished with this lesson. Now let's start with cluster placement groups.
Cluster placement groups are used when you want to achieve the absolute highest level of performance possible within EC2. With cluster placement groups you create the group and best practice is that you launch all of the instances which will be in the group all at the same time. This ensures that AWS allocate capacity for everything that you require. So for example if you launch with nine instances imagine that AWS place you in a location with the capacity for 12. If you want to double the number of instances you might have issues. Best practice is to use the same type of instance as well as launching them all at the same time because then AWS will place all of them in a suitable location with capacity for everything that you need.
Now cluster placement groups because of their performance focus have to be launched into a single availability zone. Now how this works is that when you create the placement group you don't specify an availability zone. Instead when you launch the first instance or instances into that placement group it will lock that placement group to whichever availability zone that instance is also launched into. The idea with cluster placement groups is that all of the instances within the same cluster placement group generally use the same rack but often the same EC2 host. All of the instances within a placement group have fast direct bandwidth to all other instances inside the same placement group and when transferring data between instances within that cluster placement group they can achieve single stream transfer rates of 10 GB per second versus the usual 5 GB per second which is achievable normally.
Now this is single stream transfer rates while some instances do offer significantly faster networking you're always going to be limited to the speed that a single stream of data a single connection can achieve and inside a cluster placement group this is 10 GB per second versus 5 GB per second which is achievable normally. Now the connections between these instances because of the physical placement they're the lowest latency possible and the maximum packets per second possible within AWS. Now obviously to achieve these levels of performance you need to be using instances with high performance networking so IE more bandwidth than the 10 GB per second single stream and you should also use enhanced networking on all instances so definitely to achieve the low latency and max packets per second you do need also to use enhanced networking.
So cluster placement groups are used when you really need performance. They're needed to achieve the highest levels of throughput and the lowest consistent latencies within AWS but the trade-off is because of the physical location if the hardware that they're running on fails logically it could take down all of the instances within that cluster placement group. So cluster placement groups offer little to no resilience.
Now some key points which you need to be aware of for the exam you cannot span availability zones with cluster placement groups this is locked when launching the first instance. You can span VPC peers but this does significantly impact performance in a negative way. Cluster placement groups are not supported on every type of instance it requires a supported instance type and generally you should use the same type of instance to get the best results though this is not mandatory and you should also launch them at the same time. Also this is not mandatory but it is very recommended and ideally you should always launch all of the instances as the same type and at the same time when using cluster placement groups.
Now cluster placement groups offer 10 GB per second of single stream performance and the type of use cases where you would use them are any type of workloads which demand performance so fast speeds and low latency. So this might be things like high performance compute or other scientific analysis which demand fast node-to-node speed and low consistent latency.
Now the next type of placement group I want to talk about is spread placement groups and these are designed to ensure the maximum amount of availability and resilience for an application. So spread placement groups can span multiple availability zones in this case availability zone A and availability zone B. Instances which are placed into a spread placement group are located on separate isolated infrastructure racks within each availability zone so each instance has its own isolated networking and power supply separate from any of the other instances also within that same spread placement group. This means if a single rack fails either from a networking or power perspective the fault can be isolated to one of those racks.
Now with spread placement groups there is a limit to seven instances per availability zone because each instance is in a completely separate infrastructure rack and because there are limits on the number of these within each availability zone you do have that limit of seven instances per availability zone for spread placement groups. Now the more availability zones in a region logically the more instances can be a part of each spread placement group but remember the seven instances per availability zone in that region.
Now again just some points that you should know for the exam spread placement groups provides infrastructure isolation so you're guaranteed that every instance launched into a spread placement group will be entirely separated from every other instance that's also in that spread placement group. Each instance runs from a different rack each rack has its own network and power source and then just to stress again there is this hard limit of seven instances per availability zone. Now with spread placement groups you can't use dedicated instances or hosts they're not supported and in terms of use cases spread placement groups are used when you have a small number of critical instances that need to be kept separated from each other so maybe mirrors of a file server or maybe different domain controllers within an organization anywhere where you've got a specific application and you need to ensure as high availability for each member of that application as possible where you want to create separate blast radiuses for each of the servers within that particular application and ensure that if one fails there is a smaller chance as possible that any of the other instances will fail. You have to keep in mind these limits it's seven instances per availability zone but if you want to maximize the availability of your application this is the type of placement group to choose.
Now lastly we've got partition placement groups and these have a similar architecture to spread placement groups which is why they're often so difficult to understand fully and why it's often so difficult to pick between partition placement groups and spread placement groups. Partition placement groups are designed for when you have infrastructure where you have more than seven instances per availability zone but you still need the ability to separate those instances into separate fault domains.
Now a partition placement group can be created across multiple availability zones in a region in this example az a and az b and when you're creating the partition placement group you specify a number of partitions with a maximum of seven per availability zone in that region. Now each partition inside the placement group has its own racks with isolated power and networking and there is a guarantee of no sharing of infrastructure between those partitions.
Now so far this sounds like spread placement groups except with partition placement groups you can launch as many instances as you need into the group and you can either select the partition explicitly or have EC2 make that decision on your behalf. With spread placement groups remember you had a maximum of seven instances per availability zone and you knew 100% that each instance within that spread placement group was separated from every other instance in terms of hardware. With partition placement groups each partition is isolated but you get to control which partition to launch instances into. If you launch 10 instances into one partition and it fails you lose all 10 instances. If you launch seven instances and put one into each separate partition then it behaves very much like a spread placement group.
Now the key to understanding the difference is that partition placement groups are designed for huge scale parallel processing systems where you need to create groupings of instances and have them separated. You as the designer of a system can have control over which instances are in the same and different partitions so you can design your own resilient architecture. Partition placement groups offer visibility into the partitions. You can see which instances are in which partitions and you can share this information with topology aware applications such as HDFS, HBase and Cassandra. Now these applications use this information to make intelligent data replication decisions.
Imagine that you had an application which used 75 EC2 instances. Each of those instances had its own storage and that application replicated data three times across that 75 instances. So each piece of data was replicated on three instances and so essentially you had three replication groups each with 25 instances. If you didn't have the ability to use partition placement groups then in theory all of those 75 instances could be in the same hardware and so you wouldn't have that resiliency. With partition placement groups if the application is topology aware then it becomes possible to replicate data across different EC2 instances knowing that those instances are in separate partitions and so it allows more complex applications to achieve the same types of resilience as you get with spread placement groups only that it has an awareness of that topology and it can cope with more than seven instances.
So the difference between spread and partition placement is that with spread placement it's all handled for you but you have that seven instance per availability zone limit but with partition placement groups you can have more instances but you or your application which is topology aware needs to administer the partition placement. For larger scale applications that support this type of topology awareness this can significantly improve your resilience.
Now some key points for the exam around partition placement groups again seven partitions per availability zone instances can be placed into a specific partition or you can allow EC2 to automatically control that placement. Partition placement groups are great for topology aware applications such as HDFS, HBase and Cassandra and partition placement groups can help a topology aware application to contain the impact of a failure to a specific part of that application. So by the application and AWS working together using partition placement groups it becomes possible for large-scale systems to achieve significant levels of resilience and effective replication between different components of the application.
Now it's essential that you understand the difference between all three for the exam so make sure before moving on in the course you are entirely comfortable about the differences between spread placement groups and partition placement groups and then the different situations where you would choose to use cluster, spread and partition. With that being said though that's everything I wanted to cover so go ahead and complete this lesson and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. So far in the course you've had a brief exposure to CloudWatch and CloudWatch logs, and you know that CloudWatch monitors certain performance and reliability aspects of EC2, but crucially only those metrics that are available on the external face of an EC2 instance.
There are situations when you need to enable monitoring inside an instance, so you have access to certain performance counters of the operating system itself, such as the ability to look at the processes running on an instance, the memory consumption of those processes, and generally access certain operating system level performance metrics that you cannot see outside the instance.
You also might want to allow access to system and application logging from within the EC2 instance, meaning both application logs and system logs from within the operating system of an EC2 instance.
So in this lesson I want to step through exactly how this works and what you need to use to achieve it, so let's get started.
Now, a quick summary of where we're at so far in the course relevant to this topic: I just mentioned you know now that CloudWatch is the product responsible for storing and managing metrics within AWS, and you also know that CloudWatch logs is a subset of that product aimed at storing, managing, and visualizing any logging data, but neither of those products can natively capture any data or logs happening inside of an EC2 instance.
The products aren't capable of getting visibility inside of an EC2 instance natively, and the inside of an instance is opaque to CloudWatch and CloudWatch logs by default.
To provide this visibility, the CloudWatch agent is required, and this is a piece of software which runs inside an EC2 instance, running on the operating system to capture OS-visible data and send it into CloudWatch or CloudWatch logs, so that you can then use it and visualize it within the console of both of those products.
Logically, for the CloudWatch agent to function, it needs to have the configuration and permissions to be able to send that data into both of those products, so in summary, in order for CloudWatch and CloudWatch logs to have access inside of an EC2 instance, there's some configuration and security work required in addition to having to install the CloudWatch agent, and that's what I want to cover over the remainder of this lesson and the upcoming demo lesson.
Architecturally, the CloudWatch agent is pretty simple to understand: we've got an EC2 instance on its own—for example, the animals for life WordPress instance from the previous demos—and it's incapable of injecting any logging into CloudWatch logs without the agent being installed.
To fix that, we need to install the CloudWatch agent within the EC2 instance, and the agent will need some configuration; it needs to know exactly what information to inject into CloudWatch and CloudWatch logs, so we need to configure the agent and supply the configuration information so that the agent knows what to do.
The agent also needs some way of interacting with AWS—some permissions—and we know now that it's bad practice to add long-term credentials to an instance, so we don't want to do that; but that aside, it's also difficult to manage that at scale.
So best practice for using this type of architecture is to create an IAM role with permissions to interact with CloudWatch logs, and then we can attach this IAM role to the EC2 instance, providing the instance—or more specifically, anything running on the instance—with access to the CloudWatch and CloudWatch logs service.
Now, the agent configuration also needs to be set up to configure the metrics and the logs that we want to capture, and these are all injected into CloudWatch using log groups; we'll configure one log group for every log file that we want to inject into the product, and then within each log group, there'll be a log stream for each instance performing this logging.
So that's the architecture: one log group for each individual log that we want to capture, and then one log stream inside that log group for every EC2 instance that's injecting that logging data.
To get this up and running for a single instance, you can do it manually—you can log into the instance, install the agent, configure it, attach a role, and start injecting the data—but at scale, you'll need to automate the process, and potentially you can use CloudFormation to include that agent configuration for every single instance that you provision.
Now, CloudWatch agent comes with a number of ways to obtain and store the configuration that it will use to send this data into CloudWatch logs, and one of those ways is to actually use the parameter store and store the agent configuration as a parameter.
Because we've just learned about parameter store, I thought it would be beneficial—along with demonstrating how to install and configure the CloudWatch agent—to also utilize the parameter store to store that configuration, and so that's what we're going to do together in the next demo lesson.
We're going to install and configure the CloudWatch agent and set it up to collect logging information for three different log files: we’re going to set it up to collect and inject logging for /var/log/secure, which shows any events relating to secure logins to the EC2 instance, and we’re also going to collect logging information for the access log and the error log, which are both log files generated by the Apache web server that's installed on the EC2 instance.
By using these three different log files, it should give you some great practical experience on how to configure the CloudWatch agent and how to use the parameter store to store configuration at scale.
So that's it for the theory for now; you can go ahead and finish off this video, and then when you're ready, you can join me in the next demo lesson where we'll be installing and configuring the CloudWatch agent.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I want to cover the Systems Manager parameter store, a service from AWS which makes it easy to store various bits of system configuration—so strings, documents, and secrets—and store those in a resilient, secure, and scalable way.
So let's step through the product’s architecture, including how to best make use of it. If you remember earlier in this section of the course, I mentioned that passing secrets into an EC2 instance using user data was bad practice because anyone with access to the instance could access all that data; well, parameter store is a way that this can be improved.
Parameter store lets you create parameters, and these have a parameter name and a parameter value, with the value being the part that stores the actual configuration. Many AWS services integrate with the parameter store natively—CloudFormation offers integrations that you've already used, which I'll explain in a second and in the upcoming demo lessons—and you can also use the CLI tooling on an EC2 instance to get access to the service.
So when I'm talking about configuration and secrets, parameter store offers the ability to store three different types of parameters: we've got strings, string lists, and secure strings. Using these three different types of parameters, you can store things inside the product such as license codes, database connection strings (so host names, ports), and you can even store full configs and passwords.
Now, parameter store also allows you to store parameters using a hierarchical structure and also stores different versions of parameters—so just like we've got object versioning in S3, inside parameter store we can also have different versions of parameters.
Parameter store can also store plain text parameters, which is suitable for things like DB connection strings or DB users, but we can also use cipher text, which integrates with KMS to allow you to encrypt parameters; this is useful if you're storing passwords or other sensitive information that you want to keep secret.
So when you encrypt using cipher text, you use KMS, and that means you need permissions on KMS as well, so there's that extra layer of security. The parameter store's also got the concept of public parameters—these are parameters publicly available and created by AWS.
You've used these earlier in the course—an example is when you've used CloudFormation to create EC2 instances, you haven't had to specify a particular AMI to use because you've consulted a public parameter made available by AWS, which is the latest AMI ID for a particular operating system in that particular region—and I'll be demonstrating exactly how that works now in the upcoming demo lessons.
Now, the architecture of the parameter store is simple enough to understand—it's a public service, so anything using it needs to either be an AWS service or have access to the AWS public space endpoints.
Different types of things can use the parameter store, so this might be things like applications, EC2 instances, all the things running on those instances, and even Lambda functions, and they can all request access to parameters inside the parameter store.
As parameter store is an AWS service, it's tightly integrated with IAM for permissions, so any accesses will need to be authenticated and authorized—and that might use long-term credentials like access keys or those credentials might be passed in via an IAM role.
And if parameters are encrypted, then KMS will be involved and the appropriate permissions to the CMK inside KMS will also be required.
Now, parameter store allows you to create simple or complex sets of parameters—for example, you might have something simple like myDB password, which stores your database password in an encrypted form, but you can also create hierarchical structures.
So something like /wordpress/ and inside there we might have something called dbUser, which could be accessed either by using its full name or requesting the WordPress part of the tree; we could also have dbPassword, which again, because it's under the WordPress branch of the tree, could be accessed along with the dbUser by pulling the whole WordPress tree or accessed individually by using its full name, so /wordpress/dbPassword.
Now, we might also have applications which have their own part of the tree, for example, my-cat-app, or you might have functional division in your organization, so giving your dev team a branch of the tree to store their passwords.
Permissions are flexible and they can be set either on individual parameters or whole trees. Everything supports versioning and any changes that occur to any parameters can also spawn events—and these events can start off processes in other AWS services, and I'll introduce this later.
I just want to mention it now so you understand that parameter store parameter changes can initiate events that occur in other AWS products.
Now, parameter store isn't a hugely complex product to understand, and so at this point, I've covered all of the theory that you'll need for the associate level exam.
What I want to do now is to finish off this theory lesson, and immediately following this is a demo where I want to show you how you can interact with the parameter store via the console UI and the AWS command line tools.
Now, it will be a relatively brief demo, and so you're welcome to just watch me perform the steps, or of course, as always, you can follow along with your own environment—and I'll be providing all the resources that you need to do that inside that demo lessons folder in the course GitHub repository.
So at this point, go ahead, finish this video, and when you're ready, you can join me in the next demo lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I'm going to be covering a topic which is probably slightly beyond what you need for the Solutions Architect Associate exam, but additional understanding of CloudFormation is never a bad thing and it will help you answer any automation style questions in the exam, and so I'm going to talk about it anyway. CloudFormation in it is a way that you can pass complex bootstrapping instructions into an EC2 instance, and it's much more complex than the simple user data example that you saw in the previous lesson.
Now we do have a lot to cover, so let's jump in and step through the theory before we move to another demo lesson. In the previous lesson I showed you how CloudFormation handled user data, and it works in a similar way to the console UI where you pass in base 64 encoded data into the instance operating system and it runs as a shell script. Now there's another way to configure EC2 instances, a way which is much more powerful.
It's called cfn-init, and it's officially referred to by AWS as a helper script which is installed on EC2 operating systems such as Amazon Linux 2. Now cfn-init is actually much more than a simple helper script — it's much more like a simple configuration management system. User data is what's known as procedural; it's a script, it's run by the operating system line by line. Now cfn-init can also be procedural — it can be used to run commands just like user data — but it can also be desired state, where you direct it how you want something to be.
What's the desired state of an EC2 instance, and it will perform whatever is required to move the instance into that desired state. So for example, you can tell cfn-init that you want a certain version of the Apache web server to be installed, and if that's already the case — if Apache is already installed and it's the same version — then nothing is done. However, if Apache is not installed then cfn-init will install it, or it will update any older versions to that version.
cfn-init can do lots of pretty powerful things — it can make sure packages are installed even with an awareness of versions, it can manipulate operating system groups and users, it can download sources and extract them onto the local instance even using authentication, it can create files with certain contents, permissions, and ownerships, it can run commands and test that certain conditions are true after the commands have run, and it can even control services on an instance — so in ensuring that a particular service is started or enabled to be started on the boot of the OS.
cfn-init is executed like any other command by being passed into the instance as part of the user data, and it retrieves its directives from the CloudFormation stack, and you define this data in a special part of each logical resource inside CloudFormation templates called aws double colon cloud formation double colon init — and don't worry, you'll get a chance to see this very soon in the demo.
So the instance runs cfn-init, it pulls this desired state data from the CloudFormation stack that you put in there via the CloudFormation template, and then it implements the desired state that's specified by you in that data. So let's quickly look at this architecture visually — the way that cfn-init works is probably going to be easier to understand if we do take a look at it visually, and once you see the individual components it's a lot simpler than I've made it sound on the previous screen.
It all starts off with a CloudFormation template, and this one creates an EC2 instance — and you'll see this in action yourself very soon. Now the template has a logical resource inside it called EC2 instance, which is to create an EC2 instance, and it has this new special component: metadata and aws double colon cloud formation double colon init — and this is where the cfn init configuration is stored. The cfn init command itself is executed from the user data that's passed into that instance.
So the CloudFormation template is used to create a stack which itself creates an EC2 instance, and the cfn-init line in the user data at the bottom here is executed by the instance. This should make sense now — anything in the user data is executed when the instance is first launched. Now if you look at the command for cfn-init, you'll notice that it specifies a few variables — specifically a stack ID and a region.
Remember, this instance is being created using CloudFormation, and so these variables are actually replaced for the actual values before this ends up inside the EC2 instance — so the region will be replaced with the actual region that the stack is created in, and the stack ID is the actual stack ID that's being created by this template, and these are all passed in to cfn-init. This allows cfn-init to communicate with the CloudFormation service and receive its configuration, and it can do that because of those variables passed into the user data by CloudFormation.
Once cfn-init has this configuration, then because it's a desired state system, it can implement the desired state that's specified inside the CloudFormation by you. And another amazing thing about this process, or about cfn-init and its associated tools, is that it can also work with stack updates.
Remember that the user data works once, while cfn-init can be configured to watch for updates to the metadata on an object in a template, and if that metadata changes, then cfn-init can be executed again and it will update the configuration of that instance to the desired state specified inside the template — it's really powerful. Now this is not something that user data can do — user data only works the once when you launch the instance.
Now in the demo lesson which immediately follows this one, you're going to experience just how cool this cfn-init process is. The WordPress CloudFormation template that you used in the previous demo — which included some user data — I've updated that and I've supplied a new version which uses this CloudFormation in its process or cfn-init, so you'll get to see how it's different and exactly how that looks when you apply it into your AWS account.
Now there's one more really important feature of CloudFormation which I want to cover — as you start performing more advanced bootstrapping, it will start to matter more and more. This feature is called CloudFormation creation policies and CloudFormation signals — so let's look at that next.
On the previous example there was another line passed into the user data — the bottom line: cfn-signal. Without this, the resource creation process inside CloudFormation is actually pretty dumb. You have a template which is used to create a stack which creates an EC2 instance — let's say you pass in some user data, this runs, and then the instance is marked as complete.
The problem though is we don't actually know if the resource actually completed successfully — CloudFormation has created the resource and passed in the user data, but I've already said that CloudFormation doesn't understand the user data, it just passes it in. So if the user data has a problem — if the instance bootstrapping process fails and from a customer perspective the instance doesn't really work — CloudFormation won't know. The instance is going to be marked as complete regardless of how the configuration is inside that instance.
Now this is fine when we're creating resources like a blank EC2 instance when there is no post-launch configuration — if EC2 reports to CloudFormation that it's successfully provisioned an instance then we can rely on that; if we're creating an S3 bucket and S3 reports to CloudFormation that it's worked okay then it's worked okay. But what if there's extra configuration happening inside the resource such as this bootstrapping process?
We need a better way — a way that the resource itself, the EC2 instance in this case, can inform CloudFormation if it's being configured correctly or not. This is how creation policies work, and this is a creation policy. A creation policy is something which is added to a logical resource inside a CloudFormation template — you create it and you supply a timeout value.
This one has 15 minutes, and this is used to create a stack which creates an instance. So far the process is the same — but at this point CloudFormation waits. It doesn't move the instance into a create complete status when EC2 signals that it's been created successfully — instead it waits for a signal, a signal from the resource itself.
So even though EC2 has launched the instance, even though its status checks pass and it's told CloudFormation that the instance is provisioned and ready to go — CloudFormation waits. It waits for a signal from the resource itself. The CFN signal command at the bottom is given the stack ID, the resource name, and the region — and these are passed in by the CloudFormation stack when the resource is created.
So the CFN signal command understands how to communicate with the specific CloudFormation stack that it's running inside. The -e $? part of that command represents the state of the previous command — so in this case the CFN init command is going to perform this desired state configuration, and if the output of that command is an OK state, then the OK is sent as a signal by CFN signal.
If CFN init reports an error code, then this is sent using CFN signal to the CloudFormation stack — so CFN signal is reporting to CloudFormation the success or not of the CFN init bootstrapping, and this is reported to the CloudFormation stack. If it's a success code — so if CFN init worked as intended — then the resource is moved into a create complete state; if CFN signal reports an error, the resource in CloudFormation shows an error.
If nothing happens for 15 minutes — the timeout value — then CloudFormation assumes it's erred and doesn't let the stack create successfully — the resource will generate an error. Now you'll see creation policies feature in more complex CloudFormation templates either within EC2 instance resources or within auto scaling groups that we'll be covering later in the course.
Now you won't need to know the technical implementation details of this for the Solutions Architect Associate exam, but I do expect the knowledge of this architecture will help you in any automation related questions. And now it's time for a quick demonstration — I just want you to have some experience in using a template which uses CFN init and also one which uses the creation policy, so I hope this theory has been useful to you and when you're ready for the demo, go ahead and complete this video and you can join me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in the first real lesson of the advanced EC2 section of the course, I want to introduce EC2 Bootstrapping. Now, this is one of the most powerful features of EC2 available for us to use as solutions architects because it's what allows us to begin adding automation to the solutions that we design. Bootstrapping is a process where scripts or other bits of configuration can be run when an instance is first launched, meaning that an instance can be brought into service in a certain pre-configured state. So, unlike just launching an instance with an AMI and having it be in its default state, we can bootstrap in a certain set of configurations or software installs.
Now, let's look at how this works from a theory perspective, and then you'll get a chance to implement this yourself in the following demo lesson. Now, bootstrapping is a process which exists outside EC2 — it's a general term. In systems automation, bootstrapping is a process which allows a system to self-configure or perform some self-configuration steps. In EC2, it allows for build automation — some steps which can occur when you launch an instance to bring that instance into a configured state. Rather than relying on a default AMI or an AMI with a pre-baked configuration, it allows you to direct an EC2 instance to do something when launched, so perform some software installations and then some post-installation configuration.
With EC2, bootstrapping is enabled using EC2 user data, and this is injected into the instance in the same way that metadata is. In fact, it's accessed using the metadata IP address — so 169.254, 169.254, also known as 169.254 repeating — but instead of latest /meta-data, it's latest /user-data. The user data is a piece of data, a piece of information that you can pass into an EC2 instance, and anything that you pass in is executed by the instance's operating system. And here's the important thing to remember — it's executed only once at launch time. If you update the user data and restart an instance, it's not executed again, so it's only the once. User data applies only to the first initial launch of the instance — it's for launch time configuration only.
Now, another important aspect is that EC2 as a service doesn't validate this user data, it doesn't interpret it in any way. You can tell EC2 to pass in some random data and it will, you can tell EC2 to pass in commands which will delete all of the data on the boot volume and the instance will do so. EC2 doesn't interpret the data — it just passes the data into the instance via user data, and there's a process on the operating system which runs this as the root user. So, in summary, the instance needs to understand what you pass in because it's just going to run it.
Now, the bootstrapping architecture is pretty simple to understand — an AMI is used to launch an EC2 instance in the usual way and this creates an EBS volume which is attached to the EC2 instance, and that's of course based on the block device mapping inside the AMI. This part we understand already. Where it starts to differ is that now the EC2 service provides some user data through to the EC2 instance, and there's software within the operating system running on EC2 instances which is designed to look at the metadata IP for any user data, and if it sees any user data then it executes this on launch of that instance.
Now, this user data is treated just like any other script that the operating system runs — it needs to be valid, and at the end of running the script the EC2 instance will either be in a running state and ready for service, meaning that the instance has finished its startup process, the user data ran and it was successful, and the instance is in a functional and running state. Or, the worst case is that the user data errors in some way, so the instance would still be in a running state because the user data is separate from EC2 — EC2 just delivers it into the instance. The instance would still pass its status checks, and assuming you didn't run anything which deleted mass amounts of OS data, you could probably still connect to it, but the instance would likely not be configured as you want — it would be a bad configuration. So that's critical to understand — the user data is just passed in in an opaque way to the operating system, it's up to the operating system to execute it, and if executed correctly the instance will be ready for service. If there's a problem with the user data, you will have a bad config — this is one of the key elements of user data to understand. It's one of the powerful features but also one of the risky ones — you pass the instance user data as a block of data, it runs successfully or it doesn't, and from EC2's perspective it's simply opaque data, it doesn't know or care what happens to it.
Now, user data is also not secure — anyone who can access the instance operating system can access the user data, so don't use it for passing in any long-term credentials, at least not ideally. Now in the demo, we'll be doing just that — we'll be doing bad practice by passing into the instance using the user data some long-term credentials, but this is intentional — it's part of your learning process. As we move through the course, we'll evolve the design and implementations to use more AWS services, and some of these include better ways to handle secrets inside EC2, so I need to show you the bad practice before I can compare it to the good.
Now, user data is limited to 16 kilobytes in size — for anything more complex than that, you would need to pass in a script which would download that larger data. User data can be modified — if you shut down the instance, change the user data and start it up again, then new data is available inside the instance's user data, but the contents are only executed once when you initially launch that instance. So after the launch stage, user data is only really useful for passing data in, and there are better ways of doing that. So keep in mind for the exam — user data is generally used the once for the post-launch configuration of an instance — it's only executed the one initial time.
Now, one of the question types that you'll often face in the exam relates to how quickly you can bring an instance into service. There's actually a metric — boot time to service time — how quickly after you launch an instance is it ready for service, ready to accept connections from your customers, and this includes the time that AWS requires to provision the EC2 instance and the time taken for any software updates, installations or configurations to take place within the operating system. For an AWS-provided AMI, that time can be measured in minutes — from launch time to service time it's generally only minutes. But what if you need to do some extra configuration, maybe install an application?
Remember when you manually installed WordPress after launching an instance — this is known as post-launch time, the time required after launch for you to perform manual configuration or automatic configuration before the instance is ready for service. If you do it manually, this can be a few minutes or even as long as a few hours for things which are significantly more complex. Now, you can shorten this post-launch time in a few ways. The topic of this very lesson is bootstrapping, and bootstrapping as a process automates installations after the launch of an instance, and this reduces the amount of time taken to perform these steps, and you'll see that demoed in the next lesson.
Now, alternatively, you can also do the work in advance by AMI baking — with this method, you're front-loading the work, doing it in advance and creating an AMI with all of that work baked in. Now, this removes the post-launch time, but it means you can't be as flexible with the configuration because it has to be baked into the AMI. Now, the optimal way is to combine both of these processes — so AMI baking and bootstrapping — you'd use AMI baking for any part of the process which is time intensive. So, if you have an application installation process which is 90% installation and 10% configuration, you can AMI bake in the 90% part and then bootstrap the final configuration. That way, you reduce the post-launch time and thus the boot time to service time, but you also get to use bootstrapping which gives you much more configurability.
And I'll be demonstrating this architecture later in the course when I cover scaling and high availability, but I wanted to introduce these concepts now so you can keep mulling them over and understand them when I mention them. But now, it's time to finish up this lesson — and this has been the theory component of EC2 bootstrapping. In the next lesson, which is a demo, you're going to have the chance to use the EC2 user data feature.
Remember earlier in the course where we built an AMI together, we installed WordPress to the point when it was ready to install, and we massively improved the login banner of the EC2 instance to be something more animal related with Cowsay? In the next demo lesson, you're going to implement the same thing — but you're going to be using user data. You'll see how much quicker this process is to do, though, when you had to manually launch the instance and run each command one by one. It's going to be a good, valuable demo — I can't wait to get started, so go ahead, finish this video, and when you're ready, you can join me for some practical time.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I want to talk about a really important feature of EC2 called Instance Metadata. It's a very simple architecture, but it's one that's used in many of EC2's more powerful features, so it's essential that you understand its architecture fully. It features in nearly all of the AWS exams, and you will use it often if you design and implement AWS solutions in the real world, so let's jump in and get started.
The EC2 Instance Metadata is a service that EC2 provides to instances. It's data about the instance that can be used to configure or manage a running instance. It's a way the instance or anything running inside the instance can access information about the environment that it wouldn't be able to access otherwise, and it's accessible inside all instances using the same access method. The IP address to access the instance metadata is 169.254.169.254. Remember that IP, it comes up all the time in exams, so make sure it sticks. I'll repeat it as often as I can throughout the course, but it's unusual enough that it tends to stick pretty well.
Now, the way that I've remembered the IP address from when I started with AWS is just to keep repeating it. Repetition always helps, and I remember this one as a little bit of a rhyme: 169.254 repeated. And if you just keep repeating that over and over again, then the IP address will stick. So 169.254 repeated equals 169.254.169.254. And then for the next part of the URL, I always want the latest meta-data. If you remember 169.254 repeated and you always want the latest meta-data, it will tend to stick in your mind, at least it did for me.
Now, I've seen horrible exam questions which make you actually select the exact URL for this metadata, so this is one of those annoying facts that I just need you to memorize. I promise you it will help you with exam questions in the exam, so try to memorize the IP and latest meta-data. If you remember both of those, keep repeating them. Get annoying over and over again. Write them on flashcards. It will help you in the exam.
Now, the metadata allows anything running on the instance to query it for information about that instance, and that information is divided into categories. For example, host name, events, security groups and much more. All information about the environment that the instance is in. The most common things which can be queried though are information on the networking, and I'll show you this in the demo part of this lesson. While the operating system of an instance can't see any of its IP version for public addressing, the instance meta-data can be used by applications running on that instance to get access to that information, and I'll show you that soon.
You can also gain access to authentication information. We haven't covered EC2 instance roles yet, but instances can be themselves given permissions to access AWS resources, and the meta-data is how applications on the instance can gain access to temporary credentials generated by assuming the role. The meta-data service is also used by AWS to pass in temporary SSH keys. So when you connect to an instance using EC2 instance connect, it's actually passing in an SSH key behind the scenes that's used to connect. The meta-data service is also used to grant access to user data, and this is a way that you can make the instance run scripts to perform automatic configuration steps when you launch an instance.
Now one really important fact for the exam, and I've seen questions come up on this one time and time again, the meta-data service has no authentication, and it's not encrypted. Anyone who can connect to an instance and gain access to the Linux command line shell can by default access the meta-data. You can restrict it with local firewall rules, so blocking access to the 169.254 repeated IP address, but that's extra per instance admin overhead. In general, you should treat the meta-data as something that can and does get exposed.
Okay, well that's the architecture, it's nice and simple, but this is one of the things inside AWS which is much easier to show you than to tell you about. So it's time for a demo, and we're going to perform a demo together which uses the instance meta-data of an EC2 instance. So let's switch over to the console and get started.
Now, if you do want to follow along with this in your own environment, then you'll need to apply some infrastructure. Before you do that, just make sure that you're logged in to the general AWS account, so the management account of the organization, and make sure as always that you have the Northern Virginia region selected. Now this lesson has a one-click deployment link attached to it, so go ahead and click that link. This will take you to the quick create stack screen. You should see that the stack name is called meta-data, just scroll down to the bottom, check this box and click on create stack. Now, this will automatically create all of the infrastructure which we'll be using, so you'll need to wait for this stack to move into a create complete state.
We're also going to be using some commands within this demo lesson, and also attached to this lesson is a lesson commands document which includes all of the commands that you'll be using. So this will help you avoid errors. You can either type these out manually or copy and paste them as I do them in the demo. So at this point go ahead and open that link as well. It should look something like this. There's not that many commands that we'll be using, but they are relatively long, and so by using this document we can avoid any typos.
Now just refresh this stack. Again, it will need to be in a create complete state, so go ahead and pause this video, wait for the stack to move into create complete and then we good to continue. Okay, so now the stack's moved into a create complete state, and if you just go ahead and click on resources, you can see that it's created a selection of resources. Now the one that we're concerned with is public EC2 which is an EC2 instance running in a public subnet with public IP addressing. So we're going to go ahead and interact with this instance. So click on services and then go ahead and move to the EC2 console. You can either select it in all services, recently visited if you've used this service before or you can type EC2 into the search box and then open it in a new tab. Once you're at the EC2 console go ahead and click on instances running, and you should see this single EC2 instance. Go ahead and select it, and I just want to draw your attention to a number of key pieces of information which I want you to note down.
So first you'll be able to see that the instance has a private IP version 4 address. Yours may well be different if you're doing this within your own environment. You'll also see that the instance has a public IP version 4 address, and again if you're doing this in your environment, yours will be different. Now, if you click on networking, you'll be able to see additional networking information including the IP version 6 address that's allocated to this instance. Now the IP version 6 address is always public, and so there's no concept of public and private IP version 6 addresses, but you'll be able to see that address under the networking tab.
Now just to make this easier, just go ahead and note down the IP version 6 address as well as the public IP version 4 DNS which is listed as well as the public IP version 4 address which is listed at the top and then the private IP version 4 address. And once you've got all these noted down, we're going to go ahead and connect to this instance. So right click select connect, we're going to use EC2 instance connect, so make sure that the username is EC2 hyphen user and then connect to this instance. Now once we connected straight away, we'll be able to see how even the prompt of the instance makes visible the private IP version 4 address of this EC2 instance and if we run the Linux command of if config and press enter we'll get an output of the network interfaces within this EC2 instance.
Now we'll be able to see the private IP version 4 address listed within the configuration of this network interface inside the EC2 instance, and if you're performing this in your own environment, notice how it's exactly the same as the private IP version 4 address that you just noted down which was visible inside the console UI. So in my case you'll be able to see these two IP addresses match perfectly. So this IP address that's visible in the console UI is the same as this private IP address configured on the network interface inside the instance. The same is true of the IP version 6 IP address. This is also visible inside the operating system on the network configuration for this network interface and again that's the same IP version 6 address which is visible on the networking tab inside the console UI. So that's the same as this address. What isn't visible inside the instance operating system on the networking configuration is the public IP version 4 address.
It's critical to know that at no point ever during the life cycle of an EC2 instance is a public IP version 4 address configured within the operating system. The operating system has no exposure to the public IP version 4 address. That is performed by the internet gateway. The internet gateway translates the private address into a public address. So while IP version 6 is configured inside the operating system, IP version 4 public addresses are not. The only IP version 4 addresses that an instance has are the private IP addresses and that's critical to understand.
Now as I talked about in the theory component of this lesson, the EC2 metadata service is a service which runs behind all of the EC2 instances within your account and it's accessible using the metadata IP address. Now we can access this by using the curl utility. Now curl is installed on the EC2 instance that we're using for this demo. Now we're going to query the metadata service for one particular attribute and that attribute is the public IP version 4 address of this instance. So because the instance operating system has no knowledge of the public IP address, we can use the metadata service to provide any scripts or applications running on this instance with visibility of this public IP version 4 address, and we do that using this command.
So this uses curl to query the metadata service which is 169.254.169.254. I refer to this as 169.254 repeating. So it queries this IP address and then forward slash latest forward slash meta hyphen data and this is the metadata URL, this entire part, the IP address, then latest, then meta hyphen data and then at the end we specify the attribute which we want to retrieve which is public hyphen IPv4 and if we press enter then curl is going to contact the metadata service and retrieve the public IP version 4 address of this EC2 instance. So in my case this is the IP address and if I go back to the console this matches the address that's visible within the console UI.
So if I just clear the screen to make it easier to see, we can also use the same command structure again but this time query for the public host name of this EC2 instance. We use the same URL so IP address and path, but this time we query for public hyphen host name and this will give us the IPv4 public DNS of this EC2 instance. So again I'm going to clear the screen to make it easier to see.
Now we can make this process even easier. We can use the AWS instance metadata query tool and to download it we use this command so enter it and press enter. This is just downloaded the tool directly, so if we do a listing to list the current folder, we can see the EC2 hyphen metadata tool because this is Linux we need to make it so that this tool is executable. We do that with the chmod command so enter that and press enter and then we can run the EC2 hyphen metadata tool and we can use double hyphen help to display help for this product. So this shows all the different information that we can use this tool to query for and this just makes it easier to query the metadata service especially if the query is being performed by users running interactively on that EC2 instance.
So for example, we could run EC2 hyphen metadata space hyphen a to show the AMI ID that's used to launch this instance, and in this case it's the AMI for Amazon Linux 2 inside the US hyphen east hyphen one region at least at the time of creating this demo video. If we need to show the availability zone that this instance is in, we could use EC2 hyphen metadata space hyphen Z, in this case the instance is in US hyphen east hyphen one a and we can even use EC2 hyphen metadata space hyphen s to show any security groups which were launched with this instance. Now you can carry on exploring this tool if you want, there are plenty of other pieces of information which are accessible using the metadata tool. I just wanted to give you a brief introduction, show you how to download it, how to make it executable and how to run some of the basic options.
Now at this point that's everything I wanted to cover in this brief demo component of this lesson. I wanted to give you some exposure to how you can interact with the metadata service which I covered from a theory perspective earlier in this lesson. Now at this point we need to clear up all of the infrastructure that we've used for this demo component so close down this tab, go back to the AWS console, move to cloud formation, select the metadata stack, select delete and then confirm it and that will clear up all of the infrastructure that we've used and return the account into the same state as it was at the start of this demo component of this video.
Now at that point that's everything I wanted to cover. You've learned about the theory of the metadata service as well as experienced how to interact with it from a practical perspective. So go ahead and complete this video and when you're ready I'll look forward to you joining me.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I want to cover a little bit more theory, something which you'll need to understand from now on in the course because the topics that we'll discuss and the examples that we'll use to fully understand those topics will become ever more complex. Horizontal and vertical scaling are two different ways that a system can scale to handle increasing or, in some cases, decreasing load placed on that system, so let's quickly step through the difference and look at some of the pros, cons, and requirements of each. Scaling is what happens when systems need to grow or shrink in response to increases or decreases of load placed upon them by your customers. From a technical perspective, you're adding or removing resources to a system. A system can in some cases be a single compute device such as an EC2 instance, but in some cases could be hundreds, thousands, tens of thousands, or even hundreds of thousands, or more of individual devices.
Vertical scaling is one way of achieving this increase in capacity, so this increase of resource allocation. The way it works is simple. Let's say, for example, we have an application and it's running on an EC2 instance, and let's say that it's a T3.large, which provides two virtual CPUs and eight GiB of memory. The instance will service a certain level of incoming load from our customers, but at some point, assuming the load keeps increasing, the size of this instance will be unable to cope, and the experience for our customers will begin to decrease. Customers might experience delays, unreliability, or even outright system crashes. So the commonly understood solution is to use a bigger server. In the virtual world of EC2, this means resizing the EC2 instance. We have lots of sizes to choose from. We might pick a T3.extralarge, which doubles the virtual CPU and memory, or if the rate of increase is significant, we could go even further and pick another size up, a T3.2 XL, which doubles that again to eight virtual CPUs and 32 GiB of memory.
Let's talk about a few of the finer points though of vertical scaling. When you're actually performing vertical scaling with EC2, you're resizing an EC2 instance, and because of that, there's downtime, often a restart during the resize process, which can potentially cause customer disruption. But it goes beyond this because of this disruption, it means that you can only scale generally during pre-agreed times, so within outage windows. If incoming load on a system changes rapidly, then this restriction of only being able to scale during outage windows limits how quickly you can react, how quickly you can respond to these changes by scaling up or down. Now as load increases on a system, you can scale up, but larger instances often carry a price premium. So the increasing cost going larger and larger is often not linear towards the top end. And because you're scaling individual instances, there's always going to be an upper cap on performance. And this cap is the maximum instance size. While AWS are always improving EC2, there will always be a maximum possible instance size, and so with vertical scaling, this will always be the cap on the scaling of an individual compute resource.
Now there are benefits of vertical scaling. It's really simple and it doesn't need any application modification. If an application can run on an instance, then it can run on a bigger instance. Vertical scaling works for all applications, even monolithic ones, where the whole code base is one single application because it all runs on one instance and that one instance can increase in size. Horizontal scaling is designed to address some of the issues with vertical scaling, so let's have a look at that next. Horizontal scaling is still designed to cope with changes to incoming load on a system, but instead of increasing the size of an individual instance, horizontal scaling just adds more instances. The original one instance turns into two as load increases, maybe two more added. Eventually, maybe eight instances are required. As the load increases on a system, horizontal scaling just adds additional capacity.
The key thing to understand with horizontal scaling is that with this architecture, instead of one running copy of your application, you might have two or 10 or hundreds of copies, each of them running on smaller compute instances. This means they all need to work together, all need to take their share of incoming load placed on the system by customers, and this generally means some form of load balancer. A load balancer is an appliance which sits between your servers, in this case instances, and your customers. When customers attempt to access the system, all that incoming load is distributed across all of the instances running your application. Each instance gets a fair amount of the load and for a given customer, every mouse click, every interaction with the application, could be on the same instance or randomized across any of the available instances.
Horizontal scaling is great, but there are a few really important things that you need to be aware of as a solutions architect. When you think about horizontal scaling, sessions are everything. When you log into an application, think about YouTube, about Netflix, about your email. The state of your interaction with that application is called a session. You're using this training site right now and if I deleted your session right at this moment, then the next time you interacted with the site, you would be logged out. You might lose the position of the video that you're currently watching. On amazon.com or your home grocery shopping site, the session stores what items are in your cart. With a single application running on a single server, the sessions of all customers are generally stored on that server. With horizontal scaling, this won't work. If you're shopping on your home grocery site and you add some cat cookies to your cart, this might be using instance one. When you add your weekly selection of donuts, you might be using instance 10. Without changes, every time you moved between instances for a horizontally scaled application, you would have a different session or no session. You would be logged out, the application, put simply, would be unusable. With horizontal scaling, you can be shifting between instances constantly. That's one of the benefits. It evens out the load. And so horizontal scaling needs either application support or what's known as off-host sessions.
If you use off-host sessions, then your session data is stored in another place, an external database. And this means that the servers are what's called stateless. They're just dumb instances of your application. The application doesn't care which instance you connect to because your session is externally hosted somewhere else. That's really the key consideration with horizontal scaling. It requires thought and design so that your application supports it. But if it does support it, then you get all of the benefits. The first one of those benefits is that you have no disruption while you're scaling because all you're doing is just adding instances. The existing ones aren't being impacted. So customer connections remain unaffected. Even if you're scaling in, so removing instances because sessions should be off-host, so externally hosted, connections can be moved between instances, leaving customers unaffected. So that's a really powerful feature of having externally hosted sessions together with horizontal scaling. It means all of the individual instances are just dumb instances. It doesn't matter to which instance a particular customer connects to at a particular time because the sessions are hosted externally. They'll always have access to their particular state in the application.
And there's no real limits to horizontal scaling because you're using lots of smaller, more common instances. You can just keep adding them. There isn't the single instance size cap which vertical scaling suffers from. Horizontal scaling is also often less expensive. You're using smaller commodity instances, not the larger ones which carry a premium. So it can be significantly cheaper to operate a platform using horizontal scaling. And finally, it can allow you to be more granular in how you scale. With vertical scaling, if you have a large instance and go to an extra-large, which is one step above it, you're pretty much doubling the amount of resources allocated to that system. With horizontal scaling, if you're currently using five small instances and you add one more, then you're scaling by around 20%. The smaller instances that you use, the better granularity that you have with horizontal scaling.
Now, there's a lot more to this. Later in the course, in the high availability and scaling section, I'll introduce elasticity and how we can use horizontal scaling as a component of highly available and fault-tolerant designs. But for now, I'll leave you with a visual exam power-up. Visuals often make things easier to understand, and they help especially with memory recall. So when it comes to remembering the different types of scaling methods, picture this, two types of scaling. First, horizontal scaling, and this adds and removes things. So if we're scaling Bob, one of our regular course guest stars, then scaling Bob in a horizontal way would mean moving to two Bob, which is scary enough. But if the load required it, we might even have to move to four Bob. And if we needed huge amounts of capacity, if four Bob wasn't enough, if you needed more and more and more, and even more Bob, then horizontal scaling has you covered. There isn't really a limit. We can scale Bob infinitely. In this case, we can have so many Bobs. We can scale Bob up to a near-infinite level as long as we're using horizontal scaling. Scaling Bob in a vertical way, that starts off with a small Bob, then moves to a medium Bob, and if we really need more Bob, then we can scale to a large Bob. In the exam, when you're struggling to remember the difference between horizontal scaling and vertical scaling, picture this image. I guarantee with this, you will not forget it. But at this point, that's all of the theory that I wanted to cover. Go ahead, complete the video, and when you're ready, you can join me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. This lesson will be a pretty brief one as we're going to be covering instant status checks and EC2 auto recovery. They're both pretty simple features, but let's quickly step through exactly what both of them do and what capabilities they offer because they're great things to understand. Every instance within EC2 has two high-level per instance status checks. When you initially launch an instance, you might see these listed as initializing, and then you might see only one of two passing, but eventually, all instances should move into the two out of two passed checks, which indicate that all is well within the instance. If not, you have a problem. Each of the two checks represents a separate set of tests, and so a failure of either of them suggests a different set of underlying problems.
The first status check is the system status, and the second is the instance status. A failure of the system status check could indicate one of a few major problems, such as the loss of system power, loss of network connectivity, or software or hardware issues with the EC2 host. So, this check is focused on issues impacting the EC2 service or the EC2 host. The second check focuses on instances, so a failure of this one could indicate things like a corrupt file system or incorrect networking on the instance itself. For example, maybe you've statically set a public IPv4 address on the internal interface of the operating system, which you now know won't work ever, or maybe the instance is having operating system kernel issues preventing it from correctly starting up.
Assuming that you haven't just launched an instance, anything but two out of two checks represents an issue that needs to be resolved. One way to handle it is manually, such as manually stopping and starting an instance, restarting it, or terminating and recreating an instance. There are manual activities that you could perform, but EC2 comes with a feature allowing you to recover automatically from any system check issues. You can ask EC2 to stop a failed instance, reboot it, terminate it, or you can ask EC2 to perform auto recovery. Auto recovery moves the instance to a new host, starts it up with exactly the same configuration as before, so all IP addressing is maintained, and if software on the instance is set to auto start, this process could mean that the instance, as the name suggests, automatically recovers fully from any failed status check issues.
Now, it's far easier to show you exactly how this works, so let me quickly switch over to my console. I'm currently logged in to the general AWS account, the management account of the organization. I'm using the IAM admin user, and I've currently got the Northern Virginia region selected. Now, the demo part of this lesson is going to be really brief, and so it's probably not worth you following along in your own environment. If you do want to, though, there is a one-click deployment attached to this lesson, so if you're following along, click that link, scroll to the bottom, check the acknowledgement box, and click create stack. It'll need to be in a create complete state before you continue, so if you are following along, pause the video, wait for your stack to move into create complete, and then you're good to continue.
So now we're in a create complete state, let's go ahead and move to the EC2 console. This one-click deployment link has created a single instance, so that should already be in place. Go ahead and select it and then click on the status checks tab. So, if we select the status checks tab for this particular instance, you'll see the system status checks and the instance status checks, and for both of these, my EC2 instance has passed, so the system reliability check has passed and the instance reachability check has passed, so that's good.
Now, if we wanted to create a process that will be capable of auto recovery on this instance, to do that, we'd click on the actions dropdown and then create status check alarm. Go ahead and do that, and that will open this dialogue. So, this is an alarm inside CloudWatch that will alarm if this instance fails any of these status checks. What we can do as a default is to have it send a notification to an SNS topic, and the conditions mean this notification will be sent whenever any status checks fail, so either of the two, for at least one consecutive period of five minutes. So, it will need to fail either of these status checks for one period in five minutes, and then this alarm will be triggered.
Now, we can also select this box, which means that action will be taken in addition to a notification being sent, and we can have it reboot the instance, which is just the equivalent of an operating system reboot. We can have it stop the instance, which is useful if we want to perform any diagnostics. We can have it terminate the instance. Now, this is useful if you have any sort of high availability configured, which I'll be demonstrating later in the course because what this means is if you terminate an instance, you can configure EC2 to automatically reprovision a brand new instance in its place. If we do this in isolation on a single instance, it will simply terminate the instance, and it won't be replaced, but what I want to focus on is this option, which is recover this instance. This uses the auto recovery feature of EC2. This feature will attempt to recover this instance; it will take a set of actions. It could be a simple restart, or it could be to migrate the instance to a whole new EC2 host, but importantly, it would need to be in the same availability zone. Remember, EC2 is an AZ-based service, and so logically this won't protect you against an entire AZ failure. It will only take action for an isolated failure, either the host or the instance.
Now, this feature does rely on having spare EC2 host capacity, so in the case of a major failure in multiple availability zones in a region, there is a potential that this won't work if there is not spare capacity. You also need to be using modern types of instances, such as A1, C4, C5, M4, M5, R3, R4, and R5. I'll make sure I include a link in the lesson text that gives a full overview of all of the supported types, and also, this feature won't work if you're using instance store volumes, so it'll only work on instances that solely have EBS volumes attached.
It's a simple way that EC2 adds some automation, which can attempt to recover an instance and avoid waking up a sysadmin. It's not designed to automatically recover against large scale or complex system issues, so do keep that in mind. It's a very simple feature, which answers a very narrow set of error-based scenarios. It's not something that's going to fix every problem in EC2. There are other ways of doing that, which we'll talk about later in the course.
Now, we are at the end of everything that I wanted to cover in this demo lesson, but we're not going to clear up the infrastructure that we used because I'll be using it in the following demo lesson to demonstrate EC2 termination protection. So, if you are following along with these in your own environment, then don't delete the one-click deployment that you used at the start of this lesson because we'll be using it in the following demo lesson. So that's it though, that's everything that I wanted to cover in this lesson. I did promise it would be brief. If you go ahead and complete the video, and when you're ready, I'll see you in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In the previous lesson, I talked about EC2 launch types, and one of those types was reserved. Historically, there was only one type of reserved purchase option, but over time, AWS has added newer, more flexible options, and so reserved has become known as standard reserved. As I talked about in the previous lesson, these are great for known long-term consistent usage. If you need access to the cheapest EC2 running 24/7/365 every day for one or three years, then you would pick standard reserved, but you do have a number of other, more flexible options.
Scheduled reserved instances are pretty situational, but when faced with those situations, they offer some great advantages. They're great for when you have long-term requirements, but that requirement isn't something which needs to run constantly. Let's say you have some batch processing that needs to run daily for five hours. You know this usage is required every day, so it is long-term, and it's known usage, so you're comfortable with the idea of locking in this commitment, but a standard reserved instance won't work because it's not running 24/7/365. A scheduled reserved instance is a commitment; you specify the frequency, the duration, and the time. In this case, 2300 hours daily for five hours. You reserve the capacity and get that capacity for a slightly cheaper rate versus on-demand, but you can only use that capacity during that time window.
Other types of situations might be weekly data, so sales analysis which runs every Friday for a 24-hour period, or you might have a larger analysis process which needs 100 hours of EC2 capacity per month, and so you purchase a scheduled reservation to cover that requirement. You reserve capacity and get it slightly cheaper. There are some restrictions; it doesn't support all instance types or regions, and you need to purchase a minimum of 1200 hours per year. While you are reserving partial time blocks, the commitment that you make to AWS is at least one year, so the minimum period that you can buy for scheduled reserved is one year. This is ideal for long-term known consistent usage, but where you don't need it for the full time period. If you only need access to EC2 for specific hours in a day, specific days of the week, or blocks of time every month, then you should look at scheduled reserved instances.
Now let's move on because I want to discuss capacity reservations. I mentioned earlier in the course that certain events such as major failures can result in a situation where there isn't enough capacity in a region or an availability zone. In that situation, there's an order to things, a priority order which AWS uses to deliver EC2 capacity. First, AWS delivers on any commitments in terms of reserved purchases, and then once they've been delivered, they can satisfy any on-demand request, so this is priority number two, and then after both of those have been delivered, any leftover capacity can be used via the spot purchase option.
So capacity reservations can be useful when you have a requirement for some compute which can't tolerate interruption. If it's a business-critical process which you need to guarantee that you can launch at a time that you need that compute requirement, then you definitely need to reserve the capacity. Now, capacity reservation is different from reserved instance purchase, so there are two different components. There's a billing component and a capacity component, and both of these can be used in combination or individually. There are situations where you might need to reserve some capacity, but you can't justify a long-term commitment to AWS in the form of a reserved instance purchase.
So let's step through a couple of the different options that we have available. To illustrate this, we'll start with an AWS region and two availability zones, AZA and AZB. Now, when it comes to instance reservation and capacity, we have a few options. First, we could purchase a reservation but make it a regional one, and this means that we can launch instances into either AZ in that region, and they would benefit from the reservation in terms of billing. So by purchasing a regional reservation, you get billing discounts for any valid instances launched into any availability zone in that region. So while region reservations are flexible, they don't reserve capacity in any specific availability zone, and so when you're launching instances, even if you have a regional reservation, you're launching them with the same priority as on-demand instances.
Now, with reservations, you can be more specific and pick a zonal reservation. Zonal reservations give you the same billing discount that is delivered using regional reservations, but they also reserve capacity in a specific availability zone. But they only apply to that one specific availability zone, meaning if you launch instances into another availability zone in that region, you get neither benefit. You pay the full price and don't get capacity reservations. Whether you pick regional or zonal reservations, you still need to commit for either a one or three-year term to AWS, and there are just some situations where you're not able to do that. If the usage isn't known, or if you're not sure about how long the usage will be required, then often you can't commit to AWS for a one or three-year reserved term purchase, but you still need to reserve the capacity.
So there is another option. You can choose to use on-demand capacity reservations. With on-demand capacity reservation, you're booking capacity in a specific availability zone, and you always pay for that capacity regardless of if you consume it. So in the example that's on screen now, we're booking capacity within AZB for two EC2 instances, and we bill for that capacity whether we consume it or not. So right now with the example as on screen, we bill for two EC2 instances. What we can do is launch an instance into that capacity, and now we still bill for two, but we're using one. So if we don't consume any of that capacity in this case, if we only have the one EC2 instance, we're wasting billing for an entire EC2 instance. So with capacity reservations, you still do need to do a planning exercise and plan exactly what capacity you require, because if you do book the capacity and don't use it, you will still incur the charge.
Now, capacity reservations don't have the same one or three-year commitment requirements that you need for reserved instances. You're not getting any billing benefit when using capacity reservations. You're just, as the name suggests, reserving the capacity. So at any point, you can book a capacity reservation if you know you need some EC2 capacity without worrying about the one or three-year term commitments, but you don't benefit from any cost reduction. So if you're using capacity reservations for something that's consistent, you should look at a certain point to evaluate whether reserved instances are going to be more economical.
Now one last thing that I want to talk about before I finish this lesson is a feature called a savings plan. And you can think of a savings plan as kind of like a reserved instance purchase, but instead of focusing on a particular type of instance in an availability zone or a region, you're making a one or three-year commitment to AWS in terms of hourly spend. So you might make a commitment to AWS that you're going to spend $20 per hour for one or three years, and in exchange for doing that, you get a reduction on the amount that you're paying for resources.
Now savings plans come in two main types. You can make a reservation for general compute dollar amounts, and if you elect to create a general compute savings plan, then you can save up to 66% versus the normal on-demand price of various different compute services. Or you can choose an EC2 savings plan which has to be used for EC2, but offers better savings up to 72% versus on-demand. Now, a general compute savings plan is valid for various different compute services, currently EC2, Fargate, and Lambda. Now the way that this works is products have their normal on-demand rate, so this is true for EC2, Fargate, and Lambda, but those products also have a savings plan rate. And the way that this works is when you're spending money on an hourly basis, if you have a savings plan, you get the savings plan rate up to the amount that you commit. So if you've made a commitment of $20 per hour and you consume EC2, Fargate, and Lambda, you'll get access to all three of those services at the savings plan rate until you've consumed that $20 per hour commitment. And after that, you start using the normal on-demand rate.
So a savings plan is an agreement between you and AWS where you commit to a minimum spend, and in return, AWS gives you cheaper access to any of the applicable resources. If you go above your savings plan, then you begin to consume the normal on-demand rate. And over time, generally, you'd continually evaluate your resource usage within the account and adjust your savings plan usage as appropriate. Now, out of all the compute services available in AWS, if you only consume EC2, then you will get better savings by looking at an EC2 savings plan. But if you're the type of organization that's evaluating how you can use emerging architectures such as containerization or serverless, then you can pick a general savings plan, commit to a certain hourly spend, and then utilize that over the full range of supported AWS compute services.
And as a real-world hint and for the exam, this could allow an organization that's migrating away from EC2-based compute towards these emerging architectures to get cost-effective access to resources. So they'd use a general compute savings plan providing access to EC2, Fargate, and Lambda, and over time migrate away from EC2 towards Fargate and then over the long term, potentially from Fargate through to Lambda and fully serverless architectures. Now, for the exam, you only need to be aware that savings plans exist and exactly how they work, but in the real world, you should definitely do a bit of extra reading around savings plans because they're a really powerful feature that can help you achieve significant cost savings. With that being said, though, that's everything I wanted to cover in this theory lesson. So go ahead, complete the lesson, and then when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. This is part two of this lesson, and we're going to continue immediately from the end of part one, so let's get started.
Okay, so next let's look at reserved instances, which are really important and form a part of most larger deployments within AWS. Where on demand is generally used for unknown or short-term usage which can't tolerate interruption, reserved is for long-term consistent usage of EC2.
Now, reservations are simple enough to understand. They're a commitment made to AWS for long-term consumption of EC2 resources. So, this is a normal instance, a T3 instance, and if you utilize this instance, you'll build the normal per second rate because you don't have any reservations purchased which apply to this instance. Now, if this instance is something that you know that you need long term, if it's a core part of your system, then one option is to purchase a reservation, and I'll talk about what this exactly means in a second. But the effect of a reservation would be to reduce the per second cost or remove it entirely depending on what type of reservation you purchase. As long as the reservation matches the instance, it would apply to that instance, either reducing or removing the per second price for that instance.
Now, you need to be sure that you plan reservations appropriately because it's possible to purchase them and not use them. In this case, if you have an unused reservation, you still pay for that reservation, but the benefit is wasted. Reservations can be purchased for a particular type of instance and locked to an availability zone specifically or to a region. Now, if you lock a reservation to an availability zone, it means that you can only benefit when launching instances into that availability zone, but it also reserves capacity, which I'll talk about in another lesson. If you purchase reservations for a region, it doesn't reserve capacity but it can benefit any instances which are launched into any availability zone in that region.
Now, it's also possible that reservations can have a partial effect. So, in the event that you have a reservation for, say, a T3.large instance and you provision a T3 instance which is larger than this, it would have a partial effect, so you'd get a discount of a partial component of that larger instance. Reservations, at a high level, are where you commit to AWS that you will use resources for a length of time. In return for that commitment, you get those resources cheaper. The key thing to understand is that once you've committed, you pay whether you use those resources or not. And so, use them wisely for parts of your infrastructure which are always there and never change.
Now, there are some choices in the way that you pay for reservations. First, the term: you can commit to AWS either for one year or for three years. The discounts that you receive if you commit for three years are greater than those for one year, but so is the risk because over three years, there's more chance that your requirements will change, and so you have to be really considered when picking between these different term lengths. Now, there are also different payment structures. The first is no upfront, and with this method, you agree to a one or three year term and simply pay a reduced per second fee, and you pay this whether the instance is running or not. It's nice and clean; it doesn't impact cash flow, but it also offers the least discount of the three options that I'm showing in this lesson.
You've also got the ability to pay all upfront, and this means the whole cost of the one or three year term in advance when you purchase the reservation. If you do this, there's no per second fee for the instance, and this method offers the greatest discount. So, if you decide to purchase a three year reservation and do so using all upfront, this offers the greatest discount of all the reserved options within AWS. Now, there's also a middle ground, partial upfront, where you pay a smaller lump sum in advance in exchange for a lower per second cost. So, this is a good middle ground. You have lower per second costs than no upfront and less upfront costs than all upfront. So, it's an excellent compromise if you want good cost reductions, but don't want to commit for cash flow reasons to paying for everything in advance.
So that's reserved. You purchase a commitment to use AWS resources, and reserved instances are ideal for components of your infrastructure which have known usage, require consistent access to compute, and require this on a long-term basis. So, you'd use this for any components of your infrastructure that you require the lowest cost, require consistent usage, and can't tolerate any interruption.
Now, there are some other elements to reservations that I want to talk about, such as capacity reservations, conversion, and scheduled reservations, but I'll be doing that in a dedicated lesson. At this point, let's move on.
Next, I want to talk about dedicated hosts. As the name suggests, a dedicated host is an EC2 host, which is allocated to you in its entirety. So, you pay for the host itself, which is designed for a specific family of instances, for example, A, C, R, and so on. Now, these hosts come with all of the resources that you'd expect from a physical machine, a physical EC2 host. So, a number of cores and CPUs, as well as memory, local storage, and network connectivity. Now, the key thing here is that you pay for the host. Any instances which run on that host have no per second charge, and you can launch various different sizes of instances on the host, consuming all the way up to the complete resource capacity of that host.
Now, logically, you need to manage this capacity. If the dedicated hosts run out of capacity, then you can't launch any additional instances onto those hosts. Now, the reason that you would use dedicated hosts normally is that you might have software which is licensed based on sockets or cores in a physical machine. This type of licensing does still exist. While it seems crazy that doesn't change the fact that for certain applications, you're licensing it based on the amount of resources in a physical machine, not the resources that are allocated to a virtual machine or an instance within AWS. Dedicated hosts also have a feature called host affinity, linking instances to certain EC2 hosts. So, if you stop and start the instance, it remains on the same host, and this too can have licensing implications.
Now, you also gain another benefit that only your instances will ever run on dedicated hosts, but normally in real situations and in the exam, the reason to use this purchase option is for the socket and core licensing requirements.
Now, one last thing that I want to talk about in this lesson before we move on, and that's dedicated instances, and I want to use this as an opportunity to summarize a couple of aspects of these different purchase options. Visually, this is how EC2 hosts look using the various models. On the left is the default or shared model, which on demand and reserved users. So, EC2 hosts are shared, so you will have some instances, other customers of AWS will have some instances, and in addition, there's also likely to be some unused capacity. So, with this model, there's no real exposure to EC2 hosts, your build per second, that's obviously depending on whether you use reservations, but there's no capacity to manage, but you share EC2 hosts with other customers. In 99% of cases, that's okay, but there are some situations when it's not.
In situations when it's not, you can also choose to use dedicated hosts. Now I've just talked about them, you pay for the host, and so only your instances run on these hosts. Any unused capacity is wasted, and you have to manage the capacity, both in terms of the underutilization, but you also need to be aware that because they're physical hosts, they have a physical capacity, and so there's going to be a maximum number of instances that you can launch on these dedicated hosts. So, keep that in mind because you have to monitor both resource underconsumption, as well as the maximum capacity of the EC2 hosts.
Now, the other option which I haven't discussed yet is dedicated instances, and this is a middle ground. With dedicated instances, your instances run on an EC2 host with other instances of yours, and no other customers use the same hardware. Now crucially, you don't pay for the host, nor do you share the host, you have the host all to yourself. So, you launch instances, they're allocated to a host, and AWS commit to not using any other instances from other customers on that same hardware. Now, there are some extra fees that you need to be aware of with this purchase option. First, you pay a one-off hourly fee for any regions where you're using dedicated instances, and this is regardless of how many you're utilizing, and then there's a fee for the dedicated instances themselves.
Now, dedicated instances are common in sectors of the industry where you have really strict requirements, which mean that you can't share infrastructure. So, you can use this method to benefit from the features of EC2, safe in the knowledge that you won't be sharing physical underlying hardware with other AWS customers.
So, the default or shared model is used for on-demand, for spot, and for reserved instances. Dedicated hosts offer a method where you can pay for the entire host, so you pay a charge for the host, you don't incur any charges for any instances which are launched onto that host, but you have to manage capacity. Now, dedicated hosts are generally used when you have strict licensing requirements, and then a middle ground is dedicated instances where you have requirements not to share hardware, but you don't want to manage the host itself. So, with dedicated instances, you can pay a cost premium and always guarantee that you will not share underlying hardware with any other AWS customers.
So, these are all of the different purchase options that you'll need to be aware of for the exam. For the exam, you should focus on on-demand, reserved, and spot. So, make sure that you've watched the earlier parts of this lesson really carefully, and you understand the different types of use cases where you would use spot, on-demand, and reserved. With that being said, that's all of the theory that I wanted to cover in this lesson. So, go ahead, finish the lesson, and then when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover EC2 purchase options. EC2 purchase options are often referred to as launch types, but the official way to refer to them from AWS is purchase options, and so to be consistent, I think it's worth focusing on that name. So, EC2 purchase options. Let's step through all of the main types with a focus on the situations where you would and wouldn't use each of them. So, let's jump in and get started.
The first purchase option that I want to talk about is the default, which is on demand, and on demand is simple to explain because it's entirely unremarkable in every way. It's the default because it's the average of anything with no specific pros or cons. Now, the way that it works, let's start with two EC2 hosts. Obviously, AWS has more, but it's easy to diagram with just the two. Now, instances of different sizes when launched using on demand will run on these EC2 hosts, and different AWS customers, they're all mixed up on the shared pool of EC2 hosts. So, even though instances are isolated and protected, different AWS customers launch instances which share the same pool of underlying hardware. This means that AWS can efficiently allocate resources, which is why the starting price for on demand in EC2 is so reasonable.
In terms of the price, on demand uses per second billing, and this happens while instances are running, so you're paying for the resources that you consume. If you shut an instance down logically, you don't pay for those resources. Other associated services such as storage, which do consume resources regardless of if the instance is running or in a shutdown state, do charge constantly while those resources are being consumed. So, remember this: while instances only charge while in the running state, other associated resources may charge regardless. This is how on demand works, but what types of situations should it be used for? Well, it's the default purchase option, and so you should always start your evaluation process by considering on demand as your default. For all projects, assume on demand and move to something else if you can justify that alternative purchase option.
With on demand, there are no interruptions. You launch an instance, you pay a per second charge, and barring any failures, the instance runs until you decide otherwise. You don't receive any capacity reservations with on demand. If AWS has a major failure and capacity is limited, the reserved purchase option receives highest provisioning priority on whatever capacity remains, and so if something is critical to your business, then you should consider an alternative rather than using on demand. So, on demand does not give you any priority access to remaining capacity if there are any major failures.
On demand offers predictable pricing, it's defined upfront, you pay a constant price, but there are no specific discounts. This consistent pricing applies to the duration that you use instances. So, on demand is suitable for short term workloads. Anything which you just need to provision, perform a workload and then terminate is ideal for on demand. If you're unsure about the duration or the type of workload, then again, on demand is ideal. And then lastly, if you have short term or unknown workloads, which definitely can't tolerate any interruption, then on demand is the perfect purchase option.
Next, let's talk about spot pricing, and spot is the cheapest way to get access to EC2 capacity. Let's look at how this works visually. Let's start with the same two EC2 hosts. On the left, we have A and on the right B. Then, on these EC2 hosts, we're currently running four EC2 instances, two per host. And let's assume for this example that all of these four instances are using the on demand purchase option. So, right now, with what you see on screen, the hosts are wasting capacity. Enough capacity for four additional instances on each host is being wasted. Spot pricing is AWS selling that spare capacity at a discounted rate.
The way that it works is that within each region for each type of instance, there is a given amount of free capacity on EC2 hosts at any time. AWS tracks this and it publishes a price for how much it costs to use that capacity, and this price is the spot price. Now, you can offer to pay more than the spot price, but this is a maximum. You'll only ever pay the current spot price for the type of instance in the specific region where you provision services. So, let's say that there are two different customers who want to provision four instances each. The first customer sets a maximum price of four gold coins, and the other customer sets a maximum price of two gold coins. Now, obviously, AWS doesn't charge in gold coins, and there are more than two EC2 hosts, but it's just easier to represent it in this way.
Now, because the current spot price set by AWS is only two gold coins, then both customers are only paying two gold coins a second for their instances. Even though customer one has offered to pay more, this is their maximum and they only ever pay the current spot price. So, let's say now that the free capacity is getting a little bit on the low side. AWS are getting nervous, they know that they need to free up capacity for the normal on demand instances, which they know are about to launch, and so they up the spot price to four gold coins. Now, customer one is fine because they've set a maximum price of four coins, and so now they start paying four coins because that's what the current spot price is. Customer two, they've set their maximum price at two coins, and so their instances are terminated.
If the spot price goes above your maximum price, then any spot instances which you have are terminated. That's the critical part to understand because spot instances should not be viewed as reliable. At this point in our example, maybe another customer decides to launch four on demand instances. AWS sell that capacity at the normal on demand rates, which are higher, and no capacity is wasted. Spot pricing offers up to a 90% reduction versus the price of on demand, and there are some significant trade offs that you need to be aware of.
You should never use the spot purchase option for workloads which can't tolerate interruptions. No matter how well you manage your maximum spot price, there are going to be periods when instances are terminated. If you run workloads where that's a problem, don't use spot. This means that workloads such as domain controllers, mail servers, traditional websites, or even flight control systems are all bad fits for spot instances. The types of scenarios which are good fits for using spot instances are things which are not time critical. Since the spot price changes throughout each day and throughout days of the week, if you're able to process workloads around this, then you can take advantage of the maximum cost benefits for using spot. Anything which can tolerate interruption and just rerun is ideal for spot instances.
So, if you have highly parallel workloads which can be broken into hundreds or thousands of pieces, maybe scientific analysis, and if any parts which fail can be rerun, then spot is ideal. Anything which has a bursty capacity need, maybe media processing, image processing, any cost sensitive workloads which wouldn't be economical to do using normal on-demand instances, assuming they can tolerate interruption, these are ideal for spot. Anything which is stateless where the state of the user session is not stored on the instances themselves, meaning they can handle disruption, again, ideal for using spot. Don't use spot for anything that's long-term, anything that requires consistent, reliable compute, any business critical things, or things which cannot tolerate disruption. For those type of workloads, you should not use spot. It's an anti-pattern.
OK, so this is the end of part one of this lesson. It was getting a little bit on the long side, and I wanted to give you the opportunity to take a small break, maybe stretch your legs or make a coffee. Now, part two will continue immediately from this point, so go ahead, complete this video, and when you're ready, I look forward to you joining me in part two.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I want to go into a little bit more depth on Amazon Machine Images (AMIs). Many people in the AWS community would have you think that there's some kind of argument or disagreement over how to pronounce AMI, with some people pronouncing it a slightly different way that I'm not going to repeat here. In my view, there's no argument. Those people are just misguided. AMIs are the images of EC2. They're one way that you can create a template of an instance configuration and then use that template to create many instances from that configuration. AMIs are actually used when you launch EC2 instances. You're launching those instances using AWS-provided AMIs, but you can create your own, and that's what I want to focus on in this lesson. So what that means, what happens when you create your own AMIs and how to do it effectively.
So just a few key points before I talk about the life cycles and the flow around AMIs. AMIs can be used to launch EC2 instances. I've mentioned that a second ago. They're actually used by the console UI. When you launch an EC2 instance, when you select to use Amazon Linux 2, you're actually using an Amazon Linux 2 AMI to launch that instance. Now the AMIs that you usually use to launch instances with, they can be AWS or community-provided. So certain vendors that make their own distribution of Linux, they produce community AMIs that can be used to launch that distribution of Linux inside EC2. So companies such as Red Hat, distributions such as CentOS and Ubuntu, they're available inside AWS. You can use those AMIs to launch EC2 instances with those distributions. You can also launch instances from marketplace-provided AMIs, and these can include commercial software. So you're able to launch an instance and have that instance cost the normal hourly rate plus an extra amount for that commercial software. And you can do that by going to the marketplace, picking a commercial AMI from the marketplace, launching it on a specific instance. And with that architecture, it's generally the instance cost as well as an extra cost for the AMI, which includes the licenses to that commercial software.
Now AMIs are regional. So there are different AMIs for the same thing in different regions. For Amazon Linux 2, there will be an AMI in US East 1. There will be an AMI in US West 1. There will be another AMI in the Sydney region. Each individual region has its own set of AMIs, and each AMI has a unique ID with this format on screen now. So AMI-hyphen and then a random set of numbers and letters. And an AMI can only be used in the region that it's in. So there will be different AMI IDs for the same distribution of an operating system in each individual region. And AMI also controls permissions. So by default, an AMI is set so that only your account can use it. So one of the permissions models is only your account. You can set an AMI to be public so that everybody can access it, or you can add specific AWS accounts onto that AMI. So those are the three options you have for permissions.
Now the flow that you've experienced so far is to take an AMI and use it to create an instance. But you can also do the reverse. You can create an AMI from an existing EC2 instance to capture the current configuration of that instance, creating a template of an existing instance that can be used to make more instances. Now I want you to think about the life cycle of an AMI as having four phases. We've got launch, configure, create image, and launch again. And that second launch is intentional; I'll explain more as we step through this life cycle model. A large number of people who interact with AWS only ever experience the first phase, and this is where you use an AMI to launch an EC2 instance. And we've done that together a few times so far in the course. Now in one of the previous demo lessons in this section of the course, you started to experience a little more of EBS and saw that volumes which were attached to EC2 instances are actually separate logical devices. EBS volumes are attached to EC2 instances using block device IDs. The boot volume is usually /dev/xvda, and as we saw in a previous demo lesson, an extra volume was called /dev/xvdf. So these are device IDs, and device IDs are how EBS volumes are presented to an instance. Now if this is all you ever use AMIs for, that's fine. There are more ways to provision things inside AWS than creating your own AMIs. So it's 100% fine if you don't use custom AMIs much beyond using them to launch things. But if you choose to, you can take the instance that you provisioned during the launch phase, so that's the instance and its attached EBS volumes, and then maybe you can decide to apply some customization to bring your instance into a state where it's perfectly set up for your organization. This might be an OS with certain applications and services installed, or it might be an instance with a certain set of volumes attached of a certain size, or it might be an instance with a full application suite installed and configured to your exact bespoke business requirements ready to use.
Now an instance that's in this heavily customized state, you can take this and actually create your own AMI using that configuration. So a customized configuration of an EC2 instance, architecturally, will just be the instance and any volumes that are attached to that instance, but you can take that configuration and you can use it to create an AMI. Now this AMI will contain a few things, and it's the exact details of those few things that are important. We've already mentioned that the AMI contains permissions, so who can use the AMI? Is it public? Is it private just to your account or do you give access to that AMI to other AWS accounts? That's stored inside the AMI. Think of an AMI as a container. It's just a logical container which has associated information. It's an AMI, it's got an AMI ID, and it's got the permissions restricting who can use it. But what really matters is that when you create an AMI, for any EBS volumes which are attached to that EC2 instance, we have EBS snapshots created from those volumes. And remember, EBS snapshots are incremental, but the first one that occurs is a full copy of all the data that's used on that EBS volume. So when you make an AMI, the first thing that happens is the snapshots are taken, and those snapshots are actually referenced inside the AMI using what's known as a block device mapping. Now what the block device mapping is, is essentially just a table of data. It links the snapshot IDs that you've just created when making that AMI and it has for each one of those snapshots a device ID that the original volumes had on the EC2 instance. So visually, with what's on screen now, the snapshot that's on the right side of the screen that matches the boot volume of the instance. So the block device mapping will contain the ID of the right snapshot, and it will contain the block device of the original volume, so /dev/xvda. The left snapshot that's on screen now that references the data volume, which is /dev/xvdf. So the block device mapping in this case will have two lines. It will reference each of the snapshots and then the device ID that the original volumes had. Now what this does is it means that when this AMI is used to create a new instance, this instance will have the same EBS volume configuration as the original. When you launch an instance using an AMI, what actually happens is the snapshots are used to create new EBS volumes in the availability zone that you're launching that instance into, and those volumes are attached to that new instance using the same device IDs that are contained in that block device mapping.
So AMIs are a regional construct. So you can take an AMI from an instance that's in availability zone A, and that AMI as an object is stored in the region. The snapshots, remember, are stored on S3 so they're already regional, and you can use that AMI to deploy instances back into the same AZ as the source instance or into other availability zones in that region. So just make sure that you understand this architecture. In the following demo lesson, you're going to get a chance to actually do this to create an AMI, but it's a lot easier to understand exactly how this works if you've got a picture of the architecture in your mind. So an AMI itself does not contain any real data volume. An AMI is a container. It references snapshots that are created from the original EBS volumes together with the original device IDs. And so you can take that AMI and use it to provision brand new instances with exactly the same data and exactly the same configuration of volumes.
Now, before we finish this lesson and move on to the demo, I do have some exam power-ups that I want to step through. AMIs do feature on the exam, and the architecture of AMIs is especially important for the Solutions Architect Associate exam. So I'm just going to step through a few really key points. AMIs are in one region, so you create an AMI in a particular region. It can only be used in that region, but it can be used to deploy instances into all of the availability zones in that region. There's a term that you might see in documentation or hear other AWS professionals talk about, which is AMI baking. And AMI baking is the concept of taking an EC2 instance, installing all of the software, doing all the configuration changes, and then baking all of that into an AMI. If you imagine what we did in the last lesson where we installed WordPress manually on an EC2 instance, imagine how easy that would have been if we'd have performed that installation once—so installed and configured all the software and then created an AMI of that configuration that we could then use to deploy tens of instances or hundreds of instances, all pre-configured with WordPress. Well, that is a scenario that you can use AMIs for. So create an AMI with a custom configuration of an EC2 instance for a particular bespoke requirement in your business and then use it to stamp out lots of EC2 instances. And that process is known as AMI baking. You're baking the configuration of that instance into an AMI.
Another important thing to understand is that an AMI cannot be edited. If you want to adjust the configuration of an existing AMI, then you should take the AMI, use it to launch an instance, update the configuration, and then create a brand new AMI. You cannot update an existing AMI. That's critical to understand. Another important thing to understand is that AMIs can be copied between AWS regions. Now, remember, the default permissions on an AMI is that it's accessible only in your account. You can change it by adding additional accounts explicitly. So you can add different accounts in your organization, you could add partner accounts, or you can make the AMI completely public. Those are the three options. It can be private, it can be public, or you can explicitly grant access to individual AWS accounts.
Now in terms of billing for AMIs, an AMI does contain EBS snapshots, and so you are going to be billed for the capacity used by those snapshots. Remember, though, that snapshots only store the data used in EBS volumes. So even if you do have instances with fairly large volume allocations, if those volumes only use a small percentage of the data, then the snapshots will be much smaller than the size allocated for the EBS volumes. But for an AMI, you do need to be aware that it does have a cost, and those costs are the storage capacity used by the EBS snapshots that that AMI references.
Now at this point, I think that's enough theory. So that's what we're going to finish off this lesson. In the next lesson, which is a demo lesson, you're going to get some practical experience of launching an instance, performing some custom configuration, and then creating your own AMI and using that AMI to create new instances. So I'm hoping that by having this theory lesson, which introduces all of the architecture and important details, it will help you understand the demo lesson where you'll implement this in your own environment. And that demo lesson will help all of this theory and architecture stick because it will be important to remember for the exam. At this point, though, go ahead, complete this video, and when you're ready, you can join me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to cover some networking theory related to EC2 instances, focusing on network interfaces, instance IPs, and instance DNS. EC2 is a feature-rich product, and there's a lot of nuance in the way that you can connect to it and interact with it, so it's important that you understand exactly how interfaces, IPs, and DNS work. Let's get started and take a look.
Architecturally, this is how EC2 looks: We have an EC2 instance, and it always starts off with one network interface, called an ENI, or Elastic Network Interface, and every EC2 instance has at least one, which is the primary interface or primary ENI. Optionally, you can attach one or more secondary elastic network interfaces, which can be in separate subnets, but everything needs to be within the same availability zone. Remember, EC2 is isolated in one availability zone, so this is important. An instance can have different network interfaces in separate subnets, but all of those subnets need to be in the same availability zone, in this case, availability zone A.
These network interfaces have a number of attributes or things attached to them, and this is important because when you're looking at an EC2 instance using the console UI, these are often presented as being attached to the instance itself, so you might see things like IP addresses or DNS names, and they appear to be attached to the instance, but when you're interacting with the instance from a networking perspective, you're often seeing elements of the primary network interface. For example, when you launch an instance with security groups, those security groups are actually on the network interface, not the instance.
Let me expand on this a little bit and highlight some of the things that are actually attached to the network interfaces. First, network interfaces have a MAC address, which is the hardware address of the interface, and it's visible inside the operating system, so it can be used for things like software licensing. Each interface also has a primary IP version 4 private address that's from the range of the subnet that the interface is created in, so when you select a VPC and a subnet for an EC2 instance, you're actually picking the VPC and the subnet for the primary network interface. You can have zero or more secondary private IP addresses also on the interface, and you can have zero or one public IP addresses associated with the interface itself, and you can also have one elastic IP address per private IP version 4 address.
Elastic IP addresses are public IP version 4 addresses, and these are different than normal public IP version 4 addresses where it's one per interface. With elastic IP addresses, you can have one public elastic IP address per private IP address on this interface, and you can have zero or more IP version 6 addresses per interface, and remember, these are by default publicly routable, so with IP version 6, there's no definition of public or private addresses; they're all public addresses. You can also have security groups, and security groups are applied to network interfaces, so a security group that's applied to a particular interface will impact all IP addresses on that interface. That's really important architecturally because if you're ever in a situation where you need different IP addresses for an instance impacted by different security groups, then you need to create multiple interfaces with those IP addresses separated and then apply different security groups to each of those different interfaces.
Security groups are attached to interfaces, and finally, per interface, you can also enable or disable the source and destination check, meaning if traffic is on the interface, it’s going to be discarded if it’s not from one of the IP addresses on the interface as a source or destined to one of the IP addresses on the interface as a destination. If this is enabled, traffic is discarded if it doesn’t match one of those conditions. Recall when I talked about NAT instances; this is the setting that you need to disable for an EC2 instance to work as a NAT instance. So this check needs to be switched off.
Now, depending on the type of EC2 instance that you provision, you can have additional secondary interfaces, and the exact number depends on the instance, but at a high level, the capabilities of the secondary interfaces are the same as the primary, except that you can detach secondary interfaces and move them to other EC2 instances, which is a critical difference that brings additional capabilities. Let's explore some of these ENI attributes and attachments in a little more detail. Let’s assume for this example that this instance receives a primary IP version 4 private IP address of 10.16.0.10, which is static and doesn’t change for the lifetime of the instance.
Now, the instance is also given a DNS name that's associated with this private address. It has a logical format, so it starts off with "IP," followed by a hyphen, then the private IP address separated by hyphens rather than periods, and then ".ec2.internal." This IP is only resolvable inside the VPC and always points at this private IP address, so you can use this private DNS name for internal communications only inside the VPC. Assuming the instance is either manually set to receive a public IP version 4 address or launched with default settings into a subnet configured to automatically allocate IP version 4 public addresses, it will get one, but this is a dynamic IP; it's not fixed. If you stop and start an instance, its public IP address will change, as when you stop an instance, the public IP version 4 address is deallocated, and when you start the instance again, a brand new public IP version 4 address is allocated, and it will be different.
If you just restart the instance (not stop and start), the IP address won’t change because it's only stopping and starting the instance again that will cause the change, but anything that makes the instance change between EC2 hosts will also cause an IP change. For this public IP version 4 address, EC2 instances are also allocated a public DNS name, generally following this format: "EC2" followed by a hyphen, then the IP address with hyphens rather than dots, and then something similar to "compute-1.amazonaws.com." This might differ slightly, but generally, the public DNS follows this format.
What’s special about this public DNS name is that inside the VPC, it will resolve to the primary private IP version 4 address of the instance (the primary network interface). Remember how VPC works? The public IP version 4 address is not directly attached to the instance or any of the interfaces; it's associated with it, and the internet gateway handles that translation. So, in order to allow instances in a VPC to use the same DNS name and ensure they're always using the private addresses inside the VPC, it always resolves to the private address. Outside of the VPC, the DNS resolves to the public IP version 4 address of that instance. This simplifies the discoverability of your instances by allowing you to specify one single DNS name for an instance and have that traffic resolve to an internal address inside AWS and an external IP outside AWS.
Now, elastic IP addresses are something I want to introduce now, and in the next demo lesson, you’ll get to experiment with them. Elastic IP addresses are something that’s allocated to your AWS account. When you allocate an elastic IP, you can associate the elastic IP with a private IP, either on the primary interface or a secondary interface. If you associate it with the primary interface, as soon as you do that, the normal (non-elastic) public IP version 4 address that the instance had is removed, and the elastic IP becomes the instance’s new public IP version 4 address. If you assign an elastic IP to an instance, under most circumstances, the instance will lose its non-elastic public address, and if you remove the elastic IP, it will gain a new public IP version 4 address. That’s a question that comes up in the exam all the time: If an instance has a non-elastic public IP and you assign an elastic IP and then remove it, is there any way to get that original IP back? The answer is no, you can’t.
I know this is a lot of theory, but it's really important from a networking perspective, so you need to try and become really clear on what I've talked about. Instances have one or more network interfaces, a primary and optionally secondary. For each network interface, make sure you’re clear on what IP addressing it has: a primary private IP address, secondary private IP addresses, optionally one public IP version 4 address, and optionally one or more elastic IP addresses. Become familiar with what these mean. In the next demo lesson, you’ll get a chance to experiment and understand exactly how they work.
Before we move on, I want to talk about some exam power-ups. This is an important area at AWS, and there are a number of hints and tips that I can give you for the exam. My first tip is about secondary elastic network interfaces and MAC addresses. A lot of legacy software is licensed using a MAC address, and a MAC address is seen as something static that doesn’t change, but because EC2 is a virtualized environment, we can swap and change elastic network interfaces. If you provision a secondary elastic network interface on an instance and use that secondary network interface’s MAC address for licensing, you can detach that secondary interface and attach it to a new instance, moving that licensing between EC2 instances. This is really powerful.
Something else to keep in mind is that multiple interfaces can be used for multi-homed systems. An instance with an ENI in two different subnets might use one for management and one for data, giving you some flexibility. You might use multiple interfaces rather than just multiple IPs because security groups are attached to interfaces. If you need different security groups for different IPs or different rules for different types of access based on IPs your instance has, then you need multiple elastic network interfaces with different security groups on each. When you interact with an instance and apply security groups, if you're doing it at the instance level, you generally interact with the primary elastic network interface, and in many ways, you can almost think of the primary interface as the instance, but they are separate things.
One important point about EC2 IP addressing that I keep stressing for the exam is that the operating system never sees the IP version 4 public address. This is provided by a process called NAT, which is performed by the Internet Gateway. As far as the operating system is concerned, you always configure the private IP version 4 address on the interface. Inside the OS, it has no visibility on the public IP address networking configuration, and you will never be in a situation where you need to configure Windows or Linux with the IP version 4 public address. Now, IP version 6 is different because they're all public, but for the exam, remember that you can never configure a network interface inside an operating system with a public IP version 4 address inside AWS.
The normal IP version 4 public IP address that EC2 instances are provided with is dynamic; if you stop an instance, that IP is deallocated. If you start an instance again, a new public IP version 4 address is allocated. If you start an instance, it’s fine, but if you stop and start, or if there's a forced migration of an instance between hosts, the normal IP version 4 public IP address will change. To avoid this, you need to allocate and assign an elastic IP address.
Finally, the public DNS given to the instance for the public IP version 4 address can resolve to the primary private IP version 4 address from within the VPC, ensuring that instance-to-instance communication using this address inside the VPC never leaves the VPC. For the rest of the course, this public DNS resolves to the public IP version 4 address. Remember this for the exam, and remember it later in the course when I’m talking about technologies like VPC peering, because you'll need to know exactly how this works: Inside the VPC, the public DNS resolves to the private IP, and outside the VPC, it resolves to the public IP address.
I know this has been a lot of theory, but don’t worry; as we continue moving through the course, these theoretical concepts will start to click when you start using the technology. We’ve already experienced this a little bit when we started provisioning EC2 instances or using NAT gateways. You've seen how some of the theory is applied by AWS products and services, so don’t worry, it will click as we move through the course. It's my job to make sure the information sticks, but I do need to teach you some raw theory occasionally. This has been one of those lessons. Do your best to remember it, but it will start sticking when we get practical exposure. At this point, that’s everything I wanted to cover, so go ahead, mark this video as complete, and when you're ready, you can join me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this video, we're going to be covering EBS encryption, something that's really important for the real world and for most AWS exams. Now, EBS volumes, as you know by now, are block storage devices presented over the network, and these volumes are stored in a resilient, highly available way inside an availability zone. But at an infrastructure level, they're stored on one or more physical storage devices. By default, no encryption is applied, so the data is persisted to disk exactly as the operating system writes it. If you write a cat picture to a drive or a mount point inside your instance, the plain text of that cat picture is written to one or more raw disks. Now, this obviously adds risk and a potential physical attack vector for your business operations, and EBS encryption helps to mitigate this risk. EBS encryption provides at-rest encryption for volumes and for snapshots.
So, let's take a look at how it works architecturally. EBS encryption isn't all that complex an architecture when you understand KMS, which we've already covered. Without encryption, the architecture looks at a basic level like this: we have an EC2 host running in a specific availability zone, and running on this host is an EC2 instance using an EBS volume for its boot volume. Without any encryption, the instance generates data, and this is stored on the volume in its plain text form. So, if you're storing any cat or chicken pictures on drives or mount points inside your EC2 instance, then by default, that plain text is stored at rest on the EBS volumes.
Now, when you create an encrypted EBS volume initially, EBS uses KMS and a KMS key, which can either be the EBS default AWS managed key, called AWS/ServiceName (in this case EBS), or it can be a customer-managed KMS key that you create and manage. That key is used by EBS when an encrypted volume is created. Specifically, it's used to generate an encrypted data encryption key known as a D-E-K, and this occurs with the generate data key without plain text API call. So, you just get the encrypted data encryption key, and this is stored with the volume on the raw storage. It can only be decrypted using KMS, and assuming that the entity doing so has permissions to decrypt the data encryption key using the corresponding KMS key.
Remember, initially, a volume is empty. It's just an allocation of space, so there's nothing yet to encrypt. When the volume is first used, either mounted on an EC2 instance by you or when an instance is launched, EBS asks KMS to decrypt the data encryption key that's used just for this one volume. That key is loaded into the memory of the EC2 host which will be using it. The key is only ever held in this decrypted form in memory on the EC2 host which is using the volume currently. So, the key is used by the host to encrypt and decrypt data between an instance and the EBS volume, specifically the raw storage that the EBS volume is stored on. This means the data stored onto the raw storage used by the volume is ciphertext, and it's encrypted at rest. Data only exists in an unencrypted form inside the memory of the EC2 host. What's stored on the raw storage is the ciphertext version, the encrypted version of whatever data is written by the instance operating system.
Now, when the EC2 instance moves from this host to another, the decrypted key is discarded, leaving only the encrypted version with the disk. For that instance to use the volume again, the encrypted data encryption key needs to be decrypted and loaded into another EC2 host. If a snapshot is made of an encrypted volume, the same data encryption key is used for that snapshot, meaning the snapshot is also encrypted. Any volumes created from that snapshot are themselves also encrypted using the same data encryption key, and so they're also encrypted. Now, that's really all there is to the architecture. It doesn't cost anything to use, so it's one of those things which you should really use by default.
Now, I've covered the architecture in a little detail, and now I want to step through some really important summary points which will help you within the exam. The exam tends to ask some pretty curveball questions around encryption, so I'm going to try and give you some hints on how to interpret and answer those. AWS accounts can be configured to encrypt EBS volumes by default. You can set the default KMS key to use for this encryption, or you can choose a KMS key to use manually each and every time. The KMS key isn't used to directly encrypt or decrypt volumes; instead, it's used to generate a per-volume, unique data encryption key.
Now, if you do make snapshots or create new volumes from those snapshots, then the same data encryption key is used, but for every single time you create a brand new volume from scratch, it uses a unique data encryption key. So, just to restress this because it's really important that the data encryption key is used for that one volume, and any snapshots you take from that volume which are encrypted, and any future volumes created from that snapshot. So, that's really important to understand. I'm going to stress it again. I know you're getting tired of me saying this: Every time you create an EBS volume from scratch, it uses a unique data encryption key. If you create another volume from scratch, it uses a different data encryption key. But if you take a snapshot of an existing encrypted volume, it uses the same data encryption key, and if you create any further EBS volumes from that snapshot, it also uses the same data encryption key.
Now, there's no way to remove the encryption from a volume or a snapshot. Once it's encrypted, it's encrypted. There are ways that you can manually work around this by cloning the actual data from inside an operating system to an unencrypted volume, but this isn't something that's offered from the AWS console, the CLI, or the APIs. Remember, inside an operating system, it just sees plain text, and so this is the only way that you have access to the plain text data and can clone it to another unencrypted volume. And that's another really important point to understand. The OS itself isn't aware of any encryption. To the operating system, it just sees plain text because the encryption is happening between the EC2 host and the volume. It's encrypted using AES-256, so between the EC2 host and the EBS system itself. If you face any situations where you need the operating system to encrypt things, that's something that you'll need to configure on the operating system itself.
If you need to hold the keys, if you need the operating system to hold the keys rather than EC2, EBS, and KMS, then you need to configure volume encryption within the operating system itself. This is commonly called software disk encryption, and this just means that the operating system does the encryption and stores the keys. Now, you can use software disk encryption within the operating system and EBS encryption at the same time. This doesn't really make sense for most use cases, but it can be done. EBS encryption is really efficient though. You don't need to worry about keys. It doesn't cost anything, and there's no performance loss for using it. Now, that is everything I wanted to cover in this video, so thanks for watching. Go ahead and complete the video, and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I'm going to be discussing EBS snapshots, which provide a few really useful features for a solutions architect. First, they're an efficient way to back up EBS volumes to S3, and by doing this, you protect the data on those volumes against availability zone issues or local storage system failure in that availability zone, and they can also be used to migrate the data that's on EBS volumes between availability zones using S3 as an intermediary. So let's step through the architecture first through this lesson, and then in the next lesson, which will be a demo, we'll get you into the AWS console for some practical experience.
Snapshots are essentially backups of EBS volumes which are stored on S3, and EBS volumes are availability zone resilient, which means that they're vulnerable to any issues which impact an entire availability zone. Because snapshots are stored on S3, the data that snapshots store becomes region resilient, and so we're improving the resiliency level of our EBS volumes by taking a snapshot and storing it into S3. Now, snapshots are incremental in nature, and that means a few very important things. It means that the first snapshot to be taken of a volume is a full copy of all of the data on that volume. Now, I'm stressing the word "data" because a snapshot just copies the data used. So, if you use 10 GB of a 40 GB volume, then that initial snapshot is 10 GB, not the full 40 GB. The first snapshot, because it's a full one, can take some time depending on the size of the data. It's copying all of the data from a volume onto S3. Now, your EBS performance won't be impacted during this initial snapshot, but it just takes time to copy in the background. Future snapshots are fully incremental; they only store the difference between the previous snapshot and the state of the volume when the snapshot is taken, and because of that, they consume much less space and they're also significantly quicker to perform.
Now, you might be concerned at this point hearing the word "incremental." If you've got any existing backup system or backup software experience, it was always a risk that if you lost an incremental backup, then no further backups between that point and when you next took the full backup would work, so there was a massive risk of losing an incremental backup. You don't have to worry about that with EBS. It's smart enough so that if you do delete an incremental snapshot, it makes sure that the data is moved so that all of the snapshots after that point still function, so each snapshot, even though it is incremental, can be thought of as self-sufficient.
Now, when you create an EBS volume, you have a few choices. You can create a blank volume, or you can create a volume that's based on a snapshot. So, snapshots offer a great way to clone a volume. Because S3 is a regional service, the volume you create from a snapshot can be in a different availability zone from the original, which means snapshots can be used to move EBS volumes between availability zones. But also, snapshots can be copied between AWS regions, so you can use snapshots for global DR processes or as a migration tool to migrate the data on volumes between regions. Snapshots are really flexible.
Visually, this is how snapshot architecture looks. So here we've got two AWS regions, US East 1 and AP Southeast 2. We have a volume in availability zone A in US East 1, and that's connected to an EC2 instance in the same availability zone. Now, snapshots can be taken of this volume and stored in S3. And the first snapshot is a full copy, so it stores all of the data that's used on the source volume. The second one is incremental, so this only stores the changes since the last snapshot. So, at the point that you create the second snapshot, only the changes between the original snapshot and now are stored in this incremental, and these are linked, so the incremental references the initial snapshot for any data that isn't changed. Now, the snapshot can be used to create a volume in the same availability zone, it can be used to create a volume in another availability zone in the same region, and that volume could then be attached to another EC2 instance, or the snapshot could be copied to another AWS region and used to create another volume in that region. So, that's the architecture, that's how snapshots work, and there's nothing overly complex about it, but I did want to cover a few final important points before we finish up.
As a solutions architect, there are some nuances of snapshot and volume performance that you need to be aware of. These can impact projects that you design and deploy significantly, and this does come up in the exam. Now, first, when you create a new EBS volume without using a snapshot, the performance is available immediately. There's no need to do any form of initialization process. But if you restore a volume from a snapshot, it does the restore lazily. What this means is that if you restore a volume right now, then starting right now, over time it will transfer the data from the snapshot on S3 to the new volume in the background, and this process takes some time. If you attempt to read data which hasn't been restored yet, it will immediately pull it from S3, but that achieves lower levels of performance than reading from EBS directly. So, you have a number of choices. You can force a read of every block of the volume, and this is done in the operating system using tools such as DD on Linux. And this reads every block one by one on the new EBS volume, and it forces EBS to pull all the snapshot data from S3 into that volume, and this is generally something that you would do immediately when you restore the volume before moving that volume into production usage. It just ensures perfect performance as soon as your customers start using that data.
Now, historically that was the only way to force this rapid initialization of the volume, but now there's a feature called Fast Snapshot Restore or FSR. This is an option that you can set on a snapshot which makes it instantly restore. You can create 50 of these fast snapshot restores per region, and when you enable it on a snapshot, you pick the snapshot specifically and the availability zones that you want to be able to do instant restores to. Each combination of that snapshot and an AZ is classed as one fast snapshot restore set, and you can have 50 of those per region. So, one snapshot configured to restore to four availability zones in a region represents four out of that 50 limit of FSRs per region, so keep that in mind.
Now, FSR actually costs extra. Keep this in mind. It can get expensive, especially if you have lots of different snapshots. You can always achieve the same end result by forcing a read of every block manually using DD or another tool in the operating system. But if you really don't want to go through the admin overhead, then you've got the option of using FSR. Now, I haven't talked about EBS volume encryption yet. That's coming up in a lesson soon within this section, but encryption also influences snapshots. But don't worry, I'll be covering all of that end to end when I talk about volume encryption.
Now, snapshots are billed using a gigabyte month metric. So, a 10 GB snapshot stored for one month represents 10 GB month. A 20 GB snapshot stored for half a month represents the same 10 GB month, and that's how you're billed. There's a certain cost for every gigabyte month that you use for snapshot storage. Now, just to stress this, this is an awesome feature specifically from a cost aspect, is that this is used data, not allocated data. You might have a volume which is 40 GB in size, but if you only use 10 GB of that, then the first full snapshot is only 10 GB. EBS doesn't charge for unused areas in volumes when performing snapshots. You're charged for the full allocated size of an EBS volume, but that's because it's allocated. For snapshots, you only bill for the data that's used on the volumes, and because snapshots are incremental, you can perform them really regularly. Only the data that's changed is stored, so doing a snapshot every five minutes won't necessarily cost more than doing one per hour.
Now, on the right, this is visually how snapshots look. On the left, we have a 10 GB volume using 10 GB of data, so it's 100% consumed. The first snapshot, logically, will consume 10 GB of space on S3 because it's a full snapshot and it consumes whatever data is used on the volume. In the middle column, we're changing 4 GB of data out of that original 10 GB, so the bit in yellow at the bottom. The next snap references the unchanged 6 GB of data and only stores the changed 4 GB. So, the second snap only bills for 4 GB of data, the changed data. On the right, we've got 2 GB of data that's added to that volume, so the volume is now 12 GB. The next snapshot references the original 6 GB of data, so that's not stored in this snapshot. It also references the previous snapshots for 4 GB of changed data, that's also not stored in this new snapshot. The new snapshot simply adds the new 2 GB of data, so this snapshot only bills for 2 GB. At each stage, a new snapshot is only storing data inside itself, which is new or changed, and it's referencing previous snapshots for anything which isn't changed. That's why they're all incremental, and that's why you only bill each time you do a snapshot for the changed data.
Okay, that's enough theory for now, time for a demonstration. So, in the next demo lesson, we're going to experiment with EBS volumes and snapshots and just experience practically how we can interact with them. It's going to be a simple demo, but I always find that by doing things, you retain the theory that you've learned, and this has been a lot of theory. So go ahead, complete this video, and when you're ready, we can start the demo lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome to this lesson where I want to briefly cover some of the situations where you would choose to use EBS rather than Instant Store volumes and also where Instant Store is more suitable than EBS, as well as those situations where it depends because there are always going to be situations where either or neither could work. Now we've got a lot to cover so let's jump in and get started.
Now I want to apologize right at the start. You know by now I hate lessons where I just talk about facts and figures, numbers and acronyms. I almost always prefer diagrams, teaching, using real world architecture and implementations. Sometimes though we just need to go through numbers and facts and this is one of those times. I'm sorry but we have to do it. So this lesson is going to be a lesson where I'm going to be covering some useful scenario points, some useful minimums and maximums and situations which will help you decide between using Instant Store volumes versus EBS, and these are going to be useful both for the exam and real world usage.
Now first as a default rule if you need persistent storage then you should default to EBS or more specifically default away from Instant Store volumes. So Instant Store volumes they're not persistent. There are many reasons why data can be lost: hardware failure, instances rebooting, maintenance, anything which moves instances between hosts can impact Instant Store volumes and this is critical to understand for the exam and for the real world. If you need resilient storage you should avoid Instant Store volumes and default to EBS. Again if hardware fails, Instant Store volumes can be lost. If instances move, if hosts fail, anything of this nature can cause loss of data on Instant Store volumes because they're just not resilient. EBS provides hardware which is resilient within an availability zone and you also have the ability to snapshot volumes to S3, and so EBS is a much better product if you need resilient storage.
Next, if you have storage which you need to be isolated from instance life cycles then use EBS. So if you need a volume which you can attach to one instance, use it for a while, unattach it and then reattach it to something else, then EBS is what you need. These are the scenarios where it makes much more sense to use EBS. For any of the things I've mentioned it's pretty clear cut. Use EBS. Or to put it another way avoid Instant Store volumes.
Now there are some scenarios where it's just not as clear cut and you need to be on the lookout for these within the exam. Imagine that you need resilience but your application supports built-in replication. Well then you can use lots of Instant Store volumes on lots of instances and that way you get the performance benefits of using Instant Store volumes but without the negative risk. Another situation where it depends is if you need high performance. Up to a point and I'll cover these different levels of performance soon, both EBS and Instant Store volumes can provide high performance. For super high performance though you will need to default to using Instant Store volumes and I'll be qualifying exactly what these performance levels are on the next screen. Finally, Instant Store volumes are included with the price of many EC2 instances and so it makes sense to utilize them. If cost is a primary concern then you should look at using Instant Store volumes.
Now these are the high-level scenarios and these key facts will serve you really well in the exam. It will help you to pick between Instant Store volumes and EBS for most of the common exam scenarios. But now I want to cover some more specific facts and numbers that you need to be aware of. Now if you see questions in the exam which are focused purely on cost efficacy and where you think you need to use EBS, then you should default to ST1 or SC1 because they're cheaper. They're mechanical storage and so they're going to be cheaper than using the SSD-based EBS volumes. Now if the question mentions throughput or streaming then you should default to ST1 unless the question mentions boot volumes which excludes both of them, so you can't use either of the mechanical storage types so ST1 or SC1 to boot EC2 instances and that's a critical thing to remember for the exam.
Next I want to move on to some key performance levels. So first we have GP2 and GP3 and both of those can deliver up to 16,000 IOPS per volume. So with GP2 this is based on the size of the volume. With GP3 you get 3000 IOPS by default and you can pay for additional performance. But for either GP2 or GP3 the maximum possible performance per volume is 16,000 IOPS and you need to keep that in mind for any exam questions. Now IO1 and IO2 can deliver up to 64,000 IOPS so if you need between 16,000 IOPS and 64,000 IOPS on a volume then you need to pick IO1 or IO2. Now I've included the asterisks here because there is a new type of volume known as IO2 block express and this can deliver up to 256,000 IOPS per volume. But of course you need to keep in mind that these high levels of performance will only be possible if you're using the larger instance types. So these are specifically focused around the maximum performance that's possible using EBS volumes but you need to make sure that you pair this with a good sized EC2 instance which is capable of delivering those levels of performance.
Now one option that you do have and this comes up relatively frequently in the exam, you can take lots of individual EBS volumes and you can create a RAID 0 set from those EBS volumes and that RAID 0 set then gets up to the combined performance of all of the individual volumes but this is up to 260,000 IOPS because this is the maximum possible IOPS per instance. So no matter how many volumes you combine together you always have to worry about the maximum performance possible on an EC2 instance and currently the highest performance levels that you can achieve using EC2 and EBS is 260,000 IOPS and to achieve that level you need to use a large size of instance and have enough EBS volumes to consume that entire capacity. So you need to keep in mind the performance that each volume gives and then the maximum performance of the instance itself and there is a maximum currently of 260,000 IOPS. So that's something to keep in mind.
Now if you need more than 260,000 IOPS and your application can tolerate storage which is not persistent then you can decide to use instance store volumes. Instance store volumes are capable of delivering much higher levels of performance and I've detailed that in the lesson specifically focused on instance store volumes. You can gain access to millions of IOPS if you choose the correct instance type and then use the attached instance store volumes but you do always need to keep in mind that this storage is not persistent. So you're trading the lack of persistence for much improved performance.
Now once again I don't like doing this but my suggestion is that you try your best to remember all of these figures. I'm going to make sure that I include this slide as a learning aid on the course github repository. So print it out, take a screenshot, include it in your electronic notes, whatever study method you use you need to remember all of these facts and figures from this entire lesson because if you remember them it will make answering performance related questions in the exam much easier. Now again I don't like suggesting that students remember raw facts and figures, it's not normally conducive to effective learning but this is the one exception within AWS. So try your best to remember all of these different performance levels and what technology you need to achieve each of the different levels.
Now at this point that's everything that I wanted to cover in this lesson, I hope it's been useful. Go ahead and complete the video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk through another type of storage, this time instant store volumes. It's essential for all of the AWS exams and real-world usage that you understand the pros and cons for this type of storage, as it can save money, improve performance, or cause significant headaches, so you have to appreciate all of the different factors. So let's just jump in and get started because we've got a lot to cover.
Instant store volumes provide block storage devices—raw volumes which can be attached to an instance, presented to the operating system on that instance, and used as the basis for a file system which can then in turn be used by applications. So far they're just like EBS, only local instead of being presented over the network. These volumes are physically connected to one EC2 host, and that's really important; each EC2 host has its own instant store volumes and they're isolated to that one particular host. Instances which are on that host can access those volumes, and because they're locally attached they offer the highest storage performance available within AWS, much higher than EBS can provide, and more on why this is relevant very soon.
They're also included in the price of any instances which they come with. Different instance types come with different selections of instant store volumes, and for any instances which include instant store volumes, they're included in the price of that instance, so it comes down to use it or lose it. One really important thing about instant store volumes is that you have to attach them at launch time, and unlike EBS, you can't attach them afterwards. I've seen this question come up a few times in various AWS exams about adding new instant store volumes after instance launch, and it's important that you remember that you can't do this—it's launch time only. Depending on the instance type you're going to be allocated a certain number of instant store volumes; you can choose to use them or not, but if you don't, you can't adjust this later.
This is how instant store architecture looks: each instance can have a collection of volumes which are backed by physical devices on the EC2 host which that instance is running on. So in this case, host A has three physical devices and these are presented as three instant store volumes, and host B has the same three physical devices. Now in reality, EC2 hosts will have many more, but this is a simplified diagram. Now on host A, instance 1 and 2 are running—instance 1 is using one volume and instance 2 is using the other two volumes, and the volumes are named ephemeral 0, 1, and 2. Roughly the same architecture is present on host B, but instance 3 is the only instance running on that host, and it's using ephemeral 1 and ephemeral 2 volumes.
Now these are ephemeral volumes—they're temporary storage, and as a solutions architect, developer, or engineer, you need to think of them as such. If instance 1 stored some data on ephemeral volume 0 on EC2 host A—let's say a cat picture—and then for some reason the instance migrated from host A through to host B, then it would still have access to an ephemeral 0 volume, but it would be a new physical volume, a blank block device. So this is important: if an instance moves between hosts, then any data that was present on the instant store volumes is lost, and instances can move between hosts for many reasons. If they're stopped and started, this causes a migration between hosts, or another example is if host A was undergoing maintenance, then instances would be migrated to a different host.
When instances move between hosts, they're given new blank ephemeral volumes; data on the old volumes is lost, they're wiped before being reassigned, but the data is gone—and even if you do something like change an instance type, this will cause an instance to move between hosts, and that instance will no longer have access to the same instant store volumes. This is another risk to keep in mind: you should view all instant store volumes as ephemeral. The other danger to keep in mind is hardware failure—if a physical volume fails, say the ephemeral 1 volume on EC2 host A, then instance 2 would lose whatever data was on that volume. These are ephemeral volumes—treat them as such; they're temporary data and they should not be used for anything where persistence is required.
Now the size of instant store volumes and the number of volumes available to an instance vary depending on the type of instance and the size of instance. Some instance types don't support instant store volumes, different instance types have different types of instance store volumes, and as you increase in size you're generally allocated larger numbers of these volumes, so that's something that you need to keep in mind. One of the primary benefits of instance store volumes is performance—you can achieve much higher levels of throughput and more IOPS by using instance store volumes versus EBS. I won't consume your time by going through every example, but some of the higher-end figures that you need to consider are things like if you use a D3 instance, which is storage optimized, then you can achieve 4.6 GB per second of throughput, and this instance type provides large amounts of storage using traditional hard disks, so it's really good value for large amounts of storage. It provides much high levels of throughput than the maximums available when using HDD-based EBS volumes.
The I3 series, which is another storage optimized family of instances, provides NVMe SSDs and this provides up to 16 GB per second of throughput, and this is significantly higher than even the most high performance EBS volumes can provide—and the difference in IOPS is even more pronounced versus EBS, with certain I3 instances able to provide 2 million read IOPS and 1.6 million write IOPS when optimally configured. In general, instance store volumes perform to a much higher level versus the equivalent storage in EBS. I'll be doing a comparison of EBS versus instance store elsewhere in this section, which will help you in situations where you need to assess suitability, but these are some examples of the raw figures.
Now before we finish this lesson, just a number of exam power-ups: instance store volumes are local to an EC2 host, so if an instance does move between hosts, you lose access to the data on that volume. You can only add instance store volumes to an instance at launch time; if you don't add them, you cannot come back later and add additional instance store volumes, and any data on instance store volumes is lost if that instance moves between hosts, if it gets resized, or if you have either host failure or specific volume hardware failure.
Now in exchange for all these restrictions, of course instance store volumes provide high performance—it's the highest data performance that you can achieve within AWS, you just need to be willing to accept all of the shortcomings around the risk of data loss, its temporary nature, and the fact that it can't survive through restarts or moves or resizes. It's essentially a performance trade-off: you're getting much faster storage as long as you can tolerate all of the restrictions. Now with instance store volumes, you pay for it anyway—it's included in the price of an instance, so generally when you're provisioning an instance which does come with instance store volumes, there is no advantage to not utilizing them; you can decide not to use them inside the OS, but you can't physically add them to the instance at a later date.
Just to reiterate—and I'm going to keep repeating this throughout this section of the course—instance store volumes are temporary, you cannot use them for any data that you rely on or data which is not replaceable, so keep that in mind. It does give you amazing performance, but it is not for the persistent storage of data. But at this point that's all of the theory that I wanted to cover—so that's the architecture and some of the performance trade-offs and benefits that you get with instance store volumes. Go ahead and complete this video and when you're ready, join me in the next, which will be an architectural comparison of EBS and instance store, which will help you in exam situations to pick between the two.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about the Hard Disk Drive or HDD-based volume types provided by EBS. HDD-based means they have moving bits, platters which spin, little robot arms known as heads which move across those spinning platters—moving parts means slower, which is why you'd only want to use these volume types in very specific situations.
Now let's jump straight in and look at the types of situations where you would want to use HDD-based storage. Now there are two types of HDD-based storage within EBS—well that's not true, there are actually three, but one of them is legacy—so I'll be covering the two ones which are in general usage, and those are ST1, which is throughput optimized HDD, and SC1, which is cold HDD.
So think about ST1 as the fast hard drive—not very agile but pretty fast—and think about SC1 as cold. ST1 is cheap, it's less expensive than the SSD volumes, which makes it ideal for any larger volumes of data; SC1 is even cheaper, but it comes with some significant trade-offs.
Now ST1 is designed for data which is sequentially accessed—because it's HDD-based it's not great at random access—it's more designed for data which needs to be written or read in a fairly sequential way, for applications where throughput and economy is more important than IOPS or extreme levels of performance. ST1 volumes range from 125 GB to 16 TB in size and you have a maximum of 500 IOPS—but, and this is important, IO on HDD-based volumes is measured as 1 MB blocks, so 500 IOPS means 500 MB per second.
Now their maximums—HDD-based storage works in a similar way to how GP2 volumes work with a credit bucket, only with HDD-based volumes it's done around MB per second rather than IOPS. So with ST1 you have a baseline performance of 40 MB per second for every 1 TB of volume size, and you can burst to a maximum of 250 MB per second for every TB of volume size, obviously up to the maximum of 500 IOPS and 500 MB per second.
ST1 is designed for when cost is a concern but you need frequent access storage for throughput intensive sequential workloads—so things like big data, data warehouses, and log processing. Now ST1 on the other hand is designed for infrequent workloads—it's geared towards maximum economy when you just want to store lots of data and don't care about performance.
So it offers a maximum of 250 IOPS—again this is with a 1 MB IO size—so this means a maximum of 250 MB per second of throughput, and just like with ST1, this is based on the same credit pool architecture. So it has a baseline of 12 MB per TB of volume size and a burst of 80 MB per second per TB of volume size.
So you can see that this offers significantly less performance than ST1, but it's also significantly cheaper, and just like with ST1, volumes can range from 125 GB to 16 TB in size. This storage type is the lowest cost EBS storage available—it's designed for less frequently accessed workloads, so if you have colder data, archives, or anything which requires less than a few loads or scans per day, then this is the type of storage volume to pick.
And that's it for HDD based storage—both of these are lower cost and lower performance versus SSD, designed for when you need economy of data storage. Picking between them is simple—if you can tolerate the trade-offs of ST1 then use that, it's super cheap and for anything which isn't day-to-day accessed it's perfect—otherwise choose ST1.
And if you have a requirement for anything IOPS based, then avoid both of these and look at SSD based storage. With that being said though, that's everything that I wanted to cover in this lesson—thanks for watching, go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to continue my EBS series and talk about provisioned IOPS SSD, so that means IO1 and IO2. Let's jump in and get started straight away because we do have a lot to cover. Strictly speaking, there are now three types of provisioned IOPS SSD—two which are in general release, IO1 and its successor IO2, and one which is in preview, which is IO2 Block Express.
Now they all offer slightly different performance characteristics and different prices, but the common factors is that IOPS are configurable independent of the size of the volume and they're designed for super high performance situations where low latency and consistency of that low latency are both important characteristics. With IO1 and IO2 you can achieve a maximum of 64,000 IOPS per volume—that's four times the maximum for GP2 and GP3—and with IO1 and IO2 you can achieve a 1000 MB per second of throughput, which is the same as GP3 and significantly more than GP2.
Now IO2 Block Express takes this to another level—with Block Express you can achieve 256,000 IOPS per volume and 4000 MB per second of throughput per volume. In terms of the volume sizes that you can use with provisioned IOPS SSDs, with IO1 and IO2 it ranges from 4 GB to 16 TB, and with IO2 Block Express you can use larger, up to 64 TB volumes.
Now I mentioned that with these volumes you can allocate IOPS performance values independently of the size of the volume—this is useful for when you need extreme performance for smaller volumes or when you just need extreme performance in general, but there is a maximum of the size-to-performance ratio. For IO1 it's 50 IOPS per GB of size, so this is more than the 3 IOPS per GB for GP2; for IO2 this increases to 500 IOPS per GB of volume size, and for Block Express this is 1000 IOPS per GB of volume size.
Now these are all maximums, and with these types of volumes you pay for both the size and the provisioned IOPS that you need. Now because with these volume types you're dealing with extreme levels of performance, there is also another restriction that you need to be aware of, and that's the per instance performance—there is a maximum performance which can be achieved between the EBS service and a single EC2 instance.
Now this is influenced by a few things: the type of volumes (so different volumes have a different maximum per instance performance level), the type of the instance, and then finally the size of the instance. You'll find that only the most modern and largest instances support the highest levels of performance, and these per instance maximums will also be more than one volume can provide on its own, and so you're going to need multiple volumes to saturate this per instance performance level.
With IO1 volumes you can achieve a maximum of 260,000 IOPS per instance and a throughput of 7,500 MB per second—it means you'll need just over four volumes of performance operating at maximum to achieve this per instance limit. Oddly enough, IO2 is slightly less at 160,000 IOPS for an entire instance and 4,750 MB per second, and that's because AWS have split these new generation volume types—they've added Block Express, which can achieve 260,000 IOPS and 7,500 MB per second for an instance maximum.
So it's important that you understand that these are per instance maximums, so you need multiple volumes all operating together, and think of this as a performance cap for an individual EC2 instance. Now these are the maximums for the volume types, but you also need to take into consideration any maximums for the type and size of the instance, so all of these things need to align in order to achieve maximum performance.
Now keep these figures locked in your mind—it's not so much about the exact numbers but having a good idea about the levels of performance that you can achieve with GP2 or GP3 and then IO1, IO2, and IO2 Block Express will really help you in real-world situations and in the exam. Instance store volumes, which we're going to be covering elsewhere in this section, can achieve even higher performance levels, but this comes with a serious limitation in that it's not persistent—but more on that soon.
Now as a comparison, the per instance maximums for GP2 and GP3 is 260,000 IOPS and 7,000 MB per second per instance. Again don't focus too much on the exact numbers, but you need to have a feel for the ranges that these different types of storage volumes occupy versus each other and versus instance store.
Now you'll be using provisioned IOPS SSD for anything which needs really low latency or sub-millisecond latency, consistent latency, and higher levels of performance. One common use case is when you have smaller volumes but need super high performance, and that's only achievable with IO1, IO2, and IO2 Block Express.
Now that's everything that I wanted to cover in this lesson—again if you're doing the SysOps or Developer streams there's going to be a demo lesson where you'll experience the storage performance levels. For the Architecture stream this theory is enough.
At this point though, thanks for watching, that's everything I wanted to cover—go ahead and complete the video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about two volume types available within AWS GP2 and GP3. Now GP2 is the default general purpose SSD based storage provided by EBS, and GP3 is a newer storage type which I want to include because I expect it to feature on all of the exams very soon. Now let's just jump in and get started.
General Purpose SSD storage provided by EBS was a game changer when it was first introduced; it's high performance storage for a fairly low price. Now GP2 was the first iteration and it's what I'm going to be covering first because it has a simple but initially difficult to understand architecture, so I want to get this out of the way first because it will help you understand the different storage types.
When you first create a GP2 volume it can be as small as 1 GB or as large as 16 TB, and when you create it the volume is created with an I/O credit allocation. Think of this like a bucket. So an I/O is one input output operation, and an I/O credit is a 16 kb chunk of data. So an I/O is one chunk of 16 kilobytes in one second; if you're transferring a 160 kb file that represents 10 I/O blocks of data—so 10 blocks of 16 kb—and if you do that all in one second that's 10 credits in one second, so 10 I/Ops.
When you aren't using the volume much you aren't using many I/Ops and you aren't using many credits, but during periods of high disc load you're going to be pushing a volume hard and because of that it's consuming more credits—for example during system boots or backups or heavy database work. Now if you have no credits in this I/O bucket you can't perform any I/O on the disc.
The I/O bucket has a capacity of 5.4 million I/O credits, and it fills at the baseline performance rate of the volume. So what does this mean? Well, every volume has a baseline performance based on its size with a minimum—so streaming into the bucket at all times is a 100 I/O credits per second refill rate. This means as an absolute minimum regardless of anything else you can consume 100 I/O credits per second which is 100 I/Ops.
Now the actual baseline rate which you get with GP2 is based on the volume size—you get 3 I/O credits per second per GB of volume size. This means that a 100 GB volume gets 300 I/O credits per second refilling the bucket. Anything below 33.33 recurring GB gets this 100 I/O minimum, and anything above 33.33 recurring gets 3 times the size of the volume as a baseline performance rate.
Now you aren't limited to only consuming at this baseline rate—by default GP2 can burst up to 3000 I/Ops so you can do up to 3000 input output operations of 16 kb in one second, and that's referred to as your burst rate. It means that if you have heavy workloads which aren't constant you aren't limited by your baseline performance rate of 3 times the GB size of the volume, so you can have a small volume which has periodic heavy workloads and that's OK.
What's even better is that the credit bucket it starts off full—so 5.4 million I/O credits—and this means that you could run it at 3000 I/Ops, so 3000 I/O per second for a full 30 minutes, and that assumes that your bucket isn't filling up with new credits which it always is. So in reality you can run at full burst for much longer, and this is great if your volumes are used initially for any really heavy workloads because this initial allocation is a great buffer.
The key takeaway at this point is if you're consuming more I/O credits than the rate at which your bucket is refilling then you're depleting the bucket—so if you burst up to 3000 I/Ops and your baseline performance is lower then over time you're decreasing your credit bucket. If you're consuming less than your baseline performance then your bucket is replenishing, and one of the key factors of this type of storage is the requirement that you manage all of the credit buckets of all of your volumes, so you need to ensure that they're staying replenished and not depleting down to zero.
Now because every volume is credited with 3 I/O credits per second for every GB in size, volumes which are up to 1 TB in size they'll use this I/O credit architecture, but for volumes larger than 1 TB they will have a baseline equal to or exceeding the burst rate of 3000—and so they will always achieve their baseline performance as standard; they don't use this credit system. The maximum I/O per second for GP2 is currently 16000, so any volumes above 5.33 recurring TB in size achieves this maximum rate constantly.
GP2 is a really flexible type of storage which is good for general usage—at the time of creating this lesson it's the default but I expect that to change over time to GP3 which I'm going to be talking about next. GP2 is great for boot volumes, for low latency interactive applications or for dev and test environments—anything where you don't have a reason to pick something else. It can be used for boot volumes and as I've mentioned previously it is currently the default; again over time I expect GP3 to replace this as it's actually cheaper in most cases but more on this in a second.
You can also use the elastic volume feature to change the storage type between GP2 and all of the others, and I'll be showing you how that works in an upcoming lesson if you're doing the CIS Ops or developer associate courses. If you're doing the architecture stream then this architecture theory is enough.
At this point I want to move on and explain exactly how GP3 is different. GP3 is also SSD based but it removes the credit bucket architecture of GP2 for something much simpler. Every GP3 volume regardless of size starts with a standard 3000 IOPS—so 3000 16 kB operations per second—and it can transfer 125 MB per second. That’s standard regardless of volume size, and just like GP2 volumes can range from 1 GB through to 16 TB.
Now the base price for GP3 at the time of creating this lesson is 20% cheaper than GP2, so if you only intend to use up to 3000 IOPS then it's a no brainer—you should pick GP3 rather than GP2. If you need more performance then you can pay for up to 16000 IOPS and up to 1000 MB per second of throughput, and even with those extras generally it works out to be more economical than GP2.
GP3 offers a higher max throughput as well so you can get up to 1000 MB per second versus the 250 MB per second maximum of GP2—so GP3 is just simpler to understand for most people versus GP2 and I think over time it's going to be the default. For now though at the time of creating this lesson GP2 is still the default.
In summary GP3 is like GP2 and IO1—which I'll cover soon—had a baby; you get some of the benefits of both in a new type of general purpose SSD storage. Now the usage scenarios for GP3 are also much the same as GP2—so virtual desktops, medium sized databases, low latency applications, dev and test environments and boot volumes.
You can safely swap GP2 to GP3 at any point but just be aware that for anything above 3000 IOPS the performance doesn't get added automatically like with GP2 which scales on size. With GP3 you would need to add these extra IOPS which come at an extra cost and that's the same with any additional throughput—beyond the 125 MB per second standard it's an additional extra, but still even including those extras for most things this storage type is more economical than GP2.
At this point that's everything that I wanted to cover about the general purpose SSD volume types in this lesson—go ahead, complete the lesson and then when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to quickly step through the basics of the Elastic Block Store service known as EBS. You'll be using EBS directly or indirectly, constantly as you make use of the wider AWS platform and as such you need to understand what it does, how it does it and the product's limitations. So let's jump in and get started straight away as we have a lot to cover.
EBS is a service which provides block storage. Now you should know what that is by now — it's storage which can be addressed using block IDs. So EBS takes raw physical disks and it presents an allocation of those physical disks and this is known as a volume and these volumes can be written to or read from using a block number on that volume.
Now volumes can be unencrypted or you can choose to encrypt the volume using KMS and I'll be covering that in a separate lesson. Now you see two instances — when you attach a volume to them they see a block device, a raw storage and they can use this to create a file system on top of it such as EXT3, EXT4 or XFS and many more in the case of Linux or alternatively NTFS in the case of Windows.
The important thing to grasp is that EBS volumes appear just like any other storage device to an EC2 instance. Now storage is provisioned in one availability zone — I can't stress enough the importance of this — EBS in one availability zone is different than EBS in another availability zone and different from EBS in another AZ in another region. EBS is an availability zone service — it's separate and isolated within that availability zone. It's also resilient within that availability zone so if a physical storage device fails there's some built-in resiliency but if you do have a major AZ failure then the volumes created within that availability zone will likely fail as will instances also in that availability zone.
Now with EBS you create a volume and you generally attach it to one EC2 instance over a storage network. With some storage types you can use a feature called Multi-Attach which lets you attach it to multiple EC2 instances at the same time and this is used for clusters — but if you do this the cluster application has to manage it so you don't overwrite data and cause data corruption by multiple writes at the same time.
You should by default think of EBS volumes as things which are attached to one instance at a time but they can be detached from one instance and then reattached to another. EBS volumes are not linked to the instance lifecycle of one instance — they're persistent. If an instance moves between different EC2 hosts then the EBS volume follows it. If an instance stops and starts or restarts the volume is maintained. An EBS volume is created, it has data added to it and it's persistent until you delete that volume.
Now even though EBS is an availability zone based service you can create a backup of a volume into S3 in the form of a snapshot. Now I'll be covering these in a dedicated lesson but snapshots in S3 are now regionally resilient so the data is replicated across availability zones in that region and it's accessible in all availability zones. So you can take a snapshot of a volume in availability zone A and when you do so EBS stores that data inside a portion of S3 that it manages and then you can use that snapshot to create a new volume in a different availability zone — for example availability zone B — and this is useful if you want to migrate data between availability zones.
Now don't worry I'll be covering how snapshots work in detail including a demo later in this section — for now I'm just introducing them. EBS can provision volumes based on different physical storage types — SSD based, high performance SSD and volumes based on mechanical disks — and it can also provision different sizes of volumes and volumes with different performance profiles — all things which I'll be covering in the upcoming lessons. For now again this is just an introduction to the service.
The last point which I want to cover about EBS is that you'll build using a gigabyte per month metric — so the price of one gig for one month would be the same as two gig for half a month and the same as half a gig for two months. Now there are some extras for certain types of volumes for certain enhanced performance characteristics but I'll be covering that in the dedicated lessons which are coming up next.
For now before we finish this service introduction let's take a look visually at how this architecture fits together. So we're going to start with two regions — in this example that's US-EAST-1 and AP-SOUTH EAST-2 — and then in those regions we've got some availability zones — AZA and AZB — and then another availability zone in AP-SOUTH EAST 2 and then finally the S3 service which is running in all availability zones in both of those regions.
Now EBS, as I keep stressing and I will stress this more, is availability zone based — so in the cut-down example which I'm showing in US-EAST-1 you've got two availability zones and so two separate deployments of EBS, one in each availability zone — and that's just the same architecture as you have with EC2 — you have different sets of EC2 hosts in every availability zone.
Now visually let's say that you have an EC2 instance in availability zone A — you might create an EBS volume within that same availability zone and then attach that volume to the instance — so critically both of these are in the same availability zone. You might have another instance which this time has two volumes attached to it and over time you might choose to detach one of those volumes and then reattach it to another instance in the same availability zone — and that's doable because EBS volumes are separate from EC2 instances — it's a separate product with separate life cycles.
Now you can have the same architecture in availability zone B where volumes can be created and then attached to instances in that same availability zone. What you cannot do — and I'm stressing this for the 57th time (small print: it might not actually be 57 but it's close) — what I'm stressing is that you cannot communicate cross availability zone with storage — so the instance in availability zone B cannot communicate with and so logically cannot attach to any volumes in availability zone A — it's an availability zone service so no cross AZ attachments are possible.
Now EBS replicates data within an availability zone so the data on a volume — it's replicated across multiple physical devices in that AZ — but, and this is important again, the failure of an entire availability zone is going to impact all volumes within that availability zone. Now to resolve that you can snapshot volumes to S3 and this means that the data is now replicated as part of that snapshot across AZs in that region — so that gives you additional resilience and it also gives you the ability to create an EBS volume in another availability zone from this snapshot.
You can even copy the snapshot to another AWS region — in this example AP - Southeastern -2 — and once you've copied the snapshot it can be used in that other region to create a volume and that volume can then be attached to an EC2 instance in that same availability zone in that region.
So that at a high level is the architecture of EBS. Now depending on what course you're studying there will be other areas that you need to deep dive on — so over the coming section of the course we're going to be stepping through the features of EBS which you'll need to understand and these will differ depending on the exam — but you will be learning everything you need for the particular exam that you're studying for. At this point that's everything I wanted to cover so go ahead finish this lesson and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. Over the next few lessons and the wider course, we'll be covering storage a lot, and the exam expects you to know the appropriate type of storage to pick for a given situation. So before we move on to the AWS-specific storage lessons, I wanted to quickly do a refresher. So let's get started.
Let's start by covering some key storage terms. First is direct attached or local attached storage. This is storage, so physical disks, which are connected directly to a device, so a laptop or a server. In the context of EC2, this storage is directly connected to the EC2 hosts and it's called the instance store. Directly attached storage is generally super fast because it's directly attached to the hardware, but it suffers from a number of problems. If the disk fails, the storage can be lost. If the hardware fails, the storage can be lost. If an EC2 instance moves between hosts, the storage can be lost.
The alternative is network attached storage, which is where volumes are created and attached to a device over the network. In on-premises environments, this uses protocols such as iSCSI or Fiber Channel. In AWS, it uses a product called Elastic Blockstore known as EBS. Network storage is generally highly resilient and is separate from the instance hardware, so the storage can survive issues which impact the EC2 host.
The next term is ephemeral storage and this is just temporary storage, storage which doesn't exist long-term, storage that you can't rely on to be persistent. And persistent storage is the next point, storage which exists as its own thing. It lives on past the lifetime of the device that it's attached to, in this case, EC2 instances. So an example of ephemeral storage, so temporary storage, is the instance store, so the physical storage that's attached to an EC2 host. This is ephemeral storage. You can't rely on it, it's not persistent. An example of persistent storage in AWS is the network attached storage delivered by EBS.
Remember that, it's important for the exam. You will get questions testing your knowledge of which types of storage are ephemeral and persistent. Okay, next I want to quickly step through the three main categories of storage available within AWS. The category of storage defines how the storage is presented either to you or to a server and also what it can be used for.
Now the first type is block storage. With block storage, you create a volume, for example, inside EBS and the red object on the right is a volume of block storage and a volume of block storage has a number of addressable blocks, the cubes with the hash symbol. It could be a small number of blocks or a huge number, that depends on the size of the volume, but there's no structure beyond that. Block storage is just a collection of addressable blocks presented either logically as a volume or as a blank physical hard drive.
Generally when you present a unit of block storage to a server, so a physical disk or a volume, on top of this, the operating system creates a file system. So it takes the raw block storage, it creates a file system on top of this, for example, NTFS or EXT3 or many other different types of file systems and then it mounts that, either as a C drive in Windows operating systems or the root volume in Linux.
Now block storage comes in the form of spinning hard disks or SSDs, so physical media that's block storage or delivered as a logical volume, which is itself backed by different types of physical storage, so hard disks or SSDs. In the physical world, network attached storage systems or storage area network systems provide block storage over the network and a simple hard disk in a server is an example of physical block storage. The key thing is that block storage has no inbuilt structure, it's just a collection of uniquely addressable blocks. It's up to the operating system to create a file system and then to mount that file system and that can be used by the operating system.
So with block storage in AWS, you can mount a block storage volume, so you can mount an EBS volume and you can also boot off an EBS volume. So most EC2 instances use an EBS volume as their boot volume and that's what stores the operating system, and that's what's used to boot the instance and start up that operating system.
Now next up, we've got file storage and file storage in the on-premises world is provided by a file server. It's provided as a ready-made file system with a structure that's already there. So you can take a file system, you can browse to it, you can create folders and you can store files on there. You access the files by knowing the folder structure, so traversing that structure, locating the file and requesting that file.
You cannot boot from file storage because the operating system doesn't have low-level access to the storage. Instead of accessing tiny blocks and being able to create your own file system as the OS wants to, with file storage, you're given access to a file system normally over the network by another product. So file storage in some cases can be mounted, but it cannot be used for booting. So inside AWS, there are a number of file storage or file system-style products. And in a lot of cases, these can be mounted into the file system of an operating system, but they can't be used to boot.
Now lastly, we have object storage and this is a very abstract system where you just store objects. There is no structure, it's just a flat collection of objects. And an object can be anything, it can have attached metadata, but to retrieve an object, you generally provide a key and in return for providing the key and requesting to get that object, you're provided with that object's value, which is the data back in return.
And objects can be anything, there can be binary data, they can be images, they can be movies, they can be cat pictures, like the one in the middle here that we've got of whiskers. If they can be any data really that's stored inside an object. The key thing about object storage though is it is just flat storage. It's flat, it doesn't have a structure. You just have a container. In AWS's case, it's S3 and inside that S3 bucket, you have objects. But the benefits of object storage is that it's super scalable. It can be accessed by thousands or millions of people simultaneously, but it's generally not mountable inside a file system and it's definitely not bootable.
So that's really important, you understand the differences between these three main types of storage. So generally in the on-premises world and in AWS, if you want to utilize storage to boot from, it will be block storage. If you want to utilize high performance storage inside an operating system, it will also be block storage. If you want to share a file system across multiple different servers or clients or have them accessed by different services, that can often be file storage. If you want large access to read and write object data at scale. So if you're making a web scale application, you're storing the biggest collection of cat pictures in the world, that is ideal for object storage because it is almost infinitely scalable.
Now let's talk about storage performance. There are three terms which you'll see when anyone's referring to storage performance. There's the IO or block size, the input output operations per second, pronounced IOPS, and then the throughput. So the amount of data that can be transferred in a given second, generally expressed in megabytes per second.
Now these things cannot exist in isolation. You can think of IOPS as the speed at which the engine of a race car runs at, the revolutions per second. You can think of the IO or block size as the size of the wheels of the race car. And then you can think of the throughput as the end speed of the race car. So the engine of a race car spins at a certain revolutions, whether you've got some transmission that affect that slightly, but that transmission, that power is delivered to the wheels and based on their size, that causes you to go at a certain speed.
In theory in isolation, if you increase the size of the wheels or increase the revolutions of the engine, you would go faster. For storage and the analogy I just provided, they're all related to each other. The possible throughput a storage system can achieve is the IO or the block size multiplied by the IOPS.
As we talk about these three performance aspects, keep in mind that a physical storage device, a hard disk or an SSD, isn't the only thing involved in that chain of storage. When you're reading or writing data, it starts with the application, then the operating system, then the storage subsystem, then the transport mechanism to get the data to the disk, the network or the local storage bus, such as SATA, and then the storage interface on the drive, the drive itself and the technology that the drive uses. There are all components of that chain. Any point in that chain can be a limiting factor and it's the lowest common denominator of that entire chain that controls the final performance.
Now IO or block size is the size of the blocks of data that you're writing to disk. It's expressed in kilobytes or megabytes and it can range from pretty small sizes to pretty large sizes. An application can choose to write or read data of any size and it will either take the block size as a minimum or that data can be split up over multiple blocks as it's written to disk. If your storage block size is 16 kilobytes and you write 64 kilobytes of data, it will use four blocks.
Now IOPS measures the number of IO operations the storage system can support in a second. So how many reads or writes that a disk or a storage system can accommodate in a second? Using the car analogy, it's the revolutions per second that the engine can generate given its default wheel size. Now certain media types are better at delivering high IOPS versus other media types and certain media types are better at delivering high throughput versus other media types. If you use network storage versus local storage, the network can also impact how many IOPS can be delivered. Higher latency between a device that uses network storage and the storage itself can massively impact how many operations you can do in a given second.
Now throughput is the rate of data a storage system can store on a particular piece of storage, either a physical disk or a volume. Generally this is expressed in megabytes per second and it's related to the IO block size and the IOPS but it could have a limit of its own. If you have a storage system which can store data using 16 kilobyte block sizes and if it can deliver 100 IOPS at that block size, then it can deliver a throughput of 1.6 megabytes per second. If your application only stores data in four kilobyte chunks and the 100 IOPS is a maximum, then that means you can only achieve 400 kilobytes a second of throughput.
Achieving the maximum throughput relies on you using the right block size for that storage vendor and then maximizing the number of IOPS that you pump into that storage system. So all of these things are related. If you want to maximize your throughput, you need to use the right block size and then maximize the IOPS. And if either of these three are limited, it can impact the other two. With the example on screen, if you were to change the 16 kilobyte block size to one meg, it might seem logical that you can now achieve 100 megabytes per second. So one megabyte times 100 IOPS in a second, 100 megabytes a second, but that's not always how it works. A system might have a throughput cap, for example, or as you increase the block size, the IOPS that you can achieve might decrease.
As we talk about the different AWS types of storage, you'll become much more familiar with all of these different values and how they relate to each other. So you'll start to understand the maximum IOPS and the maximum throughput levels that different types of storage in AWS can deliver. And you might face exam questions where you need to answer what type of storage you will pick for a given level of performance demands. So it's really important as we go through the next few lessons that you pay attention to these key levels that I'll highlight.
It might be, for example, that a certain type of storage can only achieve 1000 IOPS or 64000 IOPS. Or it might be that certain types of storage cap at certain levels of throughput. And you need to know those values for the exam so that you can know when to use a certain type of storage.
Now, this is a lot of theory and I'm talking in the abstract and I'm mindful that I don't want to make this boring and it probably won't sink in and you won't start to understand it until we focus on some AWS specifics. So I am going to end this lesson here. I wanted to give you the foundational understanding, but over the next few lessons, you'll start to be exposed to the different types of storage available in AWS and you will start to paint a picture of when to pick particular types of storage versus others.
So with that being said, that's everything I wanted to cover. I know this has been abstract, but it will be useful if you do the rest of the lessons in this section. I promise you this is going to be really valuable for the exam. So thanks for watching. Go ahead and complete the video. When you're ready, you can join me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back—this is part two of this lesson, and we're going to continue immediately from the end of part one, so let's get started.
Now, this is an overview of all of the different categories of instances, and then for each category, the most popular or current generation types that are available; I created this with the hope that it will help you retain this information.
This is the type of thing that I would generally print out or keep an electronic copy of and refer to constantly as we go through the course—by doing so, whenever we talk about particular size and type and generation of instance, if you refer to the details in the notes column, you'll be able to start making a mental association between the type and then what additional features you get.
So, for example, if we look at the general purpose category, we've got three main entries in that category: we've got the A1 and M6G types, and these are a specific type of instance that are based on ARM processors—so the A1 uses the AWS-designed Graviton ARM processor, and the M6G uses the generation 2, so Graviton 2 ARM-based processor.
And using ARM-based processors, as long as you've got operating systems and applications that can run under the architecture, they can be very efficient—so you can use smaller instances with lower cost and achieve really great levels of performance.
The T3 and T3A instance types are burstable instances, so the assumption with those types of instances is that your normal CPU load will be fairly low, and you have an allocation of burst credits that allows you to burst up to higher levels occasionally but then return to that normally low CPU level.
So this type of instance—T3 and T3A—are really good for machines which have low normal loads with occasional bursts, and they're a lot cheaper than the other types of general purpose instances.
Then we've got M5, M5A, and M5N—so M5 is your starting point, M5A uses the AMD architecture whereas normal M5s just use Intel, and these are your steady-state general instances.
So if you don't have a burst requirement and you're running a certain type of application server which requires consistent steady-state CPU, then you might use the M5 type—maybe a heavily used Exchange email server that runs normally at 60% CPU utilization might be a good candidate for M5.
But if you've got a domain controller or an email relay server that normally runs maybe at 2%, 3% with occasional bursts up to 20%, 30%, or 40%, then you might want to run a T-type instance.
Now, not to go through all of these in detail, we've got the compute optimized category with the C5 and C5N, and they go for media encoding, scientific modeling, gaming servers, general machine learning.
For memory optimized, we start off with R5 and R5A; if you want to use really large in-memory applications, you've got the X1 and the X1E; if you want the highest memory of all A-to-the-U instances, you've got the high memory series; and you've got the Z1D, which comes with large memory and NVMe storage.
Then, Accelerated Computing—these are the ones that come with these additional capabilities, so the P3 type and G4 type come with different types of GPUs: the P type is great for parallel processing and machine learning, while the G type is kind of okay for machine learning and much better for graphics-intensive requirements.
You've got the F1 type, which comes with field programmable gate arrays, which is great for genomics, financial analysis, and big data—anything where you want to program the hardware to do specific tasks.
You've got the Inf1 type, which is relatively new, custom-designed for machine learning—so recommendation forecasting, analysis, voice conversation, anything machine learning-related, look at using that type.
And then, storage-optimized instances—these come with high-speed local storage, and depending on the type you pick, you can get high throughput or maximum I/O or somewhere in between.
So, keep this somewhere safe, print it out, keep it electronically, and as we go through the course and use the different types of instances, refer to this and start making the mental association between what a category is, what instance types are in that category, and then what benefits they provide.
Now again, don't worry about memorizing all of this for the exam—you don't need it—I'll draw out anything specific that you need as we go through the course, but just try to get a feel for which letters are in which categories.
If that's the minimum that you can do—if I can give you a letter like the T type, or the C type, or the R type—and you can try and understand the mental association with which category that goes into, that will be a great step.
And there are ways we can do this—we can make these associations—so C stands for compute, R stands for RAM (which is a way for describing memory), we've got I which stands for I/O, D which stands for dense storage, G which stands for GPU, P which stands for parallel processing; there's lots of different mind tricks and mental associations that we can do, and as we go through the course, I'll try and help you with that.
But as a minimum, either print this out or store it somewhere safe and refer to it as we go through the course.
The key thing to understand though is how picking an instance type is specific to a particular type of computing scenario—so if you've got an application that requires maximum CPU, look at compute optimized; if you need memory, look at memory optimized; if you've got a specific type of acceleration, look at accelerated computing; start off in the general purpose instance types and then go out from there as you've got a particular requirement to.
Now before we finish up, I did want to demonstrate two really useful sites that I refer to constantly—I'll include links to both of these in the lesson text.
The first one is the Amazon documentation site for Amazon EC2 instance types—this gives you a follow-up view of all the different categories of EC2 instances.
You can look in a category, a particular family and generation of instance—so T3—and then in there you can see the use cases that this is suited to, any particular features, and then a list of each instance size and exactly what allocation of resources that you get and then any particular notes that you need to be aware of.
So this is definitely something you should refer to constantly, especially if you're selecting instances to use for production usage.
This other website is something similar—it’s EC2instances.info—and it provides a really great sortable list which can be filtered and adjusted with different attributes and columns, which give you an overview of exactly what each instance provides.
So you can either search for a particular type of instance—maybe a T3—and then see all the different sizes and capabilities of T3; as well as that, you can see the different costings for those instance types—so Linux on-demand, Linux reserved, Windows on-demand, Windows reserved—and we’ll talk about what this reserved column is later in the course.
You can also click on columns and show different data for these different instance types, so if I scroll down, you can see which offer EBS optimization, you can see which operating systems these different instances are compatible with, and you've got a lot of options to manipulate this data.
I find this to be one of the most useful third-party sites—I always refer back to this when I’m doing any consultancy—so this is a really great site.
And again, it will go into the lesson text, so definitely as you’re going through the course, experiment and have a play around with this data, and just start to get familiar with the different capabilities of the different types of EC2 instances.
With that being said, that’s everything I wanted to cover in this lesson—you’ve done really well, and there’s been a lot of theory, but it will come in handy in the exam and real-world usage.
So go ahead, complete this video, and when you’re ready, you can join me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I'm going to talk about the various different types of EC2 instances. I've described an EC2 instance before as an operating system plus an allocation of resources. Well, by selecting an instance type and size, you have granular control over what that resource configuration is, picking appropriate resource amounts and instance capabilities to mean the difference between a well-performing system and one which causes a bad customer experience.
Don't expect this lesson though to give you all the answers; understanding instance types is something which will guide your decision-making process. Given a situation, two AWS people might select two different instance types for the same implementation. The key takeaway from this lesson will be that you don't make any bad decisions and you have an awareness of the strengths and weaknesses of the different types of instances.
Now, I've seen this occasionally feature on the exam in a form where you're presented with a performance problem and one answer is to change the instance type, so to minimum with this lesson, I'd like you to be able to answer that type of question. So, know for example whether a C type instance is better in a certain situation than an M type instance. If that's what I want to achieve, we've got a lot to get through, so let's get started.
At a really high level, when you choose an EC2 instance type, you're doing so to influence a few different things. First, logically, the raw amount of resources that you get, so that's virtual CPU, memory, local storage capacity and the type of that storage. But beyond the raw amount, it's also the ratios—some type of instances give you more of one and less of the other; instance types suited to compute applications, for instance, might give you more CPU and less memory for a given dollar spend, and an instance designed for in-memory caching might be the reverse—they prioritize memory and give you lots of that for every dollar that you spend.
Picking instance types and sizes, of course, influences the raw amount that you pay per minute, so you need to keep that in mind. I'm going to demonstrate a number of tools that will help you visualize how much something's going to cost, as well as what features you get with it, so look at that at the end of the lesson.
The instance type also influences the amount of network bandwidth for storage and data networking capability that you get, so this is really important. When we move on to talking about elastic block store, for example, that's a network-based storage product in AWS, and so for certain situations, you might provision volumes with a really high level of performance, but if you don't select an instance appropriately and pick something that doesn't provide enough storage network bandwidth, then the instance itself will be the limiting factor.
So, you need to make sure you're aware of the different types of performance that you'll get from the different instances. Picking an instance type also influences the architecture of the hardware that the instance has run on and potentially the vendor, so you might be looking at the difference between an ARM architecture or an X86 architecture, and you might be picking an instance type that provides Intel-based CPUs or AMD CPUs. Instance type selection can influence in a very nuanced and granular way exactly what hardware you get access to.
Picking an appropriate type of instance also influences any additional features and capabilities that you get with that instance, and this might be things such as GPUs for graphics processing or FPGAs, which are field-programmable gate arrays—and if you think of these as a special type of CPU that you can program the hardware to perform exactly how you want, it's a super customizable piece of compute hardware. And so, certain types of instances come up with these additional capabilities, so it might come with an allocation of GPUs or it might come with a certain capacity of FPGAs, and some instance types don't come with either—you need to learn which to pick for a given type of workload.
EC2 instance is grouped into five main categories which help you select an instance type based on a certain type of workload, but we've got five main categories. The first is general purpose, and this is and always should be your starting point; instances which fall into this category are designed for your default steady-state workloads, they've got fairly even resource ratios, so generally assigned in an appropriate way.
So, for a given type of workload, you get an appropriate amount of CPU and a certain amount of memory which matches that amount of CPU, so instances in the general purpose category should be used as your default and you only move away from that if you've got a specific workload requirement.
We've also got the compute optimized category, and instances that are in this category are designed for media processing, high-performance computing, scientific modeling, gaming, machine learning, and they provide access to the latest high-performance CPUs, and they generally offer a ratio and more CPU is offered in memory for a given price point.
The memory optimized category is logically the inverse of this, so offering large memory allocations for a given dollar or CPU amount; this category is ideal for applications which need to work with large in-memory data sets, maybe in-memory caching or some other specific types of database workloads.
The accelerated computing category is where these additional capabilities come into play, such as dedicated GPUs for high-scale parallel processing and modeling, or the custom programmable hardware, such as FPGAs; now, these are niche, but if you're in one of the situations where you need them, then you know you need them, so when you've got specific niche requirements, the instance type you need to select is often in the accelerated computing category.
Finally, there's the storage optimized category, and instances in this category generally provide large amounts of superfast local storage, either designed for high sequential transfer rates or to provide massive amounts of IO operations per second, and this category is great for applications with serious demands on sequential and random IO, so things like data warehousing, Elasticsearch, and certain types of analytic workloads.
Now, one of the most confusing things about EC2 is the naming scheme of the instance types—this is an example of a type of EC2 instance; while it might initially look frustrating, once you understand it, it's not that difficult to understand.
So, while our friend Bob is a bit frustrated at understanding difficulty, understanding exactly what this means, by the end of this part of the lesson, you will understand how to decode EC2 instance types. The whole thing, end to end, so R5, DN, .8x, large—this is known as the instance type; the whole thing is the instance type.
If a member of your operations team asks you what instance you need or what instance type you need, if you use the full instance type, you unambiguously communicate exactly what you need—it's a mouthful to say R5, DN, .8x, large, but it's precise and we like precision, so when in doubt, always give the full instance type an answer to any question.
The letter at the start is the instance family—now, there are lots of examples of this: the T family, the M family, the I family, and the R family; there's lots more, but each of these are designed for a specific type or types of computing. Nobody expects you to remember all the details of all of these different families, but if you can start to try to remember the important ones—I'll mention these as we go through the course—then it will put you in a great position in the exam.
If you do have any questions where you need to identify if an instance type is used appropriately or not, as we go through the course and I give demonstrations which might be using different instance families, I will be giving you an overview of their strengths and their weaknesses.
The next part is the generation, so the number five in this case is the generation; AWS iterate often. So, if you see instance type starting with R5 or C4 as two examples, the C or the R, as you now know, is the instance family, and the number is the generation—so the C4, for example, is the fourth generation of the C family of instance.
That might be the current generation, but then AWS come along and replace it with the C5, which is generation five, the fifth generation, which might bring with it better hardware and better price to performance. Generally, with AWS, always select the most recent generation—it almost always provides the best price to performance option.
The only real reason is not to immediately use the latest generation is if it's not available in your particular region or if your business has fairly rigorous test processes that need to be completed before you get the approval to use a particular new type of instance.
So, that's the R-part covered, which is the family, and the five-part covered, which is the generation. Now, across to the other side, we've got the size—so, in this case, 8x large or 8x large, this is the instance size.
Within a family and a generation, there are always multiple sizes of that family and generation, which determine how much memory and how much CPU the instance is allocated with. Now, there's a logical and often linear relationship between these sizes, so depending on the family and generation, the starting point can be anywhere as small as the nano.
Next to the nano, there's micro, then small, then medium, large, extra large, 2x large, 4x large, 8x large, and so on. Now, keep in mind, there's often a price premium towards the higher end, so it's often better to scale systems by using a larger number of smaller instance sizes—but more on that later when we talk about high availability and scaling.
Just be aware, as far as this section of the course goes, that for a given instance family and generation, you're able to select from multiple different sizes.
Now, the bit which is in the middle, this can vary—there might be no letters between the generation and size, but there's often a collection of letters which denote additional capabilities. Common examples include a lowercase a, which signifies AMD CPU, lowercase b, which signifies NVMe storage, lowercase n, which signifies network optimized, and lowercase e, for extra capacity, which could be RAM or storage.
So, these additional capabilities are not things that you need to memorize, but as you get experience using AWS, you should definitely try to mentally associate them in your mind with what extra capabilities they provide—because time is limited in an exam, the more that you can commit to memory and know instinctively, the better you'll be.
Okay, so this is the end of part one of this lesson. It was getting a little bit on the long side, and so I wanted to add a break. It's an opportunity just to take a rest or grab a coffee—part two will be continuing immediately from the end of part one, so go ahead, complete the video, and when you're ready, join me in part two.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, now that we've covered virtualization at a high level, I want to focus on the architecture of the EC2 product in more detail. EC2 is one of the services you'll use most often in AWS since one which features on a lot of exam questions, so let's get started.
First thing, let's cover some key, high level architectural points about EC2. EC2 instances are virtual machines, so this means an operating system plus an allocation of resources such as virtual CPU, memory, potential some local storage, maybe some network storage, and access to other hardware such as networking and graphics processing units. EC2 instances run on EC2 hosts, and these are physical servers hardware which AWS manages. These hosts are either shared hosts or dedicated hosts.
Shared hosts are hosts which are shared across different AWS customers, so you don't get any ownership of the hardware and you pay for the individual instances based on how long you run them for and what resources they have allocated. It's important to understand, though, that every customer when using shared hosts are isolated from each other, so there's no visibility of it being shared, there's no interaction between different customers, even if you're using the same shared host, and shared hosts are the default.
With dedicated hosts, you're paying for the entire host, not the instances which run on it. It's yours, it's dedicated to your account, and you don't have to share it with any other customers. So if you pay for a dedicated host, you pay for that entire host, you don't pay for any instances running on it, and you don't share it with other AWS customers.
EC2 is an availability zone resilient service. The reason for this is that hosts themselves run inside a single availability zone, so if that availability zone fails, the hosts inside that availability zone could fail, and any instances running on any hosts that fail will themselves fail. So as a solutions architect, you have to assume if an AZ fails, then at least some and probably all of the instances that are running inside that availability zone will also fail or be heavily impacted.
Now let's look at how this looks visually. So this is a simplification of the US East One region, I've only got two AZs represented, AZA and AZB, and in AZA, I've represented that I've got two subnet, subnet A and subnet B. Now inside each of these availability zones is an EC2 host. Now these EC2 hosts, they run within a single AZ, I'm going to keep repeating that because it's critical for the exam and you're thinking about EC2 in the exam.
Keep thinking about it being an AZ resilient service, if you see EC2 mentioned in an exam, see if you can locate the availability zone details because that might factor into the correct answer. Now EC2 hosts have some local hardware, logically CPU and memory, which you should be aware of, but also they have some local storage called the instance store. The instance store is temporary, if an instance is running on a particular host, depending on the type of the instance, it might be able to utilize this instance store, but if the instance moves off this host to another one, then that storage is lost.
And they also have two types of networking, storage networking and data networking. When instances are provisioned into a specific subnet within a VPC, what's actually happening is that a primary elastic network interface is provisioned in a subnet, which maps to the physical hardware on the EC2 host. Remember, subnets are also in one specific availability zone. Instances can have multiple network interfaces, even in different subnets, as long as they're in the same availability zone. Everything about EC2 is focused around this architecture, the fact that it runs in one specific availability zone.
Now EC2 can make use of remote storage so an EC2 host can connect to the elastic block store, which is known as EBS. The elastic block store service also runs inside a specific availability zone, so the service running inside availability zone A is different than the one running inside availability zone B, and you can't access them cross zone. EBS lets you allocate volumes and volumes of portions of persistent storage, and these can be allocated to instances in the same availability zone, so again, it's another area where the availability zone matters.
What I'm trying to do by keeping repeating availability zone over and over again is to paint a picture of a service which is very reliant on the availability zone that it's running in. The host is in an availability zone, the network is per availability zone, the persistent storage is per availability zone, even availability zone in AWS experiences major issues, it impacts all of those things.
Now an instance runs on a specific host, and if you restart the instance, it will stay on a host. Instances stay on a host until one of two things happen: firstly, the host fails or is taken down for maintenance for some reason by AWS; or secondly, if an instance is stopped and then started, and that's different than just restarting, so I'm focusing on an instance being stopped and then being started, so not just a restart. If either of those things happen, then an instance will be relocated to another host, but that host will also be in the same availability zone.
Instances cannot natively move between availability zones. Everything about them, their hardware, networking and storage is locked inside one specific availability zone. Now there are ways you can do a migration, but it essentially means taking a copy of an instance and creating a brand new one in a different availability zone, and I'll be covering that later in this section where I talk about snapshots and AMIs.
What you can never do is connect network interfaces or EBS storage located in one availability zone to an EC2 instance located in another. EC2 and EBS are both availability zone services, they're isolated, you cannot cross AZs with instances or with EBS volumes. Now instances running on an EC2 host share the resources of that host. And instances of different sizes can share a host, but generally instances of the same type and generation will occupy the same host.
And I'll be talking in much more detail about instance types and sizes and generations in a lesson that's coming up very soon. But when you think about an EC2 host, think that it's from a certain year and includes a certain class of processor and a certain type of memory and a certain type and configuration of storage. And instances are also created with different generations, different versions that you apply specific types of CPU memory and storage, so it's logical that if you provision two different types of instances, they may well end up on two different types of hosts.
So a host generally has lots of different instances from different customers of the same type, but different sizes. So before we finish up this lesson, I want to answer a question. That question is what's EC2 good for? So what types of situations might you use EC2 for? And this is equally valuable when you're evaluating a technical architecture while you're answering questions in the exam.
So first, EC2 is great when you've got a traditional OS and application compute need, so if you've got an application that requires to be running on a certain operating system at a certain runtime with certain configuration, maybe your internal technical staff are used to that configuration, or maybe your vendor has a certain set of support requirements, EC2 is a perfect use case for this type of scenario.
And it's also great for any long running compute needs. There are lots of other services inside AWS that provide compute services, but many of these have got runtime limits, so you can't leave these things running consistently for one year or two years. With EC2, it's designed for persistent, long running compute requirements. So if you have an application that runs constantly 24/7, 365, and needs to be running on a normal operating system, Linux or Windows, then EC2 is the default and obvious choice for this.
If you have any applications, which is server style applications, so traditional applications they expect to be running in an operating system, waiting for incoming connections, then again, EC2 is a perfect service for this. And it's perfect for any applications or services that need burst requirements or steady state requirements. There are different types of EC2 instances, which are suitable for low levels of normal loads with occasional bursts, as well as steady state load.
So again, if your application needs an operating system, and it's not bursty needs or consistent steady state load, then EC2 should be the first thing that you review. EC2 is also great for monolithic application stack, so if your monolithic application requires certain components, a stack, maybe a database, maybe some middleware, maybe other runtime based components, and especially if it needs to be running on a traditional operating system, EC2 should be the first thing that you look at.
And EC2 is also ideally suited for migrating application workloads, so application workloads, which expect a traditional virtual machine or server style environment, or if you're performing disaster recovery. So if you have existing traditional systems which run on virtual servers, and you want to provision a disaster recovery environment, then EC2 is perfect for that.
In general, EC2 tends to be the default compute service within AWS. There are lots of niche requirements that you might have, and if you do have those, there are other compute services such as the elastic container service or Lambda. But generally, if you've got traditional style workloads, or you're looking for something that's consistent, or if it requires an operating system, or if it's monolithic, or if you migrated into AWS, then EC2 is a great default first option.
Now in this section of the course, I'm covering the basic architectural components of EC2, so I'm gonna be introducing the basics and let you get some exposure to it, and I'm gonna be teaching you all the things that you'll need for the exam.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this first lesson of the EC2 section of the course, I want to cover the basics of virtualization as briefly as possible. EC2 provides virtualization as a service. It's an infrastructure as a service or I/O product. To understand all the value it provides and why some of the features work the way that they do, understanding the fundamentals of virtualization is essential. So that's what this lesson aims to do.
Now, I want to be super clear about one thing. This is an introduction level lesson. There's a lot more to virtualization than I can talk about in this brief lesson. This lesson is just enough to get you started, but I will include a lot of links in the lesson description if you want to learn more. So let's get started.
We do have a fair amount of theory to get through, but I promise when it comes to understanding how EC2 actually works, this lesson will be really beneficial. Virtualization is the process of running more than one operating system on a piece of physical hardware, a server. Before virtualization, the architecture looked something like this. A server had a collection of physical resources, so CPU and memory, network cards and maybe other logical devices such as storage. And on top of this runs a special piece of software known as an operating system.
That operating system runs with a special level of access to the hardware. It runs in privilege mode, or more specifically, a small part of the operating system runs in privilege mode, known as the kernel. The kernel is the only part of the operating system, the only piece of software on the server that's able to directly interact with the hardware. Some of the operating system doesn't need this privilege level of access, but some of it does. Now, the operating system can allow other software to run such as applications, but these run in user mode or unprivileged mode. They cannot directly interact with the hardware, they have to go through the operating system.
So if Bob or Julie are attempting to do something with an application, which needs to use the system hardware, that application needs to go through the operating system. It needs to make a system call. If anything but the operating system attempts to make a privileged call, so tries to interact with the hardware directly, the system will detect it and cause a system-wide error, generally crashing the whole system or at minimum the application. This is how it works without virtualization.
Virtualization is how this is changed into this. A single piece of hardware running multiple operating systems. Each operating system is separate, each runs its own applications. But there's a problem, CPU at least at this point in time, could only have one thing running as privileged. A privileged process member has direct access to the hardware. And all of these operating systems, if they're running in their unmodified state, they expect to be running on their own in a privileged state. They contain privileged instructions. And so trying to run three or four or more different operating systems in this way will cause system crashes.
Virtualization was created as a solution to this problem, allowing multiple different privileged applications to run on the same hardware. But initially, virtualization was really inefficient, because the hardware wasn't aware of it. Virtualization had to be done in software, and it was done in one of two ways. The first type was known as emulated virtualization or software virtualization. With this method, a host operating system still ran on the hardware and included additional capability known as a hypervisor. The software ran in privileged mode, and so it had full access to the hardware on the host server.
Now, around the multiple other operating systems, which we'll now refer to as guest operating systems, were wrapped a container of sorts called a virtual machine. Each virtual machine was an unmodified operating system, such as Windows or Linux, with a virtual allocation of resources such as CPU, memory and local disk space. Virtual machines also had devices mapped into them, such as network cards, graphics cards and other local devices such as storage. The guest operating systems believed these to be real. They had drivers installed, just like physical devices, but they weren't real hardware. They were all emulated, fake information provided by the hypervisor to make the guest operating systems believe that they were real.
The crucial thing to understand about emulator virtualization is that the guest operating systems still believe that they were running on real hardware, and so they still attempt to make privileged calls. They tried to take control of the CPU, they tried to directly read and write to what they think of as their memory and their disk, which are actually not real, they're just areas of physical memory and disk that have been allocated to them by the hypervisor. Without special arrangements, the system would at best crash, and at worst, all of the guests would be overriding each other's memory and disk areas.
So the hypervisor, it performs a process known as binary translation. Any privileged operations which the guests attempt to make, they're intercepted and translated on the fly in software by the hypervisor. Now, the binary translation in software is the key part of this. It means that the guest operating systems need no modification, but it's really, really slow. It can actually halve the speed of the guest operating systems or even worse. Emulated virtualization was a cool set of features for its time, but it never achieved widespread adoption for demanding workloads because of this performance penalty.
But there was another way that virtualization was initially handled, and this is called para-virtualization. With para-virtualization, the guest operating systems are still running in the same virtual machine containers with virtual resources allocated to them, but instead of the slow binary translation which is done by the hypervisor, another approach is used. Para-virtualization only works on a small subset of operating systems, operating systems which can be modified. Because with para-virtualization, there are areas of the guest operating systems which attempt to make privileged calls, and these are modified. They're modified to make them user calls, but instead of directly calling on the hardware, they're calls to the hypervisor called hypercalls.
So areas of the operating systems which would traditionally make privileged calls directly to the hardware, they're actually modified. So the source code of the operating system is modified to call the hypervisor rather than the hardware. So the operating systems now need to be modified specifically for the particular hypervisor that's in use. It's no longer just generic virtualization, the operating systems are modified for the particular vendor performing this para-virtualization. By modifying the operating system this way, and using para-virtual drivers in the operating system for network cards and storage, it means that the operating system became almost virtualization aware, and this massively improved performance. But it was still a set of software processors designed to trick the operating system and/or the hardware into believing that nothing had changed.
The major improvement in virtualization came when the physical hardware started to become virtualization aware. This allows for hardware virtualization, also known as hardware assisted virtualization. With hardware assisted virtualization, hardware itself has become virtualization aware. The CPU contains specific instructions and capabilities so that the hypervisor can directly control and configure this support, so the CPU itself is aware that it's performing virtualization. Essentially, the CPU knows that virtualization exists.
What this means is that when guest operating systems attempt to run any privileged instructions, they're trapped by the CPU, which knows to expect them from these guest operating systems, so the system as a whole doesn't halt. But these instructions can't be executed as is because the guest operating system still thinks that it's running directly on the hardware, and so they're redirected to the hypervisor by the hardware. The hypervisor handles how these are executed. And this means very little performance degradation over running the operating system directly on the hardware.
The problem, though, is while this method does help a lot, what actually matters about a virtual machine tends to be the input/output operation, so network transfer and disk I/O. The virtual machines, they have what they think is physical hardware, for example, a network card. But these cards are just logical devices using a driver, which actually connect back to a single physical piece of hardware which sits in the host. The hardware, everything is running on.
Unless you have a physical network card per virtual machine, there's always going to be some level of software getting in the way, and when you're performing highly transactional activities such as network I/O or disk I/O, this really impacts performance, and it consumes a lot of CPU cycles on the host.
The final iteration that I want to talk about is where the hardware devices themselves become virtualization aware, such as network cards. This process is called S-R-I-O-V, single root I/O virtualization. Now, I could talk about this process for hours about exactly what it does and how it works, because it's a very complex and feature-rich set of standards. But at a very high level, it allows a network card or any other add-on card to present itself, not just one single card, but almost a several mini-cards.
Because this is supported in hardware, these are fully unique cards, as far as the hardware is concerned, and these are directly presented to the guest operating system as real cards dedicated for its use. And this means no translation has to happen by the hypervisor. The guest operating system can directly use its card whenever it wants. Now, the physical card which supports S-R-I-O-V, it handles this process end-to-end. It makes sure that when the guest operating system is used, there are logical mini-network cards that they have physical access to the physical network connection when required.
In EC2, this feature is called enhanced networking, and it means that the network performance is massively improved. It means faster speeds. It means lower latency. And more importantly, it means consistent lower latency, even at high loads. It means less CPU usage for the host CPU, even when all of the guest operating systems are consuming high amounts of consistent I/O.
Many of the features that you'll see EC2 using are actually based on AWS implementing some of the more advanced virtualization techniques that have been developed across the industry. AWS do have their own hypervisor stack now called Nitro, and I'll be talking about that in much more detail in an upcoming lesson, because that's what enables a lot of the higher-end EC2 features.
But that's all the theory I wanted to cover. I just wanted to introduce virtualization at a high level and get you to the point where you understand what S-R-I-O-V is, because S-R-I-O-V is used for enhanced networking right now, but it's also a feature that can be used outside of just network cards. It can help hardware manufacturers design cards, which, whilst they're a physical single card, can be split up into logical cards that can be presented to guest operating systems. It essentially makes any hardware virtualization aware, and any of the advanced EC2 features that you'll come across within this course will be taking advantage of S-R-I-O-V.
At this point, though, we've completed all of the theory I wanted to cover, so go ahead, complete the slicing when you're ready. You can join me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I'm going to be going into a little bit more depth about DNS within a VPC and Route 53 DNS endpoints. And it will be essential if you're involved with any complex hybrid network projects that involve DNS, so let's jump in and get started because we've got a lot to cover.
At the associate level you were introduced to how DNS functions within a VPC. How in every VPC there's an IP address that's reserved for the VPC DNS. And that's the VPC.2 or VPC+2 address. For the Animals for Life VPC which has a VPC side range of 10.16.0.0/16, this VPC+2 address would be 10.16.0.2. This is the address which all VPC based resources can use for DNS. Now additionally in every subnet the .2 or +2 address is also reserved. And this .2 address is now referred to as the Route 53 resolver. It's via this address that VPC resources can access Route 53 public hosted zones and any associated private hosted zones. So this address provides a full range of DNS services to any VPC based resources.
Now you can deploy your own DNS infrastructure within a VPC but by default Route 53 handles this functionality. Now historically Route 53 did have its limitations. The Route 53 resolver is only accessible from within the VPC. You can't access it over site to site VPNs or via Direct Connect. And this means that hybrid network integration is problematic both inbound and outbound.
DNS plays a huge part of how most applications work and if you can't easily integrate your AWS and on-premises DNS infrastructures then you will experience problems. At best this means significantly more admin and technical overhead. Now ideally what you want when dealing with hybrid networking is to have a joined up DNS platform. You want your AWS resources to be able to discover and resolve on-premise services and you want your on-premise services to work well with AWS products and services. You don't always want DNS records for private applications being available publicly because that's often a way that systems are audited before a network attack.
So the problem that we have is how to effectively integrate the often separate DNS systems that exist inside AWS and on-premises. Now let's review this problem architecturally before we look at some solutions.
So the main historical problem with DNS in a hybrid AWS and on-premises environment is that historically it's been disjointed. On the AWS side we have instances inside a VPC and these use the Route 53 resolver so the dot to address in every VPC to perform their DNS resolution. Now this handles any Route 53 based public zones and private zones and for anything else the queries are forwarded to the public DNS platform. The problem historically is that the Route 53 resolver had no way to forward queries for any on-premises DNS zones to on-premises DNS servers. There was no conditional forwarding functionality which meant that the AWS side resources had no internal visibility of the on-premises resources from a DNS perspective.
On the on-premises side the DNS resolver would generally handle any local DNS zones and it too would forward anything that it didn't know about to the public DNS system. The problem with the on-premises side is that as I mentioned earlier the Route 53 resolver isn't accessible outside the VPC and so on-premises resources couldn't access it and because of this using this architecture they can't resolve any non-public DNS records within AWS and so their ability to resolve and discover VPC based private services is impacted. Now this has the overall effect of creating a DNS boundary between the two systems. AWS on the left, on-premises on the right, neither capable of doing private DNS resolution between them.
And this was the problem originally with hybrid DNS involving AWS and on-premises environments and many solutions were designed to address this problem. The most common was the idea of a VPC based DNS forwarder running on EC2. And let's look at how that changes the architecture.
We start with a similar architecture, AWS on the left, on-premises on the right. The AWS Route 53 resolver will handle any Route 53 private or public hosted zones and it will otherwise pass any unknown queries out to the global public DNS. Now the way that we resolve the split DNS architecture that I just spoke about is by adding a DNS forwarder that's running within the VPC on the left. Now this is configured using DHCP option sets inside the VPC. So this DNS forwarder server is set as the DNS server for any resources inside the VPC.
When the forwarder receives any queries, it identifies if they're for corporate DNS zones and if so, it forwards them to the on-premises resolver. Otherwise the default is to forward them onto the Route 53 resolver where they're dealt with in the normal way. The effect of this is that the AWS resources can still use the functionality provided by the AWS Route 53 resolver but can also fully integrate with the on-premises DNS.
So AWS resources as well as being able to resolve any private hosted zones or any public hosted zones in Route 53 can now also query any zones that are hosted internally on the corporate DNS infrastructure. Now within the corporate environment the on-premises resolver is used as the DNS server for all corporate devices and it can directly answer any queries for private and locally hosted DNS zones. For anything else it can forward the queries through to the DNS forwarder within the VPC which can then communicate with the Route 53 resolver because it's located inside the VPC. Essentially the forwarder is acting as an intermediary allowing the on-premises resolver to communicate with the Route 53 resolver.
Until the release of Route 53 endpoints this was one example of best practice architecture for hybrid DNS and this is something you might still find implemented within your clients. Now to understand why Route 53 endpoints provide a significantly better architecture it was necessary to understand how things work before Route 53 endpoints. So now that you know that let's now look at the theory, features and architecture of Route 53 endpoints and you're also going to have the opportunity to use Route 53 endpoints and all of their features within a demo lesson in this section of the course.
Route 53 endpoints are delivered as VPC interfaces so ENIs which are accessible over a VPN or a direct connect so these are accessible outside of the VPC that they're located in and these are tightly integrated with the Route 53 resolver as I'll talk about in a second. Now endpoints come in two different types inbound endpoints and outbound endpoints.
Outbound endpoints are used by your on-premises DNS infrastructure to forward their request to so they work just like the EC2 forwarder that we just stepped through only they're delivered as a service by AWS. So when you provision them you select two different subnets, you get two different IP addresses and your on-premises DNS infrastructure can be configured to forward any queries that are not for locally hosted DNS zones so for zones that are not locally stored within your on-premises environment to these inbound endpoint IP addresses. So that handles your on-premises infrastructure accessing your AWS-based DNS without using an EC2-based forwarder.
Now the reverse of this are outbound endpoints and these are presented in a similar way, interfaces in multiple subnets but in the case of outbound endpoints these are used to contact your on-premises DNS. Now the way that this works is that you define a rule and you associate that with an outbound endpoint. Let's use an example of corp.animals4life.org which is a private zone hosted within the on-premises DNS infrastructure of the Animals for Life organization. We define a rule saying for any queries that are looking for records inside corp.animals4life.org use these outbound endpoints and send that query to your on-premises DNS infrastructure.
So when you're setting up these outbound endpoints you have to specify the details of your on-premises DNS infrastructure and then based on these rules when any queries are occurring for a particular DNS zone, for example corp.animals4life.org, these outbound endpoints are used and the queries are forwarded on to your on-premises DNS infrastructure. Now because these outbound endpoints have unique IP addresses you can whitelist them on-premises as needed so if you need these IP addresses to be able to bypass any filtering or corporate firewalls you can do that because you directly control their IP addresses when you're provisioning them within your VPC.
So using a combination of inbound and outbound endpoints allows you to configure a hybrid DNS platform using AWS and on-premises environments. Now before we finish let's take a look at how this architecture looks visually.
We start with a similar architecture, AWS on the left with a VPC containing two subnets and the Route 53 resolver. On the right we've got the on-premises environment with two DNS servers and a collection of servers, client devices and some humans thrown into the mix. Between the two environments is a dedicated private connection in the form of a direct connect between the AWS environment and the animals4life on-premises data center.
Now the simple part of this architecture is that when VPC resources are performing queries and these queries are for any hosted zones that are not hosted in AWS or not on-premises these go out to public DNS in the normal way. The first step to implementing a hybrid DNS architecture using Route 53 endpoints starts at the AWS side. So we create two inbound Route 53 endpoints within the VPC and these are just network interfaces which are part of the Route 53 resolver and these are accessible from the on-premises network.
The on-premises DNS servers can be set to forward queries for any non-locally hosted DNS zones to these endpoints and this occurs over the direct connect and also over a site to site VPN for organizations who can't justify the investment of a direct connect. The inbound endpoints then allow access to the Route 53 resolver which means that the on-premises side can now fully communicate with AWS and perform DNS resolution for any AWS hosted zones.
Now we can also integrate in the other direction by creating Route 53 outbound endpoints and these are also interfaces within the VPC and they're configured to point at the DNS servers which run within the on-premises environment. Now we can attach rules to these endpoints which configure forwarding for specific domains. In this example corp.animals4life.org and this means that when a VPC based resource queries for any records within this domain so when it matches one of the rules then it's forwarded via the outbound endpoints across the direct connect or VPN and into the DNS servers within the on-premises environment and this means that the AWS resources can now resolve the on-premises DNS zones and the result is a fully integrated DNS environment which spans both AWS and on-premises environments.
Now Route 53 endpoints are delivered as a service. They're highly available and they scale automatically based on load. They can handle around 10,000 queries per second per endpoint so keep that in mind and plan your infrastructure deployment accordingly. But with that being said that's all of the theory that you need to be aware of relating to Route 53 endpoints. Go ahead complete the lesson and when you're ready I look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about a feature enhancement to site to site VPNs which is called amazingly enough accelerated site to site VPN. Now the name might give away what the feature does but for clarity it's a performance enhancement to the normal site to site VPN product which uses the AWS global network. The same global network that the global accelerator product uses to improve transit performance. So let's jump in and take a look at this architecture evolution and examine exactly what benefits accelerated site to site VPN provides.
Now just to quickly summarize, historically site to site VPNs have used a virtual private gateway known as VGW and this is attached to a VPC. They also use a customer gateway object which represents your customer router and between these two gateway objects a VPN connection is created and this allocates two resilient public space endpoints which are hosted in separate availability zones and this protects against availability zone failure. Now these endpoints are used to establish two IP sec tunnels between your customer gateway and the AWS VPN infrastructure.
The point I want to focus on in this lesson is that normally those IP sec tunnels transit data over the public internet. So between your business premises and the AWS network will be your ISP, maybe another ISP, some other networks and then the data reaches AWS. As an example right now if I attempt to connect to an AWS VPN endpoint in Australia there are four networks in between my current location and the AWS network. If I attempt to connect to a VPN in the US there are significantly more. The result of this can be lower levels of performance so lower speeds and higher levels of latency in addition to inconsistency with both of these metrics.
Now for larger companies one option is to run the VPN over a direct connect public virtual interface. Because the VPN endpoints are themselves public space AWS endpoints a public VIF can be used to reach them and this offers much better performance so more consistent speeds as well as improved and consistent latencies. But for many businesses this isn't an option because of the cost and this is where accelerated site to site VPN improves things.
Now before we review the improvements offered by accelerated site to site VPN let's look at how the original VPN architecture looked. On the left we have the animals for life VPC and on the right the animals for life business premises. Now on the left we have a virtual private gateway or VGW attached to the VPC and on the right we have a single customer outer or customer gateway that's within the on premises environment and between both of those we have a pair of IP sec tunnels.
Now logically we view this as a single direct connection between the AWS VPC and the customer premises but physically the data flows through a fairly indirect route over the public internet crossing many different networks between the source and destination and it's possible even over a different route between the original traffic and the reply traffic. Both routes cross many different networks which introduces different levels of network performance different performance variability so by using the public internet as transit you open yourself to lots of different points which can impact the performance of the connection between AWS VPCs and your on-premises environments.
Now another way that you can potentially implement site-to-site VPNs is by using the transit gateway and transit gateways as you learned at the associate level significantly simplifies VPN and multi VPC architectures. With a transit gateway we still have the on-premises environment on the right but this time the dual tunnel VPN connection is between the customer gateway and a transit gateway using a VPN attachment. This means that a single dual tunnel VPN can be used to connect multiple VPCs and on-premises environments but and this is really important to understand the transit of data is still moving across the public internet meaning it suffers from variable latency and inconsistent speeds and this is where accelerated site-to-site VPN comes in handy.
With accelerated site-to-site VPN the architecture is slightly different we still have the animals for life business premises on the right but instead of connecting directly to a virtual private gateway or transit gateway when you're using accelerated mode the AWS global accelerator network is utilized this is a network of edge locations positioned globally acting as entry points into the AWS global network. So when you create a VPN connection you get two IP addresses and each of those IP addresses are allocated to all of the edge locations and data is routed to the closest edge location to the customer gateway. This means that the public internet is only used for a minimal amount of time just to get to the closest edge location and this results in lower latency, higher throughput and less jitter and jitter is the variance in latency. Essentially the quality of the connection is better because it's using the public internet less.
So this process gets your data to the edge and so far there are two important factors to be aware of. First acceleration can be enabled when creating a transit gateway VPN attachment only not when using a virtual private gateway. VGWs do not support accelerated site to site VPN and that's critical to understand for the exam and when you're implementing real-world architectures. When you do enable this feature there is a fixed accelerator fee and a data transit fee. Don't focus so much on the specific price just be aware that this cost architecture exists so the fixed fee for the accelerator plus a transfer fee.
Now once data transits from the customer gateway to the edge location that's where things start to change. Because we're using the global accelerator network architecturally the AWS network has been extended to be closer to your location. Now this sounds strange but essentially what it means is that the distance data has to travel over the public internet is reduced. Without using accelerated site to site VPN your data would transit over the public internet from the point that it exited the customer gateway all the way through to the VPN endpoints. But with accelerated site to site VPN you only have to use the public internet between the customer gateway and these edge locations and these edge locations are generally going to be significantly closer to your location than normal VPN endpoints.
Now once it reaches the edge of the global accelerator network it transits inside and then it's moving across an optimized network through to the transit gateway and then once there it's using the transit gateway to reach its final VPC destination. What we're doing here is combining three different products the site to site VPN the transit gateway and the global accelerator. The result is that your data gets to the closest edge location and from that point onward it's using a high-performing and efficient global network to transit through to its final destination.
So this is simply just an option that you need to enable on a VPN connection as long as you're using transit gateway but in doing so it offers significant reductions to the variance in latency known as jitter, lower overall latency and improvements to transit speeds. So for any real-world applications my recommendation is to use this feature by default and this will mean that for any VPN deployments you should, where possible, prefer using the transit gateway and attaching the VPN to that transit gateway versus using a virtual private gateway. Virtual private gateway based VPNs are going to start missing out on more and more advanced features that are released by AWS because transit gateway should now be the preferred product.
So remember that for the exam and for real-world usage if you're deploying site to site VPNs where possible use the transit gateway and make sure you enable accelerated site to site VPN. With that being said that's everything that I wanted to cover from an architecture theory perspective within this lesson so go ahead complete the lesson and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. This is part two of this lesson. We're going to continue immediately from the end of part one. So let's get started.
So this is a pretty typical routing architecture, public routing via an internet gateway using a default route, private routing using a default route via a NAT gateway, and then access to on-premises networks using a more specific route and a virtual private gateway. But there are some situations where you might have to have something a little bit more complex. And let's look at that next.
With this architecture, we have VPC A on the left using 10.16.0.0/16, VPC B on the top right using 10.20.0.0/16, and VPC C on the bottom right also using 10.20.0.0/16. Now there's peering configured between VPC A and VPC B, as well as between VPC A and VPC C, and this is allowed because there's no overlapping side of space between A and B and A and C. What this also means though is you can't create a peering connection between VPC B and VPC C because there is a side overlap and that prevents us from creating a VPC peer between those two VPCs.
Now let's say that we have some services in VPC B and VPC C, in this case two database platforms running on EC2, and we also have services within VPC A which need to be able to access those database instances. Now one option that we could do is to apply a route table onto both of the subnets in VPC A, and this means that all traffic from VPC A will move to 10.20.0.0/16 via the VPC peer between VPC A and VPC B.
So using one route table on both of these subnets will mean that the traffic goes to VPC B whenever 10.20.0.0/16 IP addresses are contacted from VPC A. But what this also means is that VPC A cannot communicate with anything in VPC C because the route will always send data to VPC B. So the peering connection between VPC A and C is not used, and this means that VPC C is unreachable because it uses the same IP address range as VPC B and there's no route to it.
So there's no way that VPC A can communicate with VPC C, but it also means that VPC C won't be able to communicate with VPC A because there's no route back for the return traffic. Handling any form of routing when you have SIDA overlaps is a problem, and it's one reason why I always suggest not to have overlapping address space within AWS and any other networks external to AWS.
Now if you do find yourself in this type of situation, there are a couple of easy ways that you can handle it, and that's what I want to talk about over the next few screens.
One option to allow access to both of the database instances in VPC B and VPC C is to split the routing inside VPC A. So use a route table per subnet. The top route table applies to the top subnet, and the bottom route table applies to the bottom subnet, and this means that the bottom instance will now use the bottom route table so it can now access the services inside VPC C, whereas the top instance in the top subnet will access VPC B.
So by splitting the routing, VPC B is accessible only from the top subnet of VPC A, and VPC C is accessible only from the bottom subnet of VPC A. So remember, route tables are always defined per subnet, so in this case we have two route tables that are using the same destination 10.20.0.0/16, but each of the route tables uses a different target, so a different peering connection between the VPCs.
So the top route table uses the A/B peering connection, and the bottom route table uses the A/C peering connection. So this means that at the top instance in VPC A, attempts to access 10.20.0.10, it will go via the A/B peer and access VPC B. If the bottom instance attempts to access 10.20.0.20, it will go via the A/C peer and access VPC C, but this does mean that the top instance in VPC A will not be able to access the database instance in VPC C, and the bottom instance in VPC A will not be able to access the top database instance in VPC B, because we have two different route tables that point at different VPC peers that connect to different VPCs.
So this architecture means that you have to be very careful about where you deploy instances, because with this architecture, the top subnet in VPC A will always be limited to VPC B, and likewise, the bottom subnet in VPC A will always be limited to VPC C.
Now we can do it slightly differently again. Instead of using two route tables, we can stick to using the one, and this route table applies to both of the subnets within VPC A, and it has two routes contained in that route table.
The first route has a destination of 10.20.0.0, and it's a /16 route, and the target is peer A/B, so this points at VPC B. And this means that if nothing more specific matches than traffic from either of the subnets in VPC A, if it's destined to any IP address within the 10.20.0.0/16 network, it will be directed towards VPC B.
But we also have another route, the bottom one in pink, and this is more specific. It has a longer prefix. It's a /32 route, which means a single IP address. So both of these routes match the database instance in VPC C, so 10.20.0.20. So this IP address is contained within the network that the route in blue matches, and it's also the IP address that's directly matched by the route in pink.
And so the result of this is the top peer is used for everything in the 10.20.0.0/16 network, which leads to the VPC B network, except the one specific IP, 10.20.0.20/32, and this uses the more specific route. Remember the priority order, longest prefix wins.
And so using this method means that you can point specific IP addresses over the bottom VPC peer toward VPC C, and leave the defaults being the top VPC peer, which goes to VPC B. Both of the methods are valid, so this one, and using split routing, picking between them, is an architectural choice. But you should at least have an awareness of this for the exam.
So if you face any questions which talk about routing and overlapping siders, you now know two strategies which you can use to overcome that problem.
Now one more thing that I want to cover before we finish up with this lesson is a relatively new feature, which is called ingress routing. Normally within a VPC, route tables control outgoing or egress routing.
So in this example, the application subnet has a default route, which sends all of its outgoing traffic that isn't for the VPC side arrange to a security appliance. Now this security appliance is contained within the public subnet, and this also has an attached route table.
This has a default route which sends all unmatched traffic out via the internet gateway, and anything that's destined for the corporate network, so 192.168.10.0/24 through the VGW. So without having the ability to control ingress routing, so without using gateway route tables, any return traffic would arrive at the internet gateway, which would forward that directly back to the service where it originated from.
Ingress routing, so using gateway route tables, allows us to assign a route table to gateway objects like virtual private gateways or internet gateways. In this example, a route on the internet gateway would allow us to control ingress routing as it arrived at the internet gateway.
So we could configure the internet gateway so no matter what the destination of IP traffic was, to forward that traffic through to the security appliance where it would be inspected and then forwarded through to its intended destination.
So gateway route tables, they can be attached to internet gateways or virtual private gateways, and they can be used to direct that gateway to take actions based on inbound traffic, such as in this example forwarding it through to a security appliance, no matter what the actual destination was.
So gateway route tables allow us to implement this type of architecture, which allows us to inspect traffic as it flows in and out of the network, so in a bi-directional way. And before we have the ability to assign gateway route tables, we couldn't control it in this way. We could have the traffic flowing through the security appliance on its outbound leg, but couldn't influence how that traffic would be routed when it was returning into the VPC.
So this is a really powerful feature that's relatively new to VPCs that you definitely need to be aware of for the exam. So a normal route table is allocated to a subnet and controls traffic as it leaves that subnet. A gateway route table is applied to a gateway object, so an internet gateway or a virtual private gateway, and it's used to influence how traffic is handled on its way back inside of VPC.
Okay, so that's everything I wanted to cover within this lesson. I just wanted to give you a reintroduction to the routing architecture within AWS and just provide you with some more complex routing architecture examples, which will be useful to know for the real world and for the exam.
Now, don't worry, there is a demo lesson that's coming up elsewhere in this section, where you'll get to experience exactly how routing works from an advanced perspective within a VPC and using gateway route tables to control ingress routing. So that will be coming up elsewhere in this section of the course, but this is all of the theory that I wanted to cover. So go ahead, complete this lesson, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to discuss some advanced routing concepts which become important when dealing with complex hybrid networking. Now let's quickly refresh our knowledge before we move on to some of the more advanced routing topics.
So subnets are associated with one route table only, no more and no less. And that's either an implicit association with the main route table of the VPC or a custom route table that you explicitly associate with a subnet. So you can create an explicit route table and you can associate it with a subnet and when you do that the main route table is disassociated. If you don't explicitly associate a custom route table with a subnet or you remove an explicit association then the main route table is associated again with that subnet.
Now route tables can also be associated with an internet gateway or virtual private gateway and this allows them to be used to control traffic flow entering a VPC either from the internet or from on-premises locations and I'll be talking about that in more detail later in this lesson.
Another important thing to understand about route tables is that IP version 4 and IP version 6 are dealt with separately both in terms of default routes and more specific routes. At a high level routes have two main components, a destination and a target. Now the destination can either be a default destination, a network inside a notation or a specific IP address also using a side a notation and the destination is used by the VPC router or a virtual private gateway or internet gateway when it's evaluating traffic. And in addition to the destination there's the target and this configures where traffic should be sent to if it matches the destination.
Now a route table has a default limit of 50 static routes and 100 dynamic routes known as propagated routes and this is per route table. So 50 routes that you statically add to a route table and 100 routes which are propagated onto that route table if you enable that option. Whenever traffic arrives at a VPC router interface or internet gateway or a virtual private gateway interface it's matched against routes in the relevant route table. All routes in a route table which match are all evaluated and there's a priority order and the one which matches and has the highest priority is used to control where traffic is directed towards.
Now visually this is how route tables work from a subnet perspective. We start with a VPC and it's a simple one with four subnets. Now by default a VPC is created with a main route table and this is implicitly attached to all subnets within that VPC. Now you can create other custom route tables which can be assigned to subnets within a VPC. Let's assume the top right one. When you associate a custom route table with a subnet the main route table association is removed and the new custom route table is explicitly associated with that one single subnet. Now subnets always have one and only one route table associated with them. If the custom route table is ever disassociated then the implicit main route table association is re-added. It's not possible to have a subnet without a route table association. The default is that the main route table of the VPC is implicitly associated with all subnets.
Now when route tables are associated with subnets this controls how traffic is handled when it arrives at the VPC router. Route tables can contain two different types of routes. We've got static routes and propagated routes. Static routes are added manually by you to a route table and propagated routes are added when you enable this option via a virtual private gateway. So any routes that the virtual private gateway learns of will be added as dynamic routes onto the route table if you enable route propagation on that route table. So it's an option on a per route table basis that you can either enable or disable. And if you enable it then that route table will be populated with any routes that the virtual private gateway becomes aware of. So these might be routes learned from Direct Connect or site to site VPNs either using VGP or statically defined within the VPN configuration.
So at a high level whenever traffic exits a subnet that subnet's route table is evaluated. The routes are all analyzed looking for any which match the destination of the IP traffic and for any which do match an evaluation process is started. Now if there's only one valid route which matches the destination of the traffic then that route is used to control where to send that traffic to. So the target that's nice and simple. If there's more than one route which can apply then the first rule applies and that's longest prefix wins. So a route with a /32 wins over a route with a /24 or a /16 or a /0. The higher the number after the / the more specific the route is and a /32 represents one single IP address. So more specific routes always win regardless of how they ended up on that route table. And if one route can be selected just based on this prefix then that route is used.
Now if you have multiple routes which could apply and they all have the same prefix length then the next step of priority is that statically added routes take priority over propagated ones. Static routes remember are ones that you add to a route table. And the logic to this is that if you've added something explicitly to a route table then it is important and it should be there and it should be preference versus anything which is dynamically learned from other entities in AWS. So if you have a route table which has a static route to one destination and a virtual private gateway which learns that same route with that same prefix and you have route propagation enabled on that route table you will have two routes to the same destination with the same prefix. And in this case the static route will be selected as the higher priority and that one will be used. So this level of selection will always prioritize static routes.
But there are situations where you might have multiple routes both using the same prefix length and both dynamically learned via route propagation. Well there's another level of prioritization. For any routes learned via a virtual private gateway the next priority order is that routes learned from a direct connect are used first then routes learned via a static VPN then routes learned via a BGP based VPN and if you still have multiple valid routes at this point so routes that are both learned via propagation both learned via BGP and both with the same prefix length then ASPATH is used as a decider. An ASPATH is a BGP term which is used to represent the path between two different ASNs within BGP and an ASPATH represents the distance between two different autonomous systems and so logically routes with a shorter ASPATH would be preferenced versus those with a longer ASPATH.
So this is an example of basic VPC routing. We have an AWS region with a VPC inside it two availability zones and two subnets in each private and public. Then we have two gateway attachments, an internet gateway and then a virtual private gateway connecting to an on-premises environment on the left. Each subnet in the VPC has one route table attached to it using the priority system which I've just discussed. In the subnets we've got some resources so a private instance, a public instance and a NAT gateway.
So let's talk about routing and this is a pretty typical architecture. On the public subnet we have a route table which uses a default route of 0.0.0.0/0 and a target of the internet gateway and this means that any traffic not identified by any other route is forwarded to the internet gateway. Now assuming that both the NAT gateway and the public instance both have public IP version 4 addresses this means that their data goes out via the internet gateway.
Now the private subnets also have a route table with a default route pointing at the NAT gateway as a target and this means that for any traffic not otherwise matched the flow goes via the NAT gateway where it's translated and sent on to the internet via the internet gateway. The route table on the private subnets also has a more specific route and this route is for the 192.168.10.0/24 network and it has a longer prefix than the default 0.0.0.0/0 route and so it has a higher priority. And this means that it's used for any traffic which has a destination of 192.168.10.0 rather than using the default 0.0.0.0/0 route and this means that data for the on-premises network will leave via the virtual private gateway.
Okay so this is the end of part one of this lesson. It was getting a little bit on the long side and I wanted to give you the opportunity to take a small break, maybe stretch your legs or make a coffee. Now part two will continue immediately from this point so go ahead complete this video and when you're ready I'll look forward to you joining me in part two.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson I'm going to be covering another important piece of networking functionality, VPC peering. I want to cover the theory and architecture quickly and then move on to a demo so you can experience exactly how it works. So let's jump in and get started.
VPC peering is a service that lets you create a private and encrypted network link between two VPCs. One peering connection links two and only two VPCs—remember that, no more than two; it's important for the exam. A peering connection can be created between VPCs in the same region or cross region, and the VPCs can be in the same account or between different AWS accounts. Now, there are some limitations when running a VPC peering connection between VPCs in different regions, but it still can be accomplished.
When you create a VPC peer, you can enable an option so that public host names of services in the peered VPCs resolve to the private internal IP addresses, and this means that you can use the same DNS names to locate services whether they're in peered VPCs or not. If a VPC peer exists between one VPC and another and this option is enabled, then if you attempt to resolve the public DNS host name of an EC2 instance, it will resolve to the private IP address of that EC2 instance.
And if your VPCs are in the same region, then they can reference each other by using security group ID, and so you can do the same efficient referencing and nesting of security groups that you can do if you're inside the same VPC. This is a feature that only works with VPC peers inside the same region. In different regions, you can still utilize security groups, but you'll need to reference IP addresses or IP ranges. If the VPC peers are in the same region, then you can do the logical referencing of an entire security group, and that massively improves the efficiency of the security of VPC peers.
Now, if you can take away just two important facts from this theory lesson about VPC peers, it's that VPC peering connections connect two VPCs and only two—one VPC peer connects two VPCs—and the second fact that I want you to take away is that this connection is not transitive. Now what I mean by that—and I'll show you it visually on the next screen—is that if you have VPC A peered to VPC B and you have VPC B peered to VPC C, that does not mean that there is a connection between A and C.
If you want VPC A, B, and C to all communicate with each other, then you need a total of three peers: one between A and B, one between B and C, and one between A and C. So you need to make sure that for any connectivity requirements that you have, there is always a peering connection between every VPC pair that you want to connect. You can't route through interconnected VPCs, and you'll see exactly how that looks visually on the next screen.
Now, when you create a VPC peering connection between two VPCs, what you're actually doing is creating a logical gateway object inside of both of those VPCs, and to fully configure connectivity between those VPCs, you need to configure routing—so route tables with routes on them pointing at the remote VPC IP address range and using the VPC peering connection gateway object as the target—and don't worry, you'll get to see exactly how this works when you implement it in the next demo lesson.
I do want you to keep in mind that as well as creating the VPC peering connection and configuring routing, you also need to make sure that traffic is allowed to flow between the two VPCs by configuring any security groups or network ACLs as appropriate. So let's look at the architecture visually before we move on to a demo lesson where you'll get the chance to implement VPC peering between a number of different VPCs.
So architecturally, let's say that we have three VPCs belonging to animals for life—so we've got VPC A which is using an IP sider of 10.16.0.0/16, we've got VPC B at the bottom which is using 10.17.0.0/16, and then VPC C on the right which is using 10.18.0.0/16. By default, each of these VPCs are isolated networks, so no communication is allowed between any of the VPCs.
Now, to allow communications, we can create a peering connection between VPC A and VPC B, and we can add another peering connection between VPC B and VPC C. Now, what that would do—as I mentioned on the previous screen—is establish a networking link and create a logical gateway object inside each VPC. So step two would be to configure routing tables within each VPC and associate these with subnets, and these routing tables have the remote VPC sider and as the target the VPC peering connection or the gateway object that's created when we create the VPC peering connection.
Now, this would mean that the VPC router in VPC A would know to send traffic destined for the IP range of VPC B toward the VPC peering logical gateway object. That configuration would be needed on all subnets at both sides of all peering connections, assuming we wanted to allow completely open communications.
Now, something to understand for the exam—it does come up in questions at an associate level—is that the IP address ranges of the VPCs, so the VPCs siders, cannot overlap if you want to create VPC peering connections. So this is another reason why right at the start of the course I cautioned against ever using the same IP address ranges. If you want to allow VPCs to communicate with each other using VPC peers, you cannot have overlapping IP addresses.
Now, assuming that you have followed best practice and don't have any overlapping sider ranges inside your VPCs, then you will have connectivity between your isolated networks—but one really, really important thing to understand, both for production usage and the exam, is that with the architecture that you see now, VPC A and B have one peering relationship, and VPC B and C have another peering relationship, but there is no link between VPC A and VPC C.
And while it might seem logical to assume that they could communicate through VPC B as an intermediary, that's not the case. Routing isn't transitive. What this means is that you cannot communicate through an intermediary—you need to have a VPC peer created between all of the VPCs that you want to be able to communicate with each other. At least if you only use VPC peers. There is a product called the transit gateway which I'll talk about later in the course, which is a little bit more feature rich, but for VPC peers you need to make sure that you have one peering connection between all VPCs that you want to communicate.
So in this example, for VPC A to communicate with VPC C, they would need their own independent peering connection created between those two VPCs. Now, with VPC peering, any data that's transferred between VPCs is encrypted, and if you're utilizing a cross region VPC peer, then the data transits over AWS's global secure network—so you get secure transit and you gain the performance from using the global AWS transit network versus the public internet.
Okay, so that's it for the features and architecture of VPC peering, that's everything that I wanted to cover in this lesson. Next, you're going to be doing a demo where you'll have the chance to implement this within your own AWS environment, so thanks for watching, go ahead and complete this video, and then when you're ready I look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about another type of endpoint available within a VPC, and that's an interface endpoint. These do a similar job to gateway endpoints, but the way that they accomplish it is very different, and you need to be aware of the difference. So let's jump in and get started.
Just like gateway endpoints, interface endpoints provide private access to AWS public services, so private instances or instances which are in fully private VPCs. Interface endpoints historically have been used to provide access to all services apart from S3 and DynamoDB; historically both of these services were only available using gateway endpoints, and interface endpoints were used for everything else. Recently though, AWS have enabled the use of S3 using interface endpoints, so at the time of creating this lesson, you have the option to use either gateway endpoints or interface endpoints, but currently DynamoDB is still only available using gateway endpoints.
Now one crucial difference between gateway endpoints and interface endpoints is that interface endpoints are not highly available by default; they're interfaces inside a VPC which are added to specific subnets inside that VPC, so one subnet as you now know means one availability zone. One interface end point is in one availability zone, meaning if that availability zone fails, then the functionality provided by the interface endpoint also fails. To make sure that you have a highly available service, you need to add one interface endpoint in one subnet in each availability zone that you use inside a VPC; so if you use two availability zones you need two interface endpoints, and if you use three then you'll need three interface endpoints.
Now because interface endpoints are just interfaces inside a VPC, you're able to use security groups to control access to that interface endpoint from a networking perspective, and that's something that you can't do with gateway endpoints. You do still have the option of using endpoint policies with interface endpoints in just the same way as with gateway endpoints, and these can be used to restrict what can be accessed using that interface endpoint. Another aspect of interface endpoints that you should be aware of is they currently only support the TCP protocol and only IP version 4; now IP version 4 is probably the most important of those two things that you need to know. I've not seen it come up in the exam yet, but it will make its way there eventually, and it's probably something that you should be aware of regardless.
Now behind the scenes, interface endpoints use PrivateLink, which is a product that allows external services to be injected into your VPC, either from AWS or from third parties. So if you see any mention of PrivateLink, it's a technology that allows AWS services or third-party services to be injected into your VPC and be given network interfaces inside your VPC subnet. PrivateLink is how interface endpoints operate, but it's also how you can deploy third-party applications or services directly into your VPC, and this is especially useful if you're in a heavily regulated industry but want to provide access to third-party services inside private VPCs. You can do it without creating any additional infrastructure—you just use PrivateLink and inject that service’s network interfaces directly into subnets inside your VPC.
Now interface endpoints don't work in the same way that gateway endpoints do; it's a completely different way of providing a similar type of functionality. Gateway endpoints used a prefix list, which was a logical representation of a service, and this was added to route tables—that's how traffic flows to the gateway endpoint from VPC subnets. Now interface endpoints primarily use DNS; interface endpoints are just network interfaces inside your VPC, and they have a private IP within the range which the subnet uses that they're placed inside.
The way that this works is that when you create an interface endpoint in a particular region for a particular service, you get a new DNS name for that service—an endpoint-specific DNS name—and that name can be used to directly access the service via the interface endpoint. This is an example of a DNS name that you might get for the SNS service inside the US East 1 region. This name resolves to the private IP address of the interface endpoint, and if you can update your applications to use this endpoint-specific DNS name, then you can directly use it to access the service via the interface endpoint and not require public IP addressing.
Now interface endpoints are actually given a number of DNS names. First, we've got the regional DNS name, which is one single DNS name that works whatever AZ you're using to access the interface endpoint—it’s good for simplicity and for high availability. Also, each interface in each AZ gets a zonal DNS, which resolves to that one specific interface in that one specific availability zone; now either of these two types of DNS endpoints can be used by applications to directly and immediately utilize interface endpoints.
But interface endpoints also come with a feature known as private DNS, and what private DNS does is associate a Route 53 private hosted zone with your VPC. This private hosted zone carries a replacement DNS record for the default service endpoint DNS name—it essentially overrides the default service DNS with a new version that points at your interface endpoint, and this option, which is now enabled by default, means that your applications can use interface endpoints without being modified. So this makes it much easier for applications running in a VPC to utilize interface endpoints.
Without using interface endpoints, accessing a service like SNS from within a VPC would work like this: the instance using SNS would resolve the default service endpoint, which is sns.us-east-1.amazonaws.com, to a public space IP address, and the traffic would be routed via the VPC router, then the internet gateway, and out to the service. Private instances would also attempt to do the same—they would also try to resolve this default service address—but without having access to a public IP address, they wouldn't be able to get their traffic flow past the internet gateway, so it would fail.
But if we change this architecture and we add an interface endpoint, if private DNS isn't used, then services which continue to use the service default DNS would leave the VPC via the internet gateway and connect with the service in the normal way. Now for services which choose to use the endpoint-specific DNS name, they would resolve that name to the interface endpoint’s private IP address. The endpoint is a private interface to the service that it's configured for—in this case SNS—and so the traffic could then flow via the interface endpoint to the service without requiring any public addressing. It’s as though SNS, in this example, has been injected into the VPC and is being accessed in a more secure way.
Now if we utilize private DNS, it makes it even easier. Private DNS replaces the service's default DNS, so even clients which haven't been reconfigured to use the endpoint-specific DNS—so they keep using the service default DNS name—will now go via the interface endpoint. So in this example, using private DNS overrides the default SNS service endpoint name, sns.us-east-1.amazonaws.com; when you use private DNS, rather than that resolving to a public IP address belonging to the SNS service, it's overridden so it now resolves to the private IP address of the interface endpoint. So using private DNS means that even services or applications which can't be modified to use the endpoint-specific DNS name will also utilize the interface endpoint.
So for the exam, I want you to try and remember a few really important things. Gateway endpoints work using prefix lists and route tables, so they never require changes to the applications—essentially the application thinks that it's communicating directly with S3 or DynamoDB, and all we're doing by using a gateway endpoint is influencing the route that that traffic flow uses. Instead of going via the internet gateway and requiring public IP addresses, it goes via a gateway endpoint and can use private IP addressing.
Interface endpoints use DNS and a private IP address for the interface endpoint; you've got the option of either using the endpoint-specific DNS names or you can enable private DNS, which overrides the default and allows unmodified applications to access the services using the interface endpoint. Interface endpoints don't use routing—they use DNS—so the DNS name is resolved, it resolves to the private IP address of the interface endpoint, and that is used for connectivity with the service.
Now gateway endpoints, because they're a VPC logical gateway object, are highly available by design, but interface endpoints, because they use normal VPC network interfaces, are not. When you're designing an architecture, if you're utilizing multiple availability zones, then you need to put interface endpoints in every availability zone that you use inside that VPC.
But at this point, thanks for watching—we’ve finished everything that I wanted to cover, so go ahead, finish up this video, and when you're ready, I'll look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in the next two lessons I'll be stepping you through two types of VPC endpoint. Now in this lesson I'll be talking about gateway endpoints and in the next I'll be covering interface endpoints. Now they're both used in roughly the same way, they provide the same functionality but they're used for different AWS services and the way that they achieve this functionality from a technical point is radically different. So let's get started and in this lesson I want to cover gateway endpoints.
So at a high level gateway endpoints they provide private access to supported services and at the time of creating this lesson the services that work with gateway endpoints are S3 and DynamoDB. So what I mean when I say private access in the context of this lesson, I mean that they allow a private only resource inside of VPC or any resource inside a private only VPC to access S3 and DynamoDB. Remember that both of these are public services.
Normally when you want to access AWS public services from within a VPC you need infrastructure and configuration. Normally this is an internet gateway that you need to create and attach to the VPC and then for the resources inside that VPC you need to grant them either a public IP version 4 address and IP version 6 address or you need to implement one or more NAT gateways which allow instances with private IP addresses to access these public services. So these services exist outside of the VPC and so normally public IP addressing is required and a gateway endpoint allows you to provide access to these services without implementing that public infrastructure.
Now the way that this works is that you create a gateway endpoint and these are created per service per region. So let's use an example of S3 in the US East 1 or Northern Virginia region. So you create this gateway 4S3 in US East 1 and you associate it with one or more subnets in a particular VPC. Now a gateway endpoint doesn't actually go into VPC subnets. What happens is that when you allocate the gateway endpoint to particular subnets something called a prefix list is added to the route tables for those subnets and this prefix list uses the gateway endpoint as a target.
Now a prefix list is just like what you would find on a normal route but it's an object, it's a logical entity which represents these services. So it represents S3 or DynamoDB. Imagine this is a list of IP addresses that those services use but where the list is kept updated by AWS. So this prefix list is added to the route table. The prefix list is used as the destination and the target is the gateway endpoint. And this means in this example that any traffic destined for S3 as it exits these subnets it goes via the gateway endpoint rather than the internet gateway.
Now it is important for the exam to remember that a gateway endpoint does not go into a particular subnet or an availability zone, it's highly available across all availability zones in a region by default. Like an internet gateway it's associated with a VPC but with a gateway endpoint you just set which subnets are going to be used with it and it automatically configures this route on the route tables for those subnets with this prefix list. So it's just something that's configured on your behalf by AWS.
A gateway endpoint is a VPC gateway object, it is highly available, it operates across all availability zones in that VPC, it does not go into a particular subnet. So remember that for the exam because that is different than interface endpoints which we'll be covering next. Now when you're implementing gateway endpoints you can configure endpoint policies and an endpoint policy allows you to control what things can be connected to by that gateway endpoint. So we can apply an endpoint policy to our gateway endpoint and only allow it to connect to a particular subset of S3 buckets.
And this is great if you run a private only high security VPC and you want to grant resources inside that VPC access to certain S3 buckets but not the entire S3 service so you can use an endpoint policy to restrict it to particular S3 buckets. Now gateway endpoints can only be used to access services in the same region. So you can't for example access an S3 bucket which is located in the AP Southeast 2 region from a gateway endpoint in the US East 1 region, it's in the same region only.
So in summary gateway endpoints support two main use cases. First you might have a private VPC and you want to allow that private VPC to access public resources in this case S3 or DynamoDB. Maybe you have software or application updates stored in S3 and want to allow a super secure VPC to be able to access them without allowing other public access or access to other S3 buckets. Now the second type of architecture that gateway endpoints can help support is the idea of private only S3 buckets.
Gateway endpoints can help prevent leaky buckets. S3 buckets as you know by now can be locked down by creating a bucket policy and applying it to that S3 bucket. So you could configure a bucket policy to only accept operations coming from a specific gateway endpoint. And because S3 is private by default for anything else the implicit deny would apply. So if you allow operations only from a specific gateway endpoint you implicitly deny everything else. And that means that the S3 bucket is a private only bucket.
One limitation of gateway endpoints that you should be aware of the exam is that they're only accessible from inside that specific VPC. There are logical gateway objects and you can only access logical gateways created inside of VPC from that VPC.
So before we finish up with this theory lesson let's quickly look at the architecture visually because it will probably help you understand exactly how all of the components fit together. Without using gateway endpoints this is the type of architecture that you've been using so far in the course. Two availability zones each with two subnets one public and green on the right and one private in blue on the left. Resources in the public subnets on the right can be given public IP version 4 addresses and so access public space resources using those addresses through the VPC router via the internet gateway into the public space and then through to the public resource S3 in this example.
Now private instances can't do this they still go via the VPC router but they need to use a NAT gateway which provides them with a NATed public IP version 4 address to use and then this public address that's owned by the NAT gateway is used via the internet gateway and finally through to the public resource again S3. The problem with this architecture is that the resources have public internet access either directly for public resources or via the NAT gateway for private only EC2 instances.
If you want instances inside the VPC to be able to access S3 but not the public internet then it's problematic. If you work in a heavily regulated industry and you need to create VPCs which are private only with no internet connectivity then that is almost impossible to do without using gateway endpoints.
Using gateway endpoints we can change this architecture. Architecturally to use gateway endpoints we create one inside of VPC and when creating it we associate it with one or more subnets and this means that a prefix list is added to the route table for that subnet. This means that any traffic which leaves the private instances inside those subnets now has a route to the public service so it will go via the gateway endpoint and they won't need public addresses to talk to that service. Imagine the gateway endpoint is being inside your VPC but having a tunnel to the public service and that way data can flow from private services inside the VPC through the gateway endpoint to the public service without needing any public addressing.
Note how this VPC has no internet gateway and no NAT gateway. The private instance has no access to anything else outside the VPC only S3 and that's only because we've created the gateway endpoint. We could even go one step further using a bucket policy on the S3 bucket and denying any access which doesn't come via the gateway endpoint.
Now a couple of important things to remember for the exam gateway endpoints are highly available by design. You don't need to worry about AZ placement just like internet gateways that's all handled for you by the VPC service. For the exam just know that gateway endpoints are not accessible outside of the VPC that they're associated with and in terms of access control endpoint policies can be used on gateway endpoints to control what the endpoint can be used to access.
So if you did want to allow access to one or two S3 buckets only rather than the entire service then that's something which can be controlled by using an endpoint policy on the gateway endpoint.
Now that's everything that I wanted to cover in this lesson about the theory and architecture of gateway endpoints. In the next lesson we're going to be covering interface endpoints which offer similar functionality to gateway endpoints but and this is critical they're implemented in a very different way from an architecture perspective and that difference really does matter for the exam. And if you intend to use these products in real world production implementations. But at this point thanks for watching we finished everything that I wanted to cover so go ahead finish up this video and when you're ready I look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about another type of gateway object available within VPCs, the egress only internet gateway. The name gives away its function, it's an internet gateway which only allows connections to be initiated from inside a VPC to outside. Let's step through the key concepts and architecture and you'll get chance to implement this yourself in the demo lesson later in this section.
To understand why egress only internet gateways are required, it's useful to look at the differences between IPv4 and v6 inside of AWS. With IPv4, addresses are private or public. The connectivity profile of an instance using IPv4 is easy to control, private instances cannot communicate directly with the public internet or public AWS services, at least not directly. Public instances they have a publicly routable IP address which works in both directions and in the absence of any security filtering, public instances can communicate to the public internet and be communicated with from the public internet.
For private IPv4 addresses, the NAT gateway provides two pieces of functionality which are easy to confuse into one. First, the NAT gateway provides private IPv4 IPs with a way to access the public internet or public AWS services but, and this is the important thing in the context of this lesson, it does so in a way which doesn't allow any connections from the internet to be initiated to the private instance. So NAT as a process allows private EC2 instances to connect out to the public internet and receive responses back but doesn't allow the public internet to connect into that private instance.
Now NAT as a process exists because of the limitations of IPv4, it doesn't work with IPv6 and so we have a problem because all IPv6 addresses in AWS are publicly routable. It means that an internet gateway will allow all IPv6 instances to connect out to the public space AWS services and the public internet but will also allow networking connectivity back in. So anything on the public internet from a networking perspective will be allowed to initiate connections to IPv6 enabled EC2 instances. In the absence of any other filtering the IPv6 instance will be exposed to the public internet.
So since NAT isn't usable with IPv6 we have a functionality hole, the ability to connect out but not allow networking connectivity to be initiated in an inbound direction and that's what egress only internet gateways provide for IPv6. They allow connections to be initiated out and response traffic back in but they don't allow any externally initiated connections to reach our IPv6 enabled EC2 instances. With normal internet gateways all IPv6 instances from a networking perspective can connect out and things are capable of connecting into them. With egress only internet gateways then IPv6 instances can initiate connections out and receive responses back but things cannot initiate connections to them in an inbound way.
And architecturally that looks something like this. So this is a common architecture, a VPC with two subnets in two availability zones and inside these subnets we've got two IPv6 enabled EC2 instances. Now the first step just like with a normal internet gateway is to create it and attach it to the VPC. Just like a normal internet gateway it's highly available by design across all of the AZs that the VPC uses and it scales based on the traffic flowing through it. So for any exam questions where you're asked about the architecture of egress only internet gateways it is exactly the same as a normal internet gateway. It's just the way that you use it which differs. The architecture is exactly the same.
Now once we've created and attached it to the VPC then we need to focus on the route tables in the subnet. We need to add a default IPv6 route of colon colon slash zero and use the egress only internet gateway as a target. This means that the flow for IPv6 traffic will flow to the egress only internet gateway via the VPC router and from there out to the destination service. Let's say a software update server. Any response traffic will be allowed to flow back in because all types of internet gateway understand the state of traffic, their stateful devices. What wouldn't be allowed in is any inbound traffic so traffic that's initiated from the public internet. This will fail, it won't be allowed to pass through the egress only internet gateway and reach our IPv6 enabled EC2 instances.
And that's it. It's not really a complex architecture. It's just like an internet gateway. Only it's designed for IPv6 traffic and it only allows outgoing connections and their response. It also allows incoming connections. Now you can use a normal internet gateway for both IPv4 instances with a public IPv4 IP and the IPv6 enabled instances and in that case traffic is allowed out and in in a bidirectional way. If you need to implement a VPC where you only want IPv6 instances to be able to connect out and receive responses so in many ways like the architecture that you get from using a NAT instance then if you need to do that with IPv6 then you use an egress only internet gateway.
Now you're going to get the chance to implement one of these yourself in an upcoming demo lesson later in this section and that will help really cement the knowledge that you've learned in this theory lesson. For now though just go ahead and complete the lesson and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back to this lesson where I want to talk briefly about VPC Flow Logs, which are a useful networking feature of AWS VPCs, providing details of traffic flow within the private network. The most important thing to know about VPC Flow Logs is that they only capture packet metadata; they don't capture packet contents. If you need to capture the contents of packets, then you need a packet sniffer, something which you might install on an EC2 instance. So just to be really clear on this point, VPC Flow Logs only capture metadata, which means things like the source IP, the destination IP, the source and destination ports, packet size, and so on — anything which conceptually you could observe from outside, anything to do with the flow of data through the VPC.
Now Flow Logs work by attaching virtual monitors within a VPC and these can be applied at three different levels. We can apply them at the VPC level, which monitors every network interface in every subnet within that VPC; at the subnet level, which monitors every interface within that specific subnet; and directly to interfaces, where they only monitor that one specific network interface.
Now Flow Logs aren't real time — there's a delay between traffic entering or leaving monitored interfaces and showing up within VPC Flow Logs. This often comes up as an exam question, so this is something that you need to be aware of: you can't rely on Flow Logs to provide real-time telemetry on network packet flow, as there's a delay between that traffic flow occurring and that data showing up within the Flow Logs product.
Now Flow Logs can be configured to go to multiple destinations — currently this is S3 and CloudWatch Logs. It's a preference thing, and each of these comes with their own trade-offs. If you use S3, you're able to access the log files directly and can integrate that with either a third-party monitoring solution or something that you design yourself. If you use CloudWatch Logs, then obviously you can integrate that with other products, stream that data into different locations, and access it either programmatically or using the CloudWatch Logs console. So that's important — that distinction you need to understand for the exam.
You can also use Athena if you want to query Flow Logs stored in S3 using a SQL-like querying method. This is important if you have an existing data team and a more formal, rigorous review process of your Flow Logs. You can use Athena to query those logs in S3 and only pay for the amount of data read. Athena, remember, is an ad hoc querying engine which uses a schema-on-read architecture, so you're only billed for the data as it's read through the product and the data that's stored on S3 — that's critical to understand.
Now visually, this is how the Flow Logs product is architected. We start with a VPC with two subnets — a public one on the right in green and a private one on the left in blue. This architecture is running the Categorum application and this specific implementation has an application server in the public subnet, which is accessed by our user Bob. The application uses a database within the private subnet, which has a primary instance as well as a replicated standby instance.
Flow Logs can be captured, as I just mentioned, at a few different points — at the VPC level, at the subnet level, and directly on specific elastic network interfaces — and it's important to understand that Flow Logs capture from that point downwards. So any Flow Logs enabled at the VPC level will capture traffic metadata from every network interface in every subnet in that VPC; anything enabled at the subnet level is going to capture metadata for any network interfaces in that specific subnet, and so on.
Flow Logs can be configured to capture metadata on only accepted connections, only on rejected connections, or on all connections. Visually, this is an example of a Flow Log configuration at the network interface level — it captures metadata from the single elastic network interface of the application instance within the public subnet. If we created something at the subnet level, for example the private subnet, then metadata from both of the database instances is captured as part of that configuration. Anything captured can be sent to a destination, and the current options are S3 and CloudWatch Logs.
Now I'm going to be discussing this in detail in a moment, but the Flow Logs product captures what are known as Flow Log Records, and architecturally these look something like this. I'm going to be covering this next in detail — I'm going to step through all of the different fields just to give you a level of familiarity before you get the experience practically in a demo lesson. A VPC Flow Log is a collection of rows and each row has the following fields. All of the fields are important in different situations, but I've highlighted the ones that I find are used most often — source and destination IP address, source and destination port, the protocol, and the action.
Consider this example: Bob is running a ping against an application instance inside AWS. Bob sends a ping packet to the instance and it responds — this is a common way to confirm connectivity and to assess the latency, so this is a good indication of the performance between two different internet-connected services. The Flow Log for this particular interaction might look something like this — I've highlighted Bob's IP address in pink and the server's private IP address in blue. This shows outward traffic from Bob to the EC2 instance — remember the order: source and destination, and that’s for both the IP addresses and the port numbers. Normally you would have a source and destination port number directly after that, but this is ping, so ICMP, which doesn't use ports, so that’s empty.
The one highlighted in pink is the protocol number — ICMP is 1, TCP is 6, and UDP is 17. Now you don't really need to know this in detail for the exam, but it definitely will help you if you use VPC Flow Logs day to day, and it might feature as a point of elimination in an exam question, so do your best to remember the number for ICMP, TCP, and UDP.
The second to last item indicates if the traffic was accepted or rejected — this indicates if it was blocked or not by a security group or a network access control list. If it's a security group, then generally only one line will show in the Flow Logs — remember security groups are stateful, so if the request is allowed, then the response is automatically allowed in return. What you might see is something like this, where you have one Flow Log record which accepts traffic and then another which rejects the response to that conversation.
If you have an EC2 instance inside a subnet where the instance has a security group allowing pings from an external IP address, then the response will be automatically allowed. But if you have a network ACL on that instance's subnet which allows the ping inbound but doesn't allow it outbound, then it can cause a second line — a reject. It's important that you look out for both of these types of things in the exam, so if you see an accept and then a reject, and these look to be for the same flow of traffic, then you're going to be able to tell that both a security group and a network ACL are used and they're potentially restricting the flow of traffic between the source and the destination.
Flow Logs show the results of traffic flows as they're evaluated — security groups are stateful and so they only evaluate the conversation itself, which includes the request and the response, while network ACLs are stateless and consider traffic flows as two separate parts, request and response, both of which are evaluated separately, so you might see two log entries within VPC Flow Logs.
Now one thing before I finish up with this lesson: VPC Flow Logs don't log all types of traffic — there are some things which are excluded. This includes things such as the metadata service (so any accesses to the metadata service running inside the EC2 instance), time server requests, any DHCP requests which are running inside the VPC, and any communications with the Amazon Windows license server — obviously this applies only for Windows EC2 instances — so you need to be aware that certain types of traffic are not actually recorded using Flow Logs.
Now we are going to have some demos elsewhere in the course where you are going to get some practical experience of working with Flow Logs, but this is all of the theory which I wanted to introduce within this lesson. At this point go ahead and complete this video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to cover the differences between stateful and stateless firewalls, and to do that I need to refresh your knowledge of how TCP and IP function, so let's just jump in and get started.
In the networking fundamentals videos I talk about how TCP and IP worked together; you might already know this if you have networking experience in the real world, but when you make a connection using TCP, what's actually happening is that each side is sending IP packets to each other, and these IP packets have a source and destination IP and are carried across local networks and the public internet.
Now TCP is a layer 4 protocol which runs on top of IP, and it adds error correction together with the idea of ports, so HTTP runs on TCP port 80 and HTTPS runs on TCP port 443 and so on, so keep that in mind as we continue talking about the state of connections.
So let's say that we have a user here on the left Bob and he's connecting to the Categoram application running on a server on the right; what most people imagine in this scenario is a single connection between Bob's laptop and the server, so Bob's connecting to TCP port 443 on the server and in doing so he gets information back, in this case many different categories.
Now you know that below the surface at layer 3 this single connection is handled by exchanging packets between the source and the destination; conceptually though you can imagine that each connection, in this case it's an outgoing connection from Bob's laptop to the server, and each one of these is actually made up of two different parts.
First we've got the request part where the client requests some information from a server, in this case from Categors, and then we have the response part where that data is returned to the client; now these are both parts of the same interaction between the client and server, but strictly speaking you can think of these as two different components.
What actually happens as part of this connection setup is this: first the client picks a temporary port and this is known as an ephemeral port, and typically this port has a value between 1024 and 65535, but this range is dependent on the operating system which Bob's laptop is using; then once this ephemeral port is chosen the client initiates a connection to the server using a well-known port number.
Now a well-known port number is a port number which is typically associated with one specific popular application or protocol; in this case TCP port 443 is HTTPS, so this is the request part of the connection, it's a stream of data to the server—you're asking for something, some cat pictures or a web page.
Next the server responds back with the actual data; the server connects back to the source IP of the request part, in this case Bob's laptop, and it connects to the source port of the request part, which is the ephemeral port which Bob's laptop has chosen—this part is known as the response.
So the request is from Bob's laptop using an ephemeral port to a server using a well-known port, and the response is from the server on that well-known port to Bob's laptop on the ephemeral port; now it's these values which uniquely identify a single connection—so that's a source port and source IP, and a destination IP and a destination port.
Now hope that this makes sense so far, if not then you need to repeat this first part of the video again because this is really important to understand; if it does make sense then let's carry on.
Now let's look at this example in a little bit more detail; this is the same connection that we looked at on the previous screen, we have Bob's laptop on the left and the Catering Server on the right—obviously the left is the client and the right is the server.
I also introduced the correct terms on the previous screen so request and response, so the first part is the client talking to the server asking for something and that's the request, and the second part is the server responding and that's the response.
But what I want to get you used to is that the directionality depends on your perspective and let me explain what I mean; so in this case the client initiates the request and I've added the IP addresses on here for both the client and the server, so what this means is the packets will be sent from the client to the server and these will be flowing from left to right.
These packets are going to have a source IP address of 119.18.36.73, which is the IP address of the client—so Bob's laptop—and they will have a destination IP of 1.3.3.7, which is the IP address of the server; now the source port will be a temporary or ephemeral port chosen by the client and the destination port will be a well-known port—in this case we're using HTTPS so TCP port 443.
Now if I challenge you to take a quick guess, would you say that this request is outbound or inbound?
If you had to pick, if you had to define a firewall rule right now, would you pick inbound or outbound?
Well this is actually a trick question because it's both; from the client perspective this request is an outbound connection, so if you're adding a firewall rule on the client you would be looking to allow or deny an outbound connection.
From the server perspective though it's an inbound connection, so you have to think about perspective when you're working with firewalls; but then we have the response part from the server through to the client, and this will also be a collection of packets moving from right to left.
This time the source IP on those packets will be 1.3.3.7, which is the IP address of the server; the destination IP will be 119.18.36.73, which is the IP address of the client—so Bob's laptop—the source port will be TCP port 443, which is the well-known port of HTTPS and the destination port will be the ephemeral port chosen originally by the client.
Now again I want you to think about the directionality of this component of the communication—is it outbound or inbound?
Well again it depends on perspective; the server sees it as an outbound connection from the server to the client, and the client sees it as an inbound connection from the server to itself.
Now this is really important because there are two things to think about when dealing with firewall rules: the first is that each connection between a client and a server has two components, the request and the response—so the request is from a client to a server and the response is from a server to a client.
The response is always the inverse direction to the request, but the direction of the request isn't always outbound and isn't always inbound—it depends on what that data is together with your perspective, and that's what I want to talk about a bit more on the next screen.
Let's look at this more complex example; we still have Bob and his laptop on the CaterGram server, but now we have a software update server on the bottom left.
Now the CaterGram server is inside a subnet which is protected by a firewall—and specifically this is a stateless firewall; a stateless firewall means that it doesn't understand the state of connections.
What this means is that it sees the request connection from Bob's laptop to CaterGram and the response from CaterGram to Bob's laptop as two individual parts, and you need to think about allowing or denying them as two parts—you need two rules, in this case one inbound rule which is the request and one outbound rule for the response.
This is obviously more management overhead—two rules needed for each thing, each thing which you as a human see as one connection—but it gets slightly more confusing than that.
For connections to the CaterGram server—so for example when Bob's laptop is making a request—then that request is inbound to the CaterGram server, and the response logically enough is outbound, sending data back to Bob's laptop, which is possible to have the inverse.
Consider the situation where the CaterGram server is performing software updates; well in this situation the request will be from the CaterGram server to the software update server—so outbound—and the response will be from the software update server to the CaterGram server—so this is inbound.
So when you're thinking about this, start with the request—is the request coming to you or going to somewhere else?—the response will always be in the reverse direction.
So this situation also requires two firewall rules—one outbound for the request and one inbound for the response.
Now there are two really important points I want to make about stateless firewalls: first, for any servers where they accept connections and where they initiate connections—and this is common with web servers which need to accept connections from clients, but where they also need to do software updates—in this situation you'll have to deal with two rules for each of these, and they will need to be the inverse of each other.
So get used to thinking that outbound rules can be both the request and the response, and inbound rules can also be the request and the response; it's initially confusing, but just remember, start by determining the direction of the request, and then always keep in mind that with stateless firewalls you're going to need an inverse rule for the response.
Now the second important thing is that the request component is always going to be to a well-known port; if you're managing the firewall for the category application, you'll need to allow connections to TCP port 443.
The response though is always from the server to a client, but this always uses a random ephemeral port; because the firewall is stateless, it has no way of knowing which specific port is used for the response, so you'll often have to allow the full range of ephemeral ports to any destination.
This makes security engineers uneasy, which is why stateless firewalls which I'll be talking about next are much better.
Just focus on these two key elements—that every connection has a request and a response, and together with those keep in mind the fact that they can both be in either direction, so a request can be inbound or outbound, and a response will always be the inverse to the directionality of the request.
Also you'll keep in mind that any rules that you create for the response will need to often allow the full range of ephemeral ports—that's not a problem with stateless firewalls which I want to cover next.
So we're going to use the same architecture—we've got Bob's laptop on the top left, the category server on the middle right, and the software update server on the bottom left.
A stateless firewall is intelligent enough to identify the response for a given request; since the ports and IPs are the same, it can link one to the other, and this means that for a specific request to category from Bob's laptop to the server, the firewall automatically knows which data is the response, and the same is true for software updates—for a given connection to a software update server, the request, the firewall is smart enough to be able to see the response or the return data from the software update server back to the category server.
And this means that with a stateful firewall, you'll generally only have to allow the request or not, and the response will be allowed or not automatically.
This significantly reduces the admin overhead and the chance for mistakes, because you just have to think in terms of the directionality and the IPs and ports of the request, and it handles everything else.
In addition, you don't need to allow the full ephemeral port range, because the firewall can identify which port is being used, and implicitly allow it based on it being the response to a request that you allow.
Okay, so that's how stateless and stateful firewalls work, and now it's been a little bit abstract, but this has been intentional, because I want you to understand how they work, and sexually, before I go into more detail with regards to how AWS implements both of these different security firewall standards.
Now at this point, I've finished with the abstract description, so go ahead and finish this video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, and in this lesson, I want to continue the theme of VPC networking in AWS by covering VPC subnets.
Now subnets are what services run from inside VPCs, and they're how you add structure, functionality, and resilience to VPCs, so they're an important thing to get right, both for production deployment and to do well in the exam.
So let's not waste time, let's jump in and get started.
In this lesson, we'll be starting off with this architecture, which is exactly how we left off at the end of the previous lesson—a framework VPC, a skeleton—and what we'll be doing is creating an internal structure using subnets, turning this into this.
Now if you compare this diagram to the one that I've linked previously, you might notice that the web tier subnets on the right are blue on this diagram instead of green on the diagram that I previously linked; and with AWS diagrams, blue means private subnets and green means public subnets.
Subnets inside a VPC start off entirely private and they take some configuration to make them public, so at this point, the subnets which will be created on the right—the web tier—will be created as private subnets, and in the following lessons, we'll change that together; so for now, this diagram showing them as private subnets is correct.
So what exactly is a subnet?
It's an easy, resilient feature of a VPC—a subnetwork of the VPC, a part of the VPC that's inside a specific availability zone—created within one availability zone and never changeable because it runs inside of an availability zone.
If that availability zone fails, then the subnet itself fails, and so do any services that are only hosted in that one subnet, and as AWS Solutions Architects, when we design highly available architectures, we're trying to put different components of our system into different availability zones to make sure that if one fails, our entire system doesn't fail.
And the way that we do that is to put these components of our infrastructure into different subnets, each of which are located in a specific availability zone.
The relationship between subnets and availability zones is that one subnet is created in a specific availability zone in that region, it can never be changed, and a subnet can never be in multiple availability zones—that's the important one to remember for the exam: one subnet is in one availability zone, and a subnet can never be in more than one availability zone.
Logically, though, one availability zone can have zero or lots of subnets—so one subnet is in one availability zone, but one availability zone can have many subnets.
Now, the subnet by default uses IP version 4 networking and it's allocated an IP version 4 cider, which is a subset of the VPC side block and has to be within the range that's allocated to the VPC.
What's more, the cider the subnet uses cannot overlap with any other subnets in that VPC—they have to be non-overlapping—and that's another topic which tends to come up all the time in the exam.
Now, a subnet can optionally be allocated an IP version 6 side block as long as the VPC also is enabled for IP version 6, and the range that's allocated to individual subnets is a /64 range, which is a subset of that /56 VPC—so /56 IP version 6 range has enough space for 256 /64 ranges that each subnet can use.
Now, subnets inside the VPC can, by default, communicate with other subnets in that same VPC, since the isolation of the VPC is at the perimeter of the VPC, and internally, there is free communication between subnets by default.
Now, we spoke in previous lessons about sizing—so sizes of networks are based on the prefix.
For example, a /24 network allows values from 0 to 255 in the fourth octet—that's a possible 256 possible IPs—but inside a subnet, you don't get to use them all because some IPs inside every VPC subnet are reserved.
So let's look at those next.
There are five IP addresses within every VPC subnet that you can't use, so whatever the size of the subnet, the usable IPs are five less than you would expect.
Let's assume, for example, that the subnet we're talking about is 10.16.16.0/20, so this has a range of 10.16.16.0 to 10.16.31.255.
The first address which is unusable is the network address—the first address of any subnet represents the network, the starting address of the network, and it can't be used.
This isn't specific to AWS—it’s a case for any other IP networks as well—nothing uses the first address on a network.
Next is what's known as the network plus one address, the first IP after the network address, and in AWS this is used by the VPC router—the logical network device which moves data between subnets and in and out of the VPC if it's configured to allow that.
The VPC router has a network interface in every subnet, and it uses this network plus one address.
Next is another AWS-specific IP address which can't be used, called the network plus two address.
In a VPC, the second usable address of the VPC range is used for DNS, and AWS reserves the network plus two address in every subnet.
So I've put DNS and an asterisk here because I refer to this reservation as the DNS reservation, but strictly speaking, it's the second address in a VPC which is used for DNS—so that's the VPC range plus two—but AWS does reserve the network plus two address in every single subnet, so you need to be aware of that.
And there's one more AWS-specific address that you can't use, and you guessed it, it's a network plus three address, which doesn't have a use yet, but is reserved for future requirements—this is the network plus three, and this is 10.16.16.3.
And then lastly, the final IP address that can't be used in every VPC subnet is the network broadcast address—broadcasts are not supported inside a VPC, but the last IP address in every subnet is reserved regardless, so you cannot use this last address.
So this makes a total of five IP addresses in every subnet that you can't use—three AWS-specific ones, and then the network and broadcast addresses.
So if the subnet should have 16 IPs, it actually has 11 usable IPs, so keep this in mind, especially when you're creating smaller VPCs and subnets because this can quickly eat up IP addresses, especially if you use small VPCs with lots of subnets.
Now a VPC has a configuration object applied to it called a DHCP option set—DHCP stands for Dynamic Host Configuration Protocol, and it's how computing devices receive IP addresses automatically.
Now there's one DHCP option set applied to a VPC at one time, and this configuration flows through to subnets; it controls things like DNS servers, NTP servers, NetBIOS servers, and a few other things.
If you've ever managed a DHCP server, this will be familiar—so for every VPC, there's a DHCP option set that's linked to it and that can be changed.
You can create option sets, but you cannot edit them, so keep in mind: if you want to change the settings, you need to create a new one and then change the VPC allocation to this new one.
On every subnet, you can also define two important IP allocation options.
The first option controls if resources in a subnet are allocated a public IP version 4 address in addition to their private subnet address automatically.
Now I'm going to be covering this in a lesson on routing and internet gateway, because there's some additional theory that you need to understand about public IP version 4 addresses, but this is one of the steps that you need to do to make a subnet public—so it's on a per-subnet basis that you can set auto-assign public IP version 4 addresses.
Another related option defined at a subnet level is whether resources deployed into that subnet are also given an IP version 6 address, and logically, for that to work, the subnet has to have an allocation—as does the VPC—but both of these options are defined at a subnet level and flow onto any resources inside that subnet.
Okay, so now it's time for a demo—that's all the theory that I wanted to cover in this VPC subnet lesson.
So in the demo lesson, we're going to implement the structure inside VPC together—we're essentially going to change this skeleton VPC into a multi-tier VPC that's configured with all of these subnets.
Now it's going to be a fairly detailed demo lesson—you’re going to have to create all of these 12 subnets manually one by one—and out of all the lessons, the detail really matters on this one.
We need to make sure that you configure this exactly as required so you don't have any issues in future.
Now if you do make any mistakes, I'm going to make sure that I supply a CloudFormation template with the next lesson that allows you to configure this in future automatically.
But the first time that you do this lesson, I do want you to do it manually because you need to get used to the process of creating subnets—controlling what the IP ranges are, being able to select which availability zone they go in, and knowing how to assign IP version 6 ranges to those subnets.
So it is worthwhile investing the time to create each of these 12 subnets manually, and that's what we're going to do in the next demo lesson.
But at this point, go ahead and complete this lesson, and then when you've got the time, I'll see you in the next demo lesson where we'll complete the configuration of this VPC by creating the subnets.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. Over the remaining lessons in this section, you're going to learn how to build a complex, multi-tier, custom VPC step by step. One of the benefits of the VPC product is that you can start off simple and layer components in piece by piece. This lesson will focus on just the VPC shell, but by the end of this section, you'll be 100% comfortable building a pretty complex private network inside AWS. So let's get started.
Now, don't get scared off by this diagram, but this is what we're going to implement together in this section, of course. Right now, it might look complicated, but it's like building a Lego project—we'll start off simple and add more and more complexity as we go through the section. This is a multi-tier, custom VPC. If you look at the IP plan document that I linked in the last lesson, it's using the IP address at the first range of the US Region 1 for the general account, so 10.16.0.0/16, so the VPC will be configured to use that range. Inside the VPC, there'll be space for four tiers running in four availability zones for a total of 16 possible subnets.
Now, we'll be creating all four tiers—so reserved, database, app, and web—but only three availability zones, A, B, and C. We won't be creating any subnets in the capacity reserved for the future availability zone, so that's the part at the bottom here. In addition to the VPC that we'll create in this lesson, the subnets that we'll create in the following lessons will also, as we look through the section of the course, be creating an internet gateway which will give resources in the VPC public access. We'll be creating NAT gateways which will give private instances outgoing-only access, and we'll be creating a bastion host which is one way that we can connect into the VPC.
Now, using bastion hosts is frowned upon and isn't best security practice for getting access to AWS VPCs, but it's important that you understand how not to do something in order to appreciate good architectural design. So I'm going to step you through how to implement a bastion host in this part of the course, and as we move through later sections of the course, you'll learn more secure alternatives. Finally, later on in the section, we'll also be looking at network access control lists on knuckles, which can be used to secure the VPC, as well as data transfer costs for any data that moves in and around the VPC.
Now, this might look intimidating, but don't worry, I'll be explaining everything every step of the way. To start with though, we're going to keep it simple and just create the VPC. Before we do create a VPC, I want to cover some essential architectural theory, so let's get started with that.
VPCs are a regionally isolated and regionally resilient service. A VPC is created in a region and it operates from all of the AZs in that region. It allows you to create isolated networks inside AWS, so even in a single region in an account, you can have multiple isolated networks. Nothing is allowed in or out of a VPC without a piece of explicit configuration. It's a network boundary and it provides an isolated glass radius. What I mean by this is if you have a problem inside a VPC—so if one resource or a set of resources are exploited—the impact is limited to that VPC or anything that you have connected to it.
I talked earlier in the course about the default VPC being set up by AWS using the same static structure of one subnet per availability zone using the same IP address ranges and requiring no configuration from the account administrator. Well, custom VPCs are pretty much the opposite of that. They let you create networks with almost any configuration, which can range from a simple VPC to a complex multi-tier one such as the one that we're creating in this section. Custom VPCs also support hybrid networking, which let you connect your VPC to the cloud platforms as well as on-premises networks, and we'll cover that later on in the course.
When you create a VPC, you have the option of picking default or dedicated dependency. This controls whether the resources created inside the VPC are provisioned on shared hardware or dedicated hardware. So be really careful with this option. If you pick default, then you can choose on a per-resource basis later on when you provision resources as whether it goes on shared hardware or dedicated hardware. If you pick dedicated tenancy at a VPC level, then that's locked in—any resources that you create inside that VPC have to be on dedicated hardware. So you need to be really careful with this option because dedicated tenancy comes at a cost premium, and my rule on this is unless you really know that you require dedicated, then pick default, which is the default option.
Now, VPC can use IP version for private and public IPs. The private side block is the main method of IP communication for the VPC. So by default, everything uses these private addresses. Public IPs are used when you want to make resources public, when you want them to communicate with the public internet or the AWS public zone, or you want to allow communication to them from the public internet. Now, VPC is allocated one mandatory private IP version for side block—this is configured when you create the VPC, which you'll see in a moment when we actually create a VPC.
Now, this primary block has two main restrictions: it can be at its smallest a /28 prefix, meaning the entire VPC has 16 IP addresses (and some of those can't be used—more on that in the next lesson when I talk about subnet, though), and at the largest, a VPC can use a /16 prefix, which is 65,536 IDs. Now, you can add secondary IP version for side blocks after creation, but by default, at the time of creating this lesson, there's a maximum of five of those, but they can be increased by using a support ticket. But generally, when you're thinking conceptually about a VPC, just imagine that it's got a pool of private IP version 4 addresses, and optionally, it can use public addresses.
Now, another optional configuration is that a VPC can be configured to use IP version 6 by assigning a /56 IP V6 sider to the VPC. Now, this is a feature set which is still being enjoyed, so not everything works with the same level of features as it does for IP version 4, but with the increasing worldwide usage of IP version 6, in most circumstances, you should start looking at applying an IP version 6 range as a default. An important thing about IP version 6 is that the range is either allocated by AWS—as in, you have no choice on which range to use—or you can select to use your own IP version 6 addresses, addresses which you own. You can't pick a block like you can with IP version 4—either let AWS assign it or you use addresses that you own.
Now, IP version 6 IPs don't have the concept of private and public—the range of IP version 6 addresses that AWS uses are all publicly routable by default. But if you do use them, you still have to explicitly allow connectivity to and from the public internet. So don't worry about security concerns—it just removes an admin overhead because you don't need to worry about this distinction between public and private.
Now, AWS VPCs also have fully featured DNS. It's provided by round 53, and inside the VPC, it's available on the base IP address of the VPC plus 2. So the VPC is 10.0.0.0, and the DNS IP will be 10.0.0.2. Now, there are two options which are critical for how DNS functions in a VPC, so I've highlighted both of them. The first is a setting called enable DNS host names, and this indicates whether instances with public IP addresses in a VPC are given public DNS host names. So if this is set to true, then instances do get public DNS host names. If it's not set to true, they don't.
The second option is enable DNS support, and this indicates whether DNS is enabled or disabled in the VPC—so DNS resolution. If it is enabled, then instances in the VPC can use the DNS IP address, so the VPC plus 2 IP address. If this is set to false, then this is not available. Now, why I mention both of these is if you do have any questions in the exam or any real-world situations where you're having DNS issues, these two should be the first settings that you check, switched on or off as appropriate. And in the demo part of this lesson, I'll show you where to access those.
Speaking of which, it's now time for the demo component of this lesson, and we're going to implement the framework of VPC for the Animals for Life organization together inside our AWS account. So let's go ahead and finish the theory part of this lesson right now, and then in the next lesson, the demo part will implement this VPC together.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back, this is part two of this lesson, and we're going to continue immediately from the end of part one, so let's get started. That's a good starting point for our plan, but before I elaborate more on that plan though, let's think about VPC sizing and structure.
AWS provides some useful pointers on VPC sizing, which I'll link to in the lesson text, but I also want to talk about it briefly in this lesson. They define micro as a /24 VPC with eight subnets inside it, each subnet as a /27, which means 27 IP addresses per subnet, and a total of 216. This goes all the way through to extra large, which is a /16 VPC with 16 subnets inside, each of which is a /20, offering 4,091 IP addresses per subnet, for a total of just over 65,000.
And deciding which to use, there are two important questions: first, how many subnets will you need in each VPC? And second, how many IP addresses will you need in total, and how many IP addresses in each subnet?
Now deciding how many subnets to use, there's actually a method that I use all the time, which makes it easier, so let's look at that next. So this is the shell of a VPC, but you can't just use a VPC to launch services into—that's not how it works in AWS. Services use subnets, which are where IP addresses are allocated from; VPC services run from within subnets, not directly from the VPC.
And if you remember, all the way back at the start of the course where I introduced VPCs and subnets, I mentioned that a subnet is located in one availability zone. So the first decision point that you need to think about is how many availability zones your VPC will use. This decision impacts high availability and resilience, and it depends somewhat on the region that the VPC is in, since some regions are limited in how many availability zones they have.
So we'll have three, so we'll have more—so step one is to pick how many availability zones your VPC will use. Now I'll spoil this and make it easy: I always start with three as my default. Why? Because it will work in almost any region, and I also always add a spare, because we all know at some point things grow, so I aim for at least one spare. And this means there's a minimum for availability zones, A, B, C, and the spare. If you think about it, that means that we have to at least split the VPC into at least four smaller networks, so if we started with a /16, we would now have four /18s.
As well as the availability zones inside of VPC, we also have tiers, and tiers are the different types of infrastructure that are running inside that VPC. We might have a web tier, an application tier, a database tier—that makes three—and you should always add buffer. So my default is to start with four tiers: web, application, database, and a spare. Now the tiers you think your architecture might be different, but my default for most designs is to issue three plus a spare: to web, application, database, and then a spare for future use.
If you only used one availability zone, then each tier would need its own subnet, meaning four subnets in total. But we also have four AZs, and since we want to take full advantage of the resiliency provided by these AZs, we need the same base networking duplicated in each availability zone. So each tier has its own subnet in each availability zone: for web subnets, for app subnets, for database subnets, and for spares—for a total of 16 subnets.
So if we chose a /16 for the VPC, that would mean that each of the 16 subnets would need to fit into that /16. So a /16 VPC split into 16 subnets results in 16 smaller network ranges, each of which is a /20. Remember, each time the prefix is increased—from 16 to 17, it creates two networks; from 16 to 18, it creates four; from 16 to 19, it creates eight; from 16 to 20, it creates 16 smaller networks.
Now that we know that we need 16 subnets, we could start with a /17 VPC, and then each subnet would be a /21, or we could start with a /18 VPC, and then each subnet would be a /22, and so on. Now that you know the number of subnets, and because of that, the size of the subnets in relation to the VPC prefix size, picking the size of the VPC is all about how much capacity you need. Whenever prefix you pick for the VPC, the subnets will be four steps away.
So let's move on to the last part of this lesson, where we're going to be deciding exactly what to use. Now, Animals for Life is a global organization already, but with what's happening environmentally around the world, the business could grow significantly, and so when designing the IP plans for the business, we need to assume a huge level of growth.
We've talked about a preference for the 10 range, but avoiding the common networks and avoiding Google would give us a 10.16 to 10.127 to use as /16 networks. We have five regions that we're going to be assuming the business will use: three to be chosen in the US, one in Europe, and one in Australia. So if we start at 10.16 and break this down into segments, we could choose to use 10.16 to 10.31 as US Region 1, 10.32 to 10.47 as US Region 2, 10.48 to 10.63 as US Region 3, 10.64 to 10.79 as Europe, and 10.18 to 10.95 as Australia—that is a total of 16 /16 network ranges for each region.
Now, we have a total of three accounts right now: general, prod, and dev, and let's add one more buffer, so that's four total accounts. So if we break down those ranges that we've got for each region, break them down into four, one for each account, then each account in each region gets four /16 ranges, you know, for four VPCs per region per account.
So I've created this PDF and I've included this attached to this lesson and in this lesson’s folder on the course GitHub repository. So if you go into VPC-basics, in there is a folder called VPC-Sizing and Structure, and then in this folder is a document called A4L for Animals for Life, underscore idplan.pdf, and this is that document. So I've just tried to document here exactly what we've done with these different ranges: so starting at the top here, we've blocked off all these networks, these are common ranges to avoid, and we're starting at 10.16 for Animals for Life, and then starting at 10.16, I've blocked off 16 /16 networks for each region—so US Region 1, Region 2, Region 3, Europe, and Australia—and then we're left with some of the renewed and they're reserved.
After that, of course, from 10.1 to 8 onwards, that's reserved for the Google Cloud usage, which we're uncertain about, so all the way to the end, that's blocked off. And then within each region, we've got three A.L. US accounts that we know about: general, prod, and dev, and then one set for reserved future use. So in the region, each of those accounts has four Class B networks—enough for four non-overlapping VPCs.
So feel free to look through this document, I've included the PDF and the original A.L. numbers document, so feel free to use this, adjust this for your network, and just experiment with some IP planning. But this is the type of document that I'll be using as a starting point for any large A.L. US deployments. I'm going to be using this throughout this course to plan the IP address ranges whenever we're creating a VPC—we obviously won't be using all of them, but we will be using this as a foundation.
Now based on that plan, that means we have a /16 range to use for each VPC in each account in each region, and these are non-overlapping. Now I'm going to be using the VPC structure that I've demonstrated earlier in this lesson, so we'll be assuming the usage of three availability zones plus a spare, and three application tiers plus a spare—and this means that each VPC is broken down into a total of 16 subnets, and each of those subnets is a /20 subnet, which represents 4,091 IP addresses per subnet.
Now this might seem excessive, but we have to assume the highest possible growth potential for Animals for Life—we've got the potential growth of the business, we've got the current situation with the environment, and the raising profile of animal welfare globally, so there is a potential that this business could grow rapidly.
This process might seem vague and abstract, but it's something that you'll need to do every time you create a well-designed environment in A.L. US. You'll consider the business needs, you'll avoid the ranges that you can't use, you'll allocate the remainder based on your business's physical or logical layout, and then you'll decide upon and create the VPC and subnet structure from there. You'll always work either top-down or bottom-up—you can start with the minimum subnet size that you need and work up, or start with the business requirements and work down.
When we start creating VPCs and services from now on in the course, we will be using this structure, and so I will be referring back to this lesson and that PDF document constantly, so you might want to save it somewhere safe or print it out—make sure you've got a copy handy because we will be referring back to it constantly as we're deciding upon our network topology throughout the course.
With that being said, though, that's everything I wanted to cover in this lesson. I hope it's been useful and I hope it's been a little bit abstract, but I wanted to step you through the process that a real-world solutions architect would use when deciding on the size of subnets and the VPCs, as well as the different structure these network components would have in relation to each other's IP plan. But at this point, that is it with the abstract theory—from this point onward in this section of the course, we're going to start talking about the technical aspects of AWS private networking, starting with VPCs and VPC subnets, so go ahead, complete this video, and when you're ready, you can move on to next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I'm going to cover a topic that many courses don't bother with — how to design a well-structured and scalable network inside AWS using a VPC. Now, this lesson isn't about the technical side of VPC; it's about how to design an IP plan for a business, which includes how to design an individual network within that plan, which when running in AWS means designing a VPC. So let's get started and take a look, because this is really important to understand, especially if you're looking to design real-world solutions or if you're looking to identify any problems or performance issues in exam questions.
Now, during this section of the course, you'll be learning about and creating a custom VPC — a private network inside AWS. When creating a VPC, one of the first things you'll need to decide on is the IP range that the VPC will use, the VPC SIDA. You can add more than one, but if you take architecture seriously, you need to know what range the VPC will use in advance; even if that range is made up of multiple smaller ranges, you need to think about this stuff in advance. Deciding on an IP plan and VPC structure in advance is one of the most critically important things you will do as a solutions architect, because it's not easy to change later and it will cause you a world of pain if you don't get it right.
Now, when you start this design process, there are a few things that you need to keep in mind. First, what size should the VPC be? This influences how many things, how many services can fit into that VPC — each service has one or more IPs and they occupy the space inside a VPC. Secondly, you need to consider all the networks that you'll use or that you'll need to interact with. In the previous lesson, I mentioned that overlapping or duplicate ranges would make network communication difficult, so choosing widely at this stage is essential. Be mindful about ranges that other VPCs use, ranges which are utilized in other cloud environments, on other on-premises networks, and even partners and vendors — try to avoid ranges which other parties use which you might need to interact with and be cautious; if in doubt, assume the worst.
You should also aim to predict what could happen in the future — what the situation is now is important, but we all know that things change, so consider what things could be like in the future. You also need to consider the structure of the VPC — for a given ID range that we allocate to a VPC, it will need to be broken down further. Every IT network will have tiers; Web tier, Application tier and Database tier are three common examples, but there are more, and these will depend on your exact IT architecture. Tiers are things which separate application components and allow different security to be applied, for example.
Modern IT systems also have different resiliency zones, known as Availability Zones in AWS — networks are often split, and parts of that network are assigned to each of these zones. These are my starting points for any systems design. As you can see, it goes beyond the technical considerations, and rightfully so — a good solid infrastructure platform is just as much about a good design as it is about a good technical implementation.
So since this course is structured around a scenario, what do we know about the Animals for Life organization so far? We know that the organization has three major offices — London, New York and Seattle — that will be three IP address ranges which we know are required for our global network. We don't know what those networks are yet, but as Solutions Architects, we can find out by talking to the IT staff of the business. We know that the organization has field workers who are distributed globally, and so they'll consume services from a range of locations — but how will they connect to the business? Will they access services via web apps? Will they connect to the business networks using a virtual private network or VPN? We don't know, but again, we can ask the question to get this information.
What we do know is that the business has three networks which already exist — 192.168.10.0/24, which is the business's on-premise network in Brisbane; 10.0.0.0/16, which is the network used by an existing AWS pilot; and finally, 172.31.0.0/16, which is used in an existing Azure pilot. These are all ranges our new AWS network design cannot use and also cannot overlap with. We might need to access data in these networks, we might need to migrate data from these networks, or in the case of the on-premises network, it will need to access our new AWS deployment, so we have to avoid these three ranges. And this information that we have here is our starting point, but we can obtain more by asking the business.
Based on what we already know, we have to avoid 192.168.10.0/24, we have to avoid 10.0.0.0/16, and we have to avoid 172.31.0.0/16 — these are confirmed networks that are already in use. And let's also assume that we've contacted the business and identified that the other on-premises networks which are in use by the business — 192.168.15.0/24 is used by the London office, 192.168.20.0/24 is used by the New York office, and 192.168.25.0/24 is used by the Seattle office. We've also received some disturbing news — the vendor who previously helped Animals for Life with their Google Cloud approval concept cannot confirm which networks are in use in Google Cloud, but what they have told us is that the default range is 10.128.0.0/9, and this is a huge amount of IP address space; it starts at 10.128.0.0 and runs all the way through to 10.255.255.255, and so we can't use any of that if we're trying to be safe, which we are.
So this list would be my starting point — when I'm designing an IP addressing plan for this business, I would not use any of this IP address space. Now I want you to take a moment — pause the video if needed — and make sure you understand why each of these ranges can't be used. Start trying to become familiar with how the network address and the prefix map onto the range of addresses that the network uses — you know that the IP address represents the start of that range. Can you start to see how the prefix helps you understand the end of that range?
Now with the bottom example for Google, remember that a /8 is one fixed value for the first octet of the IP and then anything else — Google's default uses /9, which is half of that, so it starts at 10.128 and uses the remainder of that 10. space, so 10.128 through to 10.255. And also, an interesting fact — the Azure network is using the same IP address range as the AWS default VPC uses, so 172.31.0.0, and that means that we can't use the default VPC for anything production, which is fine because as I talked about earlier in the course, as architects, where possible, we avoid using the default VPC.
So at this point, if this was a production process, if we were really designing this for a real organization, we'd be starting to get a picture of what to avoid — so now it's time to focus on what to pick. Now, there is a limit on VPC sizing in AWS — a VPC can be at the smallest /28 network, so that's 16 IP addresses in total, and at most, it can be a /16 network, which is just over 65,000 IP addresses. Now, I do have a personal preference, which is to use networks in the 10 range — so 10.x.y.z — and given the maximum VPC size, this means that each of these /16 networks in this range would be 10.1, 10.2, 10.3, all the way through to 10.255.
I also find it important to avoid common ranges — in my experience, this is logically 10.0, because everybody uses that as a default, and 10.1, because as human beings, everybody picks that one to avoid 10.0. I'd also avoid anything up to and including 10.10 to be safe, and just because I like base 2 numbers, I would suggest a starting point of 10.16.
With this starting point in mind, we need to start thinking about the IP plan for the Animals for Life business — we need to consider the number of networks that the business will need, because we'll allocate these networks starting from this 10.16.10 range. Now, the way I normally determine how many ranges a business requires is I like to start thinking about how many AWS regions the business will operate in — be cautious here and think of the highest possible number of regions that a business could ever operate in, and then add a few as a buffer. At this point, we're going to be pre-allocating things in our IP plan, so caution is the term of the day.
I suggest ensuring that you have at least two ranges which can be used in each region in each AWS account that your business uses. For Animals for Life, we really don't yet know how many regions the business will be operating in, but we can make an educated guess and then add some buffer to protect us against any growth — let's assume that the maximum number of regions the business will use is three regions in the US, one in Europe, and one in Australia. That's a total of five regions; we want to have two ranges in each region, so that's a total of five times two — so 10 ranges. And we also need to make sure that we've got enough for all of our AWS accounts, so I'm going to assume four AWS accounts — that's a total number of ID ranges of two in each of five regions, so that's 10, and then that in each of four accounts, so that's a total of ideally 40 ID ranges.
So to summarise where we are — we're going to use the 10 range, we're going to avoid 10.0 to 10.10 because they're far too common, we're going to start at 10.16 because that's a nice, clean, base-2 number, and we can't use 10.128 through to 10.255 because potentially that's used by Google Cloud. So that gives us a range of possibilities from 10.16 to 10.127 inclusive, which we can use to create our networks — and that's plenty.
Okay, so this is the end of part one of this lesson — it's getting a little bit on the long side, and so I wanted to add a break. So that's it. I'm going to take a little bit of time to get to the end of part one.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to quickly touch on a feature of S3 known as S3 Request to Pays. Now it will be far easier to show you visually rather than talk about it so let's jump into an architecture visual and get started.
Now to illustrate how this works I want to step through a scenario. Let's call it the tail of two buckets. We have a normal bucket and a request to pays bucket. Now the normal bucket belongs to Julie and the request to pays bucket belongs to Mike. Julie and Mike are both intending to host large data sets of animal pictures for some machine learning projects and so they upload data into their S3 buckets.
Now regardless of whether this is a normal bucket or a request to pays bucket both Mike and Julie would be responsible for any cost of this activity but as transfer into S3 is free of charge neither Mike or Julie are charged anything for this activity by AWS.
Now they are both storing large amounts of data in their buckets at this point and so both of them receive a GB per month charge for data storage within their buckets but S3 is pretty economical and so this isn't a huge charge even for large quantities of data.
Now this is where things change this is where Julie becomes less happy and Mike can relax. Mike has changed a bucket setting for request to pays and he's changed the value from owner to requester. Now this is a per bucket setting and enabling this option means that Mike now has a number of considerations. The main one being that he's now limited to not using static website hosting and bit torrent because to achieve the benefit of request to pays he needs authenticated identities to use the bucket and with bit torrent and static website hosting people accessing the bucket and not using any form of authentication.
Now let's assume at this point that for both Mike and Julie the animal data set is really popular and so it's used by lots of people. Now in Julie's case this might be a problem for every session accessing the data there's going to be a small charge. Individually this might not seem like a big problem in this case it's for accesses but what about 400 or 400 million each session might only have a tiny charge but because the owner pays for this bucket Julie is responsible for the data transfer charges out of AWS and for popular data sets with lots of data and many users this charge can be significant especially for smaller businesses or those using personal AWS accounts.
Now Mike has chosen request to pays and so he doesn't have this problem. Any sessions downloading data from Mike's bucket need to be authenticated for this to work. Unauthenticated access is not supported and the reason for this is because AWS allocate those costs to the identities making the request so each of the users will be allocated the costs for their individual session their download at this data set. The result individual users might be slightly less happy but Mike will have zero download costs.
Now two things are needed to ensure that this works. The first is that the users downloading need to be authenticated users and second the identities downloading the data need to supply the x-amz-request-payer header to confirm the payment responsibility so you need to access objects in this bucket and as part of the request you need to include this header and if you do it means you will be charged via your identity inside your AWS account rather than the bucket owner having to pay all of those transfer charges and that's at the high level is how S3 request a pays works and this is a feature that you're going to need to understand for the exam.
It's relatively simple it essentially just shifts the responsibility for paying for the data transfer charges out of AWS and any object access through to the person making that request rather than this being the responsibility of the bucket owner.
Now that's everything that I wanted to cover in this lesson it's been relatively brief but I just wanted to visually cover this architecture. At this point though go ahead and complete this video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about a web security feature which is used within various AWS products called Cross Origin Resource Sharing, otherwise known as Cores. Now this is critical to understand if you're an architect, developer or engineer working in the AWS space. So let's quickly jump in and get started.
So what is Cores? Well let's start with this. It's the Categorum application with added dogos running in a browser on a mobile phone and I want to introduce the concept of an origin. So when we open the web browser on the phone and browse to Categorum.io this is the origin. The site you visit that's what your first origin is. The browser establishes this first origin when you make the initial connection so the site that you visit in this case Categorum.io is the origin. So the browser in this case is going to make some web calls to Categorum.io which in this example is an S3 bucket and the request is for index.hgml, servlist.js and Categorum.png. Now the requests get returned without any security issues and this is because this is called a same origin request.
What's actually just happened, the architecture of this communication is that the browser initially gets the index.hgml web page and this index.hgml has references to the servlist.js file and the Categorum.png file. Now these are all on the same domain so even though the index.hgml file is calling to this S3 bucket the same domain is used the same origin as the original one and because of this it's called a same origin request and this is always allowed. This always happens the first time you make every request to a website. When you open netflix.com or your browse to this very training website you're making that initial origin request and the index.hgml document or whichever is the default root object is going to reference lots of different files and they could be on the same domain or alternatively as I'm about to talk about in a second they could be on different domains.
Now to load this application we need to make some additional calls. First an API call is made to an API gateway to get additional application information and pull some image metadata that the users of the application have access to and then based on this API response an image casperandpixel.png is loaded from yet another bucket. Now both of these are known as cross origin requests because they're made to different domains different origins. One is categorum-img.io and the other is an aws domain for API gateway. Now by default cross origin requests are normally restricted they aren't always going to work but this can be influenced by using a course configuration.
Course configurations are defined on the other origins in this case the categorum-img.io bucket and the API gateway and if defined these resources will provide directives to allow these cross origin requests so resources can define which origins they allow requests from. Now your original origin always allows connections to it because it's the original origin it's the first origin that your request is going to but if the original request that you make to the original origin downloads a HTML file and if that references any content on any other requests these are known as cross origin requests and those other origins need to approve these cross origin requests so in this example we would need course configurations on the images bucket in the middle and the API gateway on the bottom otherwise we would experience security alerts and potentially application failures.
So this is the same architecture and what we would need is a course configuration this is defined in this case in JSON and aws now requires course configurations on s3 buckets to be defined using JSON but historically this could use XML. Now we have two statements in this course configuration the bottom one means that the bucket will accept requests from any origin as long as it's using a get method the star is a wild card meaning all origins the part at the top allows put post and delete methods from the Categoram.io domain now course configurations are processed in order and the first matching rule is used. Now this configuration would allow our application to access the Categoram-img.io origin as a cross origin request because we've added it within this course configuration any application which uses services on different domains is going to require a course configuration to operate correctly and as you'll see with the pet cuddle atron service application advanced demo which will use elsewhere in the course this is required specifically on the API gateway because this is used as part of the application.
Now there are two different types of requests which will be making to a resource which will require a course configuration the first type is simple requests and I've included a link attached to this lesson which details exactly what constitutes a simple request. Now with the simple type of request you can go ahead and directly access a different origin using a cross origin request and you don't need to do anything special essentially as long as the other origin is configured to allow requests from the original origin then it will work.
The other type of request that you can make is what's known as a pre-flighted request. Now if it's more complicated than a simple request you need to perform what's known as a pre-flight and this is essentially a check which you will do in advance to the other origin so the cross origin request the origin that that request is going to you'll need to perform a pre-flight this is essentially where your browser first sends a HTTP request to the other origin and it will determine if the request that you're actually making is safe to send. So essentially in certain situations you need to do what's called a pre-flight and you need to do a pre-flighted request for anything that's more complicated than a simple request and again I've included the link attached to this lesson which gives you all of the detail you won't need to know this for any of the exams but I want to give you that background knowledge.
Now there are a number of components which will be part of a course configuration and be part of the response that the other origin sends to your web browser. The first of these is access -control -allow -origin and this will either contain the star which is a wild card or it will contain a particular origin which is allowed to make requests. Then we have access -control -max -age and this header indicates how long the results of a pre-flight request can be cached for example if you do a pre-flight request this determines how long after that you're able to communicate with the other origin before you need to do another pre-flight. Then we have access -control -allow -methods and this is either a wild card or a list of methods that can be used for cross origin requests and examples of these might be get put and delete or any other valid methods.
Next we have access -control -allow - headers and this can be contained in a course configuration and within the response to a pre-flight request and this is used to indicate which HTTP headers can be used within the actual request. So for the exams you need to have an awareness of all of these different elements of a course configuration and these things which can be included in responses to pre-flight checks these are all important and you need to understand what each of them does so I'm just covering these at a high level because for the exams you just need that basic awareness but the link which I've included in this lesson contains much more information.
Now at a high level essentially when a web browser accesses any web application this defines the original origin the Categorum.io origin in this example this is defined as the original so if you make any request to that same origin it's a same origin request and by default that's allowed. If you make any requests which are cross origin requests so they're going to different domains different origins then you need to keep in mind that you will require some form of course configuration and you will see this in the advanced demo which is the pet codelotron demo which you'll be doing elsewhere in the course but at this point that's all you need to cover for the exam so go ahead and complete this lesson and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome to this very brief lesson where I want to step through the features of S3 Inventory, give you a really quick overview in my console of how to set the feature up and then together explore how the first inventory looks once it's generated. So let's quickly step through the features and use cases before we move to the console.
So S3 inventory at a high level as the name suggests helps you manage at a high level your storage within S3 buckets so it can inventory objects together with various optional fields. Now these optional fields include things like encryption, the size of an object, the last modification date of an object, which storage class that object uses, a version ID if you have multiple versions of an object within a bucket. Logically the bucket will have versioning enabled. If you're using replication you can optionally include the replication status of an object and if you use the object lock feature you can include additional information about the object lock status of individual objects. Now there are many more optional fields that you can include and I'll detail these once I move through to my console.
Now the S3 inventory feature is configured to generate inventory reports and these can be generated either daily or weekly and it's really important to understand for the exam that this can't be forced. You can't generate an inventory whenever you want. You have to create the configuration, specify whether you want daily or weekly and then that process will run in the background based on the frequency that you set and initially when you configure the feature it can take up to 48 hours to get that first inventory. So that's important to understand it is not a service that you can explicitly run whenever you need the information.
Now the reports themselves will generate an output in one of these three formats so there's a CSV or comma separated values and then two different Apache output formats and the one that you'll pick depends on what type of integration you want to use with this reporting. You can configure multiple inventories and each of these can be configured to inventory an entire bucket or a certain prefix within a bucket and these reports go through to a target bucket which can be in the same account or a different account but in either case a bucket policy needs to be applied to the target bucket also known as the destination bucket in order to give the service the ability to perform this background processing.
So this is a fairly common feature throughout AWS where anything which operates on your behalf needs to be provided with permissions and that generally occurs either using a role or using resource policies and in this case it's a bucket policy which is applied to the target bucket also known as the destination bucket.
Now from a use case perspective you're going to be using S3 inventory for anything involving auditing, compliance, cost management or any specific industry regulations so these are things that you'll use in the background regularly to provide you with an overview of all of the objects in all of your buckets and lots of optional metadata about those objects. Now this is a topic which will be much easier to demonstrate rather than talk about so at this point I want to move across to my console and demonstrate two things. Firstly what it looks like to set up the inventory feature and then secondly what an actual inventory report looks like. Now we'll be skipping ahead in this video because it can take up to 48 hours to generate this first report so I'll record the first part of this immediately and then skip ahead right through to the point to when the first report is generated.
So I do recommend that you just watch this rather than doing this in your own environment. If you do do it in your own environment you need to be aware that it can take up to 48 hours to get this first report. So let's go ahead and switch across to my AWS console.
Okay so I'm just going to step through creating an inventory on an S3 bucket so I'll need to move to the S3 console. Just note that I'm logged in as the I am admin user of the general AWS account so that's the management account of the organization and I have the Northern Virginia region selected. So I'll type S3 into the search box and then click to move to the console. Now the inventory feature works by inventoring a source bucket and storing the results into a target bucket so I need to create both of those.
So I'm going to go ahead and click on create bucket. I'm going to call it AC-inventory-target so this is going to be my target bucket for my inventory data. I can leave everything else as default, scroll down and click on create bucket. That'll take a few seconds to create and once it's created I'm going to create the source bucket. So again create bucket. This time I'll call the bucket AC-inventory-source and again I'll accept all of the defaults and then create bucket.
Then I'm going to go ahead and go into the source bucket and I'm going to upload some objects. So I'll click on upload and then add files and then I have four images to upload each of them is one of my cats. So I'm going to start with Penny so I'll select that object and click on open and I'm going to be picking different random settings for each of these objects. So let's scroll down and expand additional upload options. I'll be picking the standard storage class for this object and I'll be enabling server-side encryption using SSE-S3. So upload that object, scroll down to the bottom and click on upload. That's going to take a few seconds, click on exit and then I'm going to do the next one.
So upload again, add files. This time I'll choose raffle, click on open, scroll down, expand additional upload options. This time I'll choose standard in frequent access and I won't encrypt the object. I'll scroll down and click on upload, click on exit, upload again, add files. Now I'll pick troughs and click on open. So this is truffles my cat, scroll down, expand additional upload options. This time I'll pick one zone IA and again I'll be picking SSE-S3 for encryption. Scroll down, click on upload and then exit, upload again, add files. I'll pick winky, the last of my four cats, click open, scroll down, expand additional upload options, scroll down again. This time we're going to put the object into intelligent tiering and we won't use any encryption. So scroll down and click on upload and then exit.
So that's the objects uploaded to the source bucket. So next I'll enable inventory. So I'll click on the management tab and I'll need to create an inventory configuration. So I'll scroll down, click on create inventory configuration and it's here where I can set up all of the options about the inventory. So I'm going to give this a name AC inventory and remember you can set up multiple inventory configurations. Each of them can have different settings to perform slightly different tasks.
So the first thing is to define an inventory scope. You can inventory an entire bucket or specify an optional prefix by using this box. We'll leave this blank to do the entire bucket and you can also specify to inventory the current version only or include all versions and I'm going to use all versions.
For the report details this is where you specify where you want the inventory report to be placed after it's generated. You can choose to use this account or a different account. If you specify a different account you need to provide the account ID and then the destination. If you choose this account then you can directly select the bucket from a list. So I can click on browse S3 and select a bucket from my account. So AC-inventory-target and then click on choose path.
Now the inventory service requires permissions on the destination bucket and as part of configuring this these permissions will be automatically added onto the destination bucket. The bucket policy allows the S3 service to perform the S3 colon put object action on the inventory target bucket. So this policy as a whole just provides the required permissions so that the reports can be stored into the target bucket. So this is all that's required and this will be added to the destination bucket either automatically or you can choose to do it manually.
Now the last few options that you're allowed to pick from you can choose the frequency of this inventory either daily or weekly. The first run will be delivered within 48 hours so this is not immediate and you can't force this to run whenever you want. If you choose daily logically it will be run once a day. If you choose weekly then the reports will be delivered on Sundays. So I'm going to choose daily.
For output format you've got three different options. You can choose comma separated values known as CSV or either of these two Apache formats. So this is important to remember. Generally you'd pick the most suitable depending on what you want to integrate this inventory reporting with. So if you want to import this into something like Microsoft Excel you can choose CSV. If you have another system which uses either of these two formats then logically you'll pick the most appropriate. I'm going to use CSV which will make it easier to explore this when the inventory report has been generated.
Now you can create inventory configurations either enabled or disabled and I'm going to pick it to be enabled by default. You're able to select server side encryption for this output report. In this case to keep things simple I'm going to leave this as disabled and then lastly you can pick additional fields which you want to be included in the report. So I'm going to select a couple of these. I'll pick size, last modified. I want to know about the storage classes of my objects. I want to know whether they're encrypted or not and which tier they're in if they're using intelligent tiering. So I'll select all of these. We're not using replication on this bucket though I could check this box if I wanted to get an overview of the replication status and if we're using any object lock configurations which I'll talk about elsewhere in the course if applicable to the course that you're taking then you could check this box and gain additional information about the object lock configuration of all of your different objects but for this demonstration I'm going to pick size, last modified, storage class, encryption and then intelligent tiering access tier.
Now that's everything I need to configure so I'm going to go ahead and click on create. So that's the inventory configuration created and again just to reiterate it could take up to 48 hours to deliver the first report. Now if I just go back to the main S3 console and go into the target bucket you'll be able to see it currently doesn't have any reporting in that bucket. Again it could take up to 48 hours but if I go to permissions, scroll down and then look at the bucket policy you'll see that S3 has automatically added the relevant policy required to give it permissions to perform an inventory and then store the data in this bucket.
So that's how the process works end-to-end you configure it on one or more source buckets and have them deliver the inventory into a target bucket. Now at this point it could take up to 48 hours for this first report to be generated and placed into this target bucket so I'm going to skip ahead with this video all the way through to when our first report is generated and then we can explore it together.
Okay so this is around 24 hours after I initially configured this inventory so now let's go into the AC-inventory-target bucket and we'll see that we now have a folder structure inside this bucket. So I'm going to go inside AC-inventory-source which is the name of the source bucket and then inside AC-inventory we have more folders I'm going to go inside the data folder and then inside here is a compressed this is what GZ is this is a compressed comma separated values data file which contains the inventory which I've previously configured.
So I'm going to go ahead and download this uncompress it and then open it in an editor and there we go I've just gone ahead and open this comma separated values file inside an editor and we can see all the details so we have penny.jpeg, raffle.jpeg, troughs.jpeg and winky.jpeg and then we have other details such as the last modify date the storage class that's being used whether we're using any form of encryption and then in the case of winky.jpeg which is using intelligent tearing which underlying storage tier is being used in this case frequent.
Now this is a feature which works equally well in my case for this example with four objects or significantly more and it's a feature you'll definitely use in real-world situations especially those with larger numbers of objects. Now at this point that's everything I wanted to cover so I'm going to go ahead and clean up this account. So from the S3 console I'm going to select the inventory target bucket and empty it I'll need to confirm that and click on empty and then exit then delete it confirm the name and click on delete and do the same with the source bucket so select it empty it confirm that click on exit and then delete that bucket and then click on delete bucket and once that's done everything's in the same state as it was prior to this demonstration and that's everything I wanted to cover.
So go ahead and complete this video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover S3 object storage classes. Now this is something which is equally as important at the associate and the professional level. You need to understand the costs relative to each other, the technical features and compromises, as well as the types of situations where you would and wouldn't use each of the storage classes. Now we've got a lot to cover so let's jump in and get started.
The default storage class available within S3 is known as S3 Standard. So with S3 Standard when Bob stores his cat pictures on S3 using the S3 API, the objects are stored across at least three availability zones. And this level of replication means that S3 Standard is able to cope with multiple availability zone failure while still safeguarding data. So start with this as a foundation when comparing other storage classes because this is a massively important part of the choice between different S3 storage classes.
Now this level of replication means that S3 Standard provides 11 nines of durability and this means if you store 10 million objects within an S3 bucket, then on average you might lose one object every 10,000 years. The replication uses MD5 checksums together with cyclic redundancy checks known as CRCs to detect and resolve any data issues. Now when objects which are uploaded to S3 have been stored durably, S3 responds with a HTTP 1.1 200 OK status. This is important to remember for the exam if you see this status, if S3 responds with a 200 code then you know that your data has been stored durably within the product.
With S3 Standard there are a number of components to how you'll build for the product. You'll build a gigabyte per month fee for data stored within S3, a dollar per gigabyte charge for transfer of data out of S3 and transfer into S3 is free and then finally you have a price per 1,000 requests made to the product. There are no specific retrieval fees, no minimum duration for objects stored and no minimum object sizes. Now this isn't true for the other storage classes so this is something to focus on as a solutions architect and in the exam.
With S3 Standard you aren't penalized in any way. You don't get any discounts but it's the most balanced class of storage when you look at the dollar cost versus the features and compromises. Now S3 Standard makes data accessible immediately. It has a first byte latency of milliseconds and this means that when data is requested it's available within milliseconds and objects can be made publicly available. This is either using S3 permissions or if you enable static website hosting and make all of the contents of the bucket available to the public internet. If you're doing that then S3 Standard supports both of these access architectures.
So for the exam the critical point to remember is that S3 Standard should be used for frequently accessed data which is important and non-replaceable. It should be your default and you should only investigate moving to other storage classes when you have a specific reason to do so.
Now let's move on and look at another storage class available within S3 and the next class I want to cover is S3 Standard in frequent access known as S3 Standard - IA. So Standard in frequent access shares most of the architecture and characteristics of S3 Standard. Data is still replicated over at least three availability zones in the region. The durability is the same, the availability is the same, the first byte latency is the same and objects can still be made publicly available.
You also have the same basic cost model starting with a storage cost but the storage costs for this class are much cheaper than S3 Standard about half at the price at the time of creating this lesson. So it's much more cost effective to store data using Standard in frequent access. You also have a per request charge and a data transfer out cost which is the same as S3 Standard and like other AWS services, data transfer in is free of charge.
So this reduction in storage cost is a substantial benefit to using in frequent access but in exchange for this benefit there are some compromises which are made. First, Standard in frequent access has a new cost component which is a retrieval fee. For every gigabyte of data retrieved from the product where the objects are stored using this storage class, there is a cost to retrieve that data and that's in addition to the transfer fee. So while the costs of storage for this class are much less than S3 Standard, that cost efficacy is reduced the more that you access the data which is why this class is designed for infrequently accessed data.
Now additionally, there is a minimum duration charge for objects using this class. However long you store objects, you'll build for a minimum duration of 30 days and however small the objects that you store within this class, you'll build a minimum of 128 kb in size per object. So this class is cost effective for data as long as you don't access the data very often or you don't need to store it short term or you don't need to store lots of tiny objects.
For the exam remember this, S3 Standard in frequent access should be used for long lived data which is important or irreplaceable but where data access is infrequent. Don't use it for lots of small files, don't use it for temporary data, don't use it for data which is constantly accessed and don't use it for data which isn't important or which can be easily replaced because there's a better cheaper option for that and that's what we're going to be covering next.
The next storage class which I want to talk about is S3 One Zone Infrequent Access and this is similar to Standard Infrequent Access in many ways. The starting point is that it's cheaper than S3 Standard or S3 Infrequent Access and there is a significant compromise for that cost reduction which I'll talk about soon.
Now this storage class shares many of the minimums and other considerations as S3 Infrequent Access. There's still the retrieval fee, there's still the minimum 30 day build storage duration and there's still the 128kb minimum capacity charge per object.
The big difference between S3 Infrequent Access and One Zone Infrequent Access and you can probably guess this from the name is that data stored using this class is only stored in one availability zone within the region so it doesn't have the replication across those additional availability zones. So you get cheaper access to storage but you take on additional risk of data loss if the AZ that the data is stored in fails.
Now oddly enough you do get the same level of durability so 11 nines of durability but that's assuming that the availability zone that your data is stored in doesn't fail during that time period. Data is still replicated within the availability zone so you have multiple copies of the data but only crucially within one availability zone.
Now for the exam this storage class should be used for long lived data because you still have the size and duration minimums. It should be used for data which is infrequently accessed because you still have the retrieval fee and and this is specific to this class for data which is non-critical or data which can be easily replaced. So this means things like replica copies so if you're using same or cross region replication then you can use this class for your replicated copy or if you're generating intermediate data that you can afford to lose then this storage class offers great value.
Don't use this for your only copy of data because it's too risky. Don't use this for critical data because it's also too risky. Don't use this for data which is frequently accessed, frequently changed or temporary data because you'll be penalized by the duration and size minimums that this storage class is affected by.
Okay so this is the end of part one of this lesson. It was getting a little bit on the long side and I wanted to give you the opportunity to take a small break. Maybe stretch your legs or make a coffee. Now part two will continue immediately from this point so go ahead complete this video and when you're ready I look forward to you joining me in part two.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover something which is a little bit situational. I want to talk about how you can revoke IAM role temporary security credentials.
Now before we step through the architecture I want to refresh your memory on a few key points about roles and how the temporary credentials work. So I want you to imagine the situation where an IAM role is used to access an S3 bucket inside an AWS account. Now the role can be assumed by many different identities. Whoever is defined in the trust policy of the role and can perform STS assume role operations can assume the role. Everybody who assumes a role gets access to the same set of permissions. In this example, permissions over an S3 bucket.
You can't really be granular, at least not in a scalable and manageable way. A role is designed to grant a set of permissions to do a certain job or a certain set of tasks to one or more identities. It's not really designed to grant different permissions based on who that identity is.
Now the permissions that a role grants are given via temporary credentials, and these temporary credentials have an expiration. They can't be canceled. It's not possible to cancel or manually expire a set of temporary credentials. They're valid until they expire. All assumptions of a given role get permissions based on that role's permissions policy.
Now the credentials that STS generates whenever a role is assumed are temporary, but they can last a long time. Depending on the type of role assumption, it can range from minutes to hours. And a really important question to understand the answer to is what happens if those credentials are leaked?
Remember you can't cancel them, so how do you limit the access that a particular set of temporary credentials has? Deleting the role impacts all of the assumers of that role. And if you change the permissions on a role, then all of the assumptions are impacted, current and future. What we need is a way of locking down a particular set of temporary credentials without impacting the ability of valid applications to continue using that role. And let's review how that works architecturally.
In this example, we have three staff accessing an AWS resource using an IAM role. They all perform an STS assume role operation, and this means that they receive temporary credentials from STS complete with permissions that are based on the role's permissions policy. These credentials can be used to access the AWS resource. They are temporary, but they can be renewed after they expire, and STS will provide new credentials as long as each of these identities has permissions to assume the role.
But let's say that one of our users commits these credentials accidentally to a GitHub repository, meaning that they can be obtained by a bad actor — Woofy the dog in this example. Now this is what's known as a credential leak, and the problem is that now that Woofy has access to these temporary credentials, he can access the S3 bucket with the same permissions as the three legitimate users.
Now it might make logical sense just to change the trust policy on the role, but this is only effective when a role is being assumed. Right now Woofy has no need to assume the role because he already has valid credentials, which might not be due to expire for some time. Changes to a trust policy are ineffective to deal with this immediate problem of a credential leak.
Adding to that, remember Woofy didn't actually assume the role — he isn’t able to. The only reason he has the credentials is because they were leaked by a valid user of this IAM role. So any changes to the trust policy would have no effect in this scenario.
Now we could, though, change the permissions policy that's attached to the role, and this would impact everyone. Because all credentials gained from assuming a role would immediately change to use these new permissions. So if you change the permissions policy on a role, then every single set of credentials which were generated by assuming that role have these new access rights. You could deny everything by updating this permissions policy, but that would impact other legitimate users of this IAM role.
Now the key to resolving this problem is understanding that the three legitimate users of the role are able to assume the role whenever they require to. So they can always assume the role. They're on the trust policy of the role. But the bad actor, Woofy the dog in this case, he is unable to assume the role. He only has access to the current set of temporary credentials.
And so one potential action that we can perform is to revoke all of the existing sessions. What this does is apply a new inline policy to the role which contains an explicit deny. This denies all operations on all AWS resources, but it's conditional on the point at which the role was assumed. So any role assumptions that happen before right now — or before when we run this revoke sessions — then the inline policy denying any operations on any resources applies. For any role assumptions which happen after this point in time, this deny-all policy does not apply.
It means that as soon as this is done, because all of the existing credentials were based on roles which were assumed before the current date and time, then any accesses to AWS will be denied because the role was assumed in the past — this deny-all policy will apply.
Now our three legitimate users are free to assume the role again, and when they do that, it will update their assumption time and mean that the conditional deny inline policy will no longer apply to them. Because it's conditional on the assumption of the role occurring before a certain date and time.
Because Woofy stole the credentials, he cannot assume the role. He isn’t in the trust policy, which means he can never update the assumption date. And so his credentials are useless.
Now the really important thing to understand is that technically Woofy's credentials are still valid — you can still use them to interact with AWS. The key part of that is when we revoked the sessions, we left the original permissions policy attached to the role, which allowed access to the bucket. But now there's a deny-all policy which is conditional on when the role was assumed.
Now an explicit deny always overrules allows — remember: deny, allow, deny — and so in effect Woofy is allowed access to everything because Woofy gets access to the same original permissions policy. But in addition, Woofy is also affected by the deny inline policy. And so, in effect, Woofy is denied access to everything because he isn't able to update the assumption date and time. Because he isn't able to assume the role. Because he isn't in the trust policy. And so in effect, he is no longer able to access AWS products and services.
Now I understand that this is a very niche area to cover in a lesson — it's a small topic, but it often comes up in the exam. Just remember: you cannot manually invalidate temporary credentials. They only expire when they expire. But because changing the permissions policy affects everyone, you can add a conditional element to it — denying anyone access to AWS who assumed the role before a certain date and time. And that's how we can revoke role sessions — by simply adding a conditional deny policy to an existing role’s permissions policy.
Now that's everything that I wanted to cover in this lesson. It's something that will feature on the exam, and so it's worth spending the time really making sure that you understand it.
Now there is going to be a demo lesson in this section where you will get the chance to revoke sessions on a role — so don't worry, you will get some practical exposure.
For now though, that's all of the theory that I wanted to cover in this lesson. So go ahead and complete the lesson, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson, I'm going to cover AWS Single Sign-On, known as SSO, which allows you to centrally manage Single Sign-On access to multiple AWS accounts, as well as external business applications. In many ways, the product replaces the historical use cases for SAML 2.0-based federation. And AWS now recommend that any new deployments of this type, so any workforce-style identity federation requirements, are met using AWS Single Sign-On.
So let's get started. Let's jump in and explore the product's features, the capabilities, and the architecture.
AWS Single Sign-On enables you to centrally manage SSO access and user permissions for all of your AWS accounts managed using AWS Organizations. So this is the product's main functionality. But it also provides Single Sign-On for external applications. AWS Single Sign-On conceptually is a combination of both an extension of AWS Organizations and an integration and evolution of previous ways of handling identity federation, all delivered as a highly available managed service.
Now the product starts with a flexible identity store system. The identity store, as the name suggests, is where your identities are stored. Normally when you log into AWS, you're going to be using IAM users. But if you use identity federation, like we talked about in the previous lesson, then you have the option of utilizing external identities, but these need to be swapped for temporary AWS credentials.
The problem with using manual identity federation is that you have to configure it manually, and each type of federation is implemented slightly differently. Now the first benefit of AWS SSO is that you define an identity store, and from that point onward, the exact implementation is abstracted. It's all handled in the same way. So you configure an identity store and whichever store is utilized, the functionality of SSO is the same for all of the different types of identity store.
Now SSO as a product supports a number of different identity stores. Firstly, there is the built-in store. Now this might seem a little bit odd to use a built-in store for a product which is designed to handle external identities. But SSO provides benefits in terms of permissions management across multiple AWS accounts. And so even using the built-in identity store, you still get those benefits.
Now as well as the built-in store, you can also use AWS managed Microsoft Active Directory via the directory service product, or you can utilize an existing on-premises Microsoft AD, either using a two-way trust between an existing implementation and SSO, or by using the AD connector. And then finally, you're also able to utilize an external identity provider using the SAML 2.0 standard.
For the exam, understand that AWS SSO is preferred by AWS for any workforce identity federation, versus the traditional direct SAML 2.0 based identity federation. So if you're in an exam situation or any real-world deployments and if you have a choice, if there's nothing to exclude it, then by default you should select AWS SSO. There aren't really any genuine reasons at this point for not utilizing the SSO product. And the existing ways for handling workforce-based or enterprise-based identity federation are generally only really there for legacy and compatibility reasons.
For any new deployments, you should default to utilizing AWS Single Sign-On. And this is because in addition to handling the identity federation, the product also handles permissions across all of the accounts inside your organization and also external applications. So it provides a significant reduction in the admin overhead associated with identity management.
Architecturally, AWS SSO operates from within an AWS account. It's designed to manage the SSO functionality and security of any AWS accounts within an organization. And this includes controlling access to both the console UI and the AWS command line, specifically version 2. Now at the core of the product is the concept of identity sources, because it manages Single Sign-On, there has to be logically a single source of identities which support the SSO process.
Now the product is flexible in that it supports a wide range. As I talked about on the previous screen, it can use its own internal store of identities, or it can integrate with an on-premises directory system.
Now I covered SAML 2.0 based identity federation in a previous lesson, but the functionality provided by SSO is much more advanced. You have the ability to automatically import users and groups from within the identity provider, and use them within SSO to manage permissions across AWS resources in all of the different AWS accounts in your organization. It's a massive evolution of the functionality set that we have available as solutions architects.
Now for the identity that SSO manages, it provides two core sets of functionality within AWS. Firstly, it allows for Single Sign-On, meaning the identities can be used to interact with all of the AWS accounts within the AWS organization. But also, it provides centralized management of permissions, so users and groups can be used as the basis for controlling what an entity can do within AWS.
Now SSO extends this though, and it delivers Single Sign-On for business applications such as Dropbox, Slack, Office 365, and Salesforce as well as custom business applications, which themselves utilize SAML.
AWS SSO, just to stress this again, is the preferred identity solution for managing workplace identities. This is a familiar pattern with AWS. They try new features, they evolve them, they package them up into things which are more reliable, more performant, and delivered as a service. And AWS Single Sign-On is this process for identities. So they've taken all of the different previous architectures and methods for handling identity and packaged it up into one single product. And this is why this is now the preferred option.
AWS SSO, where you have the option, should be picked for any workplace identity federation needs within AWS. This goes for real world usage and for any exam questions. Only use something like SAML 2.0 directly when you have a very specific reason to do so. And to be honest, I can't think of any at this point. It's mainly a legacy architecture. So preference, SSO, and only use anything else when you absolutely have to.
Now, one tip I will give you for the exam is to focus on the question and look at whether the scenario is talking about workplace identities or customer identities. If it's customer identities, so web applications using Twitter, Google, Facebook, or any other web identity, then it's not going to be AWS Single Sign-On that's used. It's going to be a product such as Cognito that I'll also be talking about in this section of the course.
But if the question or the scenario focuses on enterprise or workplace identities, then it's likely to be AWS SSO if that's one of the answers.
Now, I know that this has been a lot of theory. So immediately following this lesson is a demo lesson where you'll get the chance to implement AWS Single Sign-On within your AWS account structure. So we're going to use it and implement it within our scenario account, so the Animals for Life scenario. And we're going to step through together how it can be used to provide fine-grained granular and role-based permissions to users created inside the product across all of the different AWS accounts within the Animals for Life organization.
Now, we can do this because the product is free. There are no charges for using the AWS Single Sign-On product. And so in most cases, it should become your default or preferred way of managing identities inside AWS.
So go ahead, complete this lesson, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and this lesson will be one of a number of lessons in this section of the course where I'll be covering identity federation within AWS. Now identity federation is the process of using an identity from another identity provider to access AWS resources.
In AWS though, this can't be direct. Only AWS credentials can be used to access AWS resources. And so some form of exchange is required. And that's what I'll be covering in this lesson. So let's jump in and get started.
Now before we talk about the architecture of SAML 2.0 Identity Federation, let's talk about SAML itself. So SAML stands for Security Assertion Markup Language. And SAML 2.0 is version two of this standard.
Now it's an open standard which is used by many identity providers such as Microsoft with their Active Directory Federation Services known as ADFS, but many other on-premise identity providers utilize the SAML standard. And SAML 2.0 based federation allows you to indirectly use on-premises identities to access the AWS console and AWS command line interface.
Now I want to stress the important point here and that's the word indirectly. You can't access AWS resources using anything but AWS credentials. And so the process of federation within AWS involves exchanging or swapping these external identities for valid AWS credentials.
For the exam it's important that you both know how the architecture works as well as when you should select it versus the other identity federation options available within AWS.
Now SAML 2.0 based identity federation is used when you currently use an enterprise-based identity provider which is also SAML 2.0 compatible. Both of these need to be true. You wouldn't use it with a Google identity provider and you wouldn't use it with one which isn't SAML 2.0 compatible. So focus on those selection criteria for the exam.
Secondly, if you have an existing identity management team and you want to maintain that function allowing them to manage access to AWS as well, then SAML 2.0 based federation is ideal. Or if you're looking to maintain a single source of identity truth within your business and/or you have more than 5,000 users, then you should also look at using SAML 2.0 based federation.
Now if you read a question in the exam which mentions Google, Facebook, Twitter, Web or anything which suggests that SAML 2.0 is not supported, then this is not the right type of identity federation to use. So keep that in mind when you're reviewing exam questions. If it mentions any of those terms, you should probably assume that SAML 2.0 identity federation is not the right thing to select.
Now federation within AWS uses IAM roles and temporary credentials and in the case of SAML 2.0 based identity federation, these temporary credentials generally have up to a 12-hour validity. So keep that in mind for the exam.
Now at this point I want to step through the architectural flow of exactly how SAML 2.0 based identity federation works, both when you're accessing using the API or CLI as well as the console UI. So let's do that next and we'll start with API or CLI access.
So we start with an on-premises environment on the left using a SAML 2.0 compatible identity provider and identity store. And on the right is an AWS environment configured to allow SAML 2.0 based identity federation.
What this actually means from an infrastructure perspective is that an identity provider exists on the left side and a SAML identity provider has been created within IAM on the right and a two-way trust has been established between the two. So the instance of IAM on the right has been configured to trust the identity provider on-premises within the environment on the left.
Then we have an application, in this case the enterprise version of the Categorum application. And this is something that's developed internally within the Animals for Life organization and it initiates this process by communicating with the identity provider to request access.
Once this process is initiated, the identity provider accesses the identity store and authenticates the request, pulling a list of roles which the identity used by the Categorum application has access to. Inside the identity provider you've mapped identities onto one or more roles and one identity might be able to use multiple different roles. And so the Categorum application will have a selection from which to pick.
Now once this authentication process completes and the Categorum application has selected a role, what it gets back is what's known as a SAML assertion. And think of this as a token proving that the identity has authenticated. This SAML assertion is trusted by AWS.
Now the key concept to realize so far is that the application has initiated this process by communicating with the identity provider. This is an application-initiated process.
The next step is that the application communicates with AWS, specifically STS using the STS Assume Role with SAML operation. And it passes in the SAML assertion with that request. Now STS accepts the call and allows the role assumption. And this generates temporary AWS credentials which are returned to the application.
And this is another critical stage because what's happened here is the SAML assertion has essentially been exchanged for valid temporary AWS credentials. And remember only AWS credentials can be used to access AWS resources. And so the Categorum application can now use these temporary credentials to interact with AWS services such as DynamoDB.
So at a high level the process does require some upfront configuration. There needs to be a bidirectional trust created between IAM and the identity provider that's being used for this architecture. And once this trust has been established, AWS will respect these SAML assertions and use them as proof of authentication. And allow whatever identity is performing this process to gain access to temporary credentials which can be used to interact with AWS.
Now this process occurs behind the scenes. This is the architecture that's used if you're developing these applications in a bespoke way inside your business. So if you're a developer and you're looking to utilize identity federation to access AWS, this is the type of architecture that you'll use.
Now you can also use SAML 2.0 based identity federation to grant access to the AWS console for internal enterprise users. And on the whole it uses a very similar architectural flow. So let's have a look at that next.
When we're using SAML based identity federation to provide console access, we still have the on-premises environment on the left and AWS on the right. We still have the same identity provider, for example ADFS, but this time it's a user who wants to access the AWS console rather than an application interacting with AWS products and services.
There still needs to be a trust configured, this time between the identity provider and an SSO endpoint, also known as the AWS SAML endpoint. And this is configured within IAM inside the AWS account.
Now to begin this process, our user Bob browsers to our identity provider portal. And this might look something like this URL. So this URL is for the Animals for Life ADFS server. Bob browsers to this URL and he sees a portal which he needs to interact with.
Now when Bob loads this portal, before he can interact with it in any way, behind the scenes, the identity provider authenticates the request. Now this might mean explicitly logging in to the identity provider portal, or it might use the fact that you're already logged in to an active directory domain on your local laptop or workstation and use this authentication instead of asking you to log in again.
But in either case, you'll be presented with a list of roles that you can use based on your identity. So there might be an admin role and normal role or even an auditing role and all of these provide different permissions to AWS.
So once you've been authenticated or once you've selected a role, the identity provider portal returns a SAML assertion and instructions to point at a SAML endpoint that operates inside AWS.
Now once the client receives this information, it sends the SAML assertion to the SAML endpoint that you've configured in advance within AWS. And this assertion proves that you've authenticated as your identity and it provides details on the access rights that you should receive.
Now on your behalf in the background, the IAM role that you selected in the identity provider portal is now assumed using STS and the endpoint receives temporary security credentials on your behalf.
Then at this point it creates a sign-in URL for the AWS console which includes those credentials and it delivers those back to the client which the client then uses to access the AWS console UI.
So at a high level this is a fairly similar process. You're being authenticated by an identity provider that exists on premises. You're getting a SAML assertion. This is delivered to AWS. It's exchanged for temporary security credentials which are valid to interact with AWS resources.
The only difference in this case is that the SAML endpoint is constructing a URL which can be used to access the AWS console UI which includes those credentials. And then once that URL has been created it's passed back to your client and your client uses it to access the AWS management console. And this all happens behind the scenes without you having any visibility of it.
The key concept to understand and the really important thing about AWS Identity Federation is that you cannot use external identities directly to access AWS resources. They have to be exchanged for valid AWS credentials. And this is how that process happens when you're using a SAML 2.0 compatible identity provider.
Now that's all of the architectural theory that I wanted to cover in this lesson. In other lessons in this section of the course you're going to get some additional exposure to other types of identity federation within AWS. But SAML 2.0 based identity federation is the one that tends to be used in larger enterprises, especially those with a Windows based identity provider.
So that's everything I wanted to cover. So go ahead complete this video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to spend just a few minutes talking about the AWS Cost Explorer. Now the way that we get to this service—you can either type "Cost Explorer" in the search box or click on the user drop-down and then just move to "My Billing Dashboard."
So that's what I'm going to do, and once I'm at this console, the Cost Explorer is available at the top left, so I'm going to go ahead and click on Cost Explorer. Now, if you haven't enabled this in the past, you might be presented with an option to enable it. But once you have, you can launch Cost Explorer to move into the product.
Now, in terms of the exam, if you see any exam questions which ask you to explore the costs for the AWS account or the AWS organization, or ask you to look at the costs for individual users within that account, or even to evaluate whether Reserved Instances might be beneficial to the account or not—then Cost Explorer is the tool to use.
So once we're in Cost Explorer, we can click on Cost Explorer and gain access to data about the spend within our AWS account or organization. We can look at various different time ranges. We can group the data hourly, daily, or monthly. We can group it by service or evaluate individual accounts, regions, instance types, usage types, and much more.
We're also able to filter by service-linked account, region, instance type, usage type, and even tags if you're using billing tags. You can also enable or disable forecasted values. The tool can analyze the incomplete portion of this month together with previous months and present you with a forecast of what you can expect going forward—and it's all possible within this tool.
In addition, something that also features on the exam very frequently is that it's also capable of analyzing usage within the account and presenting you with recommendations about any Reserved Instance purchases. So you can see an overview of any reservations that you currently have. You can click on "Recommendations" and, assuming you have enough usage of the account, it will present you with the recommendation of what you should be investing in in terms of Reserved Instances.
You can also see utilization reports and also coverage reports around Reserved Instance purchases. So this is a really useful tool when it comes to exploring cost as well as predicting cost. It's also able to perform cost anomaly detection as well as right-sizing recommendations—and these are all things which can feature on the exam.
Now, because this is just a training AWS account, I don't have enough data within it to present you with actual data. But these are things that you need to be aware of for the exam, and if you have access to an account with a good amount of usage in, and if you have permissions to the Cost Explorer product, then I do recommend that you go into this product and explore all of the different views and information that you have available surrounding billing in your account or organization.
Now with that being said, that's everything I wanted to cover. I just wanted to give you an overview of the features that you can expect from this product. At this point, go ahead and complete this video and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and this is going to be a super quick lesson where I just want to discuss cost allocation tags. So this is something you'll use in normal operations when you manage AWS accounts. But for the exam, there are a number of key points that you need to be aware of. So let's keep this brief and just jump in and get started.
Cost allocation tags are things that you can enable to provide additional information for any billing reports available within AWS. So cost allocation tags need to be enabled individually. And this is either on a per account basis for standard accounts or something that's performed in the organizational master account if you use AWS Organizations.
Now cost allocation tags come in two different forms. You have AWS generated ones. You can always start with AWS: and two very common ones are AWS:createdBy or AWS:cloudformation:stack-name. And if you enable cost allocation tags, then these tags are added to AWS resources automatically by AWS.
Now I always see questions in the exam which do mention AWS:createdBy. Now this details which identity created a resource as long as cost allocation tags are enabled. So this is not something that can be added retroactively. You need to make sure that this is enabled on an account or for an organization. And from that point onward, AWS will automatically add this cost allocation tag to any resource or any supported resource within the account.
There are also user defined tags which can be enabled. So you can create these—for example, maybe you wanted to have department tags or cost center tags or tags that indicated whether environments were production or development. And you can enable these and use them as cost allocation tags and these will be visible in any AWS cost reporting.
Now both of these—so user defined and AWS defined or AWS generated—they're going to be visible once enabled within AWS cost reports and these can be used as a filter. So you're able to determine which resources were created by a user or which resources belong to certain departments or cost centers. And you can use this as part of your organizational finance systems to correctly allocate AWS costs to specific areas of your business.
Now enabling these and having them so they're visible within cost reports can take up to 24 hours. So this is something that you need to plan in advance. None of these are retroactive. So keep that in mind for the exam and real world usage.
Now to illustrate how this works and what better way than to use some obnoxiously large graphics. Let's take a simple example: two EC2 instances for the category application. Let's say that in advance I create or enable two different cost allocation tags, AWS:createdBy and a user defined tag called app. This is what you might see.
Resources created will automatically be tagged with these two different tags. So the AWS generated AWS:createdBy tag, which allows you to see which identity created that resource. And then the user defined tag user:application and the two different current values for this tag are Categorim-prod and Categorim-dev.
Now any reporting which is generated from this point onward will include these tags. So we could split out the costs for our finance team detailing which costs are allocated for the Categorim production and the Categorim development application. And then we could also produce isolated costs for resources created by specific AWS users.
So by using cost allocation tags effectively, we can feed these costs into our organizational finance processes.
Now that's pretty much all you need to know for the exam. Just the format of these tags—pay specific attention to AWS:createdBy because that's what I see in the exam all the time. Just know that these need to be enabled. They are not retroactive. And once you've enabled them, it can take up to 24 hours for these to be visible and used by AWS.
So that being said, that's everything I wanted to cover in this lesson. Go ahead and complete the video. And when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back. In this lesson I want to cover AWS Service Catalog. To understand what the AWS Service Catalog product does you first need to understand what a service catalog is from an IT service management perspective. Now we've got a lot to cover so let's jump in and get started.
A service catalog is a pretty specific thing and so I want you to be comfortable with what one is and how it's used before we move on to AWS specifics. So a service catalog is a document or database generally created by an IT team within a business. It's essentially an organized collection of products which are offered by a team or teams within the business.
Traditionally service catalogs are created by IT teams but other teams within the business such as HR and Finance might also create their own versions. They don't have to be technical in nature. They're often used when different teams within the business use a service style model—so when they consume each other's services, sometimes even using cross charging.
An example might be that a human resources team charges for the interviewing and onboarding process. Finance might charge a fee for the monthly payroll processing. And the security team might charge a fee and have a service for generating ID documents, such as an ID badge for any new hires within the business.
Each of these teams creates a product—something that they offer—and if teams cross charge, then that product will also have a cost. If you create a product and offer it, then it will have some additional information which goes along with it. So it will have clear ownership and accountability for the service—so a person and often a team within an organization; a name or identification for that service; a description of the service; a service category or type which allows it to be grouped with other similar services; related service request types—so the types of service requests that can be logged against that particular service by other people in the business; any supporting or underpinning services or dependencies; anything like service level agreements (SLAs), which help other teams interact with the team offering the service.
It will also define who is entitled to request or view the service, any costs for that service, how to request the service and then how it's delivered. And then finally, it might also have any escalation points and key contacts for that service.
So in general, all of that information is provided by the service, and it also defines any approval steps required to request and deliver the service. A service catalog provides a clear overview of services offered and how to consume them. It's essentially process documentation for a service. They're designed to allow a business to manage costs and allow service delivery to scale because everything about a product or a service is defined within these documents.
So that's a generic service catalog. So what about the AWS service with the same name? Well unsurprisingly, it's an AWS implementation of the service catalog concept. So let's take a look.
The AWS Service Catalog is a portal for end users. It's something which non-admin or non-privileged users can make use of—so non-technical finance users or HR or any other team in the business. They can use AWS Service Catalog to launch predefined products created by technical admin users, and you can use Service Catalog to control these end user permissions.
Picture a scenario where you run a hosting company. You host animal image blogs—cats, dogs, rabbits, chickens, the whole range. You have a sales team and you want the sales team to be able to deploy instances of your blogging software for customers. You don't want the sales team to require infrastructure permissions and you want to control exactly how those applications can be deployed.
So technical admin users define those products using CloudFormation, as well as configuration of the Service Catalog product. The admins also define the permissions required to launch the infrastructure that the product uses. So this is AWS permissions to, say, launch an EC2 instance or provision a load balancer or any other infrastructure used by your products.
You can also define any end user permission—so who can launch the products using Service Catalog, who has visibility of what within the product. The technical admin will also build products into portfolios and these are made visible to end users.
So what does this look like visually? Well, let’s take a look. Service Catalog is a regional service, so you need to configure it and deploy things into Service Catalog in any service-enabled regions that you have. In this example, I'm going to use US East 1.
Inside this region we have Service Catalog and we start out with technical admins, the individuals who manage Service Catalog. These admins define the actual products—so the animal picture blogs—and they do this using CloudFormation and configuration of the Service Catalog product itself. So these admin users are simply defining templates which represent finished product deployments, and using these and the configuration of Service Catalog itself, these are added into Service Catalog in the form of products and then grouped into portfolios.
These entities also contain details of the permissions that they need to interact with AWS and actually create infrastructure. Think of this like a stack role. The permissions to create the infrastructure are included with the products and portfolios. Now, these products also contain end user permissions—so which end users can see and use these products and portfolios.
So end users browse Service Catalog, and when they see a product that they want to deploy, they can do so, but based on certain restrictions definable within the product. If they do choose to deploy a product, then Service Catalog handles this deployment using the permissions granted by the technical admin. Stacks are created, physical AWS services are launched—maybe EC2 instances, maybe load balancers or CloudFront distributions—and once complete, the products are available for the end user to directly use or pass on to the eventual customer of that product.
For the exam, if you see any questions which talk about a need for end users or customers to deploy infrastructure with tight controls in a self-service way, then think about using Service Catalog.
So with that being said, go ahead and complete this lesson and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about the security token service known as STS. Now this is a service which underpins many of the identity processes within AWS. If you've used a role then you've already used the services provided by STS without necessarily being aware of it. Now it's a service that you need to understand fully, especially at the professional level. So let's jump in and take a look.
STS generates temporary credentials whenever the STS Assume Role operation is used. At a high level it's a pretty simple thing to understand. When you assume an IAM role you use the STS Assume Role call and in doing so you gain access to temporary credentials which can be used by the identity which assumes the role. Now in the role switch in the console UI you're assuming a role in another AWS account and you're using STS to gain access to these temporary credentials. Now this happens behind the scenes and you don't get any exposure to that within the UI but that's how it works architecturally.
Now temporary credentials which are generated by STS are similar in many ways to long term access keys in that they contain an access key ID which is the public part and a secret access key which is the private part. However, and this is critical to understand for the exam, these credentials expire and they don't directly belong to the identity which assumes the role. An IAM user for example owns its own credentials. It has an allocated access key ID and a secret access key and these are known as long term credentials. Temporary credentials are given temporarily to an identity and after a certain duration they expire and are no longer usable.
Now the access that these credentials provide can be limited by default the authorization is based on the permissions policy of the role but a subset of that can be granted to the temporary credentials so they don't need to have the full range of permissions that the permissions policy on a role provides. The credentials can be used to access AWS resources just like with long term credentials and another crucial thing to remember temporary credentials are requested by another identity. This is either an AWS identity such as an IAM user via an IAM role or an external identity such as a Google login or Facebook or Twitter which is known as Web Identity Federation.
Now I'm going to be talking about STS constantly as I move through the course covering other more advanced identity related products and features. At this stage I just want you to have a good foundational level of understanding about how the product works. So let's take a look at it visually before we finish up with this lesson.
So we start off with Bob and Julie who want to assume a role and because of that STS is involved. Now around the role we have the trust policy and the trust policy controls who can assume that role. Conceptually think about this as a wall which is around the role only allowing certain people access to that role. So whatever the trust policy allows that is the group of identities which can assume the role. So if Bob attempts to assume the role he isn't in the trust policy and so he's denied from doing so. Julie is on the trust policy and so her STS assume role call is allowed.
Now an STS assume role call needs to originate from an identity. In this case it's a user but it could equally be an AWS service or an external web identity. Assume role calls are made via the STS service and STS checks the role's trust policy and if the identity is allowed then it reads the permissions policy which is also attached to the role. Now the permissions policy controls what is allowed or denied inside AWS so the permissions policy associated with a role controls what permissions are granted when anyone assumes the role.
So STS uses the permissions policy in order to generate temporary credentials and the temporary credentials are linked to the permissions policy on the role which was assumed in order to generate the temporary credentials. If the permissions policy changes the permissions that the credentials have access to also change. The credentials are temporary they have an expiration and when they expire they can no longer be used. Temporary credentials include an access key ID which is the unique ID of the credentials. They have an expiration so when the credentials are no longer valid they have a secret access key which is used to sign requests to AWS and a session token which needs to be included with those requests.
So temporary credentials are generated when a role is assumed and they're returned back to the identity who assumes the role and when credentials expire another assume role call is needed to get access to new credentials. Now these temporary credentials can be used to access AWS services. So STS is used for many different types of identity architecture within AWS. If you assume a role within an AWS account then that uses STS. If you role switch between accounts using the console UI that uses STS. If you're performing cross account access using a role then that uses STS and various different types of identity federation which we'll be covering in the next few sections of the course also use STS.
For now though that's everything I wanted to cover in this lesson. I just want you to have this foundational level of understanding. For the exam it's not so much STS that's important. It's all of the different products and services that utilize STS to generate these short term credentials. And so to understand exactly how these products and services work in later lessons of the course we need to start off with a thorough understanding of this product at an architecture level. So that's what I wanted to get across in this lesson. For now go ahead complete this lesson and when you're ready I look forward to you joining me in the next.
-
-
Local file Local file
-
Cloud Transformation Phases in AWS CAF
- Envision
- Align
- Launch
- Scale
-
- Jan 2025
-
learn.cantrill.io learn.cantrill.io
-
Welcome back.
This is part two of this lesson.
We're going to continue immediately from the end of part one.
So let's get started.
Now let's look at another example.
This looks more complex, but we're going to use the same process to identify the correct answer.
So this is a multi-select question and we're informed that we need to pick two answers, but we're still going to follow the same process.
The first step is to check if we can eliminate any of the answers immediately.
Do any of the answers not make sense without reading the question?
Well, nothing immediately jumps out as wrong, but answer E does look strange to me.
It feels like it's not a viable solution.
I can see the word encryption mentioned and it's rare that I see lambda and encryption mentioned in the same statement.
So at this stage, let's just say that answer E is in doubt.
So it's the least preferred answer at this point.
So keep that in your mind.
It's fine to have answers which you think are not valid.
We don't know enough to immediately exclude it, but we can definitely say that we think there's something wrong with it.
Given that we need to select two answers out of the five, we don't need to worry about E as long as there are two potentially correct answers.
So let's move on.
Now, the real step one is to identify what matters in the question text.
So let's look at that.
Now, the question is actually pretty simple.
It gives you two requirements.
The first is that all data in the cloud needs to be encrypted at rest.
And the second is that any encryption keys are stored on premises.
For any answers to be correct, they need to meet both of these requirements.
So let's follow a similar process on the answer text first looking for any word fluff and then looking for keywords which can help identify either the correct answers or more answers that we can exclude.
So the first three answers, they all state server side encryption, but the remaining two answers don't.
And so the first thing that I'm going to try to do with this question is to analyze whether server side encryption means anything.
Does it exclude the answers or does it point to those answers being correct?
Well, server side encryption means that S3 performs the encryption and decryption operations.
But depending on the type of server side encryption, it means that S3 either handles the keys or the customer handles the keys.
But at this stage, using server side encryption doesn't mean that the answers are right or wrong.
You can use it or you can't use it.
That doesn't immediately point to correct versus incorrect.
What we need to do is to look at the important keywords.
Now, if we assume that we are excluding answer E for now unless we need it, then we have four different possible answers, each of which is using a different type of encryption.
So I've highlighted these.
So we've got S3 managed keys, SSE-S3, KMS managed keys, which is SSE-KMS, customer provided keys, which is SSE-C, and then using client side encryption.
Now, the first requirement of the question states encryption at rest and all of the answers A, B, C and D, they all provide encryption at rest.
But it also states that encryption keys are to be stored on premises.
Answers A and B use server side encryption where AWS handle the encryption process and the encryption keys.
So SSE-S3 and SSE-KMS both mean that AWS are handling the encryption keys.
And because of this, the keys are not stored on premises and so they don't meet the second criteria in the question.
And this means that they're both invalid and can be excluded.
Now, this leaves answers C, D and E, and we already know that we're assuming that we're ignoring answer E for now and only using it if we have to.
So we just have to evaluate if C and D are valid.
And if they are, then those are the answers that we select.
So SSE-C means that the encryption is performed by S3 with the keys that the customer provides.
So that works.
It's a valid answer.
So that means at the very least C is correct.
It can be used based on the criteria presented by the question.
Now, answer D suggests client side encryption, which means encrypting the data on the client side and just passing the encrypted data to S3.
So that also works.
So answers C and D are both potentially correct answers.
So both answers C and D do meet the requirements of the question.
And because of this, we don't have to evaluate answer E at all.
It's always been a questionable answer.
And since the question only requires us to specify two correct answers, we can go ahead and exclude E.
And that gives us the correct answers of C and D.
So that's another question answered.
And it's followed the same process that we used on the previous example.
So really answering questions within AWS is simply following the same process.
Try and eliminate any crazy answers.
So any answers that you can eliminate based just on the text of those answers, then exclude them right away because it reduces the cognitive overhead of having to pick between potentially four correct answers.
If you can eliminate the answers down to three or two, you significantly reduce the complexity of the question.
The next step is to find what really matters in the question, find the keywords in both the preamble and the question.
Then highlight and remove any question fluff.
So anything in the question which doesn't matter, eliminate any of the words which aren't relevant technically to the product or products that you select.
So this is something that comes with experience, being able to highlight what matters and what doesn't matter in questions.
And the more practice that you do, the easier this becomes.
Next, identify what really matters in the answers.
So again, this comes down to identifying any shared common words and removing those and then identifying any of the keywords that occur in the answers.
And then once you've got the keywords in the answers and the keywords in the questions, then you can eliminate any bad answers that occur in the question.
Now, ideally at this point, what remains are correct answers.
You might start off with four or five answers.
You might eliminate two or three.
The question asks for two correct answers.
And that's it.
You've finished the question.
But if you have more answers than you need to provide, then you need to quickly select between what remains and you can do that by doing this keyword matching.
So look for things which stand out.
Look for things which aren't best practice according to AWS.
Look for things which breach a timescale requirement in the question.
Look for things which can't perform at the levels that the question requires or that cost too much based on the criteria and the question.
Essentially, what you're doing is looking for that one thing that will let you eliminate any other answers and leave you with the answers that the question requires.
Generally, when I'm answering most questions, it's a mixture between the correct answer jumping out at me and eliminating incorrect answers until I'm left with the correct answers.
You can approach questions in two different ways.
Either looking for the correct answers or eliminating the incorrect ones.
Do whichever works the best for you and follow the same process throughout every question in the exam.
The one big piece of advice that I can give is don't panic.
Everybody thinks they're running out of time.
Most people do run out of time.
So follow the exam technique process that I detailed in the previous lesson to try and get you additional time, leave the really difficult questions until the end, and then just follow this logical process step by step through the exam.
Keep an eye on the remaining amount of time that you have at every point through the exam and I know that you will do well.
Most people fail the exam because of their exam technique, not their lack of technical capability.
With that being said, though, that's everything I wanted to cover in this set of lessons.
Good luck with the practice tests.
Good luck with the final exam.
And if you do follow this process, I know that you'll do really well.
With that being said, though, go ahead and complete this video and then when you're ready, I'll look forward to joining you in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and from the very start this course has been about more than just the technical side.
So this is part one of a two-part lesson set and in this lesson I'll focus on some exam technique hints and tips that you might find useful in the exam.
Now in terms of the exam itself it's going to have questions of varying levels of difficulty and this is also based on your own strengths and weaknesses.
Conceptually though understand that on average the AWS exams will generally feel like they have 25% easy questions, 50% medium questions and 25% really difficult questions.
Assuming that you've prepared well and have no major skill gaps this is the norm.
For most people this is how it feels.
The problem is the order of the difficulty is going to feel random so you could have all of your easy ones at the start or at the end or scattered between all of the other questions and this is part of the technique of the AWS exams how to handle question difficulty in the most efficient way possible.
Now I recommend conceptually that my students think of exams in three phases.
You want to spend most of your time on phase two.
So structurally in phase one I normally try to go through all of the 65 questions and identify ones that I can immediately answer.
You can use the exam tools including mark for review and just step through all of the questions on the exam answering anything that's immediately obvious.
If you can answer a question within 10 seconds or have a good idea of what the answer will be and just need to consider it for a couple more seconds this is what I term a phase one question.
Now the reason that I do these phase one questions first is that they're easy they take very little time and because you know the subject so well you have a very low chance of making a mistake.
So once you've finished all of these easy questions the phase one questions what you're left with is the medium or yellow questions and the hard or red questions.
My aim is that I want to leave the hard questions until the very end of the test.
They're going to be tough to answer anyway and so what I want to do at this stage in phase two is to go through whatever questions remain so whatever isn't easy and I'm looking to identify any red questions and mark them for review and then just skip past them.
I don't want to worry about any red questions in phase two.
What phase two is about is powering through the medium questions.
These will require some thought but they don't scare you they're not impossible.
The medium questions so the yellow questions should make up the bulk of the time inside the exam.
They should be your focus because these are the questions which will allow you to pass or fail the exam.
For most people the medium questions represent the bulk of the exam questions.
Generally your perception will be that most of the questions will be medium.
There'll be some easy and some hard so you need to focus in phase two which represents the bulk of the exam on just these medium questions.
So my suggestion generally is in phase two you've marked the hard questions for review and just skipped past them and then you focused on the medium questions.
Now after you've completed these medium questions you need to look at your remaining time and it might be that you have 40 minutes left or you might only have four minutes or even less.
In the remaining time that you have left you should be focusing on the remaining red questions the difficult questions.
If you have 40 minutes left then you can take your time.
If you have four minutes you might have to guess or even just click answers at random.
Now both of these approaches are fine because at this point you've covered the majority of the questions.
You've answered all of the easy questions and you've completed all of the medium questions.
What remains are questions that you might get wrong regardless but because you've pushed them all the way through to the end of your time allocation whether you're considering them carefully and answering them because you have 40 minutes left or whether you're just answering them at random they won't impact your process in answering the earlier questions.
So if you don't follow this approach what tends to happen is you're focusing really heavily on the hard questions at the start of the exam and that means that you run out of time towards the end but if you follow this three-stage process by this point all that you have left is a certain number of minutes and a certain set of really difficult questions and you can take your time safe in the knowledge that you've already hopefully passed to the exam based on the easy and medium questions and the hard ones as simply a bonus.
Now at a high level this process is designed to get you to answer all of the questions that you're capable of answering as quickly as possible and leave anything that causes you to doubt yourself or that you struggle with to the end.
So pick off the easy questions, focus on the medium and then finish up with the really hard questions at the end.
I know that it sounds simple but unless you focus really hard on this process or one like it then your actual exam experience could be fairly chaotic.
If you're unlucky enough to get hard questions at the start and you don't use a process like this it can really spoil your flow.
So before we finish this lesson just some final hints and tips that I've got based on my own experiences.
First if this is your first exam assume that you're going to run out of time.
Most people enter the exam not having an understanding of the structure and most people myself included with my first exam will run out of time.
The way that you don't run out of time and the way that you succeed is to be efficient, have a process.
Now assuming that you have the default amount of time you need to be aware that you have two minutes to read the question, read the answers and to make a decision.
So this sounds like a lot but it's not a lot of time to do all of those individual components.
You shouldn't be guessing on any answers until the end.
If you're guessing on a question then it should be in the hard question category and you should be tackling this at the end.
I don't want you guessing on any easy questions or any medium questions.
If you're guessing then you shouldn't be looking at it until right at the very end.
Another way of looking at this is if you are unsure about a question or you're forced to guess early on you need to be aware that a question that's later on so further on in the exam might prompt you to remember the correct answer for an earlier question.
So if you do have to guess on any questions then use the mark for review feature.
You can mark any question that you want for review as you go through the course and then at any point or right at the end you can see all the questions which are flagged for review and revisit them.
So use that feature it can be used if you're doubtful on any of the answers or you want to prompt yourself as with the hard questions to revisit them toward the end of the exam.
Now this should be logical but take all the practice tests that you can.
One of my favorite test vendors in the space is the team over at TutorialsDojo.com.
They offer a full range of practice questions for all of the major AWS exams so definitely give their site a look.
One of the benefits of the exam questions created over at TutorialsDojo is that they are more difficult than the real exam questions so they can prepare you for a much higher level of difficulty and by the time you get into the exam you should find it relatively okay.
So my usual method is to suggest that people take a course and then once they've finished the course take the practice test in the course, follow that up with the tutorials Dojo practice tests and for any questions they get wrong it can identify areas that they need additional study.
So rinse and repeat that process, perform that additional study, redo the practice tests and when you're regularly scoring above 90% on those practice tests then you're ready to do the real exam.
And at this point there are all of my suggestions for exam technique.
In the next lesson I want to focus on questions themselves because it's the questions and your efficiency during the process of answering questions which can mean the difference between success and failure.
So go ahead complete this video and in the next video when you're ready we'll look at some techniques on how you can really excel when tackling exam questions.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back.
In this video, I want to cover another part of the AWS global network, specifically the Edge Network, and that's AWS Local Zones.
Now, this is a key architectural concept that you'll need to understand for all of the AWS exams, and especially so for the real world.
So let's jump in and get started.
Now, before we talk about local zones, let's just refresh our memory on what the typical region and availability zone architecture looks like without local zones.
So we have a region, and let's say that this is US West 2, and within this we have three availability zones, US West 2A, US West 2B, and US West 2C.
And then running in this region across those availability zones is a VPC.
Now, an AWS region has high performance and resilient internet connections, and sitting between these and the AWS private zone is the AWS public zone.
So this is the zone where all of the AWS public services for that region run within.
And then lastly, on our right, we have our business premises.
What we know about this architecture so far is that it scales.
It can grow with your requirements, and that's really important because this is fully managed within the region.
We also know that it's resilient to failure.
The failure of one availability zone won't impact other availability zones, assuming a solutions architect has designed a solution which has infrastructure duplicated across all of the availability zones and things in one availability zone consume from that availability zone only, often regionally resilient services.
Now, what I haven't talked about until now is the effects of geographic distance.
The availability zones in this region might be hundreds of kilometers away from the business premises.
Now, this distance, even assuming that we're using fiber, can cause latency.
And this latency causes a reduction in performance, and this performance impact is noticeable at this distance.
To many use cases, a few milliseconds of latency might not sound like much, but for applications which are sensitive to latency, this can really matter.
An example might be a financial trading application.
Even if we use Direct Connect, physics and the speed of data transfer from point A to point B matters.
So how can we fix this?
Well, we can use AWS local zones and let's see how this changes the architecture.
Let's adjust the diagram a little and make it easier to see.
And we're going to add some subnets in availability zone 2A, 2B and 2C.
And we'll also have some EC2 instances running in these subnets.
When we're discussing local zones, we can refer to this region as the parent region.
So this region is the parent region to any local zones which operate in the same geographic area.
So we're also going to add some local zones to this architecture.
Now, these are identified starting with the region name and then a unique identifier for the local zone.
In this example, we have US West 2 and then LAS-1, which is a local zone in Las Vegas.
And we have US West 2 as its parent region.
So you can see the link between the local zone and the parent region because you can read the parent region at the start of the local zone name.
Now, it's possible to have multiple local zones in a given city.
For instance, in this example, we have US West 2-LAS-1A and 1B.
And both of these are in Los Angeles.
Notice how they use the international city code to identify them.
Now, think of these as related to the parent region, but they operate as their own independent infrastructure points.
So they have their own independent connections to the internet.
And additionally, generally, they also support Direct Connect, which means you can achieve high performance, private connectivity between your business locations and these local zones.
Now, different services support local zones in different ways.
And over the course of your studies, you're going to learn how.
With EC2 and VPCs, the VPC is simply extended by creating subnets within the local zones.
And then within these subnets, you can create resources as normal, utilizing the proximity of the local zone.
So these resources benefit from super low latencies.
The performance between the business premises and the local zone is at the extreme end of what's possible because of the smaller geographic separation between the local zone and your business premises.
Now, an important thing to keep in mind is that some things within the local zones still utilize the parent region.
So in this example, the subnets created in the local zones behave just like those in the parent region, and they have private connectivity just like any other subnets would.
Local zones have private networking with the parent region.
So remember that.
However, if we create EBS snapshots, then these use S3 in the parent region.
It means they still benefit from the AZ replication across all availability zones within that region that snapshots would normally benefit from.
So certain things occur within the local zone, but certain things rely on the parent region.
And one common example is EBS snapshots.
Now, let's finish up this video with some key summary points because for most of the AWS certifications, you only need to have this high level architectural overview.
So think about local zones as one additional zone or one additional availability zone so they don't have built-in resilience.
Conceptually, one zone runs in one specific facility.
So you can think of them like a single availability zone but near your location.
So they're closer to you so they have lower latency and lower latency means better performance.
So just imagine taking one of the availability zones within a region and duplicating it but putting it in a building next to your business premises.
Now, it won't always be that close, but there are some businesses which are built very close to these AWS local zones by design.
So you're able to get really close to the AWS infrastructure.
Now, not all AWS products support using local zones and for the ones that do, many of them are opt-in and many of them have limitations.
So if you're ever going to utilize local zones, you need to make sure that you check the AWS documentation for an up-to-date overview of what's supported within the local zones in your specific geographic area.
And I've made sure to include a link attached to this video which gives you up-to-the-minute overviews for all of the AWS local zones.
Now Direct Connect to local zones is generally supported and this allows local zones to be used to support any extreme performance needs or performance requirements.
And once again, local zones do utilize the parent region for various things and one example is EBS snapshots are taken to the parent region and replicated over S3 in that parent region.
Now, just to summarize this, you should use local zones as an architect when you need the absolute highest level of performance.
Local zones, much like cloud front edge locations, are much more likely to be positioned closer to your business than the parent region and any of the normal availability zones.
But if you do utilize local zones, you need to make sure that they do offer the functionality that you require.
So essentially, this is just another tool that you can use to build architectures as a solutions architect.
Now this is everything I wanted to cover in this video.
I just wanted to give you a high level overview of the architecture of local zones.
So go ahead and complete the video and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video, I want to cover Amazon SageMaker from a high level perspective.
Now this is a product which you need to understand at only a foundational level for most AWS exams, but in depth for some others.
If any additional knowledge is required for the course that you're studying, there will be follow up deep dive videos.
But if you only see this one, this is all that you'll require.
Now I have to apologize in advance.
I don't like creating lessons which I don't think add much real world value.
SageMaker is a special kind of product where you're only going to be able to use it effectively if you have some practical experience.
But because you don't need to understand how to use it in depth for most of the AWS exams, I don't really want to waste your time going into significant depth.
It's a fairly niche product to use in the real world.
So this lesson is really just going to present some high level features.
And you're not going to get as much value doing it this way.
But it's just one of those things that we have to do in this way to avoid wasting time.
So you probably won't really like this lesson, but just stick with it because it will benefit you for the exam.
Now let's jump in and get started straight away.
So SageMaker is actually a collection of other products and features all packaged together by AWS.
And it's an implementation of a fully managed machine learning service.
So it essentially helps you with the process of developing and using machine learning models.
So this includes data fetching, cleaning, preparing, and then training and evaluating models and then deploying those models and then monitoring those models and collecting data.
So it's used for this entire machine learning lifecycle and it's AWS's way of implementing one product or one container of products which can help you for this entire lifecycle.
Now there are a few key things that you need to be aware of about SageMaker.
The first is SageMaker Studio.
And this is essentially an IDE or integrated development environment for the entire machine learning lifecycle.
So this allows you to build, train, debug, and monitor machine learning models.
So think about this as a development environment for the machine learning lifecycle.
If you see this mentioned on the exam, you understand that at a high level what it is.
Now within SageMaker, you have the concept of a SageMaker domain.
And I want you to think about this as isolation or groupings for a particular project.
So you're provided with an EFS volume, separate users, you can use applications, policies, and different VPC configurations.
So just think of this as almost a container for a particular project.
And when you start using SageMaker, you'll have to create a SageMaker domain in order to interact with the product.
Next we've got containers.
And these are essentially Docker containers which are deployed to specific machine learning EC2 instances.
So these are specific types and sizes of EC2 instances which start with ML and they're designed specifically for machine learning workloads.
Now these Docker containers are machine learning environments which come with specific versions of the operating system, libraries, and tooling for the specific task that you're wanting to accomplish.
And there are many different pre-built containers that you can utilize with SageMaker.
Now SageMaker is also capable of hosting machine learning models.
So you can deploy machine learning models as endpoints that your applications can then utilize.
And there are various different hosting architectures which are either based on serverless or consistently running compute.
Now again, this is beyond the scope of this high level architecture video, but I did just want you to be aware that SageMaker can host your machine learning models.
This is something that you need to be aware of for the exam.
And then finally, be aware that SageMaker itself has no cost, but the resources that it creates do have a cost.
And it's fairly complex pricing because of the range of services which can be created by SageMaker.
And another important concern about SageMaker is because of the complexity of those resources and because of the compute requirements of machine learning in general, the resources which are deployed can be relatively large and carry significant cost.
And to help you with this, I'm going to include a link attached to this video which details some of the important cost elements for this product.
Now at this point, that's everything I'm going to cover in this video.
I need to apologize again, because I know it's at a very abstract and high level.
But the only thing that you need to be aware of for most of the AWS exams is how it works structurally at a high level.
There's only one exam where you need additional levels of understanding and that's the machine learning specialty exam.
So if you're studying this as part of the machine learning course, you're going to find many other videos which are deep diving into how SageMaker works, including advanced demos or mini projects which are going to give you practical experience.
But if you only see this one video, it means you're studying a course which only needs this really high level understanding.
And with that being said, that's everything I wanted to cover in this video.
So go ahead and complete the video.
And when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I'm going to continue my super high level overview of the AWS machine learning services, this time looking at Amazon fraud detector.
Now once again for nearly all of the AWS certifications a very high level overview is more than enough.
If you do need any additional knowledge for the course that you're studying I will have follow-up videos but this one just covers the basics so let's jump in and get started.
Now Amazon fraud detector as the name suggests is a fully managed fraud detection service so this allows you to look at various historical trends and other related data and identify any potential fraud as it relates to certain online activities so this might include things like new account creations payments or guest checkouts.
Now the architecture of this much like the other managed machine learning services is that you upload some historical data and you choose a model type.
Now for this particular service we have a few different model types.
First we have online fraud and this is designed for when you have little historical data and you're looking to identify any problematic events such as new customer accounts.
You might not have any data for a particular customer logically enough they're just signing up for an account so this looks at general trends such as if there are any surrounding elements of concern around this particular sign-up for this particular user.
Now we also have another model type which is transaction fraud and this is ideal for when you do have a transactional history for that customer and this transactional history can be used to identify any suspect payments.
This is fairly commonly used when you're performing credit card transaction validation.
Generally you do have a full purchase history for a customer so for example what stores are used, the types of purchases that are used, the value of those purchases, the countries that the purchaser is in or the store is in as well as the amounts and times of day.
So you can build up a fairly good profile for a given customer of what their normal transactions are like and this is what this particular model allows so you can import the transactional history for one or more customers and use this to identify any suspect payments and then the last is the account takeover model type and this can be used to identify any phishing or other social media or social based attacks.
For example if somebody signs in from a completely different location or if they're referred from a certain site all of this can be taken into account to identify if this user has potentially been affected by an account takeover attempt.
Now the way that this product works is that all of the various events are scored and then you can create rules which is basically a type of decision logic and use this to react to these scores based on your business activity or business risk.
So Amazon Fraud Detector is another back-end style service which generally you're going to be integrating into your applications or environments rather than using interactively from the console.
Now this high-level understanding is everything that you'll need for most of the AWS exams so I'm going to limit this video to covering just this high-level overview.
So at this point we have finished go ahead and complete this video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I'm going to continue my super high-level overview of the AWS machine learning services.
This time looking at Amazon forecast.
Now this is not weather forecasting but forecasting based on time series data.
Once again for nearly all of the AWS certifications a very high-level overview is more than enough.
If you need any additional knowledge I'll have follow-up videos but this one just covers the basics.
So let's jump in and get started.
So Amazon forecast provides forecasting for time series data.
Now this means things like predicting retail demand, supply chain, staffing levels, energy requirements, server capacity and web traffic.
So any type of data which is time series where you have a large amount of historical and related data well you can use this together with forecast to provide the ability to forecast future trends and events.
Now the way that you do this is you import historical data and related data.
Now simple historical sales data might include just an item being sold and then a date and timestamp and you can use this to trend how popular that item is throughout that time series and use that for simple forecasting.
Related data includes extra contextual information such as any promotions which might have been running throughout a time period and even things like the weather which can influence the data over a particular time period.
So forecast uses both of these different sets of data so historical and related and it understands what's normal over a particular time period and the output of this will be a forecast and forecast explainability.
Now the forecast is simple enough to understand.
It allows you to trend out the future demand for a particular thing or understand the future requirements for a particular thing.
The explainability though goes into more depth.
It allows you to extract the reasons for changes in this demand.
So for example I mentioned the weather as a related data item.
Well in the retail world weather can have a huge impact on the general demand levels for products.
Generally all products will be influenced to base level based on the weather.
Some weather patterns may cause people to stay at home versus some may encourage people to go shopping.
Now this obviously has a much greater effect for physical shopping in a retail establishment versus online but other elements of weather can also affect online shopping.
For example if it's raining heavily you might see a demand increase for certain types of clothing.
So this is all involved in the ability to forecast future demand based on historical data and this is what this product provides.
It's a managed service which provides the ability to get access to forecasting for time series data.
Now you can interact with the product from the web console where you can use it to see visualization.
So see past data, future forecasting as well as explore explain ability of that forecast.
It can also be interacted with using the CLI, APIs and the Python SDK.
Now once again this is a back-end service.
It's generally something that you're going to use as part of either a business process or integrating it with your application.
It's relatively niche in its use and so for most AWS exams this high-level overview is everything that you'll need.
If you do need any additional knowledge for the course that you're studying there will be additional theory and/or practical videos but at this point that's everything I wanted to cover in this high-level video.
Go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video, I want to cover the high level architecture of the Amazon Translate product.
This is another machine learning product available within AWS and if you need any other knowledge over and above architecture, there will be additional videos following this one.
If you only see this video, don't worry, it just means that this is the only knowledge that you need.
Now let's just jump in and get started straight away.
Amazon Translate as the name suggests is a text translation service which is based on machine learning.
It translates text from a native language to other languages one word at a time.
Now the translation process is actually two parts.
We first have the encoder and the encoder reads the source text and then outputs a semantic representation which you can think of as the meaning of that source text.
Remember that the way that you convey certain points between languages differs.
It's not always about direct translation of the same words between two different languages.
So the encoder takes the native source text and it outputs a semantic representation or meaning and then the decoder reads in that meaning and then writes to the target language.
Now there's something called an attention mechanism and Amazon Translate uses this to understand context.
It helps decide which words in source text are the most relevant for generating the target output and this ensures that the whole process correctly translates any ambiguous words or phrases.
Now the product is capable of auto-detecting the source text language.
So you can explicitly state what language the source text is in or you can allow that to be auto-detected by the product.
Now in terms of some of the use cases for Amazon Translate, well it can offer a multi-lingual user experience.
So all the documents that exist within businesses are generally going to be stored in the main language of that business.
But this allows you to offer those same documents such as meeting notes, posts, communications and articles in all of the languages that staff within your business speak and this can make it much easier for organizations which have officers in different countries to operate more efficiently.
This also means that you can offer things like emails, in-game chat or customer live chat in the native language of the person that you're communicating with and this can increase the operational efficiency of your business processes.
It also allows you to translate incoming data such as social media, news and communications from the language that they're written in into the native language of the staff that are interpreting those incoming communications.
Now more commonly, Amazon Translate can also offer language independence for other AWS services.
So you might have other services such as Comprehend, Transcribe and Poly which operate on information and Translate offers the ability for these services to operate in a language independent way.
It can also be used to analyze data which is stored in S3, RDS, DynamoDB and other AWS data stores.
Now generally with this product you're going to find that it's used more commonly as an integration product.
So rather than use it directly, it's more common to see it integrated with other AWS services, other applications including ones that you develop and other platforms.
So in the exam if you see any type of scenario which requires text to text translation then think Translate.
If you see any scenario which might need text to text translation as part of a process then Translate can form part of that process.
So you might want to translate one language into another and then speak that language or you might want to take audio which is in one language output text and then translate that to a different textual language.
Keep in mind that Translate is often used as a component of a business process.
So really keep that one in mind.
It's not always used in isolation.
Now with that being said, that is everything I wanted to cover in this video.
So go ahead and complete the video and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video we're going to be doing a really quick overview of Amazon Transcribe.
Now there isn't a lot to understand about this product so let's just jump in and get started straight away.
Amazon Transcribe is an automatic speech recognition or ASR service.
Essentially the input to this product is audio and the output is text.
So it takes audio in the form of speech and then outputs the text version of that speech.
And it offers various features which improve this process so things like language customization, filters for privacy, audience appropriate language as well as speaker identification.
In addition you can configure custom vocabularies as well as creating language models specific to your use case.
Now the product is actually pay-per-use and your build per second of transcribed audio.
Now with this product it's going to be much easier to see it working so I'm going to switch across to my console and give you a quick demonstration.
Okay so I'm at the AWS console, I'm logged in as the I am admin user of my general AWS account and I'm in the northern Virginia region.
I'm just going to go ahead and click in the search box at the top and type transcribe and then click to move to the transcribe console.
Once I'm here I'm going to click on real-time transcription because this is what I'm going to use to demo the product.
And once I'm here I'm going to click on start streaming, say something into my microphone and then click on stop streaming.
I like cats, dogs, chickens and rabbits.
Spiders not so much.
At this point we can see that the product has transcribed what I said into my microphone into text and then by clicking here we could download a full transcript of this audio.
Now if we just scroll down we're able to look at some of the more advanced features of the product so you can specify a specific language or you can set it to auto detect the language.
You're able to enable or disable speaker identification, set various content removal settings such as vocabulary filtering or personally identifiable information and other redactions.
If we expand customizations it's here where we can specify custom vocabulary, partial result stabilization or a custom language model and we can even expand this to look at application integration.
So this product is capable of being integrated with either your own applications or other AWS products and it's on the menu on the left where you can interact with the specific versions of Amazon Transcribe such as call analytics or Transcribe Medical.
Now both of these are beyond the scope of what I want to cover in this video but I did want you to be aware that they exist.
Now at this point that's everything that I wanted to do in the walkthrough so I'm going to move back to the second part of the theory I'll be covering in this video.
Now in terms of the use cases of Amazon Transcribe it allows you to do full text indexing of audio so you can convert audio into text and that can be used for full searching of the text version of that audio.
It can also be used to create meeting notes.
If you use the product to ingest audio from any meetings that you conduct then you can have full text records of those meetings.
It can be used to generate subtitles, captions or transcripts of any video that your organisation uses.
The product comes in a number of specific flavours so Transcribe, Transcribe Medical and then Transcribe Call Analytics and in the call analytics version you're able to ingest audio from any audio phone calls and assess the characteristics, perform summarisation, look at categories and sentiments of the people talking on that phone call.
The product is also capable of being integrated with your own applications through APIs as well as being able to be integrated with other AWS machine learning services so Amazon Transcribe can convert from speech into text and then that text can be ingested by other AWS machine learning products to perform further analysis.
Now that's the product at a high level, in this video we're just going to stick to this high level summary.
If the topic that you're studying requires any additional information there will be additional theory and if required practical videos.
But for most of the AWS certifications this is all the information that you'll require so go ahead and complete this video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video, I want to briefly talk about Amazon Textract.
Now much like the other machine learning products which I'll be covering, in this video I'll only be covering Textract from a high level architecture perspective.
If more knowledge is required, there will be additional theory or practical videos, but don't be alarmed if this is the only one.
That just means that this is the only level of knowledge that you'll require.
Now let's jump in and get started straight away.
So Amazon Textract is another machine learning product available within AWS.
And this product is used to detect and analyze text contained within input documents.
And currently input documents mean JPEGs, PNG, PDF or TIFF files.
Now these are the inputs and then the outputs is extracted text, the structure of that text and then any analysis which can be performed on that text.
So this is a product which can take in one of the supported document formats and extract any relevant information.
And as you'll see, this can include things like generic documents, identity documents or receipts or invoices.
And I'll be demoing a couple of these from within the AWS console later on in this video.
Now for most documents, the product is capable of operating in a synchronous way, so real time.
So if you're inputting a normal size document, then you can expect the results of that analysis almost immediately after you submit.
For large documents, these are processed in an asynchronous way.
So this might be a large PDF, I think hundreds of pages.
And for this, you might have to submit the job and then wait for processing to occur.
Now the product is paper usage, but it does offer custom pricing available for large volumes.
Now in terms of the use cases of this product, at a high level, it offers the detection of text and the relationship between that text.
So you might have, for example, a receipt or invoice and it will be able to detect all of the relevant items, so prices and products, dates, as well as any interaction between those different elements.
So it might know, for example, that a particular product line has a specific price and this has a specific element of tax.
It also generates metadata.
So for example, where that text occurs.
And then for particular types of documents, it offers specific types of analysis.
So for generic documents, it might be able to identify names, addresses and birth dates, but then for receipts, it might be able to identify prices, vendors, line items and dates.
For identity documents, it can also do abstraction of certain fields.
So you might have driver's licenses, which offer a driver license ID and then passports, which offer passport IDs.
And the product is capable of assessing both of these and abstracting that in to a document ID field.
So you can analyze many different types of identity documents and store all of that data in a common database schema, which has abstracted field IDs such as document ID.
Now, again, this product is capable of either being used from the console UI or from the APIs.
And so it can be integrated with any applications that you either develop or architect.
As well as this, it can also be integrated with other AWS products and services, including other machine learning products.
Now it's going to be far easier for you to be able to understand how this product works if I give you a brief example.
So I'm going to go ahead and switch across to my AWS console and step through a couple of examples.
So let's go ahead and do that.
Okay, so I'm at the AWS console and logged into the IAM admin user at the general AWS account.
And I have the Northern Virginia region selected.
So I'm just going to go ahead and click in the search box at the top and type text tract.
And then click to move to the text tract console.
And I'm just going to step through a very simple set of examples.
So I'm going to go ahead and click try Amazon text tract and the console itself will give you a couple of examples of how it can be used.
So in this particular case, it's analyzed a vaccination card.
So you can see that on this card, the data is in both a structured and unstructured format.
So we have a number of the fields that are stored in a relatively structured way.
So we have last name, first name, date of birth and a patient number.
We also have a table below it, which stores the dates of the various vaccinations.
And as you might know, if you've taken a number of vaccinations, this can either be typed, handwritten or stamped.
And it's often not always within the nice confines of a particular area of the table.
Now this product is able to extract all of the important elements.
You can see that as I'm hovering my mouse here, it's able to identify the particular elements of data.
So I can click on all of these individually.
And you can see that they're now selected in this results table on the right.
So it's intelligent enough to identify these individual elements, even if they're slanted or different sized or in a hard to read format.
So this is a simple vaccination record.
And we can see all of the information stored on the right.
We do have different sample documents though, if we click on this dropdown, we've got a pay stub.
This is obviously more complex with a larger amount of data.
And as you can see, it's still identified all of this information.
So it's extracted all of the key values from this table.
Now what's even cooler is if I click on tables on the right, we can see that for this specific table, it's even got the intelligence, not only to extract the data, but also to extract the actual table structure itself.
I can move through the document and see multiple tables, each containing information.
And as you can see, a lot of these are really complex in the formatting.
And yet the product has managed to extract all of the data with no problems.
We've got other types of documents.
So this, for example, is a loan application.
And again, the text extract product is capable of extracting all of this, including larger blocks of text and enable us to browse through and deal with that data, both interactively from the console, but we could also do this from the APIs from within our own application.
Now the product also has some specific types of analysis that it can do.
So if I click on analyze expense, it's able to perform an extraction of data on a sample receipt.
So you can see in this example, not only is it able to extract textual data, but it can also extract vendor information from these receipts.
We can also perform the same type of process to analyze ID documents.
And in this case, we're showing a driver's license and it's able to extract this number on this driver's license to be a document number.
So we can see this document number here.
It's abstracted this driver's license number to be the document number.
And if we switch to a passport document, in this particular case, it's going to take the passport number and it will extract this also into a document number.
Now, if you're doing identity document verification, for example, if you run an online application and need to perform know your customer or anti-money laundering techniques, so you need to identify and verify ID documentation, then this ability to take specific fields of certain identity documents and abstract them away and use the same data structure is really valuable.
Now, this is everything I wanted to cover about the Textract product.
Again, this is a basic architectural level introduction.
If for the particular topic that you're studying, you require additional information, then there will be videos following this one, which go into more depth, either theory or practical.
But if you don't see any additional videos on this product, don't worry, it simply means that for whatever topic you're studying, this is all the information you require.
At this point though, that is the end of this video, so go ahead and complete the video and when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video, I want to talk about another really cool AWS product called Amazon Recognition with a K.
Now let's jump in and get started because I'm actually super excited to step through this product and how it works.
Recognition is a deep learning based image and video analysis product.
Deep learning is a subset of machine learning.
So this is to say that recognition can look at images or videos and intelligently do or identify things based on those images or videos.
Specifically, it can identify objects in images and videos such as cats, dogs, chickens, or even hot dogs.
It can identify specific people, text, for example, license plates, activities, what people are doing in images and videos.
It can help with content moderation.
So identifying if something is safe or not.
It can detect faces.
It can analyze faces for emotion and other visual indications.
It can compare faces, checking images and videos for identified individuals.
It can do pathing, so identify movements of people in videos.
And an example of this might be post game analysis on sports games and much, much more.
It's actually one of the coolest machine intelligence services that AWS has.
And that's saying a lot.
The product is also pay as you use per image pricing or per minute of video.
And it integrates with applications via APIs and it's event-driven.
So it can be invoked say when an image is uploaded to an S3 bucket.
But one of its coolest features is that it can analyze live video by integrating with Kinesis video streams.
So this might include doing facial recognition on security camera footage for security type situations, distinguishing between the owner of a property and somebody who's attempting to commit a crime.
All in all, it's a super flexible product.
Now generally for all of the AWS exams, you will need to have a basic understanding of the architecture.
There are some AWS exams, for example, machine learning, where you might need additional understanding.
And if you're studying a course where that additional understanding is required, there will be follow-up videos.
In general though, it's only a high-level architecture understanding.
And one example architectural flow might look something like this.
So an image containing whiskers and woofy is uploaded to S3.
Now we've configured S3 events and so this invokes a Lambda function.
The Lambda function calls recognition to analyze the image.
It returns the results and then the Lambda function stores the metadata together with a link to the image into DynamoDB for further business processes.
To give some context as to the other things that recognition can do, let's just take an entirely random selection of images from the internet.
So recognition can identify celebrities such as Ironman.
It can also identify mic chambers that I think as a machine learning service, it might be slightly biased.
It can identify text in images or videos such as license plates on cars or other internet memes.
It can even identify objects, animals, or people in those same memes.
For faces specifically, it can identify emotions or other attributes.
So for example, identifying this random doctor is a male who currently has his eyes open and is looking very, very serious rather than being happy in any way.
So that's recognition.
If you have questions in the exam which need general analysis performing on images or videos for content, emotion, text, activities, anything I've mentioned in this lesson, then you should default to picking recognition.
It's probably going to be the correct answer.
Now with that being said, that is everything I wanted to cover in this video.
Go ahead and complete this video And when you're ready, I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back.
And in this video, I want to briefly talk about the Amazon Poly product.
Now, this is going to be another very brief video.
All you need to know for most AWS exams and to get started in the real world is the product's high level architecture.
If there are any other knowledge requirements for the topic that you're studying, there will be additional videos following this one.
If not, don't worry, this is everything that you'll need to understand.
Now, let's just jump in and get started.
So Amazon Poly at a high level converts text into life-like speech.
So in the example below, it takes I like cats, dogs, chickens and rabbits, spy does not so much.
It takes that text and it generates a life-like voice or life-like speech.
So essentially the product takes text in a specific language and results in speech, also in that specific language.
This is really important to understand.
Poly performs no translation.
It can only take text in a given language and output speech also in that language.
Now, there are two modes that Poly operates in.
We've got standard TTS or text to speech and this uses a concatenative architecture.
It essentially takes phonemes, which are the smallest unit of sound in the English language.
For example, for the letter A, the phoneme is a.
So it takes those smallest units of configurations and uses a concatenative architecture to build patterns of speech.
We've also got neural text to speech.
This takes those phonemes, it generates spectrograms, it puts these spectrograms through a vocoder and that generates the output audio.
Now, this is a much more complex way of generating speech using artificial intelligence.
It's much more computationally heavy, but what it does is result in much more human or natural sounding speech.
Now, the product is capable of outputting in different formats.
So you've got MP3, Agvorbis or PCM.
And the output format that you'll choose depends on how you're intending to integrate Poly with other products.
So you might choose PCM, for example, if you're wanting to integrate with various AWS products.
It fully depends on what your architecture is.
Now, there are a few final points that I want to cover.
First is that Poly is capable of using the speech synthesis markup language.
And this is a way that you can provide additional context within the text so you can control how Poly generates speech.
So examples of this include that you might want to get Poly to emphasize various parts of sentences or pronounce things in certain ways.
You might want Poly to whisper certain components of the text or you might want to use an over-exaggerated newscaster speaking style.
These are all things that you can control with speech synthesis markup language, which is SSML.
Now, again, Poly is the type of product that's going to be integrated with other things.
For example, you can get a WordPress plugin which allows articles on WordPress blogs to be spoken.
Poly can be integrated with other AWS services where you need speech to be generated based on text.
Or you can integrate Poly with your own applications using the APIs.
Again, this is another product which is going to be more often integrated with other things.
Now, this isn't a product where you're going to get much benefit from seeing this in action.
It's a very niche product that you're only ever going to use in certain situations.
For most AWS exams, you just need a high level architectural overview.
So at this point, that is everything I wanted to cover in this video.
Go ahead and complete the video and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover the high level architecture of Amazon Lex and Amazon Lex is a product which allows you to create interactive chatbots.
Now for most areas of study and for solutions architects in the real world you just need to have a basic level of understanding and that's what this video will provide.
If you need to know anything else in the course that you're studying there's going to be follow-up videos to this one.
If not don't worry this video will cover everything that you need.
Now let's jump in and get started.
Amazon Lex is a back-end service.
It's not something that you're likely to use from a user perspective.
Instead you'll use it to add capabilities to your application.
So Lex provides text or voice conversational interfaces.
For the exam remember Lex for voice or Lex for Alexa.
If you're familiar with the Amazon voice products then just know that Lex powers those voice products so it provides the conversational capability.
It's what lets the lady in the tube answer your questions.
Now Lex provides two main bits of functionality.
First automatic speech recognition or ASR which is simple speech to text.
Now I say simple but doing this well is exceptionally difficult.
If any of you have tried using Siri which is Apple's voice assistant notice how often it gets things wrong versus the Alexa product.
That's because it doesn't do ASR as well as Lex.
And for any lawyers listing this is just by opinion.
Now Lex also provides natural language understanding or NLU services and this allows Lex to discover your intent and can do intent chaining.
So imagine the act of ordering a pizza.
How would you start off that conversation?
Maybe can I order a pizza please?
Or maybe I want to order a pizza.
What about a large pepperoni pizza please?
The intent the thing you want to do is order pizza.
It's Lex's job to determine that.
But what about your next sentence?
Make that an extra large please.
Well Lex needs to understand that the second statement relates to the first.
As a human it sounds easy because humans are good at this type of natural language processing.
Computers historically have not been so great.
So Lex allows you to build voice and text understanding into your applications.
It's the type of thing that you would use when you don't want to code the functionality yourself.
You integrate Lex and it does the hard work on your behalf.
Now as a service it scales well and integrates with other AWS products such as Amazon Connect.
It's quick to deploy and uses a pay-as-you-go pricing model so it only costs when using it.
It's perfect for event driven or serverless architectures.
Now in terms of the use cases which Lex can help with you might use it to build chat bots, those annoying apps on web pages which ask you if you want help or automated support chats when logging support tickets.
It can be used to build voice assistants, you ask for something and the lady in the tube delivers.
Things like QA bots or even info or enterprise productivity bots.
Any interactive bot which accepts text or voice and performs a service.
Now let's review some of the key Lex concepts.
So Lex provides bots and bots are designed to interactively converse in one or more languages.
I've mentioned the term intent previously.
This is an action that a user wants to perform.
So things like ordering a pizza, ordering a milkshake or getting a side of fries.
Now as well as the intent we also have the concept of utterances and when creating an intent you're able to provide sample utterances and utterances are ways in which an intent might be said.
So in order to order a pizza, a milkshake or some fries you might start off by saying can I order or I want to order or give me a.
These are all different ways, different ways you can utter or provide utterances for an intent.
In addition to configuring utterances you also have to tell Lex how to fulfill the intent and often this is using lambda integration.
So if Lex understands that you do want to order a pizza it needs to have some way of initiating that pizza ordering process and often this is using lambda.
Again lambda is really great in an event driven architecture so it integrates and complements Lex really well.
Additionally you also have the concept of a slot and you can think of these as parameters for an intent.
So you might have things like the size of a pizza, small, medium or large and what type of crust normal or cheesy and you can configure these to be required parameters that go along with an intent.
So pieces of information that Lex needs to gain from that interaction with a user and just to reiterate Lex is the type of product that you're not going to use directly via the console it's going to be something that you're going to architect or develop into your applications.
If you want to provide interactive voice assistance via a chat or voice capable bot then you're going to use Amazon Lex.
So remember this for the exam.
With that being said that is everything I wanted to cover in this video so go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video, I'm going to quickly discuss the functionality provided by Amazon Kendra.
Now, this is something which you only need to have the highest level exposure to for most of the AWS exams.
If you need to know more than this basic level, there will be other videos diving deeper into Kendra functionality, but this video will stick to the basics.
So if it's the only video on Kendra in the course that you're taking, that's fine.
This is everything that you need to know.
So let's jump in and get started.
Now, Kendra is an intelligent search service.
So its primary aim is that it's designed to mimic interacting with a human expert.
So the idea is that you can use Kendra to search a source of data and you feel as though you're interacting with a human expert on that data subject.
So it supports a wide range of question types.
We start with simple factoid based questions, such as who, what and where.
So you can ask questions like who wrote this book, what the best type of animal is, of course it's cat, and where is this particular place.
And Kendra will attempt to surface the answer to that information and do it in such a way that it mimics the type of interaction and the quality of response that you get with a human.
Now, another type of question is a descriptive question.
And this might come in the form of how do I get my cat to stop being a jerk.
And so Kendra has to understand all the question text as well as the intent of the question.
So what it is you're trying to surface.
And then lastly, we have keyword type questions.
Now, these are much more difficult than they seem on first glance.
Imagine that you ask a question, what time is the keynote address?
Well, in this particular case, a dress doesn't always have the same meaning.
You can mean a postal address or you can mean in this context, a speech.
So a keynote address is a speech, a keynote speech of a conference.
And so Kendra has to help to determine the intent of the question being asked.
And that's why it's a very difficult problem to solve and why Kendra adds significant value by delivering this as a service.
Now Kendra is another backend style service.
You're going to use this product to provide functionality to any applications or systems that you design or implement.
So generally you will provide search capability within an application and use Kendra as a backend.
Now let's go through the key concepts of Kendra.
We start with an index, which is a searchable block of data organized in an efficient way.
So the index is the thing that Kendra searches when dealing with user queries.
We have a data source and this is the original location of where your data lives.
So Kendra connects and indexes from this location.
Now examples of this might be an S3 bucket, Confluence, Google Workspaces, RDS OneDrive, Kendra Webcrawler, WorkDocs, FSX, and many other data sources.
Now there are too many to list on this screen but I will attach a link to this video which gives an exhaustive list.
Now the idea is that you configure Kendra to synchronize a data source with an index based on a schedule.
And this keeps the index current with all of the data that's on the original data source.
Now in terms of the data that's being indexed, you have documents and these are either going to be structured or unstructured.
So structured might include things like frequently asked questions and unstructured data might be things like HTML files, PDFs and text documents.
But again, there are many more and there isn't space to have them on screen at the same time.
So I'll include a link attached to this video which gives a full overview of all of the different types of documents that Kendra can index.
Now Kendra as a product integrates with lots of different AWS services such as IAM for security and the IAM Identity Center previously known as AWS SSO for various single sign-on services.
And again, just to reiterate, Kendra is a backend product.
It's something that provides functionality to something else.
And so you're going to be interacting with Kendra using APIs which are surfaced through your applications.
It's not something that you'll generally interact with as an end user through the AWS console beyond the initial setup process.
Now that's everything that you need to understand for most AWS exams and to get started with it in the real world as a solutions architect.
So at this point, go ahead and complete the video and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to very briefly talk about Amazon Comprehend and this is a natural language processing service available within AWS.
In short you input a document and it develops insights by recognizing the entities, keyphrases, language sentiments and other common elements of that document.
Now this video is going to cover the basics.
If you need to have a greater level of understanding for the thing that you're studying there will be follow-up videos.
But in this video we're going to cover the important high level elements so let's jump in and get started.
So I mentioned moments ago that Amazon Comprehend is a natural language processing service and that you input a document so conceptually think of this as text and then the output is any entities, phrases, language, personal identifiable information so any important elements of a document is surfaced by the Comprehend service and you get access to all of these things which have been identified.
Now the Comprehend service is a machine learning service and it's based on either pre-trained or custom models so you've got the option of using either of these.
Now the product is capable of doing real-time analysis for small workloads or asynchronous jobs for larger workloads and the service can be used from the console or command line for interactive use or you can use the APIs to build Comprehend into your applications.
Now it's going to be far better for you to gain an understanding of this product if you can see it operate visually.
So I'm going to go ahead and switch across to my console and just give a brief walkthrough of exactly what the product can do.
Now you can either watch me do this walkthrough or you can follow along within your own environment but I'm going to go ahead now and switch across to my AWS console.
Okay so I'm currently logged into my AWS console I'm logged in as the IAM admin user of the general AWS account and as always I've got the Northern Virginia region selected.
Now once I'm here I'm going to click in the search box at the top and just go ahead and type Comprehend and then click on Amazon Comprehend.
When you first move to the Comprehend service you'll see something like this you've got the hamburger menu on the top left where you can access all of the detailed areas of the console but just to illustrate this I'm going to go ahead and click on launch Amazon Comprehend.
Now when you first launch the product it does give you some sample input text so that you can see exactly how the product works and we're going to start with this sample text.
So if I just scroll down slightly under input text you'll be able to see the analysis type is built in and this allows you to have real-time insights based on the AWS built-in models so this is one of the machine learning models which I discussed in the theory part of this lesson.
We've also got custom and this allows you to create custom models which fit your data.
In this case though we're going to use the built-in analysis type and if I just scroll down this is sample input text it essentially describes an individual, a company, it includes some credit card information as well as other personally identifiable information.
So what I'm going to do is go ahead and click on analyze to analyze this sample input text.
So examples of what we get are entities so it searched the input text and it's identified anything which it classifies as an entity and examples of this are persons so the person entity and it identifies a person here which is the recipient of this email as well as the sender.
So John is identified as a person and it has a 0.99 plus confidence.
Now this is expressed from a value from 0 which is 0% through to 1 which is 100% and the confidence level shows how confident comprehend is of its identification.
In this case it's over 99% sure that John is a person.
Next we have any company financial services and this has been identified as an organization again with a high level of confidence.
We have a credit card information and this has been identified as an entity of type other.
We have a location so an address and then other again in the form of an email address again with 99% plus confidence.
Now what we can also do is click on key phrases and have it filter based on all of the key phrases within this text.
We can click on language and the product is capable of identifying any languages used in the text in this case English with again a 99% confidence.
We can click on PII and see any personally identifiable information.
So we have names, we have credit card numbers, date and times, bank account numbers, bank routing, address, phone numbers and emails and we can even click on sentiment to identify the overall level of sentiment in this text and for this it's identified the majority as being neutral.
Now what I could do is replace this text so I'm going to do that.
So I'm going to click in the box and delete it and I'm going to replace it with this.
So my name is Adrian and I'm 1300 and 37 years old, my favorite animals are cats and I own 500 of them and my least favorite are spiders.
I'm going to go ahead and analyze this piece of text.
This time what I want to focus on is the sentiment analysis so it still identifies all of the keywords, the quantities in the form of my age and the number of cats that I own.
It's still identifies the language used which is English and if I click on sentiment we'll see that there is some neutral, positive and negative but overall it's a mixed sentiment and that's because I specified that my favorite something is cats and my least favorite are spiders.
So there's a mixed level of sentiment in this text.
If for us to go ahead and delete the spiders part so just have my name, my age, my favorite animals and how many of them I own and then go ahead and analyze this and then go to sentiment.
Now we'll see that it's changed.
We have a mixture of neutral sentiment and positive sentiment so you can see how sentiment, key phrases and any personally identifiable information can be surfaced from an input document and that's what the Comprehend service does.
Now again this can be used from the console, the CLI and the API so while I'm demonstrating a very simple interaction with this product you can use the API's and integrate it with your applications or other parts of the system so this is an important product to understand both from a solutions architect, developer and operations perspective.
Now at this point that is everything I wanted to cover in this video I just wanted to give you a really high-level overview of how the Comprehend product works.
At this point though go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk in a little bit more detail about the backups and resilience options we have with Redshift.
In the last lesson you learned how Redshift functions within a single availability zone and so it's at risk from the failure of that availability zone.
Based on this as a solutions architect you need to ensure that you design your systems with this in mind.
So let's step through the options that we have and I'll try to keep this nice and brief because it's really a continuation of what I covered in the previous lesson.
Now let's say that we have a Redshift cluster in US East 1.
It's simplified a bit and so it only has two availability zones, Availability Zone A and Availability Zone B.
And as we know now Redshift runs from only one of them.
In this example Availability Zone B.
Now this means that if we have a failure in Availability Zone B we have problems.
The entire cluster will fail and any data it's managing will be at risk.
Redshift provides a number of useful recovery features.
First it can utilize S3 for backups in the form of snapshots.
Now there are two types of backups supported by the system.
Automatic backups which occur once every eight hours or so or every five GB of data change and these occur automatically into S3.
They have by default a one day retention period configurable for anything up to 35 days and you get backup capacity totaling the capacity of your cluster for free included in the price of the cluster.
Now the snapshots are incremental so only the data changed is stored and charged for.
There are also manual snapshots which are performed explicitly by a person or a script or a management application and these have no retention they last until they're removed by a manual process.
For both types of backups because they're stored on S3 you immediately benefit from the resilience profile of S3 meaning that data is replicated between three or more Availability Zones in that same region.
So while Redshift isn't resilient across Availability Zones in a region the data managed by Redshift can be.
Restoring from snapshots creates a brand new cluster and you can choose a working Availability Zone for that cluster to be provisioned into.
Now if you have a major problem in the region impacting multiple or all Availability Zones in that region then Redshift offers further resilience still.
You can configure snapshots to be copied to another AWS region.
For example you might choose to copy snapshots to AP, Southeast 2, Land of Kangaroos, Australia.
This means your data would be safe even from the failure of the entire original region and a new cluster could be provisioned in this new region quickly in the event of a disaster.
Copied snapshots can be set with an independent retention period and this can help minimize costs.
Now that's pretty much all I wanted to talk about with regards to Redshift backups.
There are questions in the exam about Redshift and high Availability and resilience options and so I think it's important to cover that and how you can avoid the single AZ risks by using backups effectively and so that's what we've covered in this lesson.
With that being said, thanks for watching, go ahead and complete the lesson and then when you're ready you can move on to the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back.
In this lesson, I want to briefly talk about Amazon Redshift.
Redshift is a complicated product and there's no way that I can talk about it all in this lesson.
So I'll be focusing on the things which really matter for the exam from a solutions architect perspective.
Now we have a lot to cover, so let's jump in and get started.
Redshift is a petabyte scale data warehouse.
A data warehouse is a location where many different operational databases from across your business can pump data into for long-term analysis and trending.
It's designed for reporting and analytics, not for operational style usage.
And I'll explain what that means in a second.
Now it's petabyte scale because it's been designed from the ground up to support huge volumes of data.
Now Redshift is an OLAP database rather than OLTP, which is what RDS is.
The difference is really important to understand for the exam.
Online transaction processing or OLTP captures, stores and processes data from transactions in real time.
So this is the type of database used when say adding orders to an online store or a database of the best cat pictures in the world.
It's designed as the name suggests for transactions and this means inserts, modifies and deletes.
Online analytical processing or OLAP is designed for complex queries to analyze aggregated historical data from OLTP systems.
So other operational or OLTP systems put their data into OLAP systems.
So RDS might put its data into Redshift for more detailed long-term analysis and trending.
Now Redshift stores its data in columns.
Imagine a database of all the best cats in the world.
Every row in the database represents one cat.
It stores the microchip ID, the name, the age, the color, its favorite food and so on.
In a row based or OLTP database, data is stored in rows because you always interact with specific records, specific cats.
So updating their ages or editing other attributes.
Now this means that if you wanted to query the average age of all the cats in the database, you'd need to read through every row looking for just one field.
With column based databases, data is stored in columns.
So all the names, all the ages and so on.
It makes reporting style queries much easier and more efficient to process.
Now Redshift is one such database.
It's a column based database and Redshift is delivered as a service just like RDS.
So it's pretty quick to provision and you can actually provision it, load data, use it for something and then tear it down when you finished.
Generally data is loaded into Redshift before being worked on, but the product includes some really advanced functionality.
Two in particular are really cool.
First is Redshift Spectrum, which allows for querying of data on S3 without loading it into Redshift in advance.
And this is great for larger datasets.
You still do need a Redshift cluster, but it means that instead of going through the time consuming exercise of loading data into the platform, you can use Redshift Spectrum to query data on S3.
There's also Federated Query, which is kind of like Federated Identity, but instead of Identities, it allows you to directly query data that's stored in remote data sources.
So essentially you can integrate Redshift with other databases, foreign or remote databases and query their data directly.
Now Redshift integrates with other AWS tooling such as QuickSight for visualization and it has a SQL like interface for data access.
It allows you to connect using JDBC and ODBC standards.
So if your database app supports either, then it can connect natively to Redshift.
Now let's go through some key architectural points before we look visually at how Redshift fits together.
First, Redshift is a provisioned product.
It uses servers, so it's not a serverless product like say Athena.
It's also not something that you would really use on an ad hoc basis like Athena.
It's much quicker to provision than on-premise data warehouse that you have to create yourself, but it does come with a provisioning time.
It's not something that should be used for ad hoc queries of large scale datasets on S3.
That's much more aligned to the functionality which Athena provides.
So for ad hoc queries, look towards Athena as a default rather than Redshift.
Now Redshift uses a cluster architecture and a really important architectural principle to understand is that the cluster is actually a private network.
You can't access most of the cluster directly.
Redshift runs with multiple nodes and high speed networking between those nodes.
And because of this, logically it runs in one availability zone.
So it's not highly available by design.
It's tied to a specific availability zone.
All Redshift clusters have a leader node and it's the leader node that you interact with.
The leader node manages communications with client programs and all communications with the compute nodes.
It develops execution plans to carry out complex queries.
So specifically about the compute nodes, these run the queries which were assigned by the leader node and they store the data loaded into the system.
A compute node is partitioned into slices.
Each slice is allocated a portion of the node's memory and disk space where it processes a portion of the workload assigned to that node.
The leader node manages distributing data to the slices and apportions the workload for any queries or other operations onto the slices.
The slices then work in parallel to complete the operation.
Now a node might have two, four, sixteen or thirty-two slices.
It depends on the resource capacity of that node.
Redshift is a VPC service and so all parts of the system can be managed as you would expect.
So this includes VPC security controls, IAM for permissions, KMS for at rest encryption of the data and CloudWatch can also be used for monitoring of the platform.
Now because of the way Redshift is architected, it has a feature called Enhanced VPC Routing.
By default Redshift uses public routes for traffic when communicating with external services or any AWS services such as S3 when it's loading in data.
Now if you enable Enhanced VPC Routing then traffic is routed based on your VPC network and configuration.
This means it can be controlled by security groups, network access control lists, it can use custom DNS and it will require the use of any VPC gateways that any other type of traffic requires.
So internet gateways, NAT gateways or any other VPC endpoints to reach AWS and external services.
So if you want advanced networking control then you need to enable Enhanced VPC Routing.
Now that's a good one to remember for the exam.
If you do have any customized networking requirements then you do need to enable Enhanced VPC Routing.
I just wanted to stress that again because it really is a critical point to remember for the exam.
Now let's look at the architecture of Redshift visually before we finish with this lesson.
So we start with a VPC and in there is a subnet in one availability zone.
And let's say inside that we create a Redshift cluster.
Now in the cluster there's always a leader node and it's to this leader node that anything outside of the cluster connects to in order to interact with the cluster.
So things like management applications or work benches or visualization.
All of this interacts with the Redshift cluster via the leader node using ODBC or JDBC style connections.
Inside the cluster are the compute nodes and the slices on those nodes as well as the storage attached to each of the slices in the compute nodes.
Because Redshift is located in one availability zone there are a few ways that Redshift attempts to secure your data.
First data is replicated to one additional node when written.
That way the system can cope with localized hardware failure.
Additionally automatic backups happen to S3 by default around every 8 hours or every 5 GB of data written to the cluster.
This happens automatically and has a configurable retention period.
Now these backups occur to S3 and so you now have data stored across availability zones at this point.
So the data with the Redshift cluster at this point is at least resilient against the failure of an availability zone.
Additionally you can create manual snapshots also stored on S3 but you have to manage the retention yourself as an administrator.
You can also configure snapshots to be automatically copied across AWS regions which provides you with global resilience and an effective way to spin up a Redshift cluster in another region if you have a DR event.
In terms of getting data into Redshift you have a few options.
You could load that data from S3 or you could copy data in from products such as DynamoDB.
You can migrate data into Redshift using the database migration service from other data sources and even AWS products such as Kinesis Firehose can stream data into Redshift.
Now for the exam the really important bits to understand are which products can be integrated, how Redshift fits into architectures, so what it can be used for and how to design a Redshift implementation.
So I've covered most of the important architectural points that you'll need for the exam within this lesson.
Now in the next lesson I want to talk in a little bit more detail about Redshift backups.
It won't be a long lesson but I think it will be worthwhile doing because it will have benefits for the exam.
Now that's everything I wanted to cover in this lesson.
Go ahead and complete the lesson and then when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover the Elastocache product.
Now this is one which features relatively often in all of the associate AWS exams and fairly often at a professional level.
It's a product that you'll need to understand if you're delivering high performance applications.
It's one of a small number of products which allows your applications to scale to truly high-end levels of performance.
So let's jump in and take a look.
So what is Elastocache?
Well at a high level it's an in-memory database for applications which have high-end performance requirements.
If you think about RDS, that's a managed product which delivers database servers as a service.
Databases generally store data persistently on disk.
Because of this they have a certain level of performance.
No matter how fast the disk is, it's always going to be subject to performance limits.
An in-memory cache holds data in memory which is orders of magnitude faster, both in terms of throughput and latency.
But it's not persistent and so it's used for temporary data.
Elastocache provides two different engines, Redis and Memcache D, and both of them are delivered as a service.
Now in terms of what you'd use the product for, well if you need to cache data for workloads which are read heavy, so read heavy being the key term that you need to remember at this point, or if you have low latency requirements, then using Elastocache is a viable option.
For read heavy workloads, Elastocache can reduce the workloads on a database.
And this is important because databases aren't the best at scaling, especially relational databases.
Now databases are also expensive relative to the data that they store and the performance that they deliver.
So for heavy reads, offloading these to a cache can really help reduce costs.
So it's cost effective.
Remember that for the exam.
Elastocache can also be used as a place to store session data for users of your application, which can help to make your application servers stateless.
Now this is used in most highly available and elastic environments, so those that use load balancers and auto scaling groups.
But for any systems which need to be fault tolerant, where users can't notice if components fail, then generally everything needs to be stateless and so Elastocache can help with this type of architecture.
Now one really important thing to understand for the exam is that using Elastocache means that you need to make application changes.
It's not something that you can just use.
Your application needs to understand a caching architecture.
Your application needs to know to use a cache to check for data.
If data isn't in the cache, then it needs to check the underlying database.
And applications need to be able to write data and understand cache invalidation.
This functionality doesn't come for free and so if you're answering any exam questions which state no application changes, then Elastocache probably won't be a suitable solution.
So let's have a look visually at how some of these architectures work.
Architecturally let's say that you have an application.
Obviously the Categorum application.
And this application is being accessed by a customer.
In this case Bob.
The application uses Aurora as its back-end database engine and it's been adjusted to use Elastocache.
Now the first time that Bob queries the application, the Categorum application will check the cache for any data.
It won't be there though because it's the first time it's been accessed and so this will be a cache miss.
Which means the application will need to go to the database for the data which is slower and more expensive.
Now when it's accessed this data for the first time, the application will write the data it's just queried the database for into the cache.
If Bob queries the same data again, then it will be retrieved directly from Elastocache and no database reads are required.
And this is known as a cache hit.
It will be faster and cheaper because the database won't be used for the query.
Now with this small scale interaction, it's hard to see the immediate architectural benefit of using Elastocache.
But what if there are more users?
What if instead of one Bob, we have many Bob's?
Assuming the patterns of data access are the same or similar, then we'll have a lot more cache hits and a much smaller increase in the number of database reads.
This will allow us to scale our application and accept more customers.
If the application data access patterns of our user base is similar at scale, then it will mean that most of the increase load placed on the application will go directly onto Elastocache.
And we won't have a proportional increase in the number of direct accesses against our database.
And this will allow us to scale the architecture in a much more cost effective way than if everything used direct database access.
So we can scale the application in a much more cost effective way.
We can deliver much higher levels of read workload and we can offer performance improvements at scale.
So this is a caching architecture and this is a very typical architecture that Elastocache will be used for.
Let's take a look at another and this is using the product to help us with session state data for our users.
So let's say again that we're looking at our Categorum application, but now it's running within an auto scaling group with three EC2 instances and a load balancer.
And it's using Aurora for the persistent data layer.
Now again, we have a user of our application, so Bob.
And this application I'm demoing in this part of the lesson is actually the fault tolerant extreme addition of Categorum.
So even when components of the system fail, the application can continue operating without disrupting our user Bob.
Now the way that it does this is to use Elastocache to store session data.
This means that when Bob first connects to any of the application instances, his session data is written by that instance into Elastocache.
And it's kept updated if Bob purchases any limited edition cat prints.
So the first time Bob connects to any of the instances, that instance writes the session data and keeps the session data updated for Bob using Elastocache.
Now if our application at any point needs to deal with the failure of an instance, where previously the session data would be lost and the application functionality disrupted, the Categorum extreme addition can tolerate this.
If this occurs with Categorum extreme addition, then Bob's connection is moved to another instance by the load balancer.
And Bob's experience continues uninterrupted because the session data is loaded by the new instance from Elastocache.
Now this is another common use case for Elastocache, storing user session data externally to application instances, allowing the application to be built in a stateless way, which in turn allows it to go beyond simple high availability and move towards being a fault tolerant application.
Elastocache commonly helps with either read heavy performance improvements or cost reductions or session state storage for users.
But what's also important for the exam is that Elastocache actually provides access to two different engines, Redis and Memcache D.
And it's important that you understand the differences between these two engines at a high level.
So let's look at that next.
Let's look at these differences.
So the differences between Memcache D and Redis.
Both engines offer sub millisecond access to data.
They both support a range of programming languages.
So no matter what your application uses, you can use both of these engines.
But where we start to diverge is on the data structures that each of the products support.
So Memcache D supports simple data structures only.
So strings, whereas Redis supports much more advanced types of data.
So Redis can support lists, sets, sorted sets, hashes, bit arrays, and many more.
So an example could be that an application could use Redis to store data related to a game leaderboard.
And this keeps a list sorted by their rank on that game.
So as well as storing the actual data, Redis can help you by storing the order of this data.
And this can significantly improve the performance of your applications.
Now, another difference is that Redis supports replication of data across multiple availability zones.
So it's highly available by design.
And that can be used to scale reads by using those replicas.
Memcache D doesn't support replication in that way.
You can create multiple nodes which can be used to manually shard your data.
So storing maybe certain usernames in one node and others in another.
But Redis is the one that supports true replication across instances for scalability reasons.
So in the exam, if you face any questions which ask about multi-availability zone or other forms of high availability or resilience, then you should look at selecting Redis as a possible answer.
Now, additionally, from a recoverability perspective, Redis also supports backup and restores, which means that a cache can be restored to a previous state after a failure.
Memcache D doesn't support that.
It doesn't support persistence.
And so if an exam question is asking about recovery of a cache without any impact to the data that's stored in that cache, then you definitely should look at using Redis rather than Memcache D.
Now, Memcache D does have an advantage in that it's multi-threaded by design.
And so it can take better advantage of multi-core CPUs.
It can offer significantly more in terms of performance when using multi-core CPUs.
A notable Redis only feature is transactions.
And this is where you treat multiple operations as one.
So either all of the operations work or non-work at all.
And this can be useful if your application has more strict consistency requirements.
This is one situation where you would select to use Redis versus Memcache D.
Now, both of these engine types can use a range of instance types and instance sizes.
And I've included a link in the lesson description, which gives you an overview of all of the different resources that can be provided to both of these different caching engines.
Now, you don't need to know in detail for the exam, but just be aware architecturally that instance types and sizes which offer larger amounts of memory or faster memory types are obviously going to give you an advantage when it comes to running ElastiCache.
Now, for the exam, you don't need to be aware of all of the different detail.
Just be aware of the types of architectures which would benefit from an in-memory cache.
So anything that has read heavy workloads, where you need to reduce the cost of accessing databases, where you need submilisecond access to data, or where you need to store user session state data in an external way, not using EC2 instances.
So all of those are architectural scenarios where you might want to look at using an in-memory cache.
Now, be aware, it doesn't come for free.
You do need to make application changes.
And so this isn't the type of solution that you can implement if one of your requirements is that you can't make any code changes to your application.
With that being said, though, that's everything that I wanted to cover from a theory perspective in this lesson.
Go ahead, complete the lesson, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about Amazon Athena.
This product is one of those hidden gems available within AWS which are really valuable as long as you understand the features that it provides.
So let's quickly jump in and explore the architecture.
So what is Athena?
Well, it's a serverless interactive querying service.
Put simply, it means that you can take data stored within S3 and perform ad hoc queries on that data, paying for only the amount of data consumed while running the query and the storage used within S3 to store the original data.
It has no base monthly cost, no per minute or per hour charges, you just pay for the data consumed.
Now what's really special about Athena is how it handles the structured, semi-structured and even unstructured data that it uses.
Athena uses a process called schema on read.
And the way that I want you to imagine this is like a window or a lens through which you see the data in a certain way but where the original data is unchanged.
Your original data stored on S3 is never changed.
It remains on S3 in its original form.
The schema which you define in advance modifies data in flight as it's read through the schema.
So it translates the original unmodified source data into a table-like structure as it's read through the schema.
As you query the data, the original data is read, left unmodified and the translation only happens within the product during the querying process.
I can't stress this enough, the original data is maintained in its unmodified state within S3.
Normally with databases you create tables and you have to load data into those tables.
The data needs to be in the format of the tables or you need to perform ETL processors which stands for extract, transform and load.
With Athena this isn't required.
You define how you want the data to look in the form of a schema and in a non-modifying way data is loaded through this on the fly.
And then any output can be sent to other AWS services.
Now let's look at this visually because it's going to be easier to understand if you see the architecture.
So Athena starts with the source data which is stored on S3 and conceptually this is read only.
It's never modified.
Now the product supports a wide range of data formats and this is growing all of the time.
Some examples include XML, JSON, comma and tab separated values, Avro, Parquet, Ork and even custom application log formats such as Apache and AWS services such as CloudTrail, VPC Flow Logs and more.
So this data on S3 is fixed.
It doesn't get changed and that's probably one of the services most fundamental concepts that you need to understand.
And it's why I've repeated it probably 10 times already in this lesson.
So inside the product you create a schema and in this schema you're essentially defining tables.
These tables define how to get from the format of the original source data to a table like structure.
So unlike a traditional database where a table is the final structure, in Athena you're defining a way to take the original data and present it in a way that you want which allows you to run queries against these tables.
It's almost like a recipe.
You're defining how to convert from ingredients to a final meal.
It's a method to get from the source data to the structure that you want to be able to query.
So these tables within Athena don't actually contain data like a traditional database product.
They contain information, directives on how to convert the source data to be able to query on it.
So this schema is used at the time of querying when data is read and this is why it's called schema on read.
The data is conceptually streamed through the schema while being queried so it can be queried in a relational style way using normal SQL like queries.
And the output can be displayed on the console, saved or output to other AWS tools.
And all the time for this whole process there's no base or constant cost.
You just pay for the amount of data consumed by the query and you can even optimize the original data set to reduce the amount of data that has to be used for individual queries.
The key thing to understand about Athena going into the exam is that it has no infrastructure.
You don't need to think about setting up any database infrastructure in advance.
You don't need to think about data manipulation in advance and you don't need to load data in advance.
Keep those things in mind when going into the exam.
They will help inform you when Athena is the right choice and when it's not.
So Athena is great in situations where loading or transforming of data isn't desired.
Where you have data already on S3 in a source or raw format and you need to query it without doing any loading or transformation.
In the demo lesson which is coming up next you'll see an example of using a large data set, the open street map data.
If you needed to load and transform that prior to use it would massively reduce its utility.
A benefit of Athena is how you don't need to do any loading or transformation of data in advance.
And this makes it ideal for ad hoc or occasional queries of data in S3.
Why?
Well because you don't need any servers running in advance and you don't need to think in advance about loading or transformation.
You have a business need and immediately run a query.
Athena is also useful if you're a cost-conscious business.
It's great because it's servilous.
You pay for any data read as part of a query.
There are no base costs and no upfront costs.
Again, think ad hoc, sporadic and cost effective.
Athena is also the preferred solution especially in the exam for any queries which involve AWS service logs because it has native support of VPC flow logs, cloud trail logs, elastic load balancer logs, cost reports and much more.
And it can also query data from the Glue Data Catalog and supports web server logs.
And again, these are keywords to look for in the exam.
A newer feature of Athena is called Athena Federated Query.
Now be really careful with this one because I don't want you being confused.
For most situations if you see SQL mentioned or no SQL mentioned or any specific database product then the answer to that question is likely not to be Athena.
But Athena now has the capability to query non-S3 data sources.
Athena uses data source connectors that run on AWS Lambda to perform federated queries.
So a data source connector is basically a piece of code that can translate between a target data source which isn't S3 and Athena.
So you can think of a connector as almost like an extension to Athena's querying engine.
So you've got pre-built connectors which exist for data sources like cloud watch logs, DynamoDB, DocumentDB, Amazon RDS and even other JDBC compliant relational data sources such as MySQL, Postgres and many more.
So Athena Federated Query really is a feature which is going to massively improve the utility of the product.
Now that's all of the theory that you need to be aware of for the product as well as some of the key use cases that you might see in the exam.
So at this point go ahead, finish this lesson and then when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about DynamoDB TTL which stands for Time to Live.
It's a pretty simple concept to understand fully but really powerful and it's one that you'll need to understand for the exam and in the real world.
So let's jump in and I'll try to make this one as brief as possible.
Let's say within DynamoDB that we have this table.
I'm only showing three items but let's assume that the table has 3 million.
It has a partition key in blue on the left and a sort key in green next to that.
And in addition to that it has three attributes, yellow in the middle of the table, then red and then pink.
Now for that table there are going to be many partitions which support the operations on that table and here I'm just showing two.
One partition which stores any items which use the dark blue partition key value and another one which stores currently the two items with a light blue partition key value.
Now the TTL feature lets you define a time stamp which allows items in a table to be automatically deleted at a certain point in time.
You're defining when an item is no longer required so per item you specify a date and a time and at that point the item is marked as expired and then deleted.
Now let's step through how this works.
First on a table you need to enable TTL and pick an attribute to be used for the TTL processors.
The attribute should then contain a number.
It's a value in seconds, the seconds since the epoch which is the first of January 1970.
So if you want an item to expire in one week's time you need to find out how many seconds since that date one week from now is and then put that into the attribute that you pick for TTL.
And you should do that on all items which you want to be affected by the TTL processors.
So when you configure TTL on a table what it actually does is it configures automated processors which run on every partition of that table.
The first process periodically scans all items in a partition comparing the value in the TTL attribute with the current date and time in epoch format.
In effect it's checking if the item should still be viewed as valid.
Where any value in the TTL attribute is less than the current date and time so when the item is no longer valid that item is set to expired.
They aren't deleted yet, they can still be queried and viewed in the table, they're just expired.
Next another automated process which runs periodically and which is independent of the initial one this runs also on all of these partitions.
This time it's looking for any items which are set as expired and when it finds any items these items are deleted from the partitions meaning they're removed from the table.
They're also removed from any indexes and a delete is also added to any streams configured on the table.
Now these delete operations are system events they run in the background and don't cause any performance issues nor do they have a charge.
You can also in addition configure a stream on the actual TTL processors and this creates a 24 hour window of any deletions which are caused by those TTL processors.
And this is useful if you want to have any housekeeping processors where you track the TTL events which occur on tables.
So potentially things like an undelete function or some kind of table auditing.
So this is key to understand delete events are placed into a normal table stream along with any creates or modifies but in addition you can create a dedicated stream which is linked to the TTL processors.
So you can get a 24 hour rolling window of just those deletes.
So TTL in summary allows you to define a per item time stamp which determines when an item is no longer needed.
It's useful if you store items which lose relevance after a specific time.
For example removing user or sensor data after one year of inactivity in an application or retaining sensitive data for a certain amount of time.
Maybe you have regulatory or contractual obligations which mean you need to retain data for say one year three years or five years.
Then you can configure those within a TTL attribute and DynamoDB will automatically remove items once they're no longer relevant.
Now that's everything I wanted to cover in this lesson so go ahead and complete the lesson and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about the DynamoDB accelerator known as DAX and DAX is an in-memory cache for DynamoDB which substantially improves performance but unlike other caches it's directly integrated with the product itself so it's really easy for an architect to implement without lots of additional planning work so let's jump in and take a look.
Now before I focus on DAX specifically I want to spend a few minutes contrasting how an example flow works using DAX versus a traditional in-memory cache so let's assume that we have an application which uses DynamoDB for its persistent data storage so on the top we've got the traditional in-memory cache and on the bottom we've got DAX so the flow using the generic in-memory cache goes something like this first the application needs to access some data and so it checks the cache now if the cache doesn't have the data this is known as a cache miss and if this happens the application then loads the data directly from the database once it has the data which takes longer than getting it from the cache directly it updates the cache with the new data and then any subsequent reads from that point forward will directly load the data from the cache which is called a cache hit and this will be faster.
Now the problem with this architecture is the lack of integration between the cache and the database let's contrast this with DAX and when using DAX there's extra software involved and this is installed on the application instance it's the DAX SDK or software development kit and this takes away the admin overhead from the application because now DAX and DynamoDB are one and the same from the applications perspective the application makes a single call requesting the data and this is handled by DAX if DAX has the data if it's a cache hit and the data is returned directly if not then DAX handles the rest so it goes to DynamoDB to retrieve the data it gets the data adds it back into its cache and then returns that data to the client and the benefit of this method is that it's one set of API calls using one software development kit.
DAX is designed for DynamoDB and so it's tightly integrated with it it's much less admin overhead than using a generic cache so by using DAX and by integrating all of the different API calls into one SDK it makes it really easy to implement caching into your application.
So now that we know the difference between DAX and generic caches let's focus now on exactly how the DynamoDB accelerator is architected.
Now DAX operates from within a VPC and it's designed to be deployed into multiple availability zones in that VPC so like many VPC based computers and services you need to deploy it's across availability zones to ensure that it's highly available.
Now DAX is a cluster service where nodes are placed into different availability zones there's a primary node which is the read and write node and this replicates out to other nodes which are the replica nodes and these function as read replicas.
So with this architecture we have an EC2 instance running an application and the DAX SDK and this will communicate with the cluster and at the other side the cluster communicates with DynamoDB.
Now DAX actually maintains two different caches first is the item cache and this caches individual items which are retrieved via the get item or batch get item operations so these operate on single items and you need to specify an items partition key and if present it's sort key so the item cache just holds items which are directly retrieved in this way.
It also has the query cache which stores collection of items retrieved via query or scan operations but crucially it also stores the parameters used in that original query or scan so it links the parameters that are supplied to that operation with the data that's been returned so it means that whole query or scan operations can be rerun and return the same cached data.
Now DAX is accessed architecturally much like RDS every DAX cluster has an endpoint which is used to load balance across the cluster.
If data is retrieved from DAX directly then it's called a cache hit and the results can be returned in microseconds.
You might get a response back typically in say 400 microseconds.
Any cache misses so when DAX has to consult DynamoDB these are generally returned in single digit milliseconds.
Now in writing data to DynamoDB DAX can use write through caching so the data is written into DAX at the same time as being written into the database.
If a cache miss occurs while reading the data is also written to the primary node of the cluster as the data is retrieved and then it's replicated from the primary node to the replica nodes.
So DAX is a really efficient way of interacting with DynamoDB because architecturally it actually abstracts away from DynamoDB you think you're interacting with a single product using a single set of APIs but behind the scenes DAX is handling the process improvement that comes from caching and that's both for read and write operations.
Now before we finish up I just want to step through some important facts and considerations that you'll need for the exam.
So DAX is a cluster you've got the primary node which supports write operations and replicas which support read operations.
Nodes are designed to implement high availability so if you implement multiple nodes and one of the nodes fails for example the primary node it will have an election and fail over to one of the replicas which will become the new primary node.
Now DAX is an in-memory cache and so it allows for much faster read operations and significantly reduced costs.
If you're performing the same set of read operations on the same set of data over and over again then you can achieve substantial performance improvements by implementing DAX and caching those results and in addition because you're not having to constantly go back to DynamoDB time after time for the same data then you generally do achieve significant cost reductions.
In terms of scaling with DAX you can either scale up or scale out so this means using bigger DAX instances or adding additional instances.
So you can scale in both directions up and out.
Now unlike a lot of caches DAX does support write through which means that if you do write some data to DynamoDB you can write it using the DAX SDK.
DAX will handle that data being committed to DynamoDB and also storing that data inside the cache.
Architecturally you do need to know that while DynamoDB is a public AWS service DAX is not it's deployed inside a VPC and so logically any application which is using DAX will also need to be deployed into that VPC.
So DAX is an in-memory cache and it's designed to reduce the response time of read operations by an order of magnitude taking you down from single digit milliseconds using DynamoDB natively all the way through to microseconds if you use DAX.
So architecturally if you're reading the same data over and over again then you need to look at whether you should be using an in-memory cache.
Now choosing between DAX and a generic in-memory cache this comes down to how much admin overhead you want to manage because DAX is integrated with DynamoDB because it supports this single SDK for accessing the cache and DynamoDB both together as an abstract entity it's a lot less workload if you are using DynamoDB on its own to implement DAX rather than a generic cache.
So if you've got read heavy or bursty workloads then DAX provides increased throughput and potentially significant cost reductions.
So if you find yourself having to apply large RCU values onto a table these can get expensive really quickly and you might be better implementing DAX which will get you better performance and remove that additional cost.
So if you find yourself having performance issues during sale periods or have specific tables or items in a table where there are heavy read workloads against that area of data then you can consider using DAX.
If you've got a workload which is very read heavy with the same set of data again and again being read then you can consider using DAX.
If you've got a type of data layout where a certain type is used more frequently than everything else maybe time series then again you can consider using DAX.
If you really care about incredibly low response times again that's another situation where DAX could be advantageous.
Now situations where DAX is not ideal or any applications that require strongly consistent reads.
If your application cannot tolerate eventual consistency then DAX is not going to be suitable.
If you don't have an application that is latency sensitive if you don't need these really low response times again DAX might not be the right solution.
If your application is right heavy and very infrequently uses read operations then again DAX is probably not the right solution.
So generally if you see any questions in the exam which talk about a caching requirement with DynamoDB then you should by default assume it is DAX and only move away from that assumption if you see significant evidence to suggest something else.
With that being said that is everything I wanted to cover inside this lesson so go ahead complete this video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about DynamoDB global tables, something which will form part of your toolkit as a solutions architect for any global database deployments.
Now this lesson will be entirely based around architectural theory so let's jump in and get started.
Global tables aren't actually that complex, they provide multi-master replication meaning no single table is viewed as the master and others replicas instead all tables are the same.
It's global and allows for read and write replication between all tables that are part of a global table.
To implement global tables you create tables in multiple AWS regions and then on one of the tables it doesn't matter which you configure the links between all of the tables.
This creates a global table and sets DynamoDB to configure replication between all of the table replicas.
So tables become table replicas of a global table.
So think of a global table as an entity by itself and supporting that global table are individual DynamoDB tables in different AWS regions configured for multi-master replication.
Now between the tables DynamoDB utilizes last right-of-wins for conflict resolution because it's simple and generally it generates entirely predictable outcomes.
So in the event that you've got the same piece of data being written to two different tables at around the same time then DynamoDB will pick the most recent right and it will replicate that to all of the other replica tables that are part of the global table.
So whichever is the most recent will overwrite everything else.
Now because it's multi-master it means that you can read and write to any region and the updates are replicated to all other regions generally within a second.
Now it's really fast and that matters because it means that it can be used for more demanding applications.
In terms of consistency you can perform strongly consistent reads in the same region as data is written to but for anything else it's always eventual consistency.
The replication between tables is asynchronous and so if you have a global application it needs to be able to tolerate eventual consistency.
If you have a global table and one of the replica tables is in the US and one of them is in London if you're writing to one and reading from the other it will always be eventual consistency and so your application needs to take that into account.
If it can't cope with eventual consistency then global tables are going to be problematic.
Now to use global tables you first need to select the AWS regions which will be part of that global table.
For example we could use one of the US regions AP Southeast 2 in Australia and the London region.
Then in each of those regions you'd create a Dynamo DB table.
You would select one of the tables it doesn't matter which but let's use the US in this example and from that table we would add all of the tables into the global table configuration and this would establish the multi master replication between all three tables.
So all three tables can support reads and writes.
Now specifically for the exam you don't really need to be aware of the implementation details.
For the exam the architecture is what matters so be aware that replication is generally sub second.
This depends somewhat on the load on each of the different regions but generally it does occur within a second.
Now globally as I mentioned moments ago it's eventually consistent but you can do eventual or strongly consistent reads as long as it's in the same region.
The replication is multi master which means that all regions can be used for both read and write operations and finally in terms of what this architecture supports well if you want to implement a globally highly available application or you want to improve the global data performance of an application or you want to add global disaster recovery or business continuity capability to your application then global tables can support all of those requirements.
It's a feature which is really simple to use but it's highly effective and as long as you're aware that the conflict resolution is last right or wins and your application is able to tolerate that then you can use the feature to support a global data layer for your application it's a really good feature to implement as long as your application can tolerate all of the requirements.
So at this point go ahead complete this video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I'm going to be talking about DynamoDB, Streams and Triggers.
Both of these are really powerful features which let you implement some powerful and cost-effective architectures using DynamoDB within AWS.
Now we've got a fair bit to cover so let's jump in and get started.
A DynamoDB Stream is a time-ordered list of changes to items inside a DynamoDB table.
So every time a change occurs to an item in a table, a change is recorded chronologically within a DynamoDB stream.
And a stream is actually a 24-hour rolling window of these changes.
Behind the scenes it actually uses Kinesis Streams which you've covered earlier in the course.
Now you need to enable streams on a per-table basis and when you do enable streams on a table, any inserts of items, updates on items or item deletions are recorded within the stream.
Now you can configure the stream with different views and this is an option on a per-stream basis and the view setting influences exactly what information is added to the stream every time an item change occurs.
Now let's look at this visually because it's much easier to understand.
Let's say that we have a table within DynamoDB and this table has one item, the top item.
And let's say that that item is updated, changed to be the bottom item.
So this represents one item which has changed and the way that it's changed is by having its fourth attribute removed, the one in dark orange.
Now if we've enabled streams on this table then any inserts, updates or deletes are recorded including the one update that I've just discussed.
There are four view types that a stream can be configured with.
It can be configured with keys only, new image, old image and new and old images.
So that's four different options for different view types that a stream can be configured with.
Now given this change to the table on the top left so the removal of its fourth attribute, the one in dark orange, these different view types have the following differences.
So with keys only as the name suggests, the stream will only record the partition key and optionally any applicable sort key value for the item which has changed.
It would be up to whatever is interrogating the stream to determine exactly what has changed and that would probably require a database read.
So in this example where we've removed the fourth attribute, the one in dark orange, the only information that we get using keys only is that the partition key is blue and the sort key is green.
We don't see exactly how this item has been manipulated.
The second view type is new image and this actually stores the entire item with the state as it was after the change.
So if you wanted to configure some sort of business process which operated on the new updated data, then you would configure this view type.
So this view type shows the state of the item after the change.
So after the removal of the fourth dark orange attribute.
If you wanted to know exactly what had changed, then you couldn't determine that using this view type.
Now if you wanted to know what changed, then one option would be to use the old image type.
That way you have a copy of the data as it was before the change and you could check the state of the database, specifically the item in this table, to see exactly what updates had occurred on that data.
So by comparing the old image item inside the stream to the item in the database table, you could calculate exactly what had changed.
Another option to be able to determine exactly what has changed is the final view type which is new and old images.
So if you have a business process which needs complete visibility of the change, so before and after and have this visibility independent of the actual table, then you can select new and old images.
And with this view type, the actual entry in the stream stores both the pre change and the post change state of that item.
And that way you can determine the difference without having to consult the database table itself.
Now it's worth noting that all of these types of views work as well with inserted or deleted items.
But in some cases, the pre or post change state might be empty.
So if you delete an item and you're using the new and old images, then obviously your new state will be completely blank.
And if you're inserting an item and use that same view type, then your old state will be blank and your new state will contain the data that you've inserted.
Now streams are powerful in isolation, but where the real power comes from is that streams are actually the foundation for an architecture called database triggers.
So these allow for actions to take place in the event of a change in data.
So this is an event driven architecture that can respond to any data changes in a table.
Now databases like Oracle have supported these for years, but with DynamoDB, they can be implemented in a serverless way.
Now the architectures of triggers simply put is that an item change inside a database table generates an event.
That event contains the data that's changed and the exact data depends on the view type.
When an item is added to a stream, so when an event is generated, then an action is taken using that data.
And the way that this action is implemented within AWS is to combine the capabilities of streams and Lambda.
So you can use streams and Lambda so that a Lambda function is invoked whenever changes occur to a DynamoDB table.
And so you'll use Lambda to perform some type of compute operation in response to a data change that causes a stream event.
And the Lambda function invocation is actually in this architecture, the trigger.
So it's the compute action that occurs based on data change.
So streams and triggers are actually a really powerful architecture.
You might see them used in reporting or analytics scenarios.
If you want to generate a report in the event of a change of a certain item in a database, maybe stock level changes, or if you want to report on popular items in a stock database, then potentially you can use streams and triggers.
They're also really useful for things like data aggregation.
So if stock levels are being manipulated or if a voting app is recording votes and you want to perform some kind of aggregation or tally of all those votes, then potentially you can use streams and triggers.
You can also use it for messaging or notifications.
For example, if you have a messaging application which allows you to create a group chat, you might use a DynamoDB table to store the chat items that are added to that group.
And you could use streams and triggers to send push notifications to all members of that group chat.
So rather than having to poll databases which consumes compute resources even if nothing happens, if you're using streams and triggers, you can respond to an event as it happens and only consume the minimum amount of compute required to perform that action.
Streams and triggers are really used for lots of different things inside architectures, but for the associate level solutions architect exam, you just need to know that you use streams and lambda together to implement a trigger architecture for DynamoDB.
So visually this is how triggers are implemented.
We've got a table, an item change occurs within a table which has streams enabled.
So a stream event is placed onto a DynamoDB stream.
We've selected to use the new and old images type.
So we've got both the new and the old state of that item.
And based on that, that gets sent as an event to a lambda function and that lambda function can perform some compute based on the pre and post change states of that item.
So it's not a hugely complicated architecture, but it is really powerful.
So enable streams on a table, configure a lambda function to invoke whenever a change occurs and you've got a really cost effective and powerful serverless implementation of a trigger architecture.
And that's what you'll need to understand for the exam.
Now with that being said, that's everything that I wanted to cover in this lesson. go ahead and complete the video and then when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about DynamoDB indexes and there are two types, local secondary indexes known as LSIs and global secondary indexes known as GSIs.
Now we've got a lot to cover so let's jump in and get started.
Indexes are a way to improve the efficiency of data retrieval operations within DynamoDB.
We've already talked about how query is the most efficient operation within DynamoDB but it suffers from one crucial limitation that it can only work on one partition key value at a time and optionally a single or a range of sort key values.
Indexes are a way that you can provide an alternative view on data that's inside a base table.
By providing an alternative view you can allow the query operation to work in ways that it couldn't otherwise and there are two types of indexes, local secondary indexes and global secondary indexes.
Now local secondary indexes allow you to create a view using a different sort key and global secondary indexes allow a view with a different partition and sort key.
And for both of those indexes when you're creating them you have the ability to choose which attributes from the base table are projected into them and choosing what to project is important because it can massively impact how efficient the indexes are for your queries.
Now first I want to focus on local secondary indexes.
So again it's an alternative view on base table data and local secondary indexes must be created with the base table itself.
So this is critical to know for the exam.
You cannot add local secondary indexes after the base table has been created.
And so while you're creating the base table you can optionally create a number of local secondary indexes and the current maximum is five local secondary indexes per base table.
So just to repeat this again because I want this committed to your memory, local secondary indexes allow for an alternative sort key on the data in the main table.
So it's an alternative sort key but the same partition key and local secondary indexes share the capacity with the main table.
And so they use the same RCU and WCU values if the main table is using provisioned capacity.
Now in terms of the attributes that are projected into a local secondary index the options that you have are to use all of the attributes for the base table.
You can choose keys only or you can use include which lets you specifically pick which attributes from the base table are projected into the index.
Now let's take a look at an example visually so that you can understand how this all fits together from an architecture perspective.
And the example that I want to use is a weather station.
So this table holds data for a number of weather stations and for each weather station there's one record taken each day at the same time.
Now if we want to stick to using the query operation then we're limited to querying a single weather station and for a single weather station to either a single day or a range of days.
But what if we want to interrogate data based on another attribute say for example the sunny day attribute.
So this attribute records whenever the average over a day is classified as a sunny day.
Now because this attribute isn't a key in order to perform any operations on it we couldn't use query because query needs to use a single partition key value and optionally select using the sort key we would need to use the scan operation.
And we know now from an earlier lesson that the scan operation is incredibly inefficient.
An option that we have to fix this problem is while creating this table we can also create an additional local secondary index using the sunny day attribute as the sort key.
So this needs to be created along with the base table so at the same time as you're creating the base table.
But if we choose to do that then for a given weather station we're able to easily limit the data that we retrieve to sunny days because we can use the query operation on a single station ID which is still the partition key but then use the query operation to limit specific values in the sort key in this example picking only sunny days.
Now what's even better is that indexes are known as sparse indexes and this means that only items from the base table which have a value for the attribute that we define as the new sort key are present in the index.
Now this means that if the sunny day attribute is something which is present if it's a sunny day and absent if it's not then the only items in the sunny day local secondary index will be for data which is a sunny day.
So we can in some cases take advantage of the fact that indexes are sparse and we can use a scan operation against this local secondary index knowing that we'll only consume the capacity for data that is relevant towards.
So we could use a scan operation on this local secondary index knowing that any items in this index are by default for sunny days and so we're only going to be consuming capacity that's relevant for sunny day data.
Now this is a lot more complex than you'll need for the solutions architect associate exam.
For the exam all you need to know is that local secondary indexes allow you to create an alternative view on the data that's in a base table by providing an alternative sort key.
They use the same partition key and they can only be created along with the base table.
So now let's look at a different type of index a global secondary index.
Global secondary indexes are different than local indexes they can be created at any time and so they're much more flexible.
There's also a default limit of 20 global secondary indexes per base table so you can have more than them and they let you define a different partition and sort key and they also have their own RCU and WCU capacity values if you're using provisioned capacity for the base table.
And just like with local secondary indexes you have the flexibility to choose exactly what attributes are projected into the index from the base table and you have the same options either choosing all attributes keys only or including specific attributes.
So let's look at another example visually to make it easier to understand.
So we're using another example of the weather station this time we have the pink attribute which represents any items where there's been an alarm at the same point as taking the data.
So an alarm could be something like a system error it could be a bird or other wildlife which has entered the weather station and interfered with the results or anything else that's out of the ordinary.
Now if there's a regular access pattern where you need to query this table for any items which have been impacted by the alarm then you couldn't do this normally using a query you'd need to use a scan operation and filter it based on the alarm attribute and this would mean that you're consuming all the capacity for every single item that is read using the scan operation.
Now an option that we have is to create a global secondary index and this allows us to create an alternative view with a different partition key and sort key.
In this example we've created a global secondary index which uses the alarm attribute as the partition key and the station ID as the sort key and this means that we can use the efficient query operation for any items showing an alarm and optionally specify one station ID or a range of station IDs to limit the data that we receive for alarms for specific weather stations.
So global secondary indexes are super powerful because of this ability to define completely separate partition and sort keys so it truly gives you a way to create an alternative perspective on the data that's in a base table and global secondary indexes are also sparse which means in this example any items which have no alarm attribute would not be included in the index.
For the exam you need to be comfortable with the fact that GSIs allow you to create this different perspective on data with alternative partition and sort keys and for GSIs you can create them after you've created the base table so they don't have that limitation of needing to be created at the same time as the base table which is the case for local secondary indexes.
Now one final thing to keep in mind global secondary indexes are always eventually consistent because the data is replicated from the base table to the index asynchronously and so your application needs to be able to handle eventual consistency.
If you're using a global secondary index you need to be able to cope with eventual consistency because that's the only option that you have.
Now before we finish up the lesson I just want to talk through some local and global secondary index considerations things that you should be aware of for the exam.
So you need to be very careful with what attributes you choose to project into the index.
As you now know when you're working with DynamoDB at any time you're reading or writing data you're actually consuming all the capacity for the size of the entire item and so if you project all of the attributes into an index you're also using all the capacity of those attributes so you need to be aware of the capacity that you're using when you project attributes.
Now the inverse of this is if you don't project a specific attribute and then you require that attribute when you're querying an index that will still work but it's doing something in the back end a fetch of that data which is actually incredibly inefficient.
So you need to plan your indexes in advance and make sure that you project the correct attributes because if you are performing queries on any attributes which are not projected then it gets really expensive.
Now AWS recommend using GSIs as default and only using local secondary indexes when strong consistency is required.
So if you need an index and you're in doubt you should use global secondary indexes because they're a lot more flexible and they can be created after the point of when you've created the base table.
Now from an architectural perspective and something to keep in mind if you do see any exam questions which mention indexes is that indexes are designed when you have data in a base table and remember you're designing the base table with the partition and sort keys for the primary way that you will access this data.
Indexes allow you to create this alternative perspective for any alternative access patterns and so if you have a requirement maybe a different team is looking at the weather station data only looking for alarms maybe it's a security team or a data science team then you can create indexes that allow for these alternative access patterns.
Indexes allow you to keep the data in one place but create these perspectives for different types of queries, different teams or different requirements they can all access the same data just using this different perspective.
So at this point go ahead finish the video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back.
This is part two of this lesson.
We're going to continue immediately from the end of part one.
So let's get started.
Now DynamoDB can operate using two different consistency modes for read operations.
It can be eventually consistent or it can be strongly or also known as immediately consistent.
Consistency refers to the process of how when data is updated, so when new data is written to the database and then immediately read, is that read data immediately the same as the recent update, or is it only eventually the same?
Eventual consistency is easier to implement from an underlying infrastructure perspective and it scales better.
Strong consistency is essential in some types of applications or some types of operations, but it's more costly to achieve and it scales less well than eventual consistency.
So let's look visually at exactly how this works.
With DynamoDB, every piece of data is replicated multiple times in separate availability zones, and each one of these points is called a storage node.
Out of these three storage nodes in three different availability zones, one of them is elected as the leader, so the leader storage node.
This is a dynamic process.
So if the leader ever fails, the election will happen again and a new one will be chosen.
So in this case, we've got a single item inside a DynamoDB table that's been replicated across three different storage nodes, one in availability zone A, one in availability zone B, and one in availability zone C.
All three have the same data, the same item with the same five attributes.
Now let's say in this example that Bob decides to update some data, it's this particular DynamoDB item, and he decides to remove the fourth attribute, the dark orange attribute.
So this is the change to the item, the fourth attribute is removed, and when writing this to DynamoDB, the product has a fleet of entities which route connections to the appropriate storage nodes.
Writes are always directed at the leader node, so it's this leader node which will receive the update to this item first.
So on the leader node, the item will be manipulated and the fourth attribute, the one in dark orange, will be removed.
At this point, the leader node is known as consistent.
It has the data on it which you just wrote.
That's why writes are more expensive in terms of capacity units.
Why a write capacity unit is less data than a read capacity unit, because writes always occur on the leader storage node.
And so these can't scale as well as reads.
When the leader node has the new data on it, it immediately starts the process of replication.
This process usually only takes milliseconds, but it does depend on the individual load which has been placed on these storage nodes, and it assumes the lack of any faults.
But let's assume though, for this example, that we have no faults on our storage nodes, and we've replicated this updated item to one additional storage node apart from the leader, so the one in availability zone C.
And right now we freeze time.
So the situation we have is that the leader storage node contains that updated item, as well as the storage node in availability zone C, but the storage node in availability zone A does not have the updated item.
It is not consistent.
So right now with time frozen, let's look at reads.
And there are two types of reads which are possible with DynamoDB, eventually consistent reads, and strongly consistent reads.
Now I mentioned earlier in this lesson how you can actually perform reads cheaper than having one RCU representing four kilobytes of data.
Now the reason for this is that one RCU is actually four kilobytes of data read from DynamoDB every second, but that's using strongly consistent reads.
Eventually consistent reads are actually half the price.
So you can read double the amount of data for the same number of RCUs.
So let's say that Julie on the top right performs an eventually consistent read of the data.
When Julie performs that operation and uses eventual consistency, then DynamoDB directs her at one of the three storage nodes at random.
In most cases, all three storage nodes will have the same data.
And so there's little difference between eventual and strongly consistent reads.
But in this particular edge case, if DynamoDB sent her request at the top storage node, then she would get stale data.
Replication occurs in milliseconds, but with eventual consistency, it isn't always guaranteed that you will get the latest data.
And in exchange, because eventually consistent reads scale better because any of the individual nodes can be used, you actually get a price reduction.
It's 50% of the price for strongly consistent reads.
So you get twice as much reads for each individual read capacity unit.
In most cases, you will notice no difference, but you need to be aware that there is a small potential that with eventually consistent reads, you might be reading older versions of data.
If you access DynamoDB at exactly the wrong time, it is possible that you might get outdated data.
Now, in contrast to strongly consistent read, always uses the leader node.
It's always consistent, but because it mandates the use of one particular storage node, the leader node, it's less scalable, and so it costs the normal amount of RCU to perform.
So that's why eventually consistent reads are less cost than strongly consistent.
But one very important thing to keep in mind is that not every application or access type can tolerate eventual consistency.
You need to pick the correct model.
If you have a stock database where the stock level is important, or if you're performing medical examinations and the data that's being logged into DynamoDB is critical and you always need the most recent version, then you need to use strongly consistent reads.
If your application can tolerate a potential lag and the small chance of outdated data, then you can achieve significant cost savings by using eventually consistent reads.
Now, let's just talk through some calculations of how you can actually determine appropriate values for capacity on a table.
Let's look at a scenario and let's assume that you need to store 10 items in DynamoDB every second.
So you have 10 devices that are logging data into DynamoDB and on average, they store data once every second.
So you've got 10 writes per second.
Now, with any type of scenario, you need to determine a number of really important things.
You need to understand the average size of an item that's being written to DynamoDB, and you need to understand how many items per second will be written.
Now, in some cases, exam questions might try to trick you and say that 60 items will be written per minute, and if you see that type of question, you need to try and calculate how many per second because that's what read and write capacity units use.
Now, once you've got both of those pieces of information, you can calculate the WCU required per item.
So to calculate that, you take your item size, in this example, 2.5K, and you divide it by the size of 1 WCU, which is 1K.
That gives us 2.5, and then we need to round that up to the next highest whole number, which in this case is three.
So we know that for a 2.5K average item size, we're going to consume 3 WCU.
And then we need to understand how many of those occur every second, and so we need to multiply that value by the average number of writes per second.
So we know that we're going to store 10 items per second.
We now know that the WCU required per item is three, and based on that, we can multiply those together to get the required WCU, which in this example is 30.
So it's the same calculation every time.
Work out the size of an individual item right in WCU, multiply that by the number of writes per second, and that will give you the WCU setting that you require on the table.
Remember, a WCU is 1K in size.
Now flipping that round, let's look at reads.
If we have a similar example, and we need to retrieve 10 items per second from our database, and we know that the size of an average item is 2.5K, the first thing we need to do is to calculate the RCU that's required per item.
And we do that by taking the average item size and dividing that by 4KB, and then rounding that up to the next highest whole number.
So in this case, because an RCU allows for four kilobytes of reads, we know that 2.5K is going to fit inside 4K.
So every single read is going to be one RCU.
So we know that it's one RCU per read.
So now we know the number of RCU required per item.
We need to know the number per second, the number of operations per second, which is 10, and we multiply those together.
So to do strongly consistent reads with this example, we would need 10 RCU.
But now you know the concept of eventual consistency.
That is half the cost.
So we can take this RCU value and divide it by two, and that means that to perform eventually consistent reads of 10 items per second with a 2.5K size, we need five RCU set on the table.
So I've provided two really simple examples, and what I'm going to do is include some links in the lesson description, which will give you additional examples with different sizes of items, different writes per second, and it will give you some practice on how to perform these calculations.
And I suggest that you do this as extra work once you've finished all of the content of the course.
The only thing that I need you to understand for now is the size of an RCU, the size of a WCU, and exactly how the consistency model works for DynamoDB.
That's all of the theory that I wanted to cover in this lesson though.
Go ahead, complete the video, and when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about DynamoDB operations, DynamoDB consistency and DynamoDB performance.
It's a lot to cover in one lesson so I'll try to be as efficient as possible but let's jump in and get started.
Now DynamoDB allows you to pick between two different capacity modes when you create a table and with some restriction you are able to switch between these modes even after data has been added and these modes are on demand and provisioned.
On demand is a mode which is designed when you have an unknown or unpredictable level of load on a DynamoDB table or alternatively when you have a massive priority for as little admin overhead as possible.
With on demand you don't have to explicitly set capacity settings it's all handled on your behalf by DynamoDB.
You just pay a price per million read and write units but the price that you pay can be as much as five times the price versus using provisioned capacity so it's actually a trade-off.
You're reducing the admin overhead it allows you to cope with unknown or unpredictable levels of demand but you are paying more for that privilege.
With provisioned capacity you actually set a capacity value for reads and writes on a per-table basis so RCU stands for read capacity units and WCU stands for write capacity units.
Now a critical thing to understand is that every operation on a DynamoDB table consumes at least one unit so one unit of read or write.
Now I've added an asterisk here because there is a way to get cheaper reads but I'll introduce this later in this lesson.
One RCU allows for one read operation of up to four kilobytes on a table every second.
If you perform an operation and it only uses one kilobyte to read an item you still consume one RCU.
It rounds up to at least one RCU.
But one operation can consume more.
An item as you learned earlier in the course can be up to 400 kilobytes in size as a maximum and this would consume 100 RCU to read in one operation.
Now a capacity unit is per second so one RCU lets you read one block of data up to four kilobytes every second.
For writes one write capacity unit is one kilobyte so it's the same logic but one kilobyte instead of four kilobytes.
So you set a certain amount of read and write capacity and that gives you a certain amount of read and write load every second.
As well as that every single table within DynamoDB has a WCU and an RCU burst pool.
It calls 300 seconds of the read and write capacity units set on the table.
So when setting read and write capacity units you're setting for the sustained average.
But try to dip into the burst pool as infrequently as possible because other table modification tasks can use this pool as well.
Relying on it too much is pretty dangerous.
If you ever deplete the pool and have insufficient capacity set on a table then you will receive a provisioned throughput exceeded exception error and you'll be throttled.
And the solution is to wait and retry or increase the capacity settings.
Now there are a number of different types of operations that you can perform on a DynamoDB table and some of the most common ones that are mentioned in the exam are query and scan.
So I want to just talk about exactly how these work at a high level before we move on.
The query operation in DynamoDB is one way that you can retrieve data from the product.
When you're performing a query operation you need to start with the partition key.
You have to pick one partition key value.
Let's look at an example visually because it will be easier to understand.
So this is a simple DynamoDB table.
It stores weather data once per day from a group of weather stations.
So the partition key is the sensor ID and the sort key is the day of the week.
And we have one new table for every week of every year.
And to keep things simple these are the sizes of each item.
So item one is 2.5k, item two is 1.5k, item three is 1k and item four is 1.5k.
Now the query operation can return zero items, one item or multiple items.
But you always have to specify a single value for the partition key.
So you can only ever with this example query for one specific weather station.
So regardless of whether your table uses a simple primary key with just the partition key or whether it uses a composite primary key which uses both the partition key and the sort key with query you always have the option of just querying with a single value for the partition key.
So in this example if we decided to query for all items for weather station ID of one then we would get two items returned.
The item for Monday for weather station ID one and the item for Tuesday for weather station ID one.
Now the first item has a size of 2.5k and the second item has a size of 1.5k.
So both of these together equal 4k and one RCU allows you to query 4k.
So this particular query would use one RCU of capacity.
Now with DynamoDB it's always more efficient to return multiple items in a single operation.
In this example we could actually perform a query where we provide the partition key value of one as well as a specific value for the sort key.
So if we wanted to retrieve both of these items with two separate query operations then we could query for a partition key value of one and a sort key of Monday and a partition key value of one and a sort key of Tuesday as two separate operations.
But because every operation consumes at least one RCU then if we ran two separate queries then the same amount of data would consume two RCU.
So it's always more efficient to pull back as much data as you need in totality in one single operation.
Now if you only want to retrieve one specific item then you can query for one particular value of the partition key and one particular value of the sort key.
In this example we could query for a partition key value of one and a sort key value of Monday.
And this would return one single item, the Monday item for Weather Station ID 1 which has a size of 2.5k.
But because it's a single operation and it's less than 4k it will be rounded up to the next whole RCU value.
So this single item query will cost one RCU and it will retrieve the entire item.
Generally with any operations on DynamoDB you always have to operate on the entire item so reading and writing an entire item.
And so there is an architectural benefit with the platform to minimizing the size of an item as much as possible because if you have to perform queries which operate on single items as a minimum you are going to consume the capacity that that whole item uses.
Now just to restress the important thing about queries is that you have to query for one particular value of the partition key and when you're querying for that one value you can retrieve all of the items with that one value or you can filter that down based on supplying one sort key value or a range of sort key values.
And when you do that using the query operation you're only charged for the capacity of that query operation.
So if you pick a particular subset of sort key values you're only charged for the response from that query operation.
What you can do with the query operation is specify particular attributes that you want to return.
So in this example we might only want to return the yellow attribute and the pink attribute but you are still charged for the entire item.
Anything that you filter is discarded but you are still charged for it.
Now if you want to perform a search across an entire table maybe looking for every single weather station entry which indicates good weather you can't do that with the query operation because a query operation can only ever query it based on one particular partition key value.
If you want to perform more flexible operations you need to use scan.
Scan is the least efficient operation within DynamoDB when you want to get data but it's also the most flexible.
Let's use the same example the weather stations.
The way that scan works is to move through the table item by item.
You can specify any attributes that you want to match.
You can show all items for example where a temperature is between two values or retrieve all of the items across all of the weather stations for a given day.
For example Monday.
But what you need to understand about scan is that it is scanning through the entire table.
So the entire table is consumed.
So while you can use it to get access to any data that you want the consumed capacity is for all of the items that are read.
Even if you filter things even if you return less data than the whole table you consume all of the data that's scanned through.
So scan is super flexible but it's also really expensive from a capacity perspective.
So let's quickly look at an example visually.
Let's say that you want to scan the weather table looking for all items across all different weather stations looking for any entries which indicate a sunny day.
So we're looking for a particular attribute which defines that the yellow column and we want to return all items which have this attribute.
Now we can't use a query operation for this because query as I mentioned on the previous screen only allows us to query for one particular value of the partition key.
And we need to look across all different weather stations so multiple values for that partition key.
So we can't use query but we can use scan.
So in this example we're trying to scan for the sunny day attribute, the attribute in yellow and we're looking through the entire table.
So this attribute isn't a partition key and it isn't a sort key and we can't use query.
But we can use scan.
So scan will step through the entire table.
So all four items.
But because we've specified to the scan operation that we only want items which have this sunny day attribute it doesn't return some items.
It doesn't return the ones with the non sunny days.
But that data is just discarded.
We still consume the entire capacity.
So the scan operation would need to step through every item in this table to determine which ones do have the sunny day attribute and which ones don't.
So in this example we actually consume all of the item capacity in the table.
So that's 5k plus 4k plus 2k plus 3k.
So that's a total of 14k which rounded up to the next highest RCU represents 4rcu of capacity that we've consumed.
Even though we're only actually returning 5k plus 4k of data the remaining items which are not valid the ones which don't have sunny days are simply discarded.
Okay so this is the end of part one of this lesson.
It was getting a little bit on the long side and so I wanted to add a break.
It's an opportunity just to take a rest or grab a coffee.
Part two will be continuing immediately from the end of part one.
So go ahead, complete the video and when you're ready, join me in part two.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back.
Over the next few lessons, I'm going to be stepping through the architecture, features, and important considerations of the Amazon DynamoDB product.
DynamoDB is a no-sequel, wide-column database as a service product within AWS.
It's the database product which tends to be used for serverless or web-scale traditional applications inside AWS.
And for the exam, it's essential that you understand it fully.
Now we have a lot to cover, so let's jump in and get started.
So DynamoDB is a no-sequel database as a service product.
It's a public service which means it's accessible anywhere with access to the public endpoints of DynamoDB.
And this means the public internet or a VPC with either an internet gateway or a gateway VPC endpoint.
DynamoDB is capable of handling simple key value data or data with a structure like the document DB model.
Now strictly speaking, DynamoDB is a wide-column key value database.
And being a database as a service product means that you have no self-managed servers or infrastructure to worry about.
It's not like RDS or Aurora or Aurora serverless which a database server as a service product.
With DynamoDB, you actually get the database itself delivered as a service.
And this reduces the complexity and admin overhead significantly of providing a data store for your applications.
Now DynamoDB also supports a range of scaling options.
You can take full control and choose provisioned capacity and then either manually control performance or allow the system to adjust performance automatically.
Or you can use on-demand mode which is a true as a service performance model.
So essentially set and forget.
Now DynamoDB is highly resilient either across multiple availability zones in a region or optionally DynamoDB allows for global resilience.
So a table can be configured to be globally resilient but that is an optional extra.
Now within DynamoDB data is replicated across multiple storage nodes by default and so you don't need to explicitly handle it like you do with RDS.
Now DynamoDB is really really fast.
It's backed by SSD and so it provides single digit millisecond access to your data.
It also handles backups.
It allows for point-in-time recovery and any data is encrypted at rest.
It even supports event-driven integration allowing you to generate events and configure actions when data within a DynamoDB table changes.
So now let's talk about tables within DynamoDB.
Tables are actually the base entity inside the DynamoDB product.
If I'm being precise which by now you know that I like doing DynamoDB shouldn't really be described as a database as a service product it's more like a database table as a service product.
A table within DynamoDB is a grouping of items which all share the same primary key.
Items within a table are how you manage your data within that table.
So think of an item like a row in a traditional database product.
A table can have an infinite number of items within it.
There are no limits to the number of items within a table.
Now when you create a table within DynamoDB you have to pick its primary key.
Now a primary key can be one of two types.
It can either be a simple primary key which is just the partition key known as a pk or it can be a composite primary key which is a combination of the partition key and the sort key which is known as the sk.
So every item in the table has to use the same primary key and it has to have a unique value for that primary key.
If the primary key is a composite key then the combination of the two parts the pk and the sk need to be unique in that table.
Now that's actually the only restriction on data that an item has the unique values for the primary key.
Items can have other bits of data aside from the primary key called attributes but every item can be different as long as it has the same primary key then it can have no attributes, all attributes, a mixture of attributes or completely different attributes.
An item itself can be a maximum of 400 kilobytes in size and this includes the primary key, the attribute values and the attribute names all total together.
With DynamoDB you can configure a table with provisioned capacity or on-demand capacity.
Now capacity is a pretty odd term.
When you think about capacity you probably think about space but in DynamoDB capacity means speed adding capacity means adding more speed more performance.
Now if you choose to use the on-demand capacity model it means you don't have to worry about it you don't have to set explicit values for capacity on a table you just pay for the operations against the DynamoDB table there's a cost per operation.
If you choose provisioned capacity then it means you need to explicitly set the capacity values on a per table basis and there are two terms that you need to understand.
The first is write capacity units or wcu and the second is read capacity units known as rcu.
One wcu set on a table means that you can write one kilobyte of data per second to that table and one rcu or read capacity unit when set on a table means that you can read four kilobytes per second from that table.
Most operations have a minimum consumption so one read of say 100 bytes will consume one rcu as a minimum and I'll be talking more about capacity control later I'm just introducing the terms for now.
Now let's move on and I want to talk about backups inside DynamoDB.
There are two types of backups available in the product first is on-demand backups and these are similar to how manual rds snapshots function so there are full backup of the table retained until you manually remove that backup.
These on-demand backups can be used to restore data and configuration either to the same region or cross region so if you want to migrate data to another region this is one option you can use a backup to restore a table with or without indexes and you also have the ability to adjust encryption settings as part of the restore.
The key thing to remember is that you're responsible for performing the backup and removing the older backups as needed when they're no longer required but DynamoDB does come with another option and this method is called point-in-time recovery it's something that you need to enable on a table by table basis and it's disabled by default.
When you enable the feature on a table it results in a continuous stream of backups a record of all the changes to a table over a 35 day window and from that 35 day window you can restore to a new table with a one second granularity so when you enable this option you can create a new table by restoring this backup from any one second interval in that entire 35 day window so it's a really powerful feature but you do need to enable it explicitly on a per table basis.
Now before we finish up this lesson there are some considerations that you need to be aware of and some of these are really important for the exam so if you see any questions in the exam which mention no sequel then you should probably preference DynamoDB so if you see no sequel mentioned in the exam and DynamoDB is one of the options to answer that question with unless you've got a strong reason to answer otherwise you should probably default to DynamoDB.
On the other hand if you see any question which mentions relational data or a relational database then the answer is likely to not be DynamoDB.
DynamoDB is not suited to relational data it should not be used to implement any form of relational database system it's simply not designed for that and it doesn't include the required features.
If you see any mention of key value mentioned in any exam questions and DynamoDB is one possible answer then again you should probably default to that answer unless you've got a strong reason not to.
Now access to DynamoDB is via the console the CLI or the API you don't have sequel or the structured query language when using DynamoDB.
If you see any questions or answers which mention SQL or the structured query language then that probably excludes DynamoDB as being a correct answer.
Now the billing for DynamoDB is based around a table it's based on the RCU and WCU values that you set on a table as well as the amount of storage required for that table and any additional features that you enable on that table.
So it's a true on-demand database product there's no infrastructure costs for running DynamoDB no base costs you essentially pay for only the resources that you consume either storage operations or capacity requirements that you specify on a table and in addition you are able to purchase reserved allocations for capacity so if you do know that you have a requirement for long-term capacity on a DynamoDB table then you can purchase reservations so in return for a longer term commitment you get a cheaper rate but with that being said that's everything I wanted to cover in this lesson so go ahead complete this video and then when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about a powerful feature of cloud formation called custom resources.
Now we've got a lot to cover so let's jump in and get started straight away.
The way that cloud formation is architected isn't complicated.
You define logical resources within a template and these define what you want cloud formation to do.
So what infrastructure you want it to create.
Cloud formation uses these logical resources in a template to create a stack and this stack creates physical resources.
If you update the logical resources by updating and reapplying a template then the physical resources are updated.
If you remove a logical resource from the template and then reapply that template to a stack then the physical resources are affected in the same way.
Cloud formation doesn't support everything within AWS.
It can lag behind in terms of products or features of those products and there are some things which it just doesn't support or things it never will support.
Cloud formation custom resources are the answer to anything that you want to do in cloud formation that it doesn't support natively.
Custom resources are a type of logical resource which allow cloud formation to do things it doesn't yet support or doesn't natively support or they allow integration with external systems.
Examples of things that you can do with custom resources might be to populate an S3 bucket with objects when you create it or to delete objects from a bucket when that bucket is being deleted.
Something that will normally error.
If you try to delete a cloud formation stack which contains a bucket with objects within it by default it won't allow you to do that.
It will error.
Another example is that you might want to request configuration information from an external system as part of setting up an EC2 instance.
You can even use custom resources to provision non-AWS resources.
So using custom resources the functionality of cloud formation can be extended much beyond what it can support natively.
Now the architecture of custom resources is simple.
Cloud formation begins the process of creating the custom resource and in doing so it sends data to an endpoint that you define within that custom resource.
This might be a lambda function or it could be an SNS topic.
Whichever one you pick it sends an event to this thing.
Whenever a custom resource is created, updated or deleted then cloud formation sends data to that custom resource.
It sends event data which contains the operation that's happening as well as any property information.
And so the custom resource for example a lambda function is invoked and provided with that information.
Now the compute that's backing that custom resource, let's use the example of a lambda function, can respond to cloud formation letting it know of the success or failure and it can pass back in any data.
Assuming a lambda function which backs a custom resource responds with a success code then everything is assumed to be good.
The custom resource is created.
Any data generated by that lambda function is passed back in to cloud formation and it's made available to anything else within the cloud formation template.
So again two options with how you can back custom resources are lambda or an SNS topic.
Now let's look at this visually because it will help you understand the architecture and then I'll show you a practical example from the AWS console.
So let's consider a scenario where a cloud formation template is used to create a stack which creates an S3 bucket and let's look at this without using a custom resource.
We start with a cloud formation template, it's a simple one which contains a simple S3 bucket logical resource and using this template we create a stack and this creates a logical resource inside this stack and the stack creates the corresponding physical resource which is an S3 bucket.
At this point if you deleted the stack it would delete the logical resource which would delete the physical resource and if the bucket is empty everything would work as expected.
But let's say at this point a human gets involved, Gabby and Gabby makes a manual change by adding additional objects into the bucket.
Now we have a problem because the physical resource is out of sync with cloud formation and we have an even bigger problem because we have a bucket with objects inside it.
If we tried to delete the stack at this point cloud formation would attempt to delete the logical resource which would attempt to delete the physical one but because the bucket contains objects the delete operation would fail.
This is just a limitation of S3 and cloud formation.
It's the type of situation you might hit when you're dealing with any complex architecture or when you need to do something that cloud formation just doesn't support and this is one of the situations which custom resources aims to help with.
Let's look at what capabilities custom resources provide us which might help in this situation.
So we start off with the same basic components.
We've got a cloud formation template which creates a stack.
The template though this time has more resources than just the S3 bucket but first it creates an empty bucket as with the above example.
But in addition it has a custom resource and that custom resource is supported by a lambda function.
Now because this custom resource is backed by a lambda function it means that when the custom resource is being created by the stack the lambda function is invoked or executed and it's passed some data.
This data is event data and this data block contains anything given to the resource as properties.
In this example let's assume that the custom resource is provided with the bucket name of the bucket created by the cloud formation stack so the empty bucket.
Now for the sake of example let's say that we've designed this custom resource of this lambda function to download some new objects into this empty S3 bucket and this now means that we have a bucket with objects inside it.
The same problem that we had with the previous example.
Now in addition to the bucket name being provided to this custom resource the event data also contains details of how the lambda function or how the thing that's backing up the custom resource can respond back to cloud formation and this is called the response URL and so the lambda function because it's completed successfully sends a response back a success response to this response URL and this means the logical resource will create successfully and the stack itself will now move into a create complete status so all is good.
So we've used a custom resource at this point to download some additional objects into that S3 bucket so now we have a cloud formation stack in the create complete status and an S3 bucket with some objects within it.
Now at this point let's say that we have a human come along let's say it's Gabby again and Gabby uploads some additional objects to the S3 bucket and let's say there's some additional cat images so we still have a bucket with objects only now it's more objects than the custom resource added earlier so we've got the objects added by the custom resource and the three additional cat pictures that Gabby has just manually uploaded.
Now let's say that we're going to do a stack delete operation so we select the stack we right-click on it and we select delete stack this starts the process of deleting the stack now the stack has two logical resources and two corresponding physical resources the bucket with objects and the custom resource which is backed by a lambda function.
Now cloud formation in the above example immediately tried to delete the bucket with objects and that's why it failed but in this example because when the custom resource was created it needed the bucket to already have been created it means that cloud formation knows that the custom resource depends on the bucket so there's a dependency and so when you're deleting a stack cloud formation will follow the reverse of that dependency and so that means the custom resource will be deleted before the S3 bucket.
What happens now is that cloud formation starts the deletion of all of the resources contained in the stack but it starts with the custom resource and the way that it does this is to send the message through to the lambda function the message has a similar structure to when the stack was created so it's an event data block and this time it contains the fact that the stack is being deleted and it still contains the name of the S3 bucket.
The lambda function performs whatever actions are configured for a delete operation which in this case is to remove all of the objects from the S3 bucket and once this has been completed once the lambda function has completed all of its operations it will signal back to cloud formation that this was successful and again this will happen by using the response URL that's contained in the event data.
Once the success response has been completed then the stack will delete the custom resource.
Once the custom resource has been deleted there'll be no further dependencies inside the stack and the stack will go ahead and delete the S3 bucket.
This time this will succeed because the S3 bucket is empty and because the deletion of the S3 bucket completes successfully now because it's empty that means the stack itself can be deleted and the process complete successfully so by using a custom resource we can add additional capability to cloud formation.
We can make it download additional objects into an S3 bucket and also have it clean up any objects including those added by a human being outside of cloud formation before the S3 bucket is being deleted and by doing it this way it means we can avoid any stack deletion issues caused by any buckets which contain objects so this is a simple example of how we can extend the functionality of cloud formation by using custom resources.
With that being said that's everything I wanted to cover go ahead and complete this video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to briefly cover CloudFormation change sets.
Now this is a feature which makes it safer to use CloudFormation within a full infrastructure as code environment or when CI/CD processes are being used within your organization.
So let's jump in step through what change sets are and what benefits they provide.
The usual flow that you engage with with CloudFormation goes something like this.
You take a template, use it to create a stack which creates physical resources based on the logical resources in the template.
That's a create stack operation.
Or you delete a stack which deletes the physical resources created by the stack.
Or you can take a newer version of a template, maybe it has additional resources or maybe it's a bug fix.
In either case you take that new template, apply it to an existing stack and this changes existing physical resources and this is known as an update stack operation.
When a stack update occurs, when logical resources are changed which results in changes to physical resources, that change has one of three effects.
We have no interruption and this is where certain changes made to a stack might not impact the operation of the physical resource.
The change is just made and that's it.
Next is some interruption which might mean something like an EC2 instance rebooting.
It's not a damaging event but it can impact service.
And finally certain changes might cause a replacement which creates a new copy of that physical resource and the old one is removed.
This is disruptive and can result in data loss so it's critical to keep this in mind when making changes to existing cloud formation stacks.
Now change sets let you apply a new template to a stack but instead of applying the change immediately it creates a change set which is an overview of the changes to be applied to the stack.
What makes change sets powerful is that you can create many different change sets for a stack so you can preview different changes with different new versions of the template and when you've reviewed the change set or change sets then you can choose to discard them or you can pick one to apply by executing it on the stack which creates the stack update operation and updates the logical and physical resources managed by that stack.
Now visually the architecture looks like this.
Let's use a simple example and don't worry we're going to be stepping through this in the console very shortly.
So this example is a cloud formation template which creates three buckets catpicks dogpicks and memes so we use this to create a stack which creates the three s3 buckets so far so good but now let's say that we didn't actually mean to create the memes bucket we only like animal pictures memes are just no good so we create a new template and we use this to update the stack and because we're not using change sets immediately it deletes one of the s3 buckets the memes bucket the memes bucket has been removed from the template its logical resource because of that is also removed and that means the physical resource that the stack managers is deleted from your AWS account.
Now using change sets we can improve this the starting point is the same we create a stack with the version 1 template but instead of using the version 2 template to update the stack instead we create a change set now this is a distinct thing an object which represents the change between the original stack and the new version of the template we can create one or more of these but when we're satisfied we can execute that change set against the stack and this has the same effect as the top method but we have more control and visibility over the changes especially with larger and more complex templates.
Okay so I'm going to move across now to my console and demo how this works in practice if you really want you can also do this in your own environment have included the cloud formation templates within this lessons folder on the course github repository but for this one I would suggest just watching it's probably not something that's worth the effort of implementing yourself I just want you to be aware of how this looks from the console UI.
Okay so I've switched over to my console and to do this demo lesson I need to be logged in as the I am admin user of the general AWS account so that's the management account of the organization and as always I've got the northern Virginia region selected so I'm going to go ahead and move across to the cloud formation console so I'll type cloud formation in the search box at the top and then click to move to the cloud formation console and just before I apply anything these are the templates which I'm going to be using in this really brief demo lesson so we've got template one and this is a really simple cloud formation template which creates three s3 buckets and then template two which is going to be the one that I update the stack with this only has two s3 buckets so the first template creates cat pics dog pics and memes and then the second template only has cat pics and dog pics so let's move back to the console create stack upload a template file choose file and then depending on the course that you're doing inside the infrastructure as code folder or the cloud formation folder or the deployment folder the name will vary depending on the course you should see a change sets folder and in there we've got these two templates template one and template two select template one and click on open then I'm going to scroll down and click on next and I'm going to call this stack change sets click on next scroll to the bottom next again scroll to the bottom and click on create stack so this is going to create our three s3 buckets we can see that it's doing them in parallel cat pics dog pics and memes that should only take a few seconds there we go one more refresh and it's moved into create complete now that that's in create complete if I click on resources we'll be able to see the three logical resources cat pics dog pics and memes and their corresponding physical resources so the three s3 buckets now there's a change sets tab which if we click on you'll see that there are no current change sets for this stack and we can create a change set from here or we can go to stack actions and then create change set for current stack so that's what we're going to do we're going to create a change set so I'm going to click on create change set this dialogue looks much like the one that you would get if you're just updating a stack without using change sets so I'm going to replace the current template I'm going to upload a template file and I'm going to choose template 2 and then once that's loaded click on next click on next again scroll down next again scroll all the way down and then create the change set and we're going to name the change set so I'm going to call it change sets - version 2 you can call this anything you want but I always find that it's useful to have the stack name at the start and then some kind of version indication and if you wanted to type a description you could do that let's say removing memes and then create the change set initially it will show as create pending I'll hit refresh it will show us create complete and now this is a separate entity in its own right we've created a change set we've not actually updated the original stack so if I go back to stacks we'll still see the change set stack if I click on it and go to resources all three of the logical and physical resources still exist so so whilst we've uploaded this new template version and created a change set we haven't actually done anything with this change set so to use a change set let's go and click on the change sets tab and then open up this change set again we'll see a visual list of all the changes that cloud formation has detected between the version of the template the stacks using and this change set and in this particular case it's telling us that an action of remove is occurring against the logical ID of memes and this physical ID so because we've removed this logical resource from this template it's telling us that the logical resource will be removed along with the corresponding physical resource we can click on this template tab and see an overview of the template that's part of this change set in this case it's the template without the memes logical resource if I click on the JSON changes button you'll see a JSON formatted overview of exactly what's happened between the version of the template the stacks using and the version in this change set it's a list of JSON objects and it's one JSON object per change so in this case there's only one change which is logical resource ID memes and the action is to remove and this is how you can get a really accurate overview of the changes between the version of the template that the stacks using and the version of the template that's contained within this change set so it can form part of a really rigorous change management process within your business now just keep in mind at this point I could have the single change set for this stack or I could have 10 change sets and I can list them all individually I can delete them all individually but if I wanted this change set to be applied against the stack I could do so by clicking on execute so I'm going to do that as soon as I click on execute then it's going to run the update stack operation and at this point the changes would be exactly the same as if you just updated the stacks applied a new template and executed that immediately the template that the stack is using will be changed to the one contained in the change set the logical resource in this case will be removed and the corresponding physical resource will also be removed and the end effect of this will be an updated stack it has an update complete status and if we go ahead and click on resources we can see that this memes bucket has been completely removed from this stack and that's a really simple example of how you can use change sets within cloud formation now at this point that's everything which I wanted to demonstrate in this lesson I'm going to go ahead and click on delete and then delete stack and if you've been following along in your own environment you need to do the same to return the account into the same state as it was at the start of this lesson with that being said though that is everything I wanted to cover so go ahead complete this video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this theory lesson I'm going to cover another cloud formation feature called CFN HUP.
Now we have a decent amount to cover so let's just jump in and get started.
Now as a refresher CFN-init is a helper tool which runs on EC2 instances during bootstrapping.
The tool loads metadata stored within the logical resource in a cloud formation stack and it's a desired state configuration tool which applies a desired state to an EC2 instance based on the metadata of that instance's resource.
But, and this is important, it's only run once.
If you change the cloud formation template and update the stack then CFN-init isn't rerun so the configuration isn't reapplied.
CFN-HUP is an extra tool which you can install and configure on an EC2 instance so you're responsible for installing and configuring it but this can be handled within the same bootstrapping process as any other configuration when the instance is launched.
Now CFN-HUP gets pointed at a logical resource in a stack and it monitors it.
It detects changes in the resources metadata.
This occurs for example when you update the template so when you change the metadata and then perform an update stack operation.
When CFN-HUP detects a change then it can run configurable actions and commonly you might rerun CFN-init to reapply the instance's desired state.
So if you change the metadata, update a stack then CFN-HUP will detect this, rerun CFN-init which will apply that change.
The end effect is that when you use both of these tools together any update stack operations which change the metadata can also update the EC2 instances operating system configuration and this is something which isn't normally the case.
Now visually this is how it looks.
We have a CloudFormation template which is used to create a stack and the stack creates an EC2 instance.
Now the template in this case contains some metadata which is for the EC2 instances logical resource and let's assume that this is used initially to configure the instance.
Maybe to configure WordPress or install some other service or application.
Now this is how CFN-HUP interacts in this type of architecture.
Let's say that we change the template and then when we change the template we perform an update stack operation because we've previously installed CFN-HUP on the instance.
This is periodically checking the metadata for the logical resource for that instance in the stack.
When it detects a change we've configured it to run CFN- init and CFN-init then downloads the new metadata for that instance and applies that new configuration.
It's a simple but super powerful architecture and you'll be getting practical experience of using this in an upcoming demo lesson.
For now I just wanted you to be aware of two things.
Firstly if you update a stack that doesn't automatically rerun any bootstrapping on resources in that stack.
So if you want to change the metadata for a resource to include an additional application install that doesn't automatically get applied.
If you're using normal user data that's only by default executed once when you launch the instance.
If you update a stack and change that it's not automatically reapplied to the instance.
So the way that you need to do this is install and configure CFN-HUP as part of the initial bootstrap process.
Configure it to monitor the logical resource for that instance and then when any changes are made it can then initiate another CFN-init to apply that desired configuration.
So you're going to experience this practically within an upcoming demo lesson.
For now though that's everything so go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about a feature of CloudFormation called CloudFormation init.
It's another way that you can provide configuration information to an EC2 instance.
So far you've experienced bootstrapping via the user data and this is an alternative.
Now let's just jump in and get started as we've got a lot to cover.
CloudFormation init is a simple configuration management system.
So far you've used user data to pass scripts into an EC2 instance.
Now this isn't a native CloudFormation feature.
What you're essentially doing is passing in a script through EC2 using the user data feature which is an EC2 feature into the operating system running on the instance where it's executed.
Now CloudFormation init is a native CloudFormation feature.
Configuration directives are stored in a CloudFormation template along with the logical resource it applies to an EC2 instance.
So we have an AWS double colon cloud formation double colon init section of a logical resource.
This is part of an EC2 instance logical resource and it's here where you can specify directives of things that you want to happen on the instance.
The really important distinction that you have to understand is that user data is procedural.
It's a set of commands executed one by one on the instance operating system.
You're essentially telling the instance operating system how to bootstrap itself.
You're giving the instance the how.
How you want things to be done.
CloudFormation init on the other hand this is a desired state system.
You're defining what you want to occur but leaving it up to the system as to how that occurs and that makes it in many different ways much more powerful.
Not least of which because it means that it can be cross platform.
It can work across different flavors of Linux and in some cases on Linux and Windows running on EC2 instances.
Now it's also idempotent meaning if something is already in a certain state running CloudFormation init will leave it in that same state.
If Apache is already installed and your CloudFormation init wants Apache installing then nothing will happen.
If CloudFormation init defines a config file for a service and declares that that service should be started and if both of those things are already true then nothing will happen.
It's much less hassle than having to define within your script's logic as to what should occur if something is already the case.
By using the desired state feature of CloudFormation init it's much easier to design and easier to administer because you just need to define the state that you want instances to be in.
Now accessing the CloudFormation init data is done via a helper script called cfn-init which is installed within the EC2 operating systems.
This is executed via user data.
It's pointed at a logical resource name, generally the logical resource for an EC2 instance that it's running on.
It loads the configuration directives and it makes them so.
Now it's probably going to be easier to understand CloudFormation init along with the cfn-init helper tool if we look at it visually.
It all starts with a CloudFormation template.
This one creates an EC2 instance and you'll see this yourself very soon in a demo lesson.
The template has a logical resource within it for an EC2 instance and this has a new special component.
Metadata an AWS double colon CloudFormation double colon init which is where the cfn-init configuration is stored.
Now the cfn-init helper tool is executed from the user data and so like most EC2 logical resources we pass in some user data but note how this user data is very minimal only containing cfn-init which implements the configuration that we define and then cfn-signal which is used to tell CloudFormation when the bootstrapping is complete.
So the template is used to create a stack which creates an EC2 instance.
The cfn-init line in the user data at the bottom is executed by the instance and this should make sense now everything in the user data section is executed when the instance is first launched.
Now if you look at the command for cfn-init you'll notice that it specifies a few variables stack ID and a region.
Remember this instance is being created by CloudFormation.
These variables are actually replaced for the actual values before it ends up within an EC2 instance.
So the region is the actual region that the stack is created in and the stack ID is the actual ID of the stack that we're currently using and these are all passed to the cfn-init helper tool and this allows cfn-init to communicate with the CloudFormation service and receive its configuration and it can do that because the actual values for the region and the stack name these are all passed in via user data by CloudFormation and once the cfn-init helper tool has this data then it can perform the configuration which has been defined within the logical resource.
Now you're going to experience this in a demo which is coming up slightly later in this section but before we do that I want you to focus on the CloudFormation-init section within the EC2 resource on the left so under metadata and then under CloudFormation double colon init.
We're going to come back to config sets specifically but all of those others are known as config keys.
Think of them as containers of configuration directives and each of them contains the same sections.
So we have packages which defines which packages to install, groups which allow us to define directives to control local group management on the instance operating system, users which is where we can define directives for local user management, sources which lets us define archives which can be downloaded and extracted, files which allow us to configure files to create on the local operating system, commands which is where we can specify commands that we want to execute and then finally services which is where we can define services that should be enabled on the operating system.
Now often within CloudFormation-init you'll define one set of config so one config key containing one set of packages, groups, users, sources, files, commands and services but you can also extend this you can define config sets.
You can create all of these different config keys and then pick from that list and bundle them into a config set which defines which config keys to use and in what order.
Now if you look at the CFN-init line in the user data at the bottom of your screen we're using one specific config set called WordPress underscore install and this uses all of these config keys defined on the left so install CFN, software install, configure instance, install WordPress and configure WordPress but we could have others maybe ones which upgrade WordPress or install a completely different application but whatever the configuration we have in the logical resource we use the CFN-init helper tool we specify the stack ID, the particular logical resource, the region and then the config set to use in this case WordPress underscore install.
Now again don't worry if this is a little bit confusing this is just the theory we're going to be doing one more theory lesson about CFN-hub which is another helper tool available within cloud provision and once we've done that theory lesson as well you're going to do a demo lesson which uses both CFN-init and CFN-hub so by the end of that demo lesson you're going to understand how to use both these helper tools both individually and combined to provide a really good bootstrapping and configuration system.
Now that's all of the theory that I wanted to cover in the next lesson as I've just mentioned we're going to be covering the theory of CFN-hub so at this point thanks for watching go ahead complete this lesson and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to discuss another feature of cloud formation which you will need for the exam so let's jump in and get started.
You know by now that when you create a stack cloud formation creates logical resources based on what's contained in the cloud formation template but for each of those logical resources it also creates a corresponding physical resource within AWS.
Now everything which happens inside AWS requires permissions and as you know by now the default permissions within AWS is zero permissions.
Now by default cloud formation uses the permissions of the identity who is creating the stack to create AWS resources.
Examples of this might be an iam user interacting with a console UI or using the command line.
This means that by default you need the permissions to create, update or delete stacks and permissions to create, update or delete any resources for the stacks that you're creating.
So without any other functionality in order to interact with an AWS account using cloud formation you need both permissions to interact with stacks and permissions to interact with AWS resources.
Now for many organizations this is a problem because there are often separate teams who create resources and then others which are allowed to update them or support them.
Cloud formation stack roles is a feature which allows cloud formation to assume a role and via assuming that role gain the permissions required to interact with AWS and create, delete or update resources.
This allows us to use a form of role separation.
One team can create stacks and the permission sets required to implement them and then the identity creating, updating or modifying a stack only needs permissions on that stack and the past role permissions.
So an identity that's interacting with a stack using stack roles no longer needs the permissions to interact directly with the resources themselves and this is really powerful.
It means an admin user could create a stack with an associated role attached and then a non admin user could be given permissions to interact with that stack using that role without ever having to have the permissions to interact with those resources.
Now this is going to be easier to understand visually so let's have a look at that next.
Now we have two main identities in this example scenario.
We have Gabby on the left who is an account administrator for this AWS account and then on the right we have Phil who is a help desk engineer and in the middle we have the AWS account that both of these identities have to interact with.
Now normally if we wanted Phil to create any AWS resources then he would need to be allocated those permissions either by assuming a role himself or by having the permissions attached either to his user directly or via any groups he is a member of.
Now we want Phil to be able to manage the infrastructure via cloud formation only and either of those options would mean Phil could create the resources directly and this we don't want.
Cloud formation can be a great tool that we can use to control the types of things that can be created, modified or deleted by identities which have lesser permissions.
So using stack roles step one would be that Gabby could create an IAM role with the permissions to interact with AWS resources.
This role has the permissions to interact with the resources but crucially Phil can't edit the role and he has no permissions to directly assume the role.
Phil only has permissions to create, update and delete stacks as well as the permissions to pass the role into cloud formation and that's what he does.
He takes a template that Gabby has created earlier and he uses it to create a stack.
While doing so he passes the role that Gabby's created into the stack by selecting it within the console UI.
This means that the role is attached to the stack and it will be used rather than Phil's own permissions when the stack is performing any resource operations on AWS, meaning that Phil doesn't need the permissions to directly manipulate those resources.
When the stack starts creating it can assume the associated stack role and use the permissions that it gets to create all of the resources within the AWS account so it no longer has to rely on the permissions that Phil has directly associated with his IAM user.
Now this is a really simple example but it's an example of role separation.
Gabby as an administrator can create things within the AWS account that Phil can use.
Gabby herself might not even be allowed to use the things that she creates.
Phil on the other hand doesn't have the permissions to create permissions himself he can't edit the role he can only use it and only with cloud formation.
When using it he doesn't need the permissions that Gabby has to create resources he can simply pass this role into cloud formation and then that gives cloud formation the permissions that it requires to interact with resources inside the account.
Now in the exam if you see this type of scenario where an identity needs to use cloud formation to do things that they wouldn't otherwise be allowed to do outside of cloud formation then stack roles is a great solution to allow this.
You can have one IAM admin user provision an IAM role with the permissions required and then give the identity with the reduced access only the rights to pass that role into cloud formation and then the permissions required to interact with stacks in cloud formation and with the combination of those two things the user Phil in this case can perform actions on the AWS account in a safe and controlled way that he otherwise wouldn't have the permissions to do.
Now that's everything that I wanted to cover in this lesson stack roles is not a complicated feature but it's a powerful one and it's one that you'll need to be fully comfortable with for the exam.
With that being said thanks for watching go ahead complete the video and when you're ready I'll look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk briefly about a feature of CloudFormation called deletion policies.
Now this is a pretty simple feature to understand but it's one that you'll be using extensively if you're deploying larger production systems into AWS using CloudFormation.
So let's jump in and get started.
So what is a deletion policy?
Well if you delete a logical resource from within a CloudFormation template and then apply that template to an existing stack or if you delete a stack entirely then the default behavior of CloudFormation is to delete the corresponding physical resource.
Now with certain types of resources this can cause data loss.
If you're deleting RDS databases or EC2 instances with attached EBS volumes then deleting these physical resources can actually delete data that lives on those resources.
Now a deletion policy is something that you can define on each resource within a CloudFormation template and depending on the type of resource you're able to specify a certain action which CloudFormation should take when that physical resource is being deleted.
Now the default is that CloudFormation will delete a physical resource when the corresponding logical resource is deleted.
You can also specify retain and that simply means that CloudFormation will not delete the physical resource if the corresponding logical resource is deleted.
So if every logical resource within a CloudFormation template is set to retain then when you delete the stack none of the physical resources will be removed.
They're retained inside the AWS account.
Now for a smaller subset of supported resources you can specify the snapshot option for the deletion policy.
Supported resources include EBS volumes, ElastiCache, Neptune, RDS and Redshift and when you specify the snapshot option for any of these type of resources then before the physical resource is deleted a snapshot of that resource is taken.
So for example if you have an EBS volume defined within a CloudFormation template using the snapshot deletion policy if you delete that logical resource and then reapply it to a stack CloudFormation will delete the physical resource but not before it's taken a snapshot and these snapshots continue past the stack lifetime.
So if you delete a stack and you have snapshots selected then these snapshots are your responsibility to clean up.
You have to clean them up otherwise they'll continue to incur costs because they're essentially storage within AWS and as with any snapshots that comes with an associated cost.
Now one really important aspect of this to understand is that deletion policies only apply to delete style operations.
Now what this means is if you have a logical resource in a template and you remove it and then apply that to a stack or if you delete the stack then that's a deletion operation and in that case then the deletion policy will apply but it's possible that you can subtly change a logical resource in a template and then reapply that to the stack and that will cause the physical resource to be replaced which is essentially the same as a delete and then a recreate.
Now a deletion policy will not apply in this case and if a resource is replaced then any data on that original resource will be lost and so it's important that you understand that this applies only to deletion operations it does not apply to change astrological resources which cause replacement of physical resources.
So this is how it looks visually at the top we create a stack which creates an EC2 instance an EBS volume and an RDS instance on the left this is what happens when we delete that stack so this is the default process that happens with cloud formation and all of the physical resources are removed from the account.
In the middle we have the same scenario but when the retain deletion policy is used in this case all of our three resources the instance the EBS volume and the RDS instance they're all retained they remain untouched in the account after the stack has been deleted.
Now on the right we have the snapshot option and it's important to note that snapshot is not supported for EC2 instances so we can't choose this option for EC2 but for resources which do support it so an EBS volume or an RDS instance the result will be that an EBS snapshot or an RDS snapshot remain after the stack has been deleted so you will be responsible for managing these snapshots if you want to delete them after a certain period of time that will be entirely your responsibility.
Now this is all of the theory that I wanted to talk about in this lesson I'm not going to cover it in any more depth because this is something that you'll get experience of yourself as you're doing the demo lessons in any of my courses for now though I just wanted to introduce you to the theory and I wanted to make sure that you're aware that the snapshot option is not supported on all AWS resource types so that's really critical to understand.
At this point though thanks for watching go ahead and complete this video and then when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I'm going to be covering stack sets.
And stack sets are a feature of CloudFormation which allows you to create, update or delete infrastructure across many regions potentially in many AWS accounts.
At a high level stack sets allow you to use CloudFormation to deploy and manage infrastructure across many accounts and regions in those accounts.
Rather than having to authenticate to each account individually and switch to each region you can let CloudFormation do all of the hard work on your behalf.
Now let's cover the key concepts first before we look at the architecture visually.
First we have stack sets themselves.
Now think of stack sets as containers and these containers are applied to an admin account.
I don't want you to think of the admin account as anything special.
We just refer to it as an admin account when we're talking about stack sets to distinguish the account where a stack set is applied from all of the other accounts where CloudFormation creates resources.
So again stack sets are containers and stack sets contain stack instances.
Now these aren't the same thing as stacks.
You can think of stack instances as references.
So references to actual stacks running in specific regions in specific AWS accounts.
So stack instances reference one particular stack in one particular region in one particular AWS account and a stack set can contain many stack instances.
Now the reason that stack instances are treated separately from stacks is that if a stack fails to create for any reason then the stack instance will remain to keep a record of what happens.
So why is stack failed to create?
So think of a stack instance as a container for an individual stack.
So to summarize stack sets are applied in an admin account.
Stack sets reference many stack instances and stack instances are containers for individual stacks which run in a particular region in a particular account.
Now stack instances and stacks are created within target accounts.
Now target accounts are just normal AWS accounts that we refer to as target accounts because these are the accounts that stack sets target to deploy resources into.
So a stack set architecture consists of a stack set in an admin account referencing stack instances and stacks which are in the target accounts and regions that you choose.
Now each cloud formation stack created by stack sets is just a normal cloud formation stack.
It runs in one region of one account and the way that all of this multi-account multi-region architecture is created on our behalf by stack sets is by using either self- managed roles or service managed roles and service managed roles are where we use cloud formation in conjunction with AWS organizations so all of the roles get created on your behalf by the product behind the scenes.
So you can either use service managed security where everything's handled by the products for you or you can go ahead and manually create roles and then use self-managed roles to get the permissions so that cloud formation stack sets can create infrastructure across many different accounts in many different regions.
Now visually stack sets look like this so using a stack set starts off in an admin account and this is an AWS account and this is the one that's on the left of your screen.
It's in this account that we create our stack set and we'll call this the bucket atron because we need a lot of S3 buckets for storing some cat related images.
Now an important thing to be aware of is that a template used to create a stack set it's just a normal template it's nothing special.
For this example let's assume that we have a very simple template which creates a single S3 bucket.
Using stack sets we also have some other accounts these are called the target accounts in this example we have two different AWS accounts and in each of these accounts we've got two regions it doesn't matter which ones so let's just call them region 1 and region 2.
Now I mentioned on the previous screen that permissions for stack sets either come in the form of self-managed IAM roles or via service managed permissions as part of an AWS organization which the target accounts are members of.
Cloud formation is essentially in either case assuming a role to interact with all of these target accounts.
Now as part of the stack set creation you indicate which organizational units or accounts you want to use as targets you give the regions and then cloud formation begins interacting with those accounts it uses roles for permissions.
Now what it's doing is creating stack instances within each region that you pick within each target account that you select as part of creating the stack set.
Stack instances remember are just containers they're things that record what happens in each stack that's created by stack sets in each region in each account that you select.
So once we're at this point once we've created the stack set once cloud formation has used the IAM roles to integrate with each of the target accounts that you've selected and once the stack instances have been created then the stacks themselves are actually created.
One per region in each target account that you've selected as part of creating at the stack set and this stack creation process in turn creates the resources which are defined within the template.
Now with this example without using stack sets we have two regions and two accounts so this makes a total of four stacks which would need to be created for the desired infrastructure.
Now what if instead of two regions we used them all and instead of two accounts maybe we have 50.
Then the effort reduction provided by stack sets starts to become a little bit more obvious.
So what kind of things can stack sets be used for?
Let's look at that next with some other key concepts that you need to be aware of for both the exam and real-world usage.
Now there are a few terms that you need to be aware of.
The first is concurrent accounts so this is an option that you can set when creating a stack set and this defines how many individual AWS accounts can be used at the same time.
So if you're deploying a stack set which is deploying resources into say 10 different accounts and you define a concurrent account value of two then only two accounts can be deployed into at any one time which means that over 10 accounts you'll be doing five sets of two.
So the more concurrent accounts that you set in theory the faster the resources will be deployed as part of a stack set.
We've also got the term failure tolerance and failure tolerance is the amount of individual deployments which can fail before the stack set itself is viewed as failed.
So you need to decide this value carefully especially for larger infrastructure deployment and management.
Next we've got the term retain stacks so what you're able to do is remove stack instances from a stack set and by default it will delete any of the stacks that are in the target accounts but you can set it so that you can remove stack instances from different AWS accounts and different OUs and different regions and it will retain any of the cloud formation stacks within those regions within those accounts.
So by default it will delete the actual stacks but you can set it to retain them.
Now the types of scenarios you might use stack sets for might include enabling AWS config across a large range of accounts.
You might want to use stack sets to create AWS config rules for things like multi-factor authentication, elastic IPs or EBS encryption or you might want to use stack sets to create IAM roles that are used for cross account access at scale so instead of having to create them in individual accounts one by one you can define a cloud formation template to create an IAM role and then deploy it as part of a stack set.
Okay now at this point that's everything that I wanted to cover from a theory perspective in this lesson.
Immediately following this lesson is a demo where you're going to get the chance to experience stack sets within your own environment but for now that's everything that I wanted to cover so go ahead and complete this video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to continue on from the last lesson where I stepped through nested stacks only this time I want to talk about cross stack references which are similar but used in a very different set of architectural scenarios.
So let's jump in and get started.
In the last lesson I talked about this architecture.
I talked about how nested stacks could be used to get past the cloud formation resources per stack limit.
I talked about how if you wanted to reuse templates for modular parts of architecture, for example creating a standard VPC template once and then reusing it, then nested stacks were ideal.
And I talked about how if you wanted to simplify the process of creating large infrastructure using cloud formation you could do it by using nested stacks because a single root stack could create any nested stacks which it needed.
Now the limitation of nested stacks was that if you reused a particular template, say the VPC template, you would only reuse the code not the actual VPC that it creates.
If you implemented 10 root stacks each of which was identical then you'd have 10 application stacks, 10 active directory stacks and 10 VPC stacks which included 10 VPCs.
Now in some cases we want to consume a shared component, for example the VPC, for lots of different implementations.
The problem is the isolation which is a design feature of cloud formation.
Let's say that you have a stack which is a well-structured and secured VPC and you want this to be a shared VPC usable by other application stacks in the same region and the same account.
The issue is that stacks are by design self-contained and isolated.
There's a logical boundary around each stack which means that things in one stack can't be by default referenced in another.
In this example if we were deploying EC2 instances into AppStack 1 and AppStack 2 they couldn't natively reference the subnets created by the shared VPC stack.
Now you could manually add the VPC ID and the subnet IDs into AppStack 1 and AppStack 2 but that means they're static parameters, they're not references and this is where cross stack references come in handy if you want one stack to be able to reference the resources created in another in order to reuse those actual resources.
Now to understand the benefit of cross stack references first understand that because of the isolation of stacks normally the outputs of stacks are only visible from the user interface or the command line.
You can't use the built-in ref function of cloud formation to reference anything from one stack in another.
The exception to this as I detailed in the previous lesson is that root stacks can reference the outputs of nested stacks but that architecture as you also learned means that the stacks are linked in terms of their lifecycle.
Sometimes like when you want to create a shared VPC architecture you actually want a situation where a VPC might have a long running lifecycle and applications which use that specific VPC they might have a short lifecycle so you don't want to define all of those as part of the same nested stack.
An example of this imagine you work for a software development company each time a new version of your application is committed to a github repository you want to use cloud formation to create a VPC run the application in the VPC and operate a set of tests before tearing it all down.
Now you could create an isolated VPC each time using a nested stack architecture but if you wanted to save costs you could use the same shared VPC the same set of NAT gateways and the same set of subnets to do this outputs of a template can be exported.
An export is defined within an output of a stack it takes that output and adds it under an exported name to a list of exports in one region of your account so the export name has to be unique.
So for a shared VPC design some examples of what you might choose to export might be VPC ID, subnet IDs, side arrangers, security group IDs anything which you could expect to use elsewhere so external to that shared VPC stack.
To repeat though the export name needs to be unique inside one region of your account.
To use the export inside another stack instead of using the ref function which is how you reference other resources in the same stack you use the import value function you provide import value with the export name and it returns the value exported in that other stack so that's how you can use exports from one stack in another.
So let's have a look visually at how this works.
Architecturally we start with a single AWS account running in one particular region in this example US East 1.
Inside this region we have a VPC stack and we want this to be a shared services VPC which can be used by other stacks.
Step one is inside that stack make sure that anything we want to use is added as an output for example the VPC ID.
So this is an example of the output section for this particular stack we've got shared VPC ID as an output and then we use the reference function to reference the actual VPC logical resource that's created inside this stack.
Now this means this output will be visible from the command line or the console UI but to use this value in any other stack in this region of the account we need to use the export directive to export that value to a list within one region of your account and this is the exports list and this operates per region per account so this is only visible inside your account in one specific region.
Every region inside your account has its own list of exports everyone else's accounts and all of the regions in those accounts each of those has their own dedicated list of exports so within the exports list any of the exports need to be unique so we can only have one export called shared VPC ID within that region of that account.
Now once a value is in the exports list it can be referenced in other stacks using the import value function this function replaces the ref function and remember the ref function is what you can use to reference other logical resources inside a single stack the import value function when used in a stack allows you to reference values which are exported from other stacks and added to the exports list.
Now this only works in the same region as a stack is being applied in cross region or cross account isn't supported for cross stack references so essentially the process is that you need to create an output in one stack export the value for that output into the exports list and then use the import value function to import that exported value into each stack that you want to use it in and that's how you can create shared services by using cross stack references.
Now there are a number of situations where you would choose to use cross stack references for example when you're implementing service oriented architectures i.e. when you need to provide services from one stack to another another example is if you have a churn of short lived applications which all consume from a shared services VPC then you don't want them to be in the same stack or as part of a nested stack if you have things which have different life cycles long versus short then you want to separate them into different stacks and use cross stack references if you want to reuse a stack so reuse the resources created by a stack rather than reusing a template then cross stack references are ideal for the exam I want you to be clear that a template is not a stack or vice versa a template is used to create one or more stacks each stack is unique if you want to reuse a template then you can choose to use nested stacks which allow you to use the same template that you've created once in many distinct architectures a VPC template for example might be used as part of an email system a financial system or for the implementation of hundreds of different isolated client environments that's if you want to reuse a template and each time you reuse that template it creates its own distinct infrastructure but if you want to reuse an actual stack so the resources inside a stack as with this example of a shared VPC then you should use cross stack references rather than nested stacks so cross stack references allow you to reuse actual resources nested stacks allow you to reuse templates they're very different things I hope by this point it makes sense understanding the differences for the exam is essential and if you pick the wrong one in a real-world situation the results can be less than ideal at this point though that's everything that I wanted to cover so go ahead and complete this lesson and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back.
In the next two lessons I want to cover two features of Cloud Formation.
In this lesson I'm going to be covering Cloud Formation nested stacks and in the lesson following I'll cover cross stack references.
Now we've got a lot to cover so let's jump in and get started.
Most simple projects and deployments which use Cloud Formation will generally utilize a single Cloud Formation stack and a Cloud Formation stack is isolated meaning it contains all of the AWS resources that the project needs.
These might be things such as a VPC, DynamoDB, S3, maybe EC2 and Lambda, SNS, SQS and maybe even a directory service.
Now there's nothing wrong with having a Cloud Formation stack built in this way.
It's isolated, the resources inside it are created together, they're updated together and eventually they're deleted together.
The idea is that all of the resources within a Cloud Formation stack share a life cycle.
Stacks make it simple to package everything up into one collection of resources.
Now designing Cloud Formation in this way where everything's contained in one single stack is fine as long as you don't hit any of the limits that might impact your project.
There are a few things that you need to be aware of.
The first is that there is a limit of 500 resources per stack and for larger deployments this could be a problem.
Another issue with isolated stacks is that you can't easily reuse resources.
If you had a stack like this one which created a VPC it's not practical to reference that VPC in other stacks which might also want to use it.
Stacks are by design isolated by default.
You can use the ref function to reference resources from other resources in the same stack but you can't use this to reference resources in other stacks.
So stacks are isolated.
You have to treat them as self-contained groupings of infrastructure which share the same life cycle.
So you create a stack that creates all of the resources, you update a stack that updates all of the resources and eventually you delete the stack and that deletes all of the resources.
Everything shares the same life cycle.
At the professional level or just for any projects which are complex you'll tend to use a multi-stack architecture.
So you'll implement your project using multiple stacks and there are two ways to architect a multi-stack project.
Nested stacks and cross stack references and choosing between them is what I want you to be fully comfortable with for the exam and in this lesson we're going to be starting by looking at nested stacks.
So let's look at that architecture next.
Nested stacks technically are pretty simple to understand.
You start with one stack which is referred to as the root stack.
In this example this stack is both the root and the parent stack.
A root stack is the stack which gets created first.
So this is the thing that you create either manually through the console UI or the command line or using some form of automation.
So the root stack is the only component of a nested stack which gets created manually by an entity either a human or a software process.
Now a parent stack is the parent of any stacks which it immediately creates.
So complex nested stack structures can actually have multiple levels.
A root stack can create several nested stacks and each of those can in turn create additional nested stacks.
So a parent stack is just a way that we can refer to anything which has its own nested stacks.
So in this case this stack is going to be a root stack and a parent stack.
Now a root stack can have parameters just like a normal stack and also have outputs also just like a normal stack and that's because a root stack is just a normal stack.
There's nothing special about nested stacks.
Inside all stacks you have logical resources and examples of these that you've seen so far include S3 buckets, a virtual private cloud or VPC and maybe even a DynamoDB table.
Now you can also have a cloud formation stack as a logical resource and you define it using the type of AWS double colon cloud formation double colon stack and this is a logical resource just like any other only it creates a stack of its own.
So you have to give the nested stack a URL to the cloud formation template which will be used to create it.
So that template will contain its own resources it's just a normal cloud formation template as I've just mentioned it could even contain its own nested stack.
So in this case HTTPS colon forward slash forward slash some URL dot com forward slash template dot yaml is the URL to a template which will be used to create this nested stack the stack that's called VPC stack.
Now you can also provide nested stack resources with some parameters in this example we're creating a nested stack called VPC stack and if the template for VPC stack had three parameters so param one, param two, param three then we would need to provide values into that stack as it gets created.
For every parameter that the template has for this nested stack we need to provide a value as we create it if not the stack creation process will fail.
So in this particular case the template dot yaml file that's used for VPC stack has three parameters param one, two and three and when we're creating it as a nested stack we need to supply parameters for those values that are used to create that stack.
Now the exception to this is if the VPC stack template had default values for its parameters if it has default values then we wouldn't have to provide those when creating it as a nested stack but it's best practice to populate the VPC stack logical resource with parameters for everything which is parameterized within the template that's used to create that stack.
So in this case we have the root stack it currently has one logical resource VPC stack this creates a nested stack resource and we're passing in these three parameter values.
So when the VPC stack nested stack finishes creating then the logical VPC stack resource within the root stack moves into a create complete status and any outputs of that nested stack are returned to the root stack and these can be referenced using the logical resource name of the nested stack so VPC stack and then dot outputs and then the actual output name of the nested stack.
So you can only reference outputs when using nested stacks you can't directly reference logical resources created in any of the nested stacks you can only reference the outputs that you make visible when creating the nested stack.
Now we might also have other nested stacks contained within the root stack and because these are also logical resources they too would be created but they might have dependencies either ones which cloud formation calculates or one where we use the depends on directive to explicitly inform cloud formation that there is a dependency between different stacks for example we might have an active directory nested stack called AD stack which depends on the VPC stack whether it's a self-managed active directory or one provided by directory service it will need to run from a VPC and so it will depend on that VPC and that VPC is getting created within the VPC stack nested stack.
Now the root stack can take the outputs from one nested stack and give them as parameters to another examples of this might be the VPC ID or the subnet IDs of the resources created inside the VPC stack.
Once the AD stack finishes creating it too might have outputs which are then returned to the root stack then we might create another nested stack perhaps an application stack and this might depend on the AD stack maybe it uses active directory for user authentication and once complete this application stack can also provide its outputs back to the root stack maybe this is a login URL for the application itself as each of the nested stacks finished provisioning the resource in the root stack will be marked as create complete and once all of the logical resources for the nested stacks are complete then the root stack itself will be marked as create complete.
Now there are two really important aspects to nested stacks that you need to understand in order to pick between nested stacks and cross stack references which I'll be talking about in the next lesson.
First by breaking up solutions into modular templates it means that these templates can be reused in this example we have VPC stack which is probably something that can be used again and again for different deployments if you upload the template somewhere then many nested stack architectures can use that template and crucially this is reusing the code or the template for a stack it's not reusing the same stack itself so we're not reusing the VPC that's being created by VPC stack what it means is that we can reuse the template that created VPC stack so by uploading the template for the VPC stack other nested stacks can reuse this same YAML template but if you do reuse the same VPC template in another stack it will create a separate VPC the benefit is the ability to reuse the same templates you're not reusing the same stacks now the AD stack template can also probably be used for different projects which use Active Directory but again every time that this particular template is reused it will create a different Active Directory you're reusing the code not the actual resources this isn't the same when we use cross stack references which I'll be covering in the next lesson because then you're actually reusing resources that are created by a stack when using nested stacks you're reusing the template not the actual stack so you generally use nested stacks when you've created individual building blocks so modular templates and you can reuse each of these templates to form part of a single solution which is life cycle linked so you might be able to reuse the template which creates a VPC on lots of different nested stacks but crucially it would always create a dedicated VPC you would not be using the same VPC in the same stack you would be recreating a new stack and new resources each and every time now nested stacks are generally used when all of the infrastructure that you're creating is forming part of the same solution when it's life cycle linked in this example the application needs Active Directory which needs a VPC it's unlikely that one will exist without the other you aren't going to want to switch out Active Directory for another Active Directory or the VPC for another VPC it's likely that these will all be created together operate together and maybe someday be deleted together now nested stacks do allow for a few main benefits before we finish this lesson I just want to summarize them use nested stacks when you want to overcome the resource limit of using a single stack if you have five stacks together as a nested stack you can have 2,500 resources use nested stacks when you're modularizing your templates that way you can create a VPC template once and use it for many implementations but remember if you use nested stacks and each one of those projects will create its own physical VPC with nested stacks you're only reusing the template so the code you're not reusing resources themselves use nested stacks when you want to make stack installations easier this is because you can apply a root stack and have that root stack automatically orchestrate the application of many nested stacks the one single decision point between using nested stacks and cross stack references is only use nested stacks when everything is life cycle linked when everything in the stack structure needs to be created with each other updated with each other and eventually deleted with each other if you're anticipating needing one part long term but not others then nested stacks are the wrong choice if you imagine needing to use the same actual VPC across multiple implementations then cross stack references are probably better suited and we'll talk about that in the next lesson if you want to make frequent changes to one part of an application and not others then it's probably better to have individual non nested stacks and utilize cross stack references which we'll talk about next okay so that's everything I wanted to cover about nested stacks in the next lesson I'll be comparing this to cross stack references so go ahead and complete this lesson and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about a few related features of CloudFormation and those are weight conditions, creation policies and the CFN signal tool.
So let's jump in and get started straight away.
Before we look at all of those features as a refresher I want to step through what actually happens with the traditional CloudFormation provisioning process and let's assume that we're building an EC2 instance and we're using some user data to bootstrap WordPress.
Well if we do this the process starts with logical resources within the template and the template is used to create a Cloud Formation stack.
Now you know by now that it's the job of the stack to take the logical resources in a template and then create, update or delete physical resources to match them within an AWS account.
So in this case it creates an EC2 instance within an AWS account.
From CloudFormation's perspective in this example it initiates the creation of an EC2 instance so when EC2 reports back that the physical resource has completed provisioning the logical resource changes to create complete and that means everything's good right?
Well the truth is we just don't know.
With simple provisioning when the relevant system EC2 in this case tells CloudFormation that it's finished then CloudFormation has no further access to any other information beyond the fact that EC2 is telling it that that resource has completed its provisioning process.
With more complex resource provisions like this one where bootstrapping goes on beyond when the instance itself is ready then the completion state isn't really available until after the bootstrapping finishes and even then there's no built-in link to communicate back to CloudFormation whether that bootstrapping process was successful or whether it failed.
An EC2 instance will be in a create complete state long before the bootstrapping finishes and so even when it's finished if it fails the resource itself still shows create complete.
Creation policies, weight conditions and CFN signal provide a few ways that we can get around this default limitation and allow systems to provide more detailed signals on completion or not to CloudFormation.
So let's have a look at how this works.
The way that this enhanced signaling is done is via the CFN signal command which is included in the AWS CFN bootstrap package.
The principle is simple enough you configure CloudFormation to hold or pause a resource and I'll talk more about the ways that this is done next but you configure CloudFormation to wait for a certain number of success signals.
You want to make it so that resources such as EC2 instances tell CloudFormation that they're okay.
So in addition to configuring it to wait for a certain number of success signals you also configure a timeout.
This is a value in hours, minutes and seconds within which those signals can be received.
Now the maximum permitted value for this is 12 hours and once configured it means that a logical resource such as an EC2 instance will just wait.
It won't automatically move into a create complete state once the EC2 system says that it's ready.
Instead if the number of success signals that you define is received by CloudFormation within the timeout period then the status of that resource changes into create complete and the stack process continues with the knowledge that the EC2 instance really is finished and ready to go because on the instance you've configured something to explicitly send that signal or signals to CloudFormation.
CFN signal is a utility running on the instance itself actually sending a signal back to the CloudFormation service.
Now if CFN signal communicates a failure signal suggesting that the bootstrapping process didn't complete successfully then the creation of the resource in the stack fails and the stack itself fails.
So that's important to understand CFN signal can send success signals or failure signals and a failure signal explicitly fails the process.
Now another possible outcome of this is the timeout period can be reached without the required number of success signals and in this situation CloudFormation views this as an implicit failure.
The resource being created fails and then logically the stack fails the entire process that it's doing.
Now the actual thing which is being signaled using CFN signal is a logical resource specifically a resource such as EC2 or auto scaling groups which is using a creation policy or a specific type of separate resource called a weight condition resource.
Now AWS suggests that for provisioning EC2 and auto scaling groups you should use a creation policy because it's tied to that specific resource that you're handling but you might have other requirements to signal outside of a specific resource.
For example if you're integrating CloudFormation with an external IT system of some kind in that case you might choose to use a weight condition and next I want to visually step through how both of these work because it will make a lot more sense when you see the architecture visually.
Let's start with the example of an auto scaling group which uses a launch configuration to launch three EC2 instances.
These are within a template and that's used to create a stack.
Because I'm using a creation policy here a few things happen which are different to how CloudFormation normally functions.
First the creation policy here adds a signal requirement and timeout to the stack.
In this case the stack needs three signals and it has a timeout of 15 minutes to receive them.
So the EC2 instances are provisioned but because of the creation policy the auto scaling group doesn't move into a create complete state as normal.
It waits.
It can't complete until the creation policy directive is fulfilled.
The user data for the EC2 instances contains some bootstrapping and then this CFN signal statement at the bottom.
So once the bootstrapping process whatever it is has been completed and let's say that it's installing the Categorum application well the CFN signal tool signals the resource in this case the auto scaling group that it's completed the build.
So this CFN signal that's at the bottom left of your screen this is an actual utility which runs on the EC2 instance as part of the bootstrapping process.
And this causes each instance to signal once and the auto scaling group resource in the stack requires three of these signals within 15 minutes.
If it gets them all and assuming that they're all success signals then the stack moves into a create complete state.
If anything else happens so maybe a timeout happens or maybe one of the three instances has a bug then it will signal a failure and in any of those cases the stack will move into a create failed state.
Creation policies are generally used for EC2 instances or for auto scaling groups and if you do any of the advanced demo lessons in any of my courses you're going to see that I make use of this feature to ensure resources which are being provisioned are actually provisioned correctly before moving on to the next stage.
Now there are situations when you need some additional functionality maybe you want to pass data back to cloud formation or want to put general wait states into your template which can't be passed until a signal is received and that's where wait conditions come in handy.
Wait conditions operate in a similar way to creation policies.
A wait condition is a specific logical resource not something defined in an existing resource.
A wait condition can depend on other resources and other resources can also depend on a wait condition so it can be used as a more general progress gate within a template a point which can't be passed until those signals are received.
A wait condition will not proceed to create complete until it gets its signals or the timeout configured on that wait condition expires.
Now a wait condition relies on a wait handle and a wait handle is another logical resource whose sole job is to generate a pre-signed URL which can be used to send signals to.
It's pre-signed so that whatever using it doesn't need to use any AWS credentials they're included in the pre-signed URL.
So let's say that we have an EC2 instance or external server.
These are responsible for performing a process maybe some final detailed configuration or maybe they assign licensing something which has to happen after a part of the template but before the other part.
So these generate a JSON document which contains some amazing information or some amazing occurrence.
This is just an example it can be as complex or as simple as needed.
This document is passed back as the signal it has a status, a reason, a unique ID and some data.
Now what's awesome about this is that not only does this signal allow resource creation to be paused and then continued when this event has occurred but the data which has passed back can also be accessed elsewhere in the template.
We can use the get at function to query for the data attribute of the wait condition and get access to the details on the signal.
Now this allows a small amount of data exchange and processing between whatever is signaling and the cloud formation stack.
So you can inject specific data about a given event into the JSON document, send this back as a signal and then access this elsewhere in the cloud formation stack and this might be useful for certain things like licensing or to get additional status information about the event from the external system.
And that's wait conditions.
In many ways they're just like creation policies.
They have the same concept.
They allow a specific resource creation to be paused, not allowing progress until signaling is received.
Only wait conditions they're actually a separate resource and can use some more advanced data flow features like I'm demonstrating here.
AWS recommend creation policies for most situations because they're simpler to manage but as you create more complex templates you might well have need to use wait conditions as well and for the exams it's essential that you understand both creation policies and wait conditions which is why I wanted to go into detail on both.
Now that's all of the theory that I wanted to cover about creation policies and wait conditions and these are both things that you're going to get plenty of practical experience of in various demo lessons in all of my courses but I wanted to cover the theory and the architecture so that you can understand them when you come across them in those demos.
For now though thanks for watching go ahead and complete this video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about a feature of Cloud Formation called Depends On.
And Depends On allows you to establish formal dependencies between resources within Cloud Formation Templates.
Now to explain what this is and why it's required, I'm going to step through a few key points and then we're going to look at it visually.
Now when you use a Cloud Formation template to create a Cloud Formation stack, Cloud Formation tries to be efficient.
And the way that it attempts to be efficient is to do things in parallel.
So when it's creating, updating or deleting resources, it's attempting to do this where possible in parallel.
So for example when using one of my demos, you might notice that many resources inside the template are being created at the same time.
Now while it's doing this, it's trying to determine a dependency order.
So for example, it wants to create the VPC first, then create subnets inside that VPC and then create EC2 instances which run in subnets, which run inside a VPC.
So it tries to determine a dependency order or a dependency tree automatically within the Cloud Formation stack.
Now one of the ways that it does this is by using references or functions.
So if an EC2 instance references a subnet and a subnet references a VPC, then it knows that it needs to create the VPC first, then the subnet and then the EC2 instance.
Now the Depends on feature simply lets you explicitly define any dependencies between resources.
So you can formally define that resources B and C depend on resource A and that means that Cloud Formation will not attempt to provision either of those resources until resource A is in a create complete state.
Now in most cases, this built in dependency mapping will work and you won't encounter any issues.
For example, you might see in many of my demos or advanced demos that if the user data that's defined for an EC2 instance references an Aurora cluster, then it's going to wait until that Aurora cluster has been created before creating the EC2 instance.
So it tends to work in 99% of cases, but Depends on is a really useful way that lets you explicitly define this dependency relationship.
So let's take a look at visually at how that works.
So let's say that we start with two Cloud Formation logical resources in a template.
Let's say a VPC and an Internet Gateway.
Now these are different resources and there's no link between the two.
You can create an Internet Gateway without having a VPC and you can create a VPC without having an Internet Gateway.
So there's no implicit or explicit dependency between either of them.
Neither of them require the other to exist.
But consider this, we have an Internet Gateway attachment.
This attaches an Internet Gateway to a VPC.
So logically it requires both of them.
You can see here in orange at the top that we reference the VPC and in blue at the bottom we reference the Internet Gateway.
This creates implicit dependencies.
The Internet Gateway attachment depends on both the VPC and the Internet Gateway resources.
Now it's implicit because we haven't actually formally stated this dependency.
It's just assumed by Cloud Formation because the Internet Gateway attachment logical resource references the other two.
We can't reference a resource until it's in a create complete state.
So until the VPC and the Internet Gateway resources are fully created, they can't be successfully referenced.
And so the Internet Gateway attachment can't be created because it references those other two resources until those other two resources move into a create complete state.
So this implicit dependency feature works in most cases.
In most cases we can allow Cloud Formation to determine this dependency for us.
But there are some exceptions and one very common one which always seems to impact me when creating demos and one which always tends to feature in exams, hint hint, is this one.
This is that we want to create an elastic IP.
So if you're creating an elastic IP and you want to associate it with a VPC that you're creating in the same template.
So let's say that you're associating it with an EC2 instance running in a subnet inside a VPC.
Then it actually requires an attached Internet Gateway.
Otherwise you'll encounter issues.
You might find that when creating stacks using templates sometimes it works or sometimes it doesn't.
You might find when deleting stacks sometimes it works or sometimes it doesn't.
Without formally declaring this relationship whether you can create an elastic IP depends on the random order that Cloud Formation creates these resources.
So if the elastic IP attempts to create before the Internet Gateway attachment has been created then you will get an error.
If you're deleting a stack and Cloud Formation attempts to delete the Internet Gateway attachment before deleting the elastic IP then you're also going to get an error.
Now what you can do to avoid all of these types of issues is to explicitly define a dependency and this is done using depends on as with this example.
So we use the depends on key value.
For the key we use depends on and for the value we specify the resource which this resource formally depends on.
And so this creates an explicit dependency.
The elastic IP will only be created after the Internet Gateway attachment has been completed and the elastic IP will be deleted before the Internet Gateway attachment is deleted.
So using depends on establishes this formal or explicit dependency which ensures that resources are created, updated and deleted in the correct order.
And this is something that's really essential to understand about Cloud Formation to avoid errors when creating larger templates or when studying an exam with an exam question in this area.
So it's definitely something that you need to understand going into any of the AWS exams and also if you're creating larger Cloud Formation templates.
Now it depends on you can either specify a single resource as with this example or you can specify a list of resources if you want to create multiple dependencies.
So keep that in mind.
And again if you're working through any of my demos or advanced demos it's definitely worth the time to look through the underlying Cloud Formation templates and identify where I've put any depends on statements because that will help you understand exactly how this works for production and for exam questions.
At this point though that's everything I wanted to cover so thanks for watching.
Go ahead complete this video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to cover CloudFormation conditions.
Now these are a useful feature of CloudFormation which allows a stack to react to certain conditions and change infrastructure which is deployed or specific configuration of that infrastructure based on those conditions.
Now it's a simple feature but it provides a lot of flexibility for architects, developers or engineers.
So let's jump in and step through exactly how it works and what features it provides.
So CloudFormation conditions are declared within an optional section of the template, the conditions section.
Now you can define many conditions within the conditions section of the template and the end effect is that each condition is evaluated to be true or false.
And these are processed before logical resources which are defined within a template, a processed by CloudFormation and physical resources are created to mirror those logical resources.
So essentially the conditions section of a template is evaluated first and then based on those conditions any logical resources which use those conditions that influences what physical resources are created and how they're created.
So these conditions use other intrinsic functions so AND = IF NOT AND OR and it uses these intrinsic functions to evaluate one or more things and then the result of those functions determines whether the condition itself is true or false.
Any logical resources within a template can have a condition associated with them and the condition that's associated with them defines whether they're created or not.
So if a condition that's associated with a resource is true then that logical resource is created.
If a condition that's associated with a resource is false that resource is not created.
Now an example is you could have a parameter value on a template which accepted a number.
Let's say 1, 2 or 3 and then we could create three conditions within a template.
Let's say 1AZ, 2AZ or 3AZ and each of these conditions would use intrinsic functions to evaluate whether the parameter value was 1, 2 or 3.
Now we could have many duplicate sets of resources defined within the CloudFormation template and certain of those resources would only be created if 2AZ was true and certain resources would only be created if 3AZ were true.
We could also have conditions which react to the environment type parameter of a template.
So based on whether the template was prod or dev we could control the size of instances created by a CloudFormation stack.
So these are just two relatively common ways that conditions are used within a CloudFormation stack and a CloudFormation template.
Now let's take a look at how this looks visually and I'm going to step through a pretty simple example.
So we have three major component parts.
First we have a template parameter.
In this example, EnvType.
An EnvType can be dev or prod and it represents what the template is being used for.
So development activities or production usage.
Then also within the template we have a condition defined inside the conditions block of the template and this uses the equals intrinsic function to check if the value of the EnvType parameter is prod and if it is then this condition is prod is set to true.
Finally conditions are used within resources of the template.
In this case the word pressed to myEIP and myEIP2 resources they all reference this condition.
And just before anyone provides feedback for anyone with a really keen eye these templates are not complete.
So let's just refer to them as pseudo CloudFormation.
They're cut down to only show what matters for this lesson.
The flow through this architecture would start with our developer Bob who would decide on a value of dev or prod for the EnvType parameter when applying it.
So this would set that parameter value.
Now the template as I've just mentioned has the conditions block which is evaluated first by CloudFormation before even considering the resources.
So this evaluates to true or false.
If the EnvType parameter in the template is prod then this condition evaluates to true otherwise it's false.
Now the next stage is that the processing of the resources within the stack begins when processing the resources for any resources which use the isProd condition they're only created if the condition that they reference is true.
So in this example if the isProd condition is false then only the wordpress resource is created.
Because all of the other three have the condition which if it's false will cause those resources not to be created.
Now if the isProd condition was true then wordpress2 would also be created so we'd have two EC2 instances and each of those would also be allocated with an elastic IP.
So myEIP and myEIP2.
So just to reiterate this if a logical resource does not have a condition then it's created regardless.
If a logical resource does reference a condition then that logical resource is only used so a physical resource is only created if that condition evaluates to true.
If it evaluates to false then no physical resource is created for that corresponding logical resource.
Now when you step through the flow of using these conditions they aren't actually that difficult to understand.
You define a condition, you set it to true or false using one of the intrinsic functions and then you use that condition within resources in the template.
Now conditions can also be nested so you could have an isProd condition.
You could also have another condition such as createS3 bucket and if this was true it would create the S3 bucket.
And then you could have a condition which controls if a bucket policy is applied to the bucket and you could configure it so that that bucket policy would only be applied if a bucket is created and if that stack is a production stack.
So you can nest conditions together and make a condition that evaluates to true only if two other conditions also evaluate to true and this nesting is done by using these intrinsic functions.
Now I'll be showing you lots of different examples of conditions in my demos and advanced demos that you'll find through all of my courses.
So always make a habit as you do the demos to review the cloud formation templates which are used.
So seeing these through practical examples will help improve your understanding.
With that you'll be able to read templates and with more and more practice you'll find that writing them becomes much easier.
At this point though that's everything that I wanted to cover in this theory lesson about cloud formation conditions.
Thanks for watching.
Go ahead and complete this video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to talk about cloud formation outputs and outputs are optional within a template.
Many don't have them but they're useful in providing status information or showing how to access services which are created by a cloud formation stack.
Now this is going to be a really quick video so let's jump in and get started.
So the output section of a template is entirely optional.
You can implement perfectly valid cloud formation templates without using an output section but if you do decide to include an output section then you can create outputs within that section.
Essentially you can declare values inside this section which will be visible as outputs when using the CLI, they'll be visible as outputs when using the console UI and and this is a really important point they will be accessible from a parent stack when using nesting and these outputs can be exported allowing cross stack references.
Now outputs are not a complex topic and so I don't want to dwell too much on how they work because you'll be getting some practical experience in an upcoming demo video.
Visually though this is how it might look if you're declaring a simple output.
So in this example we're provisioning an EC2 instance which is running WordPress and this is an output within that template.
So what we're doing is defining an output called WordPress URL and then we're defining two key value pairs description and value.
So description is something which is visible from the CLI console UI and is passed back to the parent stack when nested stacks are being used.
So you can always access the description and it's best practice to provide a description which makes this useful to anyone who might not have seen the template.
Now the second part is the value and the value is important.
The value determines exactly what you want to be exposed by the cloud formation stack once the stack is in a create complete state.
So in this case what we're actually doing is creating a value by joining two other things together.
So we're using the join intrinsic function which I've covered in a different video and we're joining the literal string of HTTPS colon forward slash forward slash and the logical resource attribute of DNS name and this is how we can create a URL for accessing the service that's created by this cloud formation template.
So we're using the join function to generate a simple string from two different things from HTTPS colon slash slash which is a literal string and then the attribute of the instance which is created elsewhere in this template.
So the output will be HTTPS colon slash slash and then the DNS name of the instance and this will provide a method for anyone who's implementing this template to be able to access the service.
So that's everything I wanted to cover about cloud formation outputs.
They're not all that complicated you'll be getting some practical experience in using them in an upcoming demo video and when I talk about cross stack references you'll see how we can extend this by exporting a particular output or set of outputs but at this point I want to keep things simple and that's everything that you need to be aware of when it comes to cloud formation outputs.
So go ahead complete this video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to talk about CloudFormation mappings.
And in keeping with the theme from the last few videos, this is also a feature of CloudFormation which makes it easier to design portable templates.
Now this is going to be a fairly brief video so let's jump in and get started.
CloudFormation templates can contain a mappings object.
Remember at a top level a YAML or JSON template is just a collection of top-level key value pairs.
Now resources is one, parameters is one and now I'm introducing mappings as another.
The mappings object can contain many mapping logical resources and each of these maps keys to values allowing information lookup.
So you might use mappings to map the environment for example production to a particular database configuration or a specific SSH key.
Now these mappings can have one level of lookup so you can provide a key and get a value back or they can have top and second level keys.
Now a common example is a mapping of AMI IDs based on the top level key of region and the second level key of architecture.
Mappings use another intrinsic function which I haven't introduced yet called find in map and an example which I'll show next is the common use case which I just talked about using find in map to retrieve a given Amazon machine image ID for a particular region and a particular architecture.
Now at this point the key thing to remember about mappings is that they help you, you guessed it, improve template portability.
They let you store some data which can be used to influence how the template behaves for a given input.
So let's have a look at a simplified example visually on the next screen.
Now this is an example of one mapping which is called region map which is in the mappings part of a cloud formation template and this is an example of the find in map function that you will use to lookup data using the mapping.
Now to use a mapping it's actually pretty simple.
First we have to use this find in map function and we need to specify a number of pieces of information to this find in map function.
The first thing that we need to specify is the name of the mapping that we're going to use in this case region map.
So this allows the intrinsic function find in map to query a particular mapping in the mappings area of the cloud formation template.
Now the next part this is mandatory we always need to provide at least one top level key.
In this case we need to provide an item that we will use to lookup information from the mapping.
Now in this case we're using a pseudo parameter.
A WS double colon region will always resolve to the region that this template is being applied in to create a stack.
So in this case let's assume that it's US - East - 1.
Now the mapping to use and this top level key are the only mandatory parts of find in map and if we only provided these two then it would retrieve the entire object below US - East - 1 on this example.
But in this case we're going to provide a second level key, HVM 64.
And if we provide this as well it will perform a second level of lookup.
Meaning in this case we will retrieve the AMI ID for US - East - 1 using the HVM 64 architecture.
So this is a simple example but it's a fairly common scenario where you use the mappings area of a template to store an AMI lookup table that you can use to retrieve a particular suitable AMI for a given AWS region and a given architecture.
Now you could change this, you could use a particular AMI for a particular region and a particular application or a particular environment type.
But you can perform one or two level lookups using find in map.
Now again in a future video you're going to get the chance to experience this yourself in a demo video but for now I just wanted to introduce the theory, the architecture behind mappings.
So that's everything I wanted to cover in this video so go ahead, complete the video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to cover CloudFormation intrinsic functions.
Up until this point everything that you've defined within a CloudFormation template has either been static or accepted using parameters.
While intrinsic functions allow you to gain access to data at runtime, your template can take actions based on how things are when the template is being used to create a stack and that's really powerful.
In this lesson I want to cover the theory of intrinsic functions but don't worry you'll be getting the chance to use them practically in an upcoming demo video so let's jump in and get started.
Now I want to quickly step through the functions that we're going to be looking at over the remaining videos of this CloudFormation series and then we can look at some of them visually and technically step through how they work.
So first we're going to be looking at the ref and get attribute function or get at and these both allow you to reference a value from one logical resource or parameter in another one.
If you create a VPC in a template and you want to make sure that another resource such as a subnet goes inside that VPC then you can reference the VPC within other logical resources.
Next we've got join and split and these as the name suggests allow you to join strings together or split them up.
An example usage might be is if you create an EC2 instance which is given a public IP version for DNS name then you can use the join function to create a web URL that anyone can use to access that resource.
Next is get azs which can be used to get a list of availability zones for a given AWS region and the select function which allows you to select one element from that list and these two are commonly used together to pick an availability zone from the list of availability zones in one particular region.
Next are a set of conditional logic functions if and equals not and or and these can be used to provision resources based on conditional checks.
So for example if a certain parameter is set to prod then deploy big instances.
If it's dev then deploy smaller ones.
Next is base64 and sub.
Many parts of AWS accept input using base64 encoding.
For example if you're providing EC2 with some user data for automated builds then you need to provide this using base64.
So the base64 function accepts non-encoded text and its outputs base64 encoded text that you can then provide to that resource.
Sub allows you to substitute things within text based on runtime information.
So you might be passing build information into EC2 and you want to provide a value from the template parameters in which case sub can help you do that.
And then next we've got sider which lets you build sider blocks for networking.
It's a way to automatically configure the network ranges subnet used within a cloud formation template.
Now there are others such as import value, find in map and transform and I'll be covering these in dedicated videos later in this series.
Each one of these functions can be used in isolation or used together to implement some pretty advanced logic within templates.
Now let's take a look at how these work visually and technically and once again don't worry you will be getting the chance to use all of these practically in upcoming demo video.
Two of the most common intrinsic functions within cloud formation are ref and get at meaning get attribute.
It's important that you understand how these are used and the differences between the two.
So let's use this as an example.
A template with a logical resource which we're going to use to create a stack and this creates a physical resource in this case a t3.micro EC2 instance.
Now every parameter and logical resource within cloud formation has a main value which it returns so for example the main value returned by an EC2 instance is its physical resource ID.
The main value for a parameter logically enough is its value and the ref function can be used as the name suggests to reference this main value of a parameter or a logical resource.
Now if you look at the cloud formation simplified example at the bottom left you will see next to image ID we're referencing latest AMI ID which is a parameter and that's how we can use parameters with logical resources by referencing them.
We can also use ref with logical resources as I just described so when an EC2 instance is created once it reaches a create complete state then it makes available a range of data.
The primary value its physical ID can be accessed using the ref intrinsic function.
Now there are also secondary values depending on the type of resource that you're deploying and these can be accessed using the get at function.
With this function you provide the logical resource name and the name of an attribute and examples of this free EC2 might be the public IP address or the public DNS name of the instance.
Ref and get at are critical.
They're used in almost all cloud formation templates to access logical resource attributes, template parameters, pseudo parameters and much more.
They'll be the key to evolving the non-portable template that you created in the previous demo video through to being a portable template so it's really important that you understand how both of these work.
Next I want to talk about the get azs function and the select function.
Now these are often used together which is why I've included them on the same example.
Get azs is an environmental awareness function.
Let's say that we're deploying a template into US East 1 and let's assume this region has 6 azs, US East 1A, 1B, 1C, 1D, 1E and 1F.
Now if you wanted to launch an EC2 instance into one of these azs you would need to know its name.
Basically you would need to know a list of names for all of the valid azs in that region and then you would need to pick one.
Remember from the previous demo video we're trying to ensure that our templates are portable so hard coding, availability zone names is a bad practice.
What you can do is use the get azs function and with this you can either explicitly specify a region, you can use the region pseudo parameter or you can leave it blank and then it will use the region currently being used to create the stack.
What it will do is return a list of availability zones within that region.
There is a little nuance here though under normal circumstances it should return a list of all the azs in that region but what it actually does is to return a list of all azs within that region where the default VPC has subnets in that az.
Now normally these are one and the same but if you have a default VPC where you've deleted subnets then the list that you're going to get back is not going to have all available azs.
So if you don't have a default VPC or if you have the default VPC in its form where it does have subnets for all azs in that region then it will return a list of all available azs within that region.
But if you have a badly configured default VPC then you might get some inconsistent results.
But having this dynamic list of availability zones is really powerful because then you can use the select function to select a numbered one from that list.
Now select accepts a list and an index starting at zero which returns the first object in that list.
So it allows you to dynamically refer to azs in the current region without explicitly stating their identifiers which makes templates much more portable.
It's one part of ensuring that templates can be applied to all regions without having issues and it's something that you're going to get experience of very soon in the next demo video.
Now I'm going to start moving through the rest of these much faster because some of these intrinsic functions are much more situational and you're going to get experience of them as we move through this series of videos.
Next we have the join function and the split function.
Now split accepts a single string value and a delimiter pipe in this example and it outputs a list where each object in the list is part of the original split.
So in this example we provide split with a single string, ruffle pipe, truffles pipe, penny pipe, winky and we get as an output a list where each object in that list is one of those cat names which can be referenced individually.
Now join is the reverse of this.
You provide a delimiter and a list of values and the join function joins them together to make a string.
In this case we're creating a web URL for a WordPress EC2 instance by combining https// and the DNS name of the instance and note how that's obtained with the get at function which I've just covered.
Okay so moving on, next we have base64 and sub.
Now this is an example of user data which I'm going to be covering soon.
Essentially it's a script that you provide to instances which allow them to perform auto configuration.
Now this user data needs to be provided using base64 encoded text but as you can see this isn't the case.
It's simply using plain text.
The base64 function accepts normal text and it encodes it and then passes the output which is base64 into an instance which is the format that that instance needs.
So if you're operating with any AWS resources which require base64 then you can use this function, provide the function with some normal text and it will output the base64 encoded text that you need.
Now the substitute or sub function allows you to do replacements on variables.
So for example this is a variable.
This is the instance ID attribute of the instance logical resource.
By putting it in this format so dollar, curly bracket, variable name, close curly bracket the sub function will replace it with the actual runtime value, the instance ID.
Now there are some restrictions.
You can't do self references.
So in this case this user data could only reference the instance ID of another instance.
This example is actually an invalid one which I wanted to show you visually.
The formatting is correct but it actually shows a self reference.
How can we pass in an attribute of a physical instance before the physical instance is created?
So this is not valid but during an upcoming demo video I'm going to be covering how to use these effectively and you're going to get plenty of practical experience of using the sub function within your own cloud formation templates.
Now the format of using things in substitutions is either the left one for a parameter, the middle one for the primary value of a resource and this is like using the ref function and the right is the format for using attributes.
The logical resource name and then the attribute name and again don't worry you're going to get plenty of practical experience of using this in an upcoming demo video.
The last function I want to talk about is actually a really cool feature of cloud formation which makes networking much easier.
So when you're creating VPCs you have to provide a side arrange for the VPC to use.
Inside that side arrange you've historically had to manually assign rangers for the sub nets inside that VPC.
With this function you can use it to reference the side arrange in this example of a VPC.
You can tell it how many sub nets you want to allocate and then finally you can tell it the size of those sub nets and from that it will output a list of side arrangers which you can use within sub nets within a VPC and you can combine this with the select function to allocate those to sub nets individually.
So in both of these examples sub net one and sub net two what we're doing is we're using the side function, we're passing it the side arrange of the VPC, we're telling it we want 16 ranges in total and we're giving it the size for those ranges.
Both of them output a list of possible ranges to use and we're selecting the first one so index zero for sub net one and the second one so index one for sub net two.
And this is an example of how we can assign side arrangers to sub nets in a more automated way.
It assists again in making templates more portable by auto assigning things.
Now it does have its limitations, it's all based on the parent VPC side arrange and it can't allocate or unallocate ranges but luckily I'm going to show you some really cool techniques how you can fix that in later videos of this series.
Now at this point that's everything I wanted to cover, I wanted to quickly go through some common intrinsic functions that you might use while you're creating cloud formation templates.
Now very soon there's going to be another demo video where you're going to get some practical experience to all of the theoretical concepts that I've been talking about in this block of theory videos.
So don't worry we start with a theory, we make sure that you're entirely comfortable with that and then you'll get the opportunity to practice that in a demo video.
Now that's everything that I wanted to cover in this video so as always please go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to talk about template and pseudo parameters, two types of parameters which can be used within CloudFormation templates and which can influence logical resources within those templates.
Now we've got a lot to cover so let's jump in and get started.
Parameters both template and pseudo parameters allow input.
They let external sources provide input into CloudFormation.
For template parameters this means that the human or automated process can provide input via the console, CLI or API when a stack is created or updated.
An example of this might be the size of the instance or the environment that the template is for.
So for example dev, test or prod.
Now parameters are defined inside a template along with the resources and the values for those parameters can be referenced within logical resources also within that template which allows them to influence the physical resources and/or the configuration of those physical resources when a template is used with a stack to provision AWS resources.
For every parameter that you define in a template you can provide configuration for that specific parameter.
You can define defaults for it so if no value is explicitly provided then that default applies.
You can define allowed values so maybe a list of instance types which are valid for the template.
You can define restrictions such as the minimum and maximum length or even allowed patterns.
You can also define the parameter as using no echo which is useful for passwords where you don't want the input to be visible when it's being typed.
And then finally each parameter can have a type.
You have simple ones like string, number or list but you also have AWS specific ones which allow you to specify a VPC from a list or subnets from a list and some of these can be populated so from the console UI perspective they're interactive based on the region and the account that you're applying the template within.
Now you're going to be getting some practical experience of working with parameters in a future demo video.
For now I just want you to have a basic awareness.
Now visually parameter architecture looks like this.
Parameters start by being defined within a cloud formation template and let's use this as an example.
I've defined two parameters here so instance type which is a string and it has a default of t3.micro together with a set of three allowed values.
I've also provided a description which makes it easier to use from the console UI and then second we have instance AMI ID which is a normal string type parameter with no allowed values so this is simple free text.
Now this example is part of a wider template which includes an EC2 logical resource so if we load this into cloud formation via the console UI then this is what we might see a user interface presentation of those parameters.
At this stage we enter values or we accept the default values and we move through the process of creating the stack.
Conceptually this means that the template defines things based on the resources declared within it and the interactive values provided via the parameters so both of these are combined and are used to create the stack.
It means that the stack creates physical resources based both on the logical resources and the effect on them which the parameters have.
In this case based on the parameter values we would create an instance with one of three sizes and use a certain Amazon machine image.
Now most of this applies to both template and pseudo parameters.
The thing unique to template parameters is that the personal process provides the values into cloud formation either explicitly or by implicitly accepting the defaults.
Pseudo parameters can be treated in the same way but they're provided by AWS so let's have a look at that visually.
Now we start off with a familiar architecture a cloud formation template is used to create a cloud formation stack.
The template could be using the template parameters I've just been talking about.
You don't have to pick one type over the other.
Template and pseudo parameters can be used in a complementary way.
With pseudo parameters what happens is that AWS make available parameters which can be referenced and these exist even if you don't define them in the parameters section of the template.
So conceptually think of these as being injected by AWS into the template and stack.
Now an example of a pseudo parameter is AWS double colon region and the value of this parameter always matches whichever region a template is being applied in to create a stack.
In this example US - East - 1.
Other pseudo parameters include AWS double colon stack ID which matches the unique ID of the stack, AWS double colon stack name which matches the name on the stack and AWS double colon account ID which is populated with the account ID of the account that the stack is being created in.
So pseudo parameters think of them like template parameters but instead of being populated by a human or a process when creating the stack they're populated by AWS.
Now both types of parameters are useful in ensuring that a template is portable and can adjust based on input from the person or process creating the stack.
Static templates are much less flexible and this functionality goes a long way to removing the negative aspects of static templates.
From a best practice perspective you should aim to minimize the number of parameters which you have which require explicit input.
Now this means wherever possible using defaults and where possible getting values from AWS rather than whoever is implementing the stack.
In the videos which follow as well as learning more about the features of cloud formation which help with template portability you're going to get the chance to experiment with all of those features in some demos.
I'm introducing the theory first and then you'll get the chance to experience it yourself.
Now with that being said that's everything that I wanted to cover in this video so go ahead and complete the video and when you're ready I look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I want to cover two things which are at the core of CloudFormation as a product.
Physical resources and logical resources.
In covering both of those you're also going to be learning about templates and stacks.
So this will be a good video to cover the basics of CloudFormation.
Now we've got a lot to cover so let's jump in and get started.
CloudFormation begins with a template which is a document written in either YAML or JSON, both of which you should now have an awareness of.
And defined within a CloudFormation template are logical resources.
Think of logical resources as what you want to create but not how you want them created.
When using CloudFormation you focus on the what and let CloudFormation deal with the how.
CloudFormation templates can be used to create CloudFormation stacks.
And a template can be used to create one stack, a hundred stacks or twenty stacks in different regions.
The idea is that one template defines what resources you want.
And defining good templates means a template can be used many times in many accounts in many regions.
And we refer to that as a portable template.
The initial job of a stack is to create physical resources based on the logical resources defined within the template.
For every logical resource in a template when a stack is created a physical resource is also created.
If a stack's template is updated in some way and then the stack itself is updated the physical resources are also changed.
The stack keeps the logical and physical resources in sync.
If a stack is deleted then normally the physical resources are also deleted.
So think about CloudFormation as a product which looks at a template specifically the logical resources within a template.
And then it creates, modifies or deletes physical resources as required.
So visually it looks like this.
This is a CloudFormation template and this one has been written using YAML.
The template contains logical resources.
In this example instance is the name of the logical resource and this is the type.
So AWS double colon EC2 double colon instance.
Now logical resources are generally going to have properties which are used by CloudFormation when configuring the actual physical resources.
In this example this sets the Amazon machine image to use, the type of the instance and the SSH key pair to use when connecting to the instance.
So the collection of logical resources and other things which I'll be covering in future videos is called a CloudFormation template.
And this template can be used to create one or many CloudFormation stacks.
And a stack when created also creates physical resources based on the logical resources.
So this means because we've set the AMI to use in the template and the SSH key to use these will be used when creating the physical resource.
In this case an EC2 instance.
So this physical EC2 instance is a representation of the logical resource defined in the CloudFormation template.
Now the stack will also react to template changes to update or delete physical resources as required.
Once a logical resource defined inside the CloudFormation template moves into a create complete state, meaning that the physical resource has been created, then the logical resource can be referenced by other logical resources to retrieve various physical configuration elements or IDs.
For example in this case the physical machine ID of the EC2 instance.
So in summary logical resources are contained inside CloudFormation templates.
CloudFormation templates are used to create CloudFormation stacks and the stacks job is to create, update or delete physical resources based on what's contained in that template.
CloudFormation as a product aims to keep the two in sync, so physical and logical resources.
So when you use a template to create a stack, CloudFormation will scan the template and create a stack with logical resources inside and then create physical resources which match those logical resources.
If you update the template then you can use it to update that same stack.
When you do that the stack's logical resources will change, either new logical resources will be added or existing ones are updated or deleted and CloudFormation will perform the same actions on the physical resources.
So adding new ones, updating existing ones or removing physical resources entirely.
If you delete a stack its logical resources are also deleted which causes it to delete the matching physical resources.
CloudFormation is a really powerful tool which you'll be using extensively in the real world and this is the same whether you're a solutions architect, a developer or an engineer.
I use CloudFormation constantly in all of the AWS courses that I create and so by taking the courses you'll be gaining a lot of practical and theory understanding of how CloudFormation works.
Now if you're taking any of my courses with my CloudFormation mini deep dive then you'll be learning even more.
By talking about every important aspect of CloudFormation that's relevant for the course that you're taking as well as giving you plenty of practical examples.
CloudFormation lets you automate infrastructure.
Imagine that you host WordPress blogs.
You can use one template to create one, ten, a hundred or more deployments rather than having to create a hundred individual sites.
CloudFormation can also be used as part of change management.
You can store templates in source code repositories, add changes and get approval before applying them.
Or they can be used to just quickly spin up one-off deployments and if you're taking any of my AWS courses you'll be seeing that I'll be using CloudFormation extensively as part of any of the practical demo lessons in the course.
We'll be using templates to spin up any of the infrastructure that will support the demo lesson that you're going to be taking.
Now that's all of the theory that I wanted to cover about physical and logical resources within CloudFormation.
It is a fairly theoretical topic but you need to understand what a physical resource is, what a logical resource is and how the two relate together as far as they're used within CloudFormation.
Now at this point that's everything that I wanted you to cover in this video so go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk briefly about Amazon Guard Duty.
Now this is something which you only need detailed knowledge of for the security specialty stream of training.
Now I'll try to keep this lesson as efficient as possible so let's jump in and get started.
Now it's important at the outset that you know what Guard Duty is and what makes it special.
So it's a security service but specifically it's a continuous security monitoring service.
This means once enabled it's running all the time trying to protect your account and resources from any security issues.
Now the way that it works is that it can be integrated with supported data sources and I'll talk about this more on the next screen.
It's constantly reviewing those data sources for anything occurring within the account and it also uses artificial intelligence and machine learning plus threat intelligent feeds.
Now the aim of the product is to identify any unexpected or unauthorized activity on the account.
Guard Duty is doing this in an intelligent way so you aren't having to identify things you usually do or define what normal activity is.
It attempts to learn this on its own and using threat intelligence feeds it tries to spot odd or worrying activity as it occurs on the account.
Now you can influence this so white listing IPs and influencing what it sees as okay behavior but the whole point of the product is that on the whole it learns patterns of what happens normally within any managed accounts.
Now if it finds something which logically is called a finding then it can be configured to notify somebody or initiate an event driven process of protection and/or remediation.
Now this might be a lambda function performing some kind of remediation or an event driven workflow via cloud watch events but Guard Duty can be part of an automatic event driven security response and that's really cool.
What's even more awesome is that it actually supports multiple accounts via a master and member account architecture.
When you enable Guard Duty you're essentially making the account that you enable it in the master Guard Duty account and then you can invite other AWS accounts and if they accept they become member Guard Duty accounts meaning the product supports a single location for managing multiple AWS accounts.
Now architecturally the product looks like this.
First we have Guard Duty and Guard Duty receives logs from supported data sources.
At the time of creating this lesson this includes DNS logs from Route 53 showing DNS requests, VPC flow logs showing traffic metadata for any traffic flowing through a VPC, cloud trail event logs showing any API calls within the account, cloud trail management events which cover any control plane level events and then finally cloud trail S3 data events which cover any interactions with objects within S3.
Now all of those are ingested together with various threat intelligent feeds and are used to generate findings which show any unusual or unexpected behavior.
These findings can be sent to cloud watch events now known as event bridge which can be used to handle event driven notification and automatic remediation.
So event bridge can use S&S for notifications to any team members or external security management systems or it can invoke Lambda functions which can interact with AWS APIs, products and services to help automatically remediate any security issues maybe to add an explicit deny rule to a network ACL if there's a potential intrusion.
Now that's pretty much all you need to know for the exam and to get started using the product in the real world.
Now thanks for watching go ahead and complete this lesson and then when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this video I'm going to be talking about Amazon Inspector.
Now this is a service which is really simple to use and it only features in a relatively minor way on most of the AWS exams.
So this is a fundamental video.
If appropriate for the course that you're taking, I'll be going into much more detail in separate videos.
For this video you just need to have a basic awareness of what this product does and how to use it effectively.
Now nearly all of my lessons contain visuals because I find this helps students to learn better.
But in this case Inspector is just one of those services which is easy to understand but very detailed in terms of what it does.
And unfortunately this means it's going to be a text heavy lesson.
So let's jump in and get started.
Amazon Inspector is a product designed to check EC2 instances, the operating systems running on those instances as well as container workloads for any vulnerabilities or deviations against best practice.
The idea is to run an assessment of varying lengths, say 15 minutes, 1, 8 or 12 hours and even 1 day and identify any unusual traffic or configurations which put applications on the instances, the instances themselves or containers at risk.
Now at the end of this process the product will provide you with a review of findings ordered by severity.
In the exam if you see anything about a security report then think Inspector.
But remember it's checking instances, their operating systems, containers and any other networking components involved.
Now Inspector can work with two main types of assessments.
A network assessment can be conducted without using an Inspector agent but adding an agent provides additional richer information.
It can also run a network and/or host assessment which does use an agent.
The host assessment looks at OS level vulnerabilities and this needs access to inside of the instance, so the instance OS and this requires an agent.
With Inspector rules packages determine what is checked.
The first package, network reachability which can be done with no agent or with an agent for additional rich information.
This checks how an instance or group of instances is exposed to public networks, so it checks end-to-end reachability.
So EC2, application load balancers, Direct Connect, elastic load balancers, network interfaces, internet gateways, access control lists, route tables, security groups.
It even checks subnet and VPC configuration and even exposure from virtual private gateways and any VPC peering.
The network reachability rules package returns the following types of findings.
First, for recognized ports, so well-known ports, it confirms if the port is recognized with a listener, i.e. is it exposed to the public networks and is the operating system listening on that port, or recognized port no listener where it's exposed to the internet but with nothing listening, or if you don't use an agent, a recognized port which is exposed but there is no agent to check if the operating system is listening, and this is why using an agent always adds more information versus no agent.
Now lastly, it can identify any unrecognized ports which are exposed with listeners.
So for the exam, this is what the network reachability rules package does.
You might see that term, you need to know what it does, or it might request you to suggest a product which can do this type of analysis and then question whether an agent is required.
And so these are all key points to understand.
We also have rules packages which do require an agent, so host assessments, and all of these are really, really important to remember for the exam.
These are pure keywords, so easy to remember but massively important.
First, there is the common vulnerabilities and exposures or CVE package, and CVE is a database of known cyber security vulnerabilities, each of which is assigned a CVE number, and this package checks against those.
If you see CVE in the exam, think Inspector.
And a report will include any CVE IDs for anything found on the instances or containers.
Next, we have the Center for Internet Security or CIS Benchmarks.
The formal definition is the CIS Security Benchmark Program provides well-defined, unbiased, consensus-based industry best practices to help organizations assess and improve their security.
This rules package checks against that.
So again, if you see CIS as an exam question, think Inspector.
Then finally, we have Security Best Practices for Inspector, which is just a collection of best practices provided by Amazon, including things like disabling root login over SSH, using only modern version numbers for SSH, password complexity checks, and permissions on certain folders.
Again, if you see anything of this nature in the exam, think Inspector.
And that really is everything that you need to know at this fundamental level for this product.
Again, if you're studying for a particular exam which requires more information, I will have additional videos covering everything else in depth.
This is just a fundamental 101 level lesson.
Now, you'll know by now I do hate teaching based on just keywords, but this is one of those outliers where you don't really need to know all of the details.
But I don't want you dropping exam marks because you don't know any of these really valuable keywords.
And again, I'm just going to repeat this one more time.
If applicable for the course that you're studying, I'll be covering Inspector in much more detail in other dedicated lessons.
For now, though, that is everything I wanted to cover.
So go ahead and complete this video.
And when you're ready, I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome to this lesson where we're going to be covering Amazon Macie.
Now we have a lot to cover so let's jump in and get started.
So what is Macie?
Well, it's a data security and data privacy service.
You'll understand now the architecture of the simple storage service known as S3.
It's one of AWS's most popular services and it can host huge quantities of large or small objects at scale.
It can also be made public and for some time it's been a constant source of risk within organizations because of the fact that data can be leaked if the service is misconfigured.
So Macie is a service which can be used to discover, monitor and protect data which is stored within S3 buckets.
It's critical if an organization wants to control the security of its data that it needs to have an awareness of where that data is and what exactly it contains.
So once enabled and pointed at buckets within your AWS account or AWS accounts, Macie can get to work discovering data and this might mean data which is classed as personally identifiable information or PII or personal health information known as PHI as well as financial data and many other types of data.
Now these high level categories include a huge range of data which you personally will have day to day familiarity with.
Things like AWS access keys, SSH keys, PGP keys or bank account numbers, credit card numbers or expiry dates, health insurance numbers, birth dates, drivers license numbers, national insurance numbers, passport numbers, addresses and much more.
It's the first job of Macie to identify and inventory this data.
So by using Macie you'll know what you have, what it contains and where it is.
Now the way that it does this is using data identifiers.
Think of these like rules which your objects and their contents are assessed against and there are two types of data identifiers.
Managed data identifiers and custom data identifiers.
Now managed data identifiers are built into the product.
They use a combination of criteria and techniques including machine learning and pattern matching to analyse the data that you specify.
They're designed to detect sensitive data types for many countries and regions including multiple types of personally identifiable information, personal health information and financial data.
And this type of identifier can be used to detect almost all common types of sensitive data that you might need to manage within your organisation.
Now you can also build custom data identifiers for your business.
These are proprietary so you can look for specific data which your business needs to identify and control.
An example you might use a regular expression known as a reg X to search for certain patterns of specific text within your business.
Maybe employee IDs or performance reports.
With Macy you create discovery jobs which use these identifiers and look for anything matching on buckets.
If anything is found these jobs generate findings and you can view these findings interactively or they can be used as part of integrations with other AWS services.
For example security hub or finding events can be generated and passed into EventBridge and then they can be used for automatic event driven remediation.
So it's a super powerful architecture.
Now one final thing which you need to understand before we review the architecture visually is that Macy uses a multi account architecture.
One account is the administrator account and that can be used to manage Macy within member accounts.
And this multi account structure can be done either using AWS organisations or by explicitly inviting accounts.
And once invited buckets across all accounts within the Macy organisation can be evaluated in the same way.
Now let's just take a second to review the architecture visually.
We start with one or more S3 buckets and then the Macy service itself and then we create a discover job.
And within the discover job we can specify which buckets we want to analyse which means detecting and classifying data within those buckets.
The discovery job has a schedule so this controls when it runs and how frequently it runs and then the job uses a combination of managed data identifiers and custom data identifiers.
And these are the things which actually identify and classify the types of data that Macy is locating.
So it's these things which are the important part of the whole process.
Now as an output to the discovery job findings are generated and these can be viewed either using the console interactively or and this is the more common use case.
They can be used with a vent bridge in the form of event findings generation which can then be delivered to other AWS services.
And this is commonly used for integration or for event driven remediation in this example where a Lambda function can receive the event and can perform some kind of automatic fix based on the finding.
So at a high level that's how the architecture looks.
And before we finish up with this lesson I want to explore a number of other important elements of the service and we're going to start with looking in more detail at the managed and custom data identifiers.
To discover sensitive data within Amazon Macy you create and run data discovery jobs.
A data discovery job analyzes objects within S3 buckets to determine whether the objects contain sensitive data.
And the way that it does this is via data identifiers.
First we have managed data identifiers and these are created and managed by AWS.
And as I mentioned earlier in this lesson they can be used to identify a growing list of sensitive data types.
Now I've included a link attached to this lesson which details the full range of data which is matched by this type of identifier.
But it's things like various credentials, financial data, credit cards, bank details and more.
Things like health data or anything personally identifying such as addresses, passports, drivers licenses and much much more.
It's a pretty comprehensive list so it's worth checking out the link that's included with this lesson which gives a full overview.
In addition to this anyone can create custom data identifiers.
Now the foundation of these are regular expressions which define a pattern to match within data.
This one for instance matches any data which contains the letters A through to Z and then a dash and then eight digits.
Anything that you can define using regular expressions you can match using custom data identifiers.
And these are generally used for data patterns which are custom to your organization as with this example of an employee ID.
You can optionally add keywords to custom data identifiers which must occur within a definable proximity to the pattern matched by the regular expressions.
And this definable distance is called the maximum match distance.
And then finally you can also include ignore words.
So if the regex match is something but an ignore word is there in addition it's ignored and doesn't match.
So keywords, maximum match distance and ignore words are all refiners.
They help you start with a regex pattern but influence how something is classified based on those refinements.
So these identifiers run in addition to built in checks that Macy performs and then findings are generated.
And Macy will produce two types of findings.
Policy findings and sensitive data findings.
Macy generates policy findings when the policies or settings for an S3 bucket are changed in a way that reduces the security of the bucket or its objects but crucially after Macy is enabled.
For example if the default encryption on a bucket was enabled when you enabled Macy and then default encryption is later disabled on that bucket then this is highlighted as a policy finding.
So that's an example of a policy finding.
Macy generates the other type of finding which is a sensitive data finding when it discovers sensitive data in S3 objects that you configure it to analyze.
And it determines what is sensitive data based on the jobs and identifiers which you configure and which I've just stepped through.
So some examples of policy findings are S3 block public access disabled which is triggered if the block public access settings on a bucket are disabled.
Another is S3 bucket encryption disabled which is triggered logically when encryption on a bucket is disabled.
Another is S3 bucket public which is triggered when a bucket policy or ACL changes are made which make a bucket public and another is S3 bucket shared externally and this is triggered when a bucket policy or ACL allows an AWS account other than those within the Macy organization access to this bucket.
So these are all policy changes which Macy decides reduce the security of a bucket or objects in that bucket and so trigger policy findings.
So these are called policy findings and there are more of these and I've included a link attached to this lesson which details all of them and that's really worth a look through just to become familiar with all of the different things that Macy can identify.
Now examples of sensitive data findings include these and it's worth pointing out that there are many more of them.
Again I've included a comprehensive list which is attached to this lesson but for now let's just focus on these important examples.
First we have S3 object credentials and this matches any exposed SSH keys or AWS access keys that Macy can locate.
We've also got S3 object custom identifier and this matches anything defined within custom data identifiers.
We have S3 object financial which matches credit card numbers or bank account numbers and much more.
We have S3 object multiple which occurs when more than one thing is identified.
We have S3 object personal which covers personally identifiable information such as full names, mailing addresses, personal health information such as health insurance or medical identification numbers or combinations of those.
Now this isn't an exhaustive list.
Again I've included a link attached to this lesson which gives you a full overview.
And that at a high level is Macy.
It's a useful tool which you'll need to understand for the exam.
If you see any questions regarding the classification of data within S3 so identifying data, discovering data or reacting to sensitive data automatically then Macy is probably the product to use.
Now that's everything I wanted to cover within this theory lesson.
If you're doing any of my courses where practical knowledge of Macy is required then there's going to be a demo lesson immediately following this one.
If not then this theory is all that you'll need.
So at this point this is the end of the lesson.
Thanks for watching.
Go ahead and complete this video and when you're ready I'll look forward to you joining me in the next.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about AWS config.
Now let's just jump in and get started because we've got a lot to cover.
AWS config is an interesting service because people often misunderstand what it does.
This is especially important within exam situations where you don't have the benefit of Google and have to make architectural decisions quickly.
Now AWS config has two main jobs.
Its primary function is to record changes over time on resources within an AWS account.
Once enabled, the configuration of every resource in the account is monitored.
Every time a resource's configuration changes, a configuration item is created which stores the configuration of that resource at a specific point in time.
The information which is stored is the configuration of the resource, the relationship to other resources and who makes any changes.
So for example, if you had a security group attached to an instance and you added a rule to that security group, then it would track the pre-change state, the post-change state, the fact that you changed it and the fact that it was attached to that EC2 instance.
Now this makes AWS config great for auditing changes and for checking if resources are compliant with standards defined by your organization.
The most important thing to understand about AWS config is that it doesn't prevent changes happening.
It's not a permissions product or a protection product.
Even if you define standards for resources, it can check compliance against those standards but it doesn't prevent you from breaching those standards and creating non-compliant resources.
An example of compliance might be a certain set of allowed ports within security groups.
You can add additional ports exposing an instance to a certain amount of risk.
Now AWS config won't stop you but that non-compliance, that additional port will be identified.
Now config is a regional service so when enabled it monitors changes within a particular AWS region in a particular AWS account but it can be configured for cross-region and cross-account aggregation.
It can also generate notifications via SNS and it can generate events via EventBridge and Lambda when resources change in terms of their compliance state so while AWS config won't prevent you changing something it can be used for automatic remediation.
Now the product stores all of the configuration data and changes in a consistent format within the S3 config bucket and the product allows you to access that data so all of the configuration history of all of the resources and you can interact with them directly from that bucket or using the AWS config APIs.
Now there are two sides to AWS config, the features which are standard and the parts of the product which are optional.
Now the standard part is on the left and the optional part is on the right.
So starting on the left we have some account resources and we have AWS config.
To use the product we have to enable it and this enables the recorder functionality and this takes config information of all the resources and stores them in an S3 bucket, the config bucket and this is all part of the standard functionality provided by the product.
Now you could just enable all of this functionality and leave this as it is.
This would allow you to record and review all changes to resources over time.
Every time a change happens a configuration item would be generated and all of these for all resources would be stored in a standard format in the config bucket.
But we can do a lot more with the product and this is where the real power of AWS config comes from because we can use config rules.
Now config rules are either AWS managed ones or you can define your own which uses Lambda.
What happens is that these rules evaluate resources against a defined standard.
Resources based on these rules are either compliant or non-compliant based on if they meet criteria specified within the config rule.
Now custom rules use Lambda to evaluate if resources match criteria.
The Lambda function does the evaluation using whatever things that you can code and then returns information back to AWS config.
AWS config can then notify or work with other products for automatic remediation.
For example, it can use SNS to send either a stream of changes or compliance notifications and these will either go to human operators or other applications to deal with.
In addition though you can integrate AWS config with EventBridge.
So for any changes in the state of config rules whenever anything becomes compliant or non-compliant this event can be sent to EventBridge and then EventBridge can be used to invoke Lambda functions to perform automatic remediation of any changes.
So to fix the problems automatically.
Now this isn't strictly part of AWS config.
You're essentially using EventBridge to send any events from AWS config to targets to perform this automatic remediation.
You can also fix these type of config changes using SSM.
So AWS config can integrate with systems manager and apply fixes to remediate any issues.
But Lambda can be more flexible for account level things whereas SSM can be effective for anything relating to the configuration of instances.
Now that's all of the theory that I wanted to cover in this lesson.
Go ahead and complete this lesson and then when you're ready I look forward to you joining me in the next lesson.
-
-
learn.cantrill.io learn.cantrill.ioCloudHSM1
-
Welcome back and in this lesson I want to talk about CloudHSM.
Now this is a product which is similar to KMS in terms of the functionality which it provides, in that it's an appliance which creates, manages and secures cryptographic material or keys.
Now there are a few key differences and you need to know these differences because it will help you decide on when to use KMS and when to use CloudHSM.
And you might face an exam question where you need to select between these two.
So let's jump in and get started.
Now I promised you at the start of the course I wouldn't use facts and figures in lessons unless absolutely required.
You shouldn't have to remember lots of different facts and figures unless they influence the architecture.
Now this unfortunately is going to be one of the lessons where I do have to introduce some keywords that you simply need to remember.
Because in this lesson the detail, the difference between CloudHSM and KMS really matters.
Now let's start by quickly talking about KMS.
KMS is the key management service within AWS.
So it's used essentially for encryption within AWS and it integrates with other AWS products.
So it can generate keys, it can manage keys, other AWS services integrate with it for their encryption.
But it has one security concern, at least if you operate in a really demanding security environment.
And that's that it's a shared service.
While your part of KMS is isolated, under the covers you're using a service which other accounts within AWS also use.
What's more, while the permissions within AWS are strict, AWS do have a certain level of access to the KMS product.
They manage the hardware and the software of the systems which provide the KMS product to you as a customer.
Now behind the scenes KMS uses what's called a HSM which stands for Hardware Security Module.
And these are actually industry standard pieces of hardware which are designed to manage keys and perform cryptographic operations.
Now you can actually run your own HSM on-premise.
Cloud HSM is essentially a true single tenant HSM that's hosted within the AWS cloud.
So if you hear the term HSM mentioned, it could refer to both Cloud HSM which is hosted by AWS or an on-premise HSM device.
Now specifically focusing on Cloud HSM, AWS provision it and they're responsible for hardware maintenance.
But they have no access to the part of the unit where the keys are stored and managed.
It's actually a physically tamper resistant piece of hardware.
So it's not something that they can gain access to.
Generally if you as the customer lose access to a HSM, that's it, game over.
You can reprovision them but there's no easy way to recover data.
Now there's actually a well-known standard for these cryptographic modules.
It's called the Federal Information Processing Standard Publication 140-2.
You can easily determine the capability of any HSM modules based on their compliance with this standard.
And I've included a link in the lesson description with additional information.
But Cloud HSM is FIPS 140-2 Level 3 compliant and it's the Level 3 which really matters in the context of this lesson.
KMS in comparison is overall 140-2 Level 2 compliant and some of the areas of the KMS product are also compliant with Level 3.
Now this matters.
This is really important.
If you see an exam question or if you're in a real world production situation which requires 140-2 Level 3 overall, then you have to use Cloud HSM or your own on-premises HSM device.
And that's a fact that you really need to remember for the exam.
Another important distinction between KMS and Cloud HSM is how you access the product.
With KMS, all operations are performed with AWS standard APIs and all permissions are also controlled with IAM permissions.
Now Cloud HSM isn't so integrated with AWS and this is by design.
With Cloud HSM, you access it with industry standard APIs.
Now examples of this are PKCS 11, the JCE extensions or the CryptoNG extensions.
And I've highlighted the keywords that you should try to build up an association with Cloud HSM.
So if you see any of these keywords listed in the exam or in production situations, then you know you need a HSM appliance, either on-premise or Cloud HSM hosted by AWS.
Now it used to be that there was no real overlap between Cloud HSM and KMS.
They were completely different.
But more recently, you can use a feature of KMS called a custom key store.
And this custom key store can actually use Cloud HSM to provide this functionality, which means that you get many of the benefits with Cloud HSM together with the integration with AWS.
So when you're facing any exam questions, you still should be able to look for these keywords to distinguish between situations when you use KMS versus Cloud HSM.
Now just to summarize before we move on from this screen, I want you to focus on doing your best to remember all of the three key points that are highlighted with the exam power-up icon.
If you can remember those, then you should be in a really good position to determine whether to use KMS or Cloud HSM within exam questions.
Now I want to look at the architecture of Cloud HSM as a product, and I think it's best that we do that visually.
Now architecturally, Cloud HSMs are not actually deployed inside a VPC that you control.
They're deployed into an AWS managed Cloud HSM VPC that you have no visibility of.
So architecturally, this is how that looks.
So on the left, we've got a customer managed VPC.
On the right, we've got the Cloud HSM VPC that's managed by AWS.
We're using two availability zones, and inside the customer managed VPC, we've gone ahead and created two private subnets, one in availability zone A and one in availability zone B.
Now inside the Cloud HSM VPC, to achieve high availability, you need to deploy multiple HSMs and configure them as a cluster.
So a HSM by default is not a highly available device.
It's a physical network device that runs within one availability zone.
So in order to provide a fully highly available system, we need to create a cluster and have at least two HSMs in that cluster, one of them in every availability zone that you use within a VPC.
Now once HSM devices are configured to be in a cluster, then they replicate any keys, any policies, or any other important configuration between all of the HSM devices in that cluster.
So that's managed by default, by the appliances themselves.
That's not something that you need to configure.
So the HSMs operate from this AWS managed VPC, but they're injected into your customer managed VPC via elastic network interfaces.
So you get one elastic network interface for every HSM that's inside the cluster injected into your VPC.
Once these interfaces have been injected into your customer managed VPC, then any services which are also inside that VPC can utilize the HSM cluster by using these interfaces.
And if you want to achieve true high availability, then logically instances will need to be configured to low balance across all of the different interfaces.
Now also in order to utilize the cloud HSM devices, then a client needs to be installed on the EC2 instances, which are going to be configured to access the cloud HSM.
So this is a background process known as the cloud HSM client.
And this needs to be installed on the EC2 instance in order for it to access the HSM appliances.
And then once the cloud HSM client is installed, then you can utilize industry standard API's such as PK, CS11, JCE and crypto NG to access the HSM cluster.
Now a really important thing to understand about cloud HSM, because this is a distinguishing factor between it and KMS, is that while AWS do provision the HSM, they're actually partitioned and they're tamper resistant.
So AWS have no access to the area of the HSM appliances which store the keys.
Only you can control these.
You manage them, you're responsible for them.
Now AWS can perform things like software updates and other maintenance tasks, but these don't take place on the area of the HSM which is used to perform cryptographic operations.
Only you as an administrator or anyone that you delegate that to has the ability to interact with the secure area of the HSM devices.
Now before we finish this lesson, there are a few more things that I want to cover.
So these are points that I think you should be aware of.
So some of these are use cases, some of these are limitations that will help you select between using cloud HSM and using something like KMS.
So first, by default there's no native integration between cloud HSM and any AWS products.
So one example of this is that you can't use cloud HSM in conjunction with S3 server-side encryption.
That's not a capability that it has.
Cloud HSM is not accessed using AWS standard APIs at least by default and so you can't integrate it directly with any AWS services.
Now you could, for example, use cloud HSM to perform client-side encryption.
So if you've got an encryption library on a particular local machine and you want to encrypt objects before you upload them to S3, then you can use it to perform that encryption on the object before you upload it to the S3 service.
But this is not integrated with S3.
You're just using it to perform encryption on the objects before you provide them to S3.
Now a cloud HSM can also be used to offload SSL or TLS processing from web servers.
And if you do that, then the web servers can benefit from A, not having to perform those cryptographic operations, but also the cloud HSM is a custom designed piece of hardware that accelerates those processes.
So it's much more economical and efficient to have a cloud HSM device performing those cryptographic operations versus doing it on a general purpose EC2 instance.
So that's something that a cloud HSM can do for you, but KMS natively cannot.
Now other products that you might use inside AWS can also benefit from cloud HSM, products which are able to interact using these industry standard APIs.
And this includes products like Oracle databases.
So they can utilize cloud HSM for performing transparent data encryption or TDE.
So this is a method that Oracle has for encrypting data that it manages on your behalf.
And it can utilize a cloud HSM device to perform the encryption operations and to manage the keys.
Now this does mean that because a cloud HSM device is something that's entirely managed by you, you're the only entity that initially starts off with access to be able to interact with the encryption materials.
So the keys, it means that if you use a cloud HSM and integrate it with an Oracle database, then you're doing so in a way which means that AWS have no ability to decrypt that data.
And so if you're operating in a highly restricted regulatory environment where you really need to use strong encryption and verify exactly who has the ability to perform encryption operations, then generally cloud HSM is an ideal product to support that.
And then lastly in a similar way, cloud HSM can also be used to protect the private keys for a certificate authority.
So if you're running your own certificate authority, you can utilize cloud HSM to manage the private keys for that certificate authority.
Now just to summarize at this point, the overall theme is that for anything which isn't specific to AWS, for anything which expects to have access to a hardware security module using industry standard APIs, then the ideal product for that is cloud HSM.
For anything that uses standards for anything that has to integrate with products which aren't AWS, then cloud HSM is ideal.
For anything which does require AWS integration, then natively cloud HSM isn't suitable.
If FIPS 140-2 Level 3 is mentioned, then it's cloud HSM.
If integration with AWS is mentioned, then it's probably going to be KMS.
If you need to utilize industry standard encryption APIs, then it's likely to be cloud HSM.
Now that's everything that we need to cover.
I just wanted you to be able to handle any curveball HSM style questions that you might encounter in the exam.
So thanks for watching, go ahead and complete this video and then when you're ready, I'll look forward to you joining me in the next one.
-
-
learn.cantrill.io learn.cantrill.io
-
Welcome back and in this lesson I want to talk about AWS Shield, which is an essential tool to protect any internet connected environment from distributed denial of service attacks.
Now it's important for the exam, but especially so for the real world.
So let's jump in and get started.
So AWS Shield actually comes in two forms, Shield Standard and Shield Advanced.
Both of them provide protection against DDoS attacks, but there's a huge difference in their respective capabilities.
First, Shield Standard is free for AWS customers, whereas Shield Advanced is a commercial extra product, which comes with additional costs and benefits, which I'll detail later in this lesson.
The product protects against three different types or layers of DDoS attack.
Now I've covered these in the DDoS lesson in the technical fundamental section of the course, but as a reminder, these categories are Network Volumetric Attacks, so these are things which operate at layer three of the OSI 7 layer model, and these are designed to simply overwhelm the system being attacked, so to direct as much raw network data at a target as possible.
Next, we have Network Protocol Attacks, such as SYNFLUDS, and these operate at layer four of the OSI model.
Now there are various types of protocol attack, but one common one is to generate a huge number of connections from a spoofed IP address and then just leave these connections open, so never terminating them, and while the CPU memory and data resources of the target will be fine, its ability to service real connections will be impacted by the huge volume of fake ones.
To understand this, imagine a call centre where people call up and just leave the phone line silent.
The operators won't be doing anything, but there won't be capacity for new calls to be answered.
Now Network Protocol Attacks can also be combined with volumetric attacks, but by default, you should view these as two different things.
Lastly, we have Application Layer Attacks, which operate at layer seven, for example, Web Request Floods.
Imagine you have a part of your web app which allows searchers.
Think of something like this which lets you search for every cat image in the world ever.
From the perspective of the attacker, this uses almost no resources to run.
It can be done hundreds, thousands or millions of times per second.
But from the perspective of the system being attacked, this might take two to three seconds to return data, maybe even more.
And so it's possible to de-doss a system by using the application as intended, where certain parts of the application are cheap to request, but expensive to deliver the result.
So those are the types of things which SHIELD protects against.
Now I want to spend a little time delving deeper into the capabilities of SHIELD Standard and Advanced, together with the differences.
And I want to focus on when you might pick one versus the other.
So let's start with SHIELD Standard.
SHIELD Standard, as I mentioned earlier, is free for all AWS customers, so you benefit from its protection automatically without you having to do anything.
The protection is at the perimeter of the network, which can either be in your region, meaning as data flows into a VPC, or it can be at the edge of the AWS network if you use CloudFront or Global Accelerator.
SHIELD Standard protects against common network or transport layer attacks.
So that's attacks at layer three or four of the OSI seven layer model.
Now you get the best protection if you use Route 53, CloudFront or Global Accelerator.
Now SHIELD Standard doesn't provide much in the way of proactive capability or any form of explicit configurable protection.
It's just there working away in the background.
Now that's the foundation, the baseline of the product.
Now let's look at what extra things SHIELD Advanced offers.
So SHIELD Advanced, as a starting point, is a commercial product.
In fact, it costs $3,000 per month per organization.
Now this is important, it's not per AWS account.
If you have multiple accounts where you're wanting the advanced level of protection that SHIELD Advanced offers, then just make sure they're in the same AWS organization and you can share the one single investment.
Now the cost while it is per month is part of a one year commitment.
So at 3K per month, this means $36,000 per calendar year.
And there's also a charge for data out for using the product.
Now SHIELD Advanced protects more than standard.
It covers CloudFront, Route 53, Global Accelerator, anything associated with elastic IPs, for example EC2.
It also covers application, classic and network load balances.
It's a comprehensive set of DDoS protections for your network perimeter.
Now what's really important to understand as a concept is that the protections offered by SHIELD Advanced are not automatic.
You need to explicitly enable protections, either in SHIELD Advanced or as part of AWS Firewall Manager when using SHIELD Advanced policies.
It's an explicit act, remember that.
You might find a question on the exam where you need to answer whether these protections require explicit configuration or they happen in the background.
Now SHIELD Advanced offers two other really important benefits and it's important to understand that these are not technical functionality differences, but they're important nonetheless.
First you get cost protection.
And this means that if you as a customer incur any costs for any attacks which should be mitigated by AWS SHIELD Advanced, but aren't, then you're protected against those costs.
And an example of this might be EC2 scaling events caused by excessive load.
Now there are restrictions, it needs to be something SHIELD Advanced should cover and you should have enabled the coverage on that resource.
Now I've included a link attached to this video which covers this particular feature in much more detail.
You don't need to understand the detail for the exam, but for the real world it's good knowledge to have.
Now the other benefit is a proactive style of management as well as access to the AWS SHIELD Response Team known as SRT.
With proactive management, the SHIELD Response Team contacts you directly when the availability or performance of your application is affected because of a possible attack.
And this provides the quickest level of response.
It allows the SHIELD Response Team to begin troubleshooting even before they've established contact with you, the customer.
Now to use this you need to provide your contact details in advance and enable the feature.
And when you do, the SHIELD Response Team will contact you when any attacks are detected.
You can also contact the SHIELD Response Team to log support tickets.
And the SLA for this depends on your support plan.
It might be one hour or 15 minutes.
These are all things that you need to think about and decide upon up front.
Now let's step through some of the technical ways in which SHIELD Advanced helps us.
The first unique feature of SHIELD Advanced is the integration with the web application firewall.
SHIELD Advanced uses the web application firewall to implement its protection against layer 7 attacks.
And if you have a SHIELD Advanced subscription, this includes basic WAF fees to implement these protections.
This is one of the differences in feature benefits which SHIELD Advanced provides over SHIELD Standard.
And so it's an important one to keep in mind.
Another benefit that SHIELD Advanced provides is advanced real-time metrics and reports for DDoS events and attacks.
And these can be accessed via the SHIELD Advanced Console or APIs and via CloudWatch.
Now, if you have a business need for SHIELD Advanced, if you can justify the cost, you're also going to have a need for this enhanced level of visibility.
So this is another one to keep in mind.
You also have health-based detection and this is using Route 53 health checks to implement application-specific health checks.
Now, this allows you to reduce any false positives detected by AWS SHIELD.
And it's used alone or in combination with the proactive engagement team to provide faster detection and mitigation of any issues.
Health-based detection is actually a requirement for using the proactive engagement team.
Again, another important thing to remember.
Now, lastly, you also have protection groups and you can use protection groups to create groupings of resources which SHIELD Advanced protects.
You can define the criteria for membership in a protection group so any new resources are automatically included.
And with these groups, you gain the ability to manage protection at a group level versus a resource level which can significantly decrease the admin overhead of using the product.
Now, at this point, that's everything I wanted to cover about AWS SHIELD at a high level.
If the topic that you're studying requires any additional detail, there will be additional deep dive lessons.
If not, don't worry, this is everything that you need to know.
But at this point, that's the end of this video.
So go ahead and complete the video and when you're ready, I'll look forward to you joining me in the next.
-