S3 API became a de facto standard
Vgl [[S3 API How Amazon’s Storage Protocol Became the Industry Standard]]
S3 API became a de facto standard
Vgl [[S3 API How Amazon’s Storage Protocol Became the Industry Standard]]
On the s3 API, simple storage service. A proprietary protocol by Amazon, that is now mostly the standard for cloud storage services, used by other vendors
You can access Amazon S3 from your VPC using gateway VPC endpoints. After you create the gateway endpoint, you can add it as a target in your route table for traffic destined from your VPC to Amazon S3.
Access AWS S3 from your VPC using Gateway Endpoints, not a bucket policy.
Deploying Machine Learning Models with Flask and AWS Lambda: A Complete Guide
In essence, this article is about:
1) Training a sample model and uploading it to an S3 bucket:
```python from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression import joblib
iris = load_iris() X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=200) model.fit(X_train, y_train)
joblib.dump(model, 'model.pkl') ```
```json { "dev": { "app_function": "app.app", "exclude": [ "boto3", "dateutil", "botocore", "s3transfer", "concurrent" ], "profile_name": null, "project_name": "flask-test-app", "runtime": "python3.10", "s3_bucket": "zappa-31096o41b" },
"production": {
"app_function": "app.app",
"exclude": [
"boto3",
"dateutil",
"botocore",
"s3transfer",
"concurrent"
],
"profile_name": null,
"project_name": "flask-test-app",
"runtime": "python3.10",
"s3_bucket": "zappa-31096o41b"
}
} ```
```python import boto3 import joblib import os
app = Flask(name)
s3 = boto3.client('s3')
s3.download_file('your-s3-bucket-name', 'model.pkl', '/tmp/model.pkl') model = joblib.load('/tmp/model.pkl')
@app.route('/predict', methods=['POST']) def predict(): # Get the data from the POST request data = request.get_json(force=True)
# Convert the data into a numpy array
input_data = np.array(data['input']).reshape(1, -1)
# Make a prediction using the model
prediction = model.predict(input_data)
# Return the prediction as a JSON response
return jsonify({'prediction': int(prediction[0])})
if name == 'main': app.run(debug=True) ```
bash
zappa deploy production
and later eventually updating it:
bash
zappa update production
https://xyz123.execute-api.us-east-1.amazonaws.com/production
which we can query:
curl -X POST -H "Content-Type: application/json" -d '{"input": [5.1, 3.5, 1.4, 0.2]}' https://xyz123.execute-api.us-east-1.amazonaws.com/production/predict
Lesson 3: When executing a lot of requests to S3, make sure to explicitly specify the AWS region.
Lesson 2: Adding a random suffix to your bucket names can enhance security.
Lesson 1: Anyone who knows the name of any of your S3 buckets can ramp up your AWS bill as they like.
The author was charged over $1300 after two days of using an S3 bucket, because some OS tool stored a default bucket name in the config, which was the same as his bucket name.
Luckily, after everything AWS made an exception and he did not have to pay the bill.
dynamodb import
Saving Your Wallet With Lifecycle Rules Of course, storing multiple copies of objects uses way more space, especially if you’re frequently overwriting data. You probably don’t need to store these old versions for the rest of eternity, so you can do your wallet a favor by setting up a Lifecycle rule that will remove the old versions after some time. Under Management > Life Cycle Configuration, add a new rule. The two options available are moving old objects to an infrequent access tier, or deleting them permanently after
S3 object versioning Many of the strategies to be discussed for data durability require S3 object versioning to be enabled for the bucket (this includes S3 object locks and replication policies). With object versioning, anytime an object is modified, it results in a new version, and when the object is deleted, it only results in the object being given a delete marker. This allows an object to be recovered if it has been overwritten or marked for deletion. However, it is still possible for someone with sufficient privileges to permanently delete all objects and their versions, so this alone is not sufficient. When using object versioning, deleting old versions permanently is done with the call s3:DeleteObjectVersion, as opposed to the usual s3:DeleteObject, which means that you can apply least privilege restrictions to deny someone from deleting the old versions. This can help mitigate some issues, but you should still do more to ensure data durability. Life cycle policies Old versions of objects will stick around forever, and each version is an entire object, not a diff of the previous version. So if you have a 100MB file that you change frequently, you’ll have many copies of this entire file. AWS acknowledges in the documentation “you might have one or more objects in the bucket for which there are millions of versions”. In order to reduce the number of old versions, you use lifecycle policies. Audit tip: It should be a considered a misconfiguration if you have object versioning enabled and no lifecycle policy on the bucket. Every versioned S3 bucket should have a `NoncurrentVersionExpiration` lifecycle policy to eventually remove objects that are no longer the latest version. For data durability, you may wish to set this to 30 days. If this data is being backed up, you may wish to set this to as little as one day on the primary data and 30 days on the backup. If you are constantly updating the same objects multiple times per day, you may need a different solution to avoid unwanted costs. Audit tip: In 2019, I audited the AWS IAM managed policies and found some issues, including what I called Resource policy privilege escalation. In a handful of cases AWS had attempted to create limited policies that did not allow `s3:Delete*`, but still allowed some form of `s3:Put*`. The danger here is the ability to call `s3:PutBucketPolicy` in order to grant an external account full access to an S3 bucket to delete the objects and versions within it, or `s3:PutLifecycleConfiguration` with an expiration of 1 day for all objects which will delete all objects and their versions in the bucket. Storage classes With lifecycle policies, you have the ability to transition objects to less expensive storage classes. Be aware that there are many constraints, specifically around the size of the object and how long you have to keep it before transitioning or deleting it. Objects in the S3 Standard storage class must be kept there for at least 30 days until they can be transitioned. Further, once an object is in the S3 Intelligent-Tiering, S3 Standard-IA, and S3 One Zone-IA, those objects must be kept there for 30 days before deletion. Objects in Glacier must be kept for 90 days before deleting, and objects in Glacier Deep Archive must be kept for 180 days. So if you had plans of immediately transitioning all non-current object versions to Glacier Deep Archive to save money, and then deleting them after 30 days, you will not be able to.
So, while DELETE operations are free, LIST operations (to get a list of objects) are not free (~$.005 per 1000 requests, varying a bit by region).
Deleting buckets on S3 is not free. If you use either Web Console or AWS CLI, it will execute the LIST call per 1000 objects
Your Amazon Athena query performance improves if you convert your data into open source columnar formats, such as Apache Parquet
s3 perfomance use columnar formats
Expedited retrieval allows you to quickly access your data when you need to have almost immediate access to your information. This retrieval type can be used for archives up to 250MB. Expedited retrieval usually completes within 1 and 5 minutes.
https://aws.amazon.com/glacier/faqs/
3 types of retrieval
expecited 1~5minutes
In general, bucket owners pay for all Amazon S3 storage and data transfer costs associated with their bucket. A bucket owner, however, can configure a bucket to be a Requester Pays bucket. With Requester Pays buckets, the requester instead of the bucket owner pays the cost of the request and the data download from the bucket. The bucket owner always pays the cost of storing data.
Request Pays
Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.
event notification of s3 might take minutes
BTW,
cloud watch does not support s3, but cloud trail does
How to enable s3 in Gnome Desktop
Kafka consumer offset management protocol to keep track of what’s been uploaded to S3
consumers keep track of what's written and where it left off by looking at kafka consumer offsets rather than checking S3 since S3 is an eventually consistent system.
Data lost or corrupted at this stage isn’t recoverable so the greatest design objective for Secor is data integrity.
data loss in S3 is being mitigated.
Represents an object stored in Amazon S3.
S3Object is a pointer to the data object.
S3ObjectInputStream
Provides an InputStream to read the data.
cloud storage service S3, which Amazon confirmed is experiencing “high error rates,”
Need to look at S3
. Is art about making up new things or about transforming the raw material that's out there? Cutting, pasting, sampling, remixing and mashing up have become mainstream modes of cultural expression, and fan fiction is part of that. It challenges just about everything we thought we knew about art and creativity.
Not really. Art has always been about reacting (in some part) to the works that have come before it.