AWS

Amazon S3 Files: Complete Guide to Setup, Performance, and Pricing (2026)

Twenty years of S3, and the answer to “can I just mount it like a normal file system?” was always “sort of, with FUSE hacks.” That changed on April 7, 2026. Amazon S3 Files brings native NFS access to S3 buckets with sub-millisecond latency on small files, full POSIX semantics, and no third-party tools. It runs on the same infrastructure as Amazon EFS, which means the caching layer is battle-tested, not bolted on.

Original content from computingforgeeks.com - post 165395

S3 Files creates a shared file system backed by an S3 bucket. Your applications mount it over NFS and work with standard file operations (open, read, write, rename, lock) while the underlying data stays in S3. Both the NFS mount and the S3 API can access the same data simultaneously. This guide covers the architecture, performance characteristics, pricing model, and operational details you need before adopting S3 Files in production. For the hands-on walkthrough, see our EC2 mounting guide.

Current as of April 2026. Amazon S3 Files GA launched April 7, 2026. Tested on Amazon Linux 2023 with AWS CLI 2.34.26

What is Amazon S3 Files?

S3 Files is a managed NFS file system that uses an S3 bucket as its backing store. Unlike FUSE-based solutions that translate file operations into S3 API calls on every request, S3 Files maintains a high-performance caching layer that serves active data locally while keeping the full dataset in S3. The result is a file system that feels like EFS for hot data but costs like S3 for cold data.

The core capabilities:

  • NFS 4.1 and 4.2 protocol support with full POSIX permissions, advisory file locking, and read-after-write consistency
  • Two-way synchronization between the file system view and the S3 bucket. Write a file via NFS, read it via the S3 API (after sync). Upload an object via S3, access it through the mount (on first access)
  • No data duplication in the traditional sense. The cache stores active working set data, but S3 remains the source of truth
  • Works with EC2, Lambda, EKS, and ECS without custom drivers or FUSE libraries
  • Shared access across thousands of concurrent connections, same as EFS

The practical upshot: teams that previously maintained both an EFS volume for compute and an S3 bucket for analytics can now use a single S3 bucket for both, accessed via the appropriate protocol for each workload.

How S3 Files Works (Architecture)

S3 Files uses a two-tier architecture that separates hot and cold data access. Understanding this split is key to predicting both performance and cost.

High-Performance Storage Layer

When you create an S3 Files file system, AWS provisions a caching layer built on EFS infrastructure. This layer stores the active working set: recently written files, recently read files, and metadata. Small file reads (below the configurable threshold, default 128 KB) are served entirely from this layer with sub-millisecond latency. Think of it as a smart, managed cache that you never have to warm or invalidate manually.

S3 Bucket (Source of Truth)

The S3 bucket holds the complete dataset. Large reads (1 MB and above) stream directly from S3, bypassing the cache entirely. This is a deliberate design choice: S3 already delivers high throughput for large sequential reads, so caching those would waste money without meaningful latency improvement.

Synchronization Behavior

Writes go to the high-performance layer first. They batch for approximately 60 seconds before syncing to the S3 bucket. In testing, the actual delay ranged from 60 to 70 seconds. During this window, data is visible through the NFS mount but not yet through the S3 API. After sync completes, S3 GET requests return the updated data.

Imports work in the opposite direction. When an object is uploaded directly to S3 (bypassing the file system), it becomes visible through the NFS mount on first access. The file system fetches the object from S3 and caches it in the high-performance layer. Subsequent reads hit the cache.

Cache Expiration

Data in the high-performance layer expires after a configurable period of inactivity. The default is 30 days, with a range of 1 to 365 days. Shorter expiration reduces storage costs but increases first-read latency for infrequently accessed files. For workloads with clear hot/cold patterns, tuning this value can cut costs significantly.

Supported Compute Services

Amazon EC2

EC2 is the primary use case. Mount the file system using mount -t s3files after installing amazon-efs-utils v3.0.0 or later. The mount helper handles TLS negotiation and IAM authentication automatically. Both Amazon Linux 2023 and Ubuntu 24.04 are supported. See the full step-by-step EC2 mounting guide for the complete walkthrough.

Amazon EKS

EKS support comes through the Amazon EFS CSI driver version 3.0.0 and above. You create a PersistentVolume backed by the S3 Files file system ID and mount it into pods like any other persistent volume. The CSI driver handles the NFS connection and IAM credential injection. This means existing EFS-based Kubernetes workloads can migrate to S3 Files by changing the file system ID in the PV spec.

Amazon ECS

For ECS, add the file system as an EFSVolumeConfiguration in the task definition. The container mounts it at the specified path. Both Fargate and EC2 launch types are supported. Access points work here too, enforcing POSIX identity per container without sharing root credentials.

AWS Lambda

Lambda functions can attach S3 Files file systems using access points, mounted at a path under /mnt/. The function reads and writes files as if they were local. This is the same mechanism Lambda uses for EFS, so existing Lambda/EFS patterns apply directly. The catch: Lambda functions have a 15-minute execution limit, so long-running file operations need to be chunked.

Prerequisites

Before creating an S3 Files file system, verify these requirements. Missing any one of them will block creation or cause mount failures.

  • S3 bucket with versioning enabled. This is mandatory. S3 Files will not attach to a bucket without versioning. Enable it before creating the file system
  • Server-side encryption. The bucket must use SSE-S3 or SSE-KMS. S3 Files does not support unencrypted buckets or SSE-C (customer-provided keys)
  • amazon-efs-utils v3.0.0 or later. Older versions do not recognize the s3files mount type
  • AWS CLI v2.34 or later. The aws s3files subcommand was added in this release
  • Two IAM roles. A file system access role (allows S3 Files to read/write the bucket) and a compute resource role (allows EC2/Lambda/EKS to mount the file system)
  • Security group. TCP port 2049 must be open between compute resources and the mount target. This is the standard NFS port
  • VPC with DNS resolution enabled. The mount helper resolves the file system DNS name within the VPC

Performance Specifications

Performance was tested on an m5.xlarge instance in us-east-1 using dd and fio against a freshly created S3 Files file system. The numbers below combine tested results with documented limits from the AWS documentation.

MetricValue
Small file read latencySub-millisecond to single-digit ms
100 MB sequential write341 MB/s (tested)
100 MB first read (uncached)289 MB/s (tested)
100 MB cached read4.7 GB/s (tested)
Max aggregate read throughputUp to TB/s
Max aggregate write throughput1-5 GB/s
Max read IOPS250,000
Max write IOPS50,000
Write sync delay to S3~60-70 seconds (tested)
Import from S3 throughput700 MB/s
Export to S3 throughput2,700 MB/s

The cached read number (4.7 GB/s) is the standout. Once data lands in the high-performance layer, repeated reads approach local NVMe speeds. First reads on large files are slower because they stream from S3 directly, but the 289 MB/s figure is still faster than most FUSE-based alternatives by an order of magnitude.

Write throughput scales with the number of concurrent connections. A single client pushing sequential writes hit 341 MB/s in testing. Multiple clients writing in parallel will approach the aggregate 1-5 GB/s ceiling depending on file size distribution and I/O patterns.

Pricing Model

S3 Files pricing has three cost components. Understanding how they interact is critical for capacity planning.

1. High-Performance Storage

You pay per GB-month for data stored in the high-performance caching layer. This is only the active working set, not the entire bucket. If your bucket holds 10 TB but only 500 GB is actively accessed, you pay high-performance storage costs for 500 GB. The cache expiration setting directly controls this: shorter expiration means less cached data and lower cost.

2. File System Access Charges

Reads from and writes to the high-performance layer incur per-request charges. These cover the compute cost of serving NFS operations from the managed infrastructure. Small file operations (metadata lookups, directory listings, sub-128 KB reads) all hit this tier.

3. Standard S3 Costs

Your existing S3 storage charges remain unchanged. The bucket still stores all data at standard S3 rates. Large reads (1 MB and above) that stream directly from S3 incur only standard S3 GET request costs with no additional S3 Files charge. This makes read-heavy workloads on large files surprisingly cost-effective because you skip the high-performance layer entirely for those reads.

Cost Comparison: S3 Files vs Alternatives

The following comparison covers the four main AWS shared file system options. Pricing varies by region; this reflects us-east-1 as of April 2026.

FactorS3 FilesAmazon EFSFSx for LustreEBS (gp3)
Backing storeS3 bucketEFS managedFSx managedBlock storage
Shared accessYes (NFS)Yes (NFS)Yes (Lustre)No (single attach)
Storage cost driverS3 + active cachePer-GB storedProvisioned capacityProvisioned capacity
Cold data costS3 Standard ratesEFS IA ratesFull Lustre rateFull gp3 rate
Best forMixed S3 + NFS workloadsGeneral shared FSHPC, ML trainingSingle-instance databases
Latency (small files)Sub-msSub-msSub-msSub-ms
Max throughputTB/s (reads)20+ GB/sHundreds of GB/s1 GB/s
Data accessible via S3 APIYesNoOptional (linked)No

S3 Files wins on cost when you have large datasets where only a fraction is actively used. EFS wins when all data is hot and you need consistent NFS access without S3 integration. FSx for Lustre remains the choice for pure HPC workloads that need the highest possible parallel throughput. EBS is included for completeness but is not a shared file system.

Quotas and Limitations

Every managed service has boundaries. These are the ones that matter for capacity planning and architecture decisions.

QuotaValue
Max file size48 TiB
Max file systems per account1,000
Max connections per file system25,000
Max mount targets per AZ1
Max directory depth1,000 levels
Max S3 key length1,024 bytes
Max locks per file512
VPCs per file system1

The 1,024-byte S3 key length limit means deeply nested directory structures with long filenames can hit the ceiling. If your application generates paths like /data/project/2026/04/08/experiment-abc-123/results/output-final-v2.json, measure the total key length before committing.

Unsupported Features

S3 Files does not support hard links. Symbolic links work fine. S3 Glacier storage classes are not compatible because the file system needs to read data on demand, and Glacier retrieval times would break NFS expectations. S3 ACLs are not preserved after file modifications through the mount; IAM policies and POSIX permissions govern access instead.

Custom S3 metadata set via the S3 API is not visible through the NFS mount. If your application relies on S3 object metadata (x-amz-meta-* headers), those values will not appear as extended attributes on the mounted file. The NFS mount exposes standard POSIX attributes only.

On the protocol side, pNFS (parallel NFS), Kerberos authentication, NFSv4 data retention, and the nconnect mount option are all unsupported at GA launch.

S3 Files vs Alternatives

Before S3 Files, mounting an S3 bucket meant choosing between several imperfect options. This comparison shows where S3 Files fits relative to the existing tools.

FeatureS3 Filess3fs-fuseMountpoint for S3Storage GatewayFSx for Lustre
ProtocolNFS 4.2FUSEFUSENFS/SMBLustre
POSIX complianceFullPartialLimited (no renames)FullFull
Write supportYesYesAppend-onlyYesYes
CachingAutomatic (EFS-backed)NoneMetadata onlyLocal diskNone (direct)
Latency (small files)Sub-ms10-100 ms1-10 ms1-10 msSub-ms
Data locationS3 bucketS3 bucketS3 bucketS3 bucketFSx storage
Max throughputTB/s~100 MB/sGB/sLimited by gatewayTB/s
Managed serviceYesNo (self-hosted)YesYesYes
Cost modelStorage + accessFree (+ S3 costs)Free (+ S3 costs)Instance + S3Storage + throughput

The biggest differentiator is POSIX compliance with performance. s3fs-fuse gives you POSIX-ish semantics but at 10-100 ms latency and limited throughput. Mountpoint for S3 is fast but restricts writes to append-only, which rules out most real applications. Storage Gateway works but requires managing a gateway instance. FSx for Lustre delivers the raw performance but stores data in its own managed volumes, not directly in your S3 bucket.

S3 Files is the first option that combines full POSIX compliance, sub-millisecond latency, high throughput, and native S3 bucket integration in a fully managed service.

Security

S3 Files enforces encryption at every layer. All data in transit between compute resources and mount targets is encrypted with TLS. This is automatic and mandatory; there is no option to mount without TLS. At rest, the high-performance storage layer inherits the bucket’s server-side encryption configuration (SSE-S3 or SSE-KMS with customer-managed keys).

Authentication is IAM-based. When an EC2 instance mounts the file system, the mount helper automatically retrieves temporary credentials from the instance profile. There is no username/password mechanism and no Kerberos. The IAM policy on the compute resource role controls which file systems the instance can mount and what operations it can perform.

Network isolation uses the same model as EFS. Mount targets live inside your VPC, and security groups control which IP ranges can connect on port 2049. Cross-VPC access requires VPC peering or Transit Gateway, same as any other VPC-internal resource.

Access points provide application-level isolation. Each access point enforces a specific POSIX user ID, group ID, and root directory. A containerized application connecting through an access point sees only its designated directory subtree, with a fixed POSIX identity regardless of the connecting user’s actual UID. This is particularly useful in multi-tenant EKS clusters where different namespaces need isolated file access to the same underlying bucket.

Monitoring

S3 Files publishes metrics to Amazon CloudWatch. The key metrics to watch are storage utilization in the high-performance layer, sync errors between the cache and S3, active connections, and I/O throughput. Sync errors deserve special attention: if a write cannot sync to S3 (due to bucket policy changes, KMS key issues, or S3 service disruptions), data exists only in the cache until the sync succeeds.

The aws s3files CLI provides file system-level metrics directly from the terminal. Use aws s3files describe-file-system to check storage consumption, connection count, and sync status without navigating the CloudWatch console.

AWS CloudTrail logs all S3 Files API calls: file system creation, deletion, mount target changes, and access point modifications. This integrates with existing audit pipelines. Note that individual NFS file operations (open, read, write) are not logged in CloudTrail; those appear in CloudWatch metrics as aggregate IOPS and throughput counters.

Best Practices for Production

Access points should be your default for application mounts. Instead of mounting the root of the file system and relying on directory permissions, create an access point per application with an enforced POSIX identity (UID/GID) and root directory path. This contains blast radius: a misconfigured application cannot accidentally traverse or modify another application’s data. In EKS, map each namespace to its own access point.

Cache expiration tuning pays off quickly. The default 30-day expiration is a reasonable starting point, but workloads vary wildly. A CI/CD pipeline that builds artifacts, uploads them, and never touches them again should use a 1-day expiration. A machine learning training job that revisits the same dataset for weeks should use 60 days or more. Monitor the CloudWatch cache hit ratio to find the sweet spot where you are not paying for stale cache but also not constantly re-fetching from S3.

Place mount targets in every Availability Zone where your compute resources run. Cross-AZ NFS traffic adds latency and incurs data transfer charges. If your EKS nodes span three AZs, create mount targets in all three. The cost of the mount target is negligible compared to the cross-AZ data transfer you avoid.

The ~60-second write sync delay is the most important operational detail to internalize. If your application writes a file via the NFS mount and a separate process immediately tries to read it via the S3 API, it will get a 404 or stale data. Design your pipelines so that the S3 API consumer either polls for the object or receives a notification (S3 Event Notification fires when the sync completes). Within the NFS mount itself, writes are immediately visible to all clients, so this only affects cross-protocol access patterns.

For /etc/fstab entries, always include the _netdev and nofail mount options. The _netdev flag tells the OS to wait for network availability before attempting the mount. The nofail flag prevents a boot failure if the file system is temporarily unreachable. Without these flags, an unreachable mount target can hang the instance during boot indefinitely.

After bulk uploads via the S3 API (migrating an existing dataset into the bucket, for example), monitor CloudWatch for sync errors. Large batches of new objects may take time to become fully accessible through the NFS mount, especially if the objects are small and numerous. The import throughput is 700 MB/s, so a 1 TB dataset with millions of small files will take longer than the raw throughput suggests due to per-object overhead.

Security groups on mount targets should follow the principle of least privilege. Allow TCP 2049 only from the specific security groups or CIDR ranges that need to mount the file system. A common mistake is opening 2049 to the entire VPC CIDR, which allows any instance in the VPC to mount the file system regardless of its role. Restrict it to the security groups attached to your application instances.

Where S3 Files Fits

S3 Files fills a gap that has existed in the AWS storage lineup since S3 launched in 2006. EFS gives you a shared POSIX file system but no direct S3 integration. S3 gives you object storage with unmatched durability and analytics ecosystem support but no file system semantics. FSx for Lustre bridges the two for HPC workloads but is overkill for general-purpose use.

S3 Files is the answer for teams that want one storage location (an S3 bucket) accessible through both protocols without maintaining gateway instances, FUSE mounts, or sync scripts. The typical candidates are data engineering pipelines that produce files for both compute and analytics, web applications with shared media storage, and machine learning workflows where training data lives in S3 but the training framework expects a POSIX path.

If your workload is pure object storage with no file system access needed, stick with plain S3. If you need a shared file system with no S3 integration, Amazon EFS is simpler and more cost-effective. If you need maximum parallel throughput for HPC, FSx for Lustre is still the right tool. S3 Files shines when you genuinely need both protocols on the same data.

Related Articles

Storage How To Install Seafile on CentOS 8 / Rocky / AlmaLinux 8 Cloud Install AWS CloudFormation Helper Scripts on Linux Storage Backup and Restore MBR and GPT Partition Tables on Linux Virtualization Customize Linux Qcow2 or Raw image with virt-customize

Leave a Comment

Press ESC to close