- Oct 2024
-
elixir.bootlin.com elixir.bootlin.com
-
if (si->cluster_info) { if (!scan_swap_map_try_ssd_cluster(si, &offset, &scan_base)) goto scan; } else if (unlikely(!si->cluster_nr--)) {
Algorithmic policy decision: If SSD, use SSD wear-leveling friendly algorithm. Otherwise, use HDD algo which minimizes seek times. Potentially, certain access patterns may make one of these algorithms less effective (i.e. in the case of wear leveling, the swap is constantly full)
-
scan_base = offset = si->lowest_bit; last_in_cluster = offset + SWAPFILE_CLUSTER - 1; /* Locate the first empty (unaligned) cluster */ for (; last_in_cluster <= si->highest_bit; offset++) { if (si->swap_map[offset]) last_in_cluster = offset + SWAPFILE_CLUSTER; else if (offset == last_in_cluster) { spin_lock(&si->lock); offset -= SWAPFILE_CLUSTER - 1; si->cluster_next = offset; si->cluster_nr = SWAPFILE_CLUSTER - 1; goto checks; } if (unlikely(--latency_ration < 0)) { cond_resched(); latency_ration = LATENCY_LIMIT; } }
Here, (when using HDDs), a policy is implemented that places the swapped page in the first available slot. This is supposed to reduce seek time in spinning drives, as it encourages having swap entries near each other.
-
/* * Even if there's no free clusters available (fragmented), * try to scan a little more quickly with lock held unless we * have scanned too many slots already. */ if (!scanned_many) { unsigned long scan_limit; if (offset < scan_base) scan_limit = scan_base; else scan_limit = si->highest_bit; for (; offset <= scan_limit && --latency_ration > 0; offset++) { if (!si->swap_map[offset]) goto checks; } }
Here we have a configuration policy where we do another smaller scan as long as we haven't exhausted our latency_ration. Another alternative could be yielding early in anticipation that we aren't going to find a free slot.
-
/* * select a random position to start with to help wear leveling * SSD */ for_each_possible_cpu(cpu) { per_cpu(*p->cluster_next_cpu, cpu) = get_random_u32_inclusive(1, p->highest_bit); }
An algorithmic (random) policy is used here to spread swap pages around an SSD to help with wear leveling instead of writing to the same area of an SSD often.
-
cluster_list_add_tail(&si->discard_clusters, si->cluster_info, idx);
A policy decision is made to add a cluster to the discard list on a first-come, first-served basis. However, this approach could be enhanced by prioritizing certain clusters higher on the list based on their 'importance.' This 'importance' could be defined by how closely a cluster is related to other pages in the swap. By doing so, the system can reduce seek time as mentioned on line 817.
-
if (unlikely(--latency_ration < 0)) { cond_resched(); latency_ration = LATENCY_LIMIT; scanned_many = true; } if (swap_offset_available_and_locked(si, offset)) goto checks; } offset = si->lowest_bit; while (offset < scan_base) { if (unlikely(--latency_ration < 0)) { cond_resched(); latency_ration = LATENCY_LIMIT; scanned_many = true; }
Here, a policy decision is made to fully replenish the latency_ration with the LATENCY_LIMIT and then yield back to the scheduler if we've exhausted it. This makes it so that when scheduled again, we have the full LATENCY_LIMIT to do a scan. Alternative policies could grow/shrink this to find a better heuristic instead of fully replenishing each time.
Marked config/value as awe're replacing latency_ration with a compiletime-defined limit.
-
while (scan_swap_map_ssd_cluster_conflict(si, offset)) { /* take a break if we already got some slots */ if (n_ret) goto done; if (!scan_swap_map_try_ssd_cluster(si, &offset, &scan_base)) goto scan;
Here, a policy decision is made to stop scanning if some slots were already found. Other policy decisions could be made to keep scanning or take into account how long the scan took or how many pages were found.
-
else if (!cluster_list_empty(&si->discard_clusters)) { /* * we don't have free cluster but have some clusters in * discarding, do discard now and reclaim them, then * reread cluster_next_cpu since we dropped si->lock */ swap_do_scheduled_discard(si); *scan_base = this_cpu_read(*si->cluster_next_cpu); *offset = *scan_base; goto new_cluster;
This algorithmic policy discards + reclaims pages as-needed whenever there is no free cluster. Other policies could be explored that do this preemptively in order to avoid the cost of doing it here.
-
#ifdef CONFIG_THP_SWAP
This is a build-time flag that configures how hugepages are handled when swapped. When defined, it swaps them in one piece, while without it splits them into smaller units and swaps those units.
-
if (swap_flags & SWAP_FLAG_DISCARD_ONCE) p->flags &= ~SWP_PAGE_DISCARD; else if (swap_flags & SWAP_FLAG_DISCARD_PAGES) p->flags &= ~SWP_AREA_DISCARD
This is a configuration policy decision where a sysadmin can pass flags to sys_swapon() to control the behavior discards are handled. If DISCARD_ONCE is set, a flag which "discard[s] swap area at swapon-time" is unset, and if DISCARD_PAGES is set, a flag which "discard[s] page-clusters after use" is unset.
-
-
elixir.bootlin.com elixir.bootlin.com
-
size = count_history_pages(mapping, index, max);
This uses the size of contigiously cached pages in cache to get a "conservative" estimation of the "length of a sequential read sequence" or the "thrashing threshold in memory tight systems". LDOS could use different policies to predict the optimal read ahead size.
-
unsigned long this_chunk = (2 * 1024 * 1024) / PAGE_SIZE
This chunks the read-ahead into 2 megabyte units to avoid pinning too much memory at once. LDOS could replace this with a dynamically-sized chunk to better optimize for other use cases.
-