- Oct 2024
-
elixir.bootlin.com elixir.bootlin.com
-
static bool kcompactd_node_suitable(pg_data_t *pgdat) { int zoneid; struct zone *zone; enum zone_type highest_zoneidx = pgdat->kcompactd_highest_zoneidx; for (zoneid = 0; zoneid <= highest_zoneidx; zoneid++) { zone = &pgdat->node_zones[zoneid]; if (!populated_zone(zone)) continue; /* Allocation can already succeed, check other zones */ if (zone_watermark_ok(zone, pgdat->kcompactd_max_order, min_wmark_pages(zone), highest_zoneidx, 0)) continue; if (compaction_suitable(zone, pgdat->kcompactd_max_order, highest_zoneidx)) return true; } return false; }
This function determines whether a memory node is suitable for compaction by the kcompactd daemon. It iterates through memory zones up to a configured highest index, checking if compaction is necessary and beneficial. The policy skips unpopulated zones and those where allocation can already succeed without compaction.
-
static int compaction_proactiveness_sysctl_handler(struct ctl_table *table, int write, void *buffer, size_t *length, loff_t *ppos) { int rc, nid; rc = proc_dointvec_minmax(table, write, buffer, length, ppos); if (rc) return rc; if (write && sysctl_compaction_proactiveness) { for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); if (pgdat->proactive_compact_trigger) continue; pgdat->proactive_compact_trigger = true; trace_mm_compaction_wakeup_kcompactd(pgdat->node_id, -1, pgdat->nr_zones - 1); wake_up_interruptible(&pgdat->kcompactd_wait); } } return 0; }
This function implements configuration and policy logic for proactive memory compaction. It supports the sysctl handler, and provides runtime configuration of compaction proactiveness. When the proactiveness setting is enabled, the function applies a policy to trigger proactive compaction across all online memory nodes. It does this by setting a trigger flag and waking up the kcompactd daemon for each node that hasn't already been triggered.
-
if (suitable) {
This policy is based off of the allocation order, zone characteristics, and memory fragmentation level to decide whether compaction should happen. For high-order allocations, it uses a fragmentation index and a configurable threshold to decide if compaction would be beneficial. The policy aims to balance the need for contiguous memory with system performance, avoiding unnecessary compaction that could impact system stability.
-
watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ? low_wmark_pages(zone) : min_wmark_pages(zone); watermark += compact_gap(order); return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx, ALLOC_CMA, wmark_target);
The __compaction_suitable function determines if memory compaction is viable for a given zone based on its current watermark levels and the requested allocation order. It calculates a watermark threshold, which varies depending on the allocation size, and includes a compaction gap for efficiency. The function then checks if the zone meets this watermark using __zone_watermark_ok, considering factors like CMA pages and zone indices.
-
static enum compact_result __compact_finished(struct compact_control *cc) { unsigned int order; const int migratetype = cc->migratetype; int ret; /* Compaction run completes if the migrate and free scanner meet */ if (compact_scanners_met(cc)) { /* Let the next compaction start anew. */ reset_cached_positions(cc->zone); /* * Mark that the PG_migrate_skip information should be cleared * by kswapd when it goes to sleep. kcompactd does not set the * flag itself as the decision to be clear should be directly * based on an allocation request. */ if (cc->direct_compaction) cc->zone->compact_blockskip_flush = true; if (cc->whole_zone) return COMPACT_COMPLETE; else return COMPACT_PARTIAL_SKIPPED; } if (cc->proactive_compaction) { int score, wmark_low; pg_data_t *pgdat; pgdat = cc->zone->zone_pgdat; if (kswapd_is_running(pgdat)) return COMPACT_PARTIAL_SKIPPED; score = fragmentation_score_zone(cc->zone); wmark_low = fragmentation_score_wmark(true); if (score > wmark_low) ret = COMPACT_CONTINUE; else ret = COMPACT_SUCCESS; goto out; } if (is_via_compact_memory(cc->order)) return COMPACT_CONTINUE; /* * Always finish scanning a pageblock to reduce the possibility of * fallbacks in the future. This is particularly important when * migration source is unmovable/reclaimable but it's not worth * special casing. */ if (!pageblock_aligned(cc->migrate_pfn)) return COMPACT_CONTINUE; /* Direct compactor: Is a suitable page free? */ ret = COMPACT_NO_SUITABLE_PAGE; for (order = cc->order; order < NR_PAGE_ORDERS; order++) { struct free_area *area = &cc->zone->free_area[order]; bool can_steal; /* Job done if page is free of the right migratetype */ if (!free_area_empty(area, migratetype)) return COMPACT_SUCCESS;
This function has policy logic for when memory compaction should complete, considering whether migrate and free scanners have met, proactive compactions status, fragmentation scores, and availability of free pages. The function returns different COMPACT_* status codes based on these criteria, effectively deciding whether to continue compaction, consider it successful, or terminate the process. This policy balances thorough memory reorganization with efficient use of system resources during compaction.
-
#ifdef CONFIG_COMPACTION static bool suitable_migration_source(struct compact_control *cc, struct page *page) { int block_mt; if (pageblock_skip_persistent(page)) return false; if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction) return true; block_mt = get_pageblock_migratetype(page); if (cc->migratetype == MIGRATE_MOVABLE) return is_migrate_movable(block_mt); else return block_mt == cc->migratetype; }
This code snippet from the Linux kernel's memory compaction system implements policy and configuration logic for determining suitable migration sources during compaction. It's conditionally compiled based on the CONFIG_COMPACTION option, demonstrating configuration-dependent behavior. The suitable_migration_source function encapsulates policy decisions by considering factors such as persistent skip flags, compaction mode, and migration types. It applies different criteria for async direct compaction versus other modes, and implements specific rules for matching migration types, with special handling for movable pages.
-
/* Similar to reclaim, but different enough that they don't share logic */ static bool too_many_isolated(struct compact_control *cc) { pg_data_t *pgdat = cc->zone->zone_pgdat; bool too_many; unsigned long active, inactive, isolated; inactive = node_page_state(pgdat, NR_INACTIVE_FILE) + node_page_state(pgdat, NR_INACTIVE_ANON); active = node_page_state(pgdat, NR_ACTIVE_FILE) + node_page_state(pgdat, NR_ACTIVE_ANON); isolated = node_page_state(pgdat, NR_ISOLATED_FILE) + node_page_state(pgdat, NR_ISOLATED_ANON); /* * Allow GFP_NOFS to isolate past the limit set for regular * compaction runs. This prevents an ABBA deadlock when other * compactors have already isolated to the limit, but are * blocked on filesystem locks held by the GFP_NOFS thread. */ if (cc->gfp_mask & __GFP_FS) { inactive >>= 3; active >>= 3; } too_many = isolated > (inactive + active) / 2; if (!too_many) wake_throttle_isolated(pgdat); return too_many; }
This policy determines when too many pages have been isolated during kernel compaction and calculates a threshold based on the number of active and inactive pages relative to isolated pages. It handles GFP_NOFS allocations specifically differently so as to prevent deadlocks.
-
/* * Compound pages of >= pageblock_order should consistently be skipped until * released. It is always pointless to compact pages of such order (if they are * migratable), and the pageblocks they occupy cannot contain any free pages. */ static bool pageblock_skip_persistent(struct page *page) { if (!PageCompound(page)) return false; page = compound_head(page); if (compound_order(page) >= pageblock_order) return true; return false; }
This policy optimizes memory compaction by deferring compaction on large compound pages. It determines if they are of a size greater than or equal to pageblock_order and skips them if so. For non-compound pages, it returns false, meaning it shouldn't be deferred.
-
/* Returns true if compaction should be skipped this time */ static bool compaction_deferred(struct zone *zone, int order) { unsigned long defer_limit = 1UL << zone->compact_defer_shift; if (order < zone->compact_order_failed) return false; /* Avoid possible overflow */ if (++zone->compact_considered >= defer_limit) { zone->compact_considered = defer_limit; return false; } trace_mm_compaction_deferred(zone, order); return true; }
This policy is for deferring memory compaction in Linux. It defers compaction if the requested order is > than the previously failed order and there hasn't been more than some limit of comparisons recently, so that compaction attempts will stop if repeatedly unsuccessful.
-
if (!sysctl_compaction_proactiveness) timeout = MAX_SCHEDULE_TIMEOUT;
If proactive compaction is disabled (sysctl_compaction_proactiveness is 0), the daemon sets its timeout to MAX_SCHEDULE_TIMEOUT, effectively sleeping until explicitly woken up. If proactive compaction is enabled, it uses the default timeout to periodically wake up and check for work.
-
-
elixir.bootlin.com elixir.bootlin.com
-
static unsigned long count_shadow_nodes(struct shrinker *shrinker, struct shrink_control *sc)
The policy in this function governs the management of shadow nodes in the Linux kernel's page cache. It dynamically calculates a maximum allowable number of shadow nodes based on the available memory, aiming to maintain an optimal balance between memory usage and the ability to detect page refaults. The policy sets this limit at approximately 1/8th of the total possible pages, adapting to different memory configurations and pressure scenarios. When the current number of shadow nodes exceeds this calculated maximum, the policy determines how many nodes should be reclaimed.
-
f (lru_gen_enabled()) return lru_gen_eviction(folio);
When enabled via lru_gen_enabled(), it alters how the kernel tracks page access patterns and manages evictions. This policy introduces the concept of generation-based tracking, where pages are grouped into generations based on their access time. It uses functions like lru_gen_eviction() and lru_gen_refault() to handle page evictions and refaults respectively. This approach allows for more nuanced decisions in page reclaim, potentially improving memory utilization and reducing I/O operations by better identifying truly cold pages and protecting hot pages from premature eviction.
-