mirror of
https://github.com/torvalds/linux.git
synced 2026-01-25 15:03:52 +08:00
8175ebfd302abe6fbdca9037f763ecbfdb8db572
1296430 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
8175ebfd30 |
mm: count the number of partially mapped anonymous THPs per size
When a THP is added to the deferred_list due to partially mapped, its partial pages are unused, leading to wasted memory and potentially increasing memory reclamation pressure. Detailing the specifics of how unmapping occurs is quite difficult and not that useful, so we adopt a simple approach: each time a THP enters the deferred_list, we increment the count by 1; whenever it leaves for any reason, we decrement the count by 1. Link: https://lkml.kernel.org/r/20240824010441.21308-3-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: Chuanhua Han <hanchuanhua@oppo.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuai Yuan <yuanshuai@oppo.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
5d65c8d758 |
mm: count the number of anonymous THPs per size
Patch series "mm: count the number of anonymous THPs per size", v4.
Knowing the number of transparent anon THPs in the system is crucial
for performance analysis. It helps in understanding the ratio and
distribution of THPs versus small folios throughout the system.
Additionally, partial unmapping by userspace can lead to significant waste
of THPs over time and increase memory reclamation pressure. We need this
information for comprehensive system tuning.
This patch (of 2):
Let's track for each anonymous THP size, how many of them are currently
allocated. We'll track the complete lifespan of an anon THP, starting
when it becomes an anon THP ("large anon folio") (->mapping gets set),
until it gets freed (->mapping gets cleared).
Introduce a new "nr_anon" counter per THP size and adjust the
corresponding counter in the following cases:
* We allocate a new THP and call folio_add_new_anon_rmap() to map
it the first time and turn it into an anon THP.
* We split an anon THP into multiple smaller ones.
* We migrate an anon THP, when we prepare the destination.
* We free an anon THP back to the buddy.
Note that AnonPages in /proc/meminfo currently tracks the total number of
*mapped* anonymous *pages*, and therefore has slightly different
semantics. In the future, we might also want to track "nr_anon_mapped"
for each THP size, which might be helpful when comparing it to the number
of allocated anon THPs (long-term pinning, stuck in swapcache, memory
leaks, ...).
Further note that for now, we only track anon THPs after they got their
->mapping set, for example via folio_add_new_anon_rmap(). If we would
allocate some in the swapcache, they will only show up in the statistics
for now after they have been mapped to user space the first time, where we
call folio_add_new_anon_rmap().
[akpm@linux-foundation.org: documentation fixups, per David]
Link: https://lkml.kernel.org/r/3e8add35-e26b-443b-8a04-1078f4bc78f6@redhat.com
Link: https://lkml.kernel.org/r/20240824010441.21308-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240824010441.21308-2-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Chuanhua Han <hanchuanhua@oppo.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuai Yuan <yuanshuai@oppo.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||
|
|
70e59a7528 |
mm: tidy up shmem mTHP controls and stats
Previously we had a situation where shmem mTHP controls and stats were not exposed for some supported sizes and were exposed for some unsupported sizes. So let's clean that up. Anon mTHP can support all large orders [2, PMD_ORDER]. But shmem can support all large orders [1, MAX_PAGECACHE_ORDER]. However, per-size shmem controls and stats were previously being exposed for all the anon mTHP orders, meaning order-1 was not present, and for arm64 64K base pages, orders 12 and 13 were exposed but were not supported internally. Tidy this all up by defining ctrl and stats attribute groups for anon and file separately. Anon ctrl and stats groups are populated for all orders in THP_ORDERS_ALL_ANON and file ctrl and stats groups are populated for all orders in THP_ORDERS_ALL_FILE_DEFAULT. Additionally, create "any" ctrl and stats attribute groups which are populated for all orders in (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_FILE_DEFAULT). swpout stats use this since they apply to anon and shmem. The side-effect of all this is that different hugepage-*kB directories contain different sets of controls and stats, depending on which memory types support that size. This approach is preferred over the alternative, which is to populate dummy controls and stats for memory types that do not support a given size. [ryan.roberts@arm.com: file pages and shmem can also be split] Link: https://lkml.kernel.org/r/f7ced14c-8bc5-405f-bee7-94f63980f525@arm.comLink: https://lkml.kernel.org/r/20240808111849.651867-3-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: Barry Song <baohua@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
246d3aa3e5 |
mm: cleanup count_mthp_stat() definition
Patch series "Shmem mTHP controls and stats improvements", v3. This is a small series to tidy up the way the shmem controls and stats are exposed. These patches were previously part of the series at [2], but I decided to split them out since they can go in independently. This patch (of 2): Let's move count_mthp_stat() so that it's always defined, even when THP is disabled. Previously uses of the function in files such as shmem.c, which are compiled even when THP is disabled, required ugly THP ifdeferry. With this cleanup, we can remove those ifdefs and the function resolves to a nop when THP is disabled. I shortly plan to call count_mthp_stat() from more THP-invariant source files. Link: https://lkml.kernel.org/r/20240808111849.651867-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20240808111849.651867-2-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Barry Song <baohua@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lance Yang <ioworker0@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
6f1833b820 |
mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation
Use the isolate_folio_to_list() to unify hugetlb/LRU/non-LRU folio isolation, which cleanup code a bit and save a few calls to compound_head(). [wangkefeng.wang@huawei.com: various fixes] Link: https://lkml.kernel.org/r/20240829150500.2599549-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240827114728.3212578-6-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
f1264e9531 |
mm: migrate: add isolate_folio_to_list()
Add isolate_folio_to_list() helper to try to isolate HugeTLB, no-LRU movable and LRU folios to a list, which will be reused by do_migrate_range() from memory hotplug soon, also drop the mf_isolate_folio() since we could directly use new helper in the soft_offline_in_use_page(). Link: https://lkml.kernel.org/r/20240827114728.3212578-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Tested-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
e8a796fa1c |
mm: memory_hotplug: check hwpoisoned page firstly in do_migrate_range()
Commit |
||
|
|
16038c4fff |
mm: memory-failure: add unmap_poisoned_folio()
Add unmap_poisoned_folio() helper which will be reused by do_migrate_range() from memory hotplug soon. [akpm@linux-foundation.org: whitespace tweak, per Miaohe Lin] Link: https://lkml.kernel.org/r/1f80c7e3-c30d-1ac1-6a36-d1a5f5907f7c@huawei.com Link: https://lkml.kernel.org/r/20240827114728.3212578-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
b62b51d2d1 |
mm: memory_hotplug: remove head variable in do_migrate_range()
Patch series "mm: memory_hotplug: improve do_migrate_range()", v3. Unify hwpoisoned page handling and isolation of HugeTLB/LRU/non-LRU movable page, also convert to use folios in do_migrate_range(). This patch (of 5): Directly use a folio for HugeTLB and THP when calculate the next pfn, then remove unused head variable. Link: https://lkml.kernel.org/r/20240827114728.3212578-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240827114728.3212578-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
f66ac836d4 |
mm/damon/tests: add .kunitconfig file for DAMON kunit tests
'--kunitconfig' option of 'kunit.py run' supports '.kunitconfig' file name convention. Add the file for DAMON kunit tests for more convenient kunit run. Link: https://lkml.kernel.org/r/20240827030336.7930-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
9bfbaa5e44 |
mm/damon: move kunit tests to tests/ subdirectory with _kunit suffix
There was a discussion about better places for kunit test code[1] and test file name suffix[2]. Folowwing the conclusion, move kunit tests for DAMON to mm/damon/tests/ subdirectory and rename those. [1] https://lore.kernel.org/CABVgOS=pUdWb6NDHszuwb1HYws4a1-b1UmN=i8U_ED7HbDT0mg@mail.gmail.com [2] https://lore.kernel.org/CABVgOSmKwPq7JEpHfS6sbOwsR0B-DBDk_JP-ZD9s9ZizvpUjbQ@mail.gmail.com Link: https://lkml.kernel.org/r/20240827030336.7930-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
61879eed1f |
mm/damon/dbgfs-test: skip dbgfs_set_init_regions() test if PADDR is not registered
The test depends on registration of DAMON_OPS_PADDR. It would be
registered only when CONFIG_DAMON_PADDR is set. DAMON core kunit tests do
fake ops registration for such case. However, the functions for such fake
ops registration is not available to DAMON debugfs interface. Just skip
the test in the case.
Link: https://lkml.kernel.org/r/20240827030336.7930-8-sj@kernel.org
Fixes:
|
||
|
|
8e34bac5a2 |
mm/damon/dbgfs-test: skip dbgfs_set_targets() test if PADDR is not registered
The test depends on registration of DAMON_OPS_PADDR. It would be
registered only when CONFIG_DAMON_PADDR is set. DAMON core kunit tests do
fake ops registration for such case. However, the functions for such fake
ops registration is not available to DAMON debugfs interface. Just skip
the test in the case.
Link: https://lkml.kernel.org/r/20240827030336.7930-7-sj@kernel.org
Fixes:
|
||
|
|
e43772dcdf |
mm/damon/core-test: fix damon_test_ops_registration() for DAMON_VADDR unset case
DAMON core kunit test can be executed without CONFIG_DAMON_VADDR. In the
case, vaddr DAMON ops is not registered. Meanwhile, ops registration
kunit test assumes the vaddr ops is registered. Check and handle the case
by registrering fake vaddr ops inside the test code.
Link: https://lkml.kernel.org/r/20240827030336.7930-6-sj@kernel.org
Fixes:
|
||
|
|
9fcce7e7be |
mm/damon/core-test: test only vaddr case on ops registration test
DAMON ops registration kunit test tests both vaddr and paddr use cases in parts of the whole test cases. Basically testing only one ops use case is enough. Do the test with only vaddr use case. Link: https://lkml.kernel.org/r/20240827030336.7930-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
8c211412c5 |
selftests/damon: add execute permissions to test scripts
Some test scripts are missing executable permissions. It causes warnings that make the test output unnecessarily verbose. Add executable permissions. Link: https://lkml.kernel.org/r/20240827030336.7930-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
582c04b07f |
selftests/damon: cleanup __pycache__/ with 'make clean'
Python-based tests creates __pycache__/ directory. Remove it with 'make
clean' by defining it as EXTRA_CLEAN.
Link: https://lkml.kernel.org/r/20240827030336.7930-3-sj@kernel.org
Fixes:
|
||
|
|
9cb75552f4 |
selftests/damon: add access_memory_even to .gitignore
Patch series "misc fixups for DAMON {self,kunit} tests".
This patchset is for minor fixups of DAMON selftests and kunit tests.
First three patches make DAMON selftests more cleanly maintained (patches
1 and 2) without unnecessary warnings (patch 3). Following six patches
remove unnecessary test case (patch 4), handle configs combinations that
can make tests fail (patches 5-7), reorganize the test files following the
new guideline (patch 8), and add reference kunitconfig for DAMON kunit
tests (patch 9).
This patch (of 9):
DAMON selftests build access_memory_even, but its not on the .gitignore
list. Add it to make 'git status' output cleaner.
Link: https://lkml.kernel.org/r/20240827030336.7930-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20240827030336.7930-2-sj@kernel.org
Fixes:
|
||
|
|
f22cde4371 |
sched/numa: Fix the vma scan starving issue
Problem statement: Since commit |
||
|
|
073c78edf5 |
memory tier: fix deadlock warning while onlining pages
commit |
||
|
|
7de8728f55 |
mm: vmalloc: refactor vm_area_alloc_pages() function
The aim is to simplify and making the vm_area_alloc_pages() function less confusing as it became more clogged nowadays: - eliminate a "bulk_gfp" variable and do not overwrite a gfp flag for bulk allocator; - drop __GFP_NOFAIL flag for high-order-page requests on upper layer. It becomes less spread between levels when it comes to __GFP_NOFAIL allocations; - add a comment about a fallback path if high-order attempt is unsuccessful because for such cases __GFP_NOFAIL is dropped; - fix a typo in a commit message. Link: https://lkml.kernel.org/r/20240827190916.34242-1-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
01c373e9a5 |
mm: rework vm_ops->close() handling on VMA merge
In commit |
||
|
|
cc8cb3697a |
mm: refactor vma_merge() into modify-only vma_merge_existing_range()
The existing vma_merge() function is no longer required to handle what were previously referred to as cases 1-3 (i.e. the merging of a new VMA), as this is now handled by vma_merge_new_vma(). Additionally, simplify the convoluted control flow of the original, maintaining identical logic only expressed more clearly and doing away with a complicated set of cases, rather logically examining each possible outcome - merging of both the previous and subsequent VMA, merging of the previous VMA and merging of the subsequent VMA alone. We now utilise the previously implemented commit_merge() function to share logic with vma_expand() de-duplicating code and providing less surface area for bugs and confusion. In order to do so, we adjust this function to accept parameters specific to merging existing ranges. Link: https://lkml.kernel.org/r/2cf6016b7bfcc4965fc3cde10827560c42e4f12c.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
65e0aa64df |
mm: introduce commit_merge(), abstracting final commit of merge
Pull the part of vma_expand() which actually commits the merge operation, that is inserts it into the maple tree and sets the VMA's vma->vm_start and vma->vm_end parameters, into its own function. We implement only the parts needed for vma_expand() which now as a result of previous work is also the means by which new VMA ranges are merged. The next commit in the series will implement merging of existing ranges which will extend commit_merge() to accommodate this case and result in all merges using this common code. Link: https://lkml.kernel.org/r/7b985a20dfa549e3c370cd274d732b64c44f6dbd.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
25d3925fa5 |
mm: make vma_prepare() and friends static and internal to vma.c
Now we have abstracted merge behaviour for new VMA ranges, we are able to render vma_prepare(), init_vma_prep(), vma_complete(), can_vma_merge_before() and can_vma_merge_after() static and internal to vma.c. These are internal implementation details of kernel VMA manipulation and merging mechanisms and thus should not be exposed. This also renders the functions userland testable. Link: https://lkml.kernel.org/r/7f7f1c34ce10405a6aab2714c505af3cf41b7851.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
cacded5e42 |
mm: avoid using vma_merge() for new VMAs
Abstract vma_merge_new_vma() to use vma_merge_struct and rename the resultant function vma_merge_new_range() to be clear what the purpose of this function is - a new VMA is desired in the specified range, and we wish to see if it is possible to 'merge' surrounding VMAs into this range rather than having to allocate a new VMA. Note that this function uses vma_extend() exclusively, so adopts its requirement that the iterator point at or before the gap. We add an assert to this effect. This is as opposed to vma_merge_existing_range(), which will be introduced in a subsequent commit, and provide the same functionality for cases in which we are modifying an existing VMA. In mmap_region() and do_brk_flags() we open code scenarios where we prefer to use vma_expand() rather than invoke a full vma_merge() operation. Abstract this logic and eliminate all of the open-coding, and also use the same logic for all cases where we add new VMAs to, rather than ultimately use vma_merge(), rather use vma_expand(). Doing so removes duplication and simplifies VMA merging in all such cases, laying the ground for us to eliminate the merging of new VMAs in vma_merge() altogether. Also add the ability for the vmg to track state, and able to report errors, allowing for us to differentiate a failed merge from an inability to allocate memory in callers. This makes it far easier to understand what is happening in these cases avoiding confusion, bugs and allowing for future optimisation. Also introduce vma_iter_next_rewind() to allow for retrieval of the next, and (optionally) the prev VMA, rewinding to the start of the previous gap. Introduce are_anon_vmas_compatible() to abstract individual VMA anon_vma comparison for the case of merging on both sides where the anon_vma of the VMA being merged maybe compatible with prev and next, but prev and next's anon_vma's may not be compatible with each other. Finally also introduce can_vma_merge_left() / can_vma_merge_right() to check adjacent VMA compatibility and that they are indeed adjacent. Link: https://lkml.kernel.org/r/49d37c0769b6b9dc03b27fe4d059173832556392.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Tested-by: Mark Brown <broonie@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
fc21959f74 |
mm: abstract vma_expand() to use vma_merge_struct
The purpose of the vmg is to thread merge state through functions and avoid egregious parameter lists. We expand this to vma_expand(), which is used for a number of merge cases. Accordingly, adjust its callers, mmap_region() and relocate_vma_down(), to use a vmg. An added purpose of this change is the ability in a future commit to perform all new VMA range merging using vma_expand(). Link: https://lkml.kernel.org/r/4bc8c9dbc9ca52452ef8e587b28fe555854ceb38.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
3e01310d29 |
mm: remove duplicated open-coded VMA policy check
Both can_vma_merge_before() and can_vma_merge_after() are invoked after checking for compatible VMA NUMA policy, we can simply move this to is_mergeable_vma() and abstract this altogether. In mmap_region() we set vmg->policy to NULL, so the policy comparisons checked in can_vma_merge_before() and can_vma_merge_after() are exactly equivalent to !vma_policy(vmg.next) and !vma_policy(vmg.prev). Equally, in do_brk_flags(), vmg->policy is NULL, so the can_vma_merge_after() is checking !vma_policy(vma), as we set vmg.prev to vma. In vma_merge(), we compare prev and next policies with vmg->policy before checking can_vma_merge_after() and can_vma_merge_before() respectively, which this patch causes to be checked in precisely the same way. This therefore maintains precisely the same logic as before, only now abstracted into is_mergeable_vma(). Link: https://lkml.kernel.org/r/0dbff286d9c4988333bc6f4ff3734cb95dd5410a.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
2f1c6611b0 |
mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()
Rather than passing around huge numbers of parameters to numerous helper
functions, abstract them into a single struct that we thread through the
operation, the vma_merge_struct ('vmg').
Adjust vma_merge() and vma_modify() to accept this parameter, as well as
predicate functions can_vma_merge_before(), can_vma_merge_after(), and the
vma_modify_...() helper functions.
Also introduce VMG_STATE() and VMG_VMA_STATE() helper macros to allow for
easy vmg declaration.
We additionally remove the requirement that vma_merge() is passed a VMA
object representing the candidate new VMA. Previously it used this to
obtain the mm_struct, file and anon_vma properties of the proposed range
(a rather confusing state of affairs), which are now provided by the vmg
directly.
We also remove the pgoff calculation previously performed vma_modify(),
and instead calculate this in VMG_VMA_STATE() via the vma_pgoff_offset()
helper.
Link: https://lkml.kernel.org/r/a955aad09d81329f6fbeb636b2dd10cde7b73dab.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||
|
|
955db39676 |
tools: add VMA merge tests
Add a variety of VMA merge unit tests to assert that the behaviour of VMA merge is correct at an abstract level and VMAs are merged or not merged as expected. These are intentionally added _before_ we start refactoring vma_merge() in order that we can continually assert correctness throughout the rest of the series. In order to reduce churn going forward, we backport the vma_merge_struct data type to the test code which we introduce and use in a future commit, and add wrappers around the merge new and existing VMA cases. Link: https://lkml.kernel.org/r/1c7a0b43cfad2c511a6b1b52f3507696478ff51a.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
4e52a60ac5 |
tools: improve vma test Makefile
Patch series "mm: remove vma_merge()", v3. The infamous vma_merge() function has been the cause of a great deal of pain, bugs and confusion for a very long time. It is subtle, contains many corner cases, tries to do far too much and is as a result very fragile. The fact that the function requires there to be a numbering system to cover each possible eventuality with references to each in the many branches of its implementation as to which case you are looking at speaks to all this. Some of this complexity is inherent - unfortunately there is no getting away from the need to figure out precisely how to execute the merge, whether we need to remove VMAs, whether it is safe to do so, what constitutes a mergeable VMA and so on. However, a lot of the complexity is not inherent but instead a product of the function's 'organic' development. Liam has gone to great lengths to improve the situation as a part of his maple tree implementation, greatly improving the readability of the code, and Vlastimil and myself have additionally gone to lengths to try to improve things further. However, with the availability of userland VMA testing, it now becomes possible to perform a rather more significant refactoring while maintaining confidence in its correct operation. An attempt was previously made by Vlastimil [0] to eliminate vma_merge(), however it was rather - brutal - and an astute reader might refer to the date of that patch for insight as to its intent. This series instead divides merge operations into two natural kinds - merges which occur when a NEW vma is being added to the address space, and merges which occur when a vma is being MODIFIED. Happily, the vma_expand() function introduced by Liam, which has the capacity for also deleting a subsequent VMA, covers each of the NEW vma cases. By abstracting the actual final commit of changes to a VMA to its own function, commit_merge() and writing a wrapper around vma_expand() for new VMA cases vma_merge_new_range(), we can avoid having to use vma_merge() for these instances altogether. By doing so we are also able to then de-duplicate all existing merge logic in mmap_region() and do_brk_flags() and have everything invoke this new function, so we universally take the same approach to merging new VMAs. Having done so, we can then completely rework vma_merge() into vma_merge_existing_range() and use this for the instances where a merge is proposed for a region of an existing VMA. This eliminates vma_merge() and its numbered cases and instead divides things into logical cases - merge both, merge left, merge right (the latter 2 being either partial or full merges). The code is heavily annotated with ASCII diagrams and greatly simplified in comparison to the existing vma_merge() function. Having made this change, we take the opportunity to address an issue with merging VMAs possessing a vm_ops->close() hook - commit |
||
|
|
723e1e8b77 |
mm/vma.h: optimise vma_munmap_struct
The vma_munmap_struct has a hole of 4 bytes and pushes the struct to three
cachelines. Relocating the three booleans upwards allows for the struct
to only use two cachelines (as reported by pahole on amd64).
Before:
struct vma_munmap_struct {
struct vma_iterator * vmi; /* 0 8 */
struct vm_area_struct * vma; /* 8 8 */
struct vm_area_struct * prev; /* 16 8 */
struct vm_area_struct * next; /* 24 8 */
struct list_head * uf; /* 32 8 */
long unsigned int start; /* 40 8 */
long unsigned int end; /* 48 8 */
long unsigned int unmap_start; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
long unsigned int unmap_end; /* 64 8 */
int vma_count; /* 72 4 */
/* XXX 4 bytes hole, try to pack */
long unsigned int nr_pages; /* 80 8 */
long unsigned int locked_vm; /* 88 8 */
long unsigned int nr_accounted; /* 96 8 */
long unsigned int exec_vm; /* 104 8 */
long unsigned int stack_vm; /* 112 8 */
long unsigned int data_vm; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
bool unlock; /* 128 1 */
bool clear_ptes; /* 129 1 */
bool closed_vm_ops; /* 130 1 */
/* size: 136, cachelines: 3, members: 19 */
/* sum members: 127, holes: 1, sum holes: 4 */
/* padding: 5 */
/* last cacheline: 8 bytes */
};
After:
struct vma_munmap_struct {
struct vma_iterator * vmi; /* 0 8 */
struct vm_area_struct * vma; /* 8 8 */
struct vm_area_struct * prev; /* 16 8 */
struct vm_area_struct * next; /* 24 8 */
struct list_head * uf; /* 32 8 */
long unsigned int start; /* 40 8 */
long unsigned int end; /* 48 8 */
long unsigned int unmap_start; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
long unsigned int unmap_end; /* 64 8 */
int vma_count; /* 72 4 */
bool unlock; /* 76 1 */
bool clear_ptes; /* 77 1 */
bool closed_vm_ops; /* 78 1 */
/* XXX 1 byte hole, try to pack */
long unsigned int nr_pages; /* 80 8 */
long unsigned int locked_vm; /* 88 8 */
long unsigned int nr_accounted; /* 96 8 */
long unsigned int exec_vm; /* 104 8 */
long unsigned int stack_vm; /* 112 8 */
long unsigned int data_vm; /* 120 8 */
/* size: 128, cachelines: 2, members: 19 */
/* sum members: 127, holes: 1, sum holes: 1 */
};
Link: https://lkml.kernel.org/r/20240830040101.822209-22-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||
|
|
20831cd6f8 |
mm/vma: drop incorrect comment from vms_gather_munmap_vmas()
The comment has been outdated since
|
||
|
|
224c1c702c |
mm: move may_expand_vm() check in mmap_region()
The may_expand_vm() check requires the count of the pages within the munmap range. Since this is needed for accounting and obtained later, the reodering of ma_expand_vm() to later in the call stack, after the vma munmap struct (vms) is initialised and the gather stage is potentially run, will allow for a single loop over the vmas. The gather sage does not commit any work and so everything can be undone in the case of a failure. The MAP_FIXED page count is available after the vms_gather_munmap_vmas() call, so use it instead of looping over the vmas twice. Link: https://lkml.kernel.org/r/20240830040101.822209-20-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
63fc66f5b6 |
ipc/shm, mm: drop do_vma_munmap()
The do_vma_munmap() wrapper existed for callers that didn't have a vma iterator and needed to check the vma mseal status prior to calling the underlying munmap(). All callers now use a vma iterator and since the mseal check has been moved to do_vmi_align_munmap() and the vmas are aligned, this function can just be called instead. do_vmi_align_munmap() can no longer be static as ipc/shm is using it and it is exported via the mm.h header. Link: https://lkml.kernel.org/r/20240830040101.822209-19-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
13d77e0133 |
mm/mmap: use vms accounted pages in mmap_region()
Change from nr_pages variable to vms.nr_accounted for the charged pages calculation. This is necessary for a future patch. This also avoids checking security_vm_enough_memory_mm() if the amount of memory won't change. Link: https://lkml.kernel.org/r/20240830040101.822209-18-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Paul Moore <paul@paul-moore.com> [LSM] Cc: Kees Cook <kees@kernel.org> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
5972d97c44 |
mm/mmap: use PHYS_PFN in mmap_region()
Instead of shifting the length by PAGE_SIZE, use PHYS_PFN. Also use the existing local variable everywhere instead of some of the time. Link: https://lkml.kernel.org/r/20240830040101.822209-17-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
4f87153e82 |
mm: change failure of MAP_FIXED to restoring the gap on failure
Prior to call_mmap(), the vmas that will be replaced need to clear the way for what may happen in the call_mmap(). This clean up work includes clearing the ptes and calling the close() vm_ops. Some users do more setup than can be restored by calling the vm_ops open() function. It is safer to store the gap in the vma tree in these cases. That is to say that the failure scenario that existed before the MAP_FIXED gap exposure is restored as it is safer than trying to undo a partial mapping. Since abort_munmap_vmas() is only reattaching vmas with this change, the function is renamed to reattach_vmas(). There is also a secondary failure that may occur if there is not enough memory to store the gap. In this case, the vmas are reattached and resources freed. If the system cannot complete the call_mmap() and fails to allocate with GFP_KERNEL, then the system will print a warning about the failure. [lorenzo.stoakes@oracle.com: fix off-by-one error in vms_abort_munmap_vmas()] Link: https://lkml.kernel.org/r/52ee7eb3-955c-4ade-b5f0-28fed8ba3d0b@lucifer.local Link: https://lkml.kernel.org/r/20240830040101.822209-16-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
f8d112a4e6 |
mm/mmap: avoid zeroing vma tree in mmap_region()
Instead of zeroing the vma tree and then overwriting the area, let the area be overwritten and then clean up the gathered vmas using vms_complete_munmap_vmas(). To ensure locking is downgraded correctly, the mm is set regardless of MAP_FIXED or not (NULL vma). If a driver is mapping over an existing vma, then clear the ptes before the call_mmap() invocation. This is done using the vms_clean_up_area() helper. If there is a close vm_ops, that must also be called to ensure any cleanup is done before mapping over the area. This also means that calling open has been added to the abort of an unmap operation, for now. Since vm_ops->open() and vm_ops->close() are not always undo each other (state cleanup may exist in ->close() that is lost forever), the code cannot be left in this way, but that change has been isolated to another commit to make this point very obvious for traceability. Temporarily keep track of the number of pages that will be removed and reduce the charged amount. This also drops the validate_mm() call in the vma_expand() function. It is necessary to drop the validate as it would fail since the mm map_count would be incorrect during a vma expansion, prior to the cleanup from vms_complete_munmap_vmas(). Clean up the error handing of the vms_gather_munmap_vmas() by calling the verification within the function. Link: https://lkml.kernel.org/r/20240830040101.822209-15-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
94f59ea591 |
mm: clean up unmap_region() argument list
With the only caller to unmap_region() being the error path of mmap_region(), the argument list can be significantly reduced. Link: https://lkml.kernel.org/r/20240830040101.822209-14-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
9c3ebeda8f |
mm/vma: track start and end for munmap in vma_munmap_struct
Set the start and end address for munmap when the prev and next are gathered. This is needed to avoid incorrect addresses being used during the vms_complete_munmap_vmas() function if the prev/next vma are expanded. Add a new helper vms_complete_pte_clear(), which is needed later and will avoid growing the argument list to unmap_region() beyond the 9 it already has. Link: https://lkml.kernel.org/r/20240830040101.822209-13-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
d744f4acb8 |
mm/mmap: reposition vma iterator in mmap_region()
Instead of moving (or leaving) the vma iterator pointing at the previous vma, leave it pointing at the insert location. Pointing the vma iterator at the insert location allows for a cleaner walk of the vma tree for MAP_FIXED and the no expansion cases. The vma_prev() call in the case of merging the previous vma is equivalent to vma_iter_prev_range(), since the vma iterator will be pointing to the location just before the previous vma. This change needs to export abort_munmap_vmas() from mm/vma. Link: https://lkml.kernel.org/r/20240830040101.822209-12-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
58e60f8284 |
mm/vma: support vma == NULL in init_vma_munmap()
Adding support for a NULL vma means the init_vma_munmap() can be initialized for a less error-prone process when calling vms_complete_munmap_vmas() later on. Link: https://lkml.kernel.org/r/20240830040101.822209-11-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
9014b230d8 |
mm/vma: expand mmap_region() munmap call
Open code the do_vmi_align_munmap() call so that it can be broken up later in the series. This requires exposing a few more vma operations. Link: https://lkml.kernel.org/r/20240830040101.822209-10-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
c7c0c3c30f |
mm/vma: inline munmap operation in mmap_region()
mmap_region is already passed sanitized addr and len, so change the call to do_vmi_munmap() to do_vmi_align_munmap() and inline the other checks. The inlining of the function and checks is an intermediate step in the series so future patches are easier to follow. Link: https://lkml.kernel.org/r/20240830040101.822209-9-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
89b2d2a57e |
mm/vma: extract validate_mm() from vma_complete()
vma_complete() will need to be called during an unsafe time to call validate_mm(). Extract the call in all places now so that only one location can be modified in the next change. Link: https://lkml.kernel.org/r/20240830040101.822209-8-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
17f1ae9b40 |
mm/vma: change munmap to use vma_munmap_struct() for accounting and surrounding vmas
Clean up the code by changing the munmap operation to use a structure for the accounting and munmap variables. Since remove_mt() is only called in one location and the contents will be reduced to almost nothing. The remains of the function can be added to vms_complete_munmap_vmas(). Link: https://lkml.kernel.org/r/20240830040101.822209-7-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
dba1484090 |
mm/vma: introduce vma_munmap_struct for use in munmap operations
Use a structure to pass along all the necessary information and counters involved in removing vmas from the mm_struct. Update vmi_ function names to vms_ to indicate the first argument type change. Link: https://lkml.kernel.org/r/20240830040101.822209-6-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
6898c9039b |
mm/vma: extract the gathering of vmas from do_vmi_align_munmap()
Create vmi_gather_munmap_vmas() to handle the gathering of vmas into a detached maple tree for removal later. Part of the gathering is the splitting of vmas that span the boundary. Link: https://lkml.kernel.org/r/20240830040101.822209-5-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
01cf21e9e1 |
mm/vma: introduce vmi_complete_munmap_vmas()
Extract all necessary operations that need to be completed after the vma maple tree is updated from a munmap() operation. Extracting this makes the later patch in the series easier to understand. Link: https://lkml.kernel.org/r/20240830040101.822209-4-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |