Android&Linux系统suspend/resume机制

一、系统休眠唤醒机制

1.1 系统休眠唤醒介绍

一方面,在用户不需要系统工作的时候,系统休眠唤醒机制让系统尽可能进入一个功耗极低的状态,这时外部的设备、芯片内部ip、时钟进入了低功耗状态或关闭电源状态,从而尽可能的减少功耗,增加产品的续航;另一方面,在用户需要系统工作的时候,系统能够快速恢复电源、时钟、芯片内部ip及外部设备的工作,从而不影响用户的使用体验。系统休眠唤醒相比其他的功耗管理,对系统涉及面更广。系统休眠及唤醒过程涉及到pm core框架、device pm框架、用户进程及内核线程或worker,各设备驱动、power domain、cpu管理、process freeze&thaw、wakeup处理、设备suspend&resume、syscore suspend&resume、ddr自刷新等模块。

Linux内核提供了多种休眠(休眠)方式:freeze(idle)、standyby、STR(suspend to ram)和STD(suspend to disk),这些休眠方式通过文件节点/sys/power/state提供给用户操作,在用户空间通过向/sys/power/state文件节点分别写入freeze、standy、mem、disk,系统就会进入相应的状态。这样可以在系统需要进入相应状态的时候,由用户发起相应的休眠。在休眠之前会配置唤醒源,当系统休眠下去后,通过这些唤醒源(比如,按键、RTC、屏幕、USB拔插等)可以在需要的时候唤醒(resume)系统。这样在系统进入了休眠状态后,用户可以选择什么时刻,通过什么方式将系统快速唤醒,即兼顾了功耗低,又兼顾了性能。

1.2 系统休眠唤醒框架

系统休眠唤醒的框架包括三部分:Services、PM core、PM driver。

Services(user space):由三类service组成,system supend hal service、power manager service及普通的app service。其中,power manager service提供了wakelock锁的create/request/release管理功能,当没有services持有wakelock的话,power manager service会通过调用system supend hal service往文件节点/sys/power/state写mem发起内核的休眠。

PM core:实现power manage的核心逻辑,为上层services提供操作休眠唤醒的相关接口,通过利用底层相关的技术实现休眠唤醒过程中的cpu hotplug、wakup source enable/disable、设备的suspend&resume、freeze&thaw、power domain的关&开等功能或框架。

PM driver:主要实现了设备驱动的suspend&resume实现,架构驱动(gpio、irq、timer等)低功耗相关的操作。

1.3 系统休眠唤醒流程

流程框架左右是supend流程,右边是resume流程。

supend详细代码详解可参考:2.3;resume详细代码详解可参考:3.1

二、系统休眠机制

Suspend to idle、freezer、standby、ram模式,一般都称作系统休眠。用户空间通过写 /sys/power/state文件节点来触发system supend.

2.1 system suspend框架

详细代码参考:2.3.1、2.3.2、2.3.3、2.3.4

2.2 system suspend代码流程

2.3 system suspend核心代码逻辑

2.3.1 PowerManagerService模块核心代码逻辑

private static native void nativeSetAutoSuspend(boolean enable);
......
public void onDisplayStateChange(boolean allInactive, boolean allOff) { // This method is only needed to support legacy display blanking behavior// where the display's power state is coupled to suspend or to the power HAL.// The order of operations matters here.synchronized (mLock) { setPowerModeInternal(MODE_DISPLAY_INACTIVE, allInactive);
        // 灭屏状态
        if (allOff) { if (!mDecoupleHalInteractiveModeFromDisplayConfig) { setHalInteractiveModeLocked(false);
            }
            if (!mDecoupleHalAutoSuspendModeFromDisplayConfig) { // 设置系统休眠的模式(mem、disk、standbly),默认为suspend to mem模式
                setHalAutoSuspendModeLocked(true);
            }
        } else { if (!mDecoupleHalAutoSuspendModeFromDisplayConfig) { setHalAutoSuspendModeLocked(false);
            }
            if (!mDecoupleHalInteractiveModeFromDisplayConfig) { setHalInteractiveModeLocked(true);
            }
        }
    }
}
 // 设置系统休眠的模式(mem、disk、standbly),并进入suspend流程
private void setHalAutoSuspendModeLocked(boolean enable) { if (enable != mHalAutoSuspendModeEnabled) { if (DEBUG) { Slog.d(TAG, "Setting HAL auto-suspend mode to " + enable);
        }
        mHalAutoSuspendModeEnabled = enable;
        Trace.traceBegin(Trace.TRACE_TAG_POWER, "setHalAutoSuspend(" + enable + ")");
        try { // 通过JNI调用native层suspend接口
            mNativeWrapper.nativeSetAutoSuspend(enable);
        } finally { Trace.traceEnd(Trace.TRACE_TAG_POWER);
        }
    }
}

2.3.2 PMS JNI代码逻辑

static void nativeSetAutoSuspend(JNIEnv* /* env */, jclass /* clazz */, jboolean enable) { // enable为true 
    if (enable) { android::base::Timer t;
        enableAutoSuspend();
        if (t.duration() > 100ms) { ALOGD("Excessive delay in autosuspend_enable() while turning screen off");
        }
    } else { android::base::Timer t;
        disableAutoSuspend();
        if (t.duration() > 100ms) { ALOGD("Excessive delay in autosuspend_disable() while turning screen on");
        }
    }
}
void enableAutoSuspend() { static bool enabled = false;
    if (!enabled) { // 通过binder通信获取system suspen进程中SuspendControlServiceInternal服务
        static sp autosuspendClientToken = new BBinder();
        sp suspendControl =
                getSuspendControlInternal();
        // 调用binder API
        suspendControl->enableAutosuspend(autosuspendClientToken, &enabled);
    }
    { std::lock_guard lock(gSuspendMutex);
        if (gSuspendBlocker) { gSuspendBlocker->release();
            gSuspendBlocker = nullptr;
        }
    }
}
// 获取SuspendControlServiceInternal 服务
sp getSuspendControlInternal() { static std::once_flag suspendControlFlag;
    std::call_once(suspendControlFlag, []() { gSuspendControlInternal =
                waitForService(
                        String16("suspend_control_internal"));
        LOG_ALWAYS_FATAL_IF(gSuspendControlInternal == nullptr);
    });
    return gSuspendControlInternal;
}

2.3.3 system suspend

// ISuspendControlServiceInternal.aidl
package android.system.suspend.internal;
import android.system.suspend.internal.SuspendInfo;
import android.system.suspend.internal.WakeLockInfo;
import android.system.suspend.internal.WakeupInfo;
/**
 * Interface exposed by the suspend hal that allows framework to toggle the suspend loop and
 * monitor native wakelocks.
 * @hide
 */interface ISuspendControlServiceInternal { /**
     * Starts automatic system suspension.
     *
     * @param token token registering automatic system suspension.
     * When all registered tokens die automatic system suspension is disabled.
     * @return true on success, false otherwise.
     */    boolean enableAutosuspend(IBinder token);
/**
     * Suspends the system even if there are wakelocks being held.
     */    boolean forceSuspend();
/**
     * Returns a list of wake lock stats.
     */    WakeLockInfo[] getWakeLockStats();
/**
     * Returns a list of wakeup stats.
     */    WakeupInfo[] getWakeupStats();
/**
     * Returns stats related to suspend.
     */    SuspendInfo getSuspendStats();
}
-----------------------------------------------------------------------------
// ISuspendControlService.aidl
package android.system.suspend;
import android.system.suspend.IWakelockCallback;
import android.system.suspend.ISuspendCallback;
/**
 * Interface exposed by the suspend hal that allows framework to toggle the suspend loop and
 * monitor native wakelocks.
 * @hide
 */interface ISuspendControlService { /**
     * Registers a callback for suspend events.  ISuspendControlService must keep track of all
     * registered callbacks unless the client process that registered the callback dies.
     *
     * @param callback the callback to register.
     * @return true on success, false otherwise.
     */    boolean registerCallback(ISuspendCallback callback);
/**
     * Registers a callback for a wakelock specified by its name.
     *
     * @param callback the callback to register.
     * @param name the name of the wakelock.
     * @return true on success, false otherwise.
     */    boolean registerWakelockCallback(IWakelockCallback callback, @utf8InCpp String name);
}
// SuspendControlService.h
#include #include #include #include #include using ::android::system::suspend::BnSuspendControlService;
using ::android::system::suspend::ISuspendCallback;
using ::android::system::suspend::IWakelockCallback;
using ::android::system::suspend::internal::BnSuspendControlServiceInternal;
using ::android::system::suspend::internal::SuspendInfo;
using ::android::system::suspend::internal::WakeLockInfo;
using ::android::system::suspend::internal::WakeupInfo;
namespace android {namespace system {namespace suspend {namespace V1_0 {class SystemSuspend;
class SuspendControlService : public BnSuspendControlService,
                              public virtual IBinder::DeathRecipient { public:
    SuspendControlService() = default;
    ~SuspendControlService() override = default;
    binder::Status registerCallback(const sp& callback,
                                    bool* _aidl_return) override;
    binder::Status registerWakelockCallback(const sp& callback,
                                            const std::string& name, bool* _aidl_return) override;
    void binderDied(const wp& who) override;
    void notifyWakelock(const std::string& name, bool isAcquired);
    void notifyWakeup(bool success, std::vector& wakeupReasons);
   private:
    std::map>> mWakelockCallbacks;
    std::mutex mCallbackLock;
    std::mutex mWakelockCallbackLock;
    std::vector> mCallbacks;
    const std::vector>::iterator findCb(const wp& cb) { return std::find_if(
            mCallbacks.begin(), mCallbacks.end(),
            [&cb](const sp& i) { return cb == IInterface::asBinder(i); });
    }
};
class SuspendControlServiceInternal : public BnSuspendControlServiceInternal { public:
    SuspendControlServiceInternal() = default;
    ~SuspendControlServiceInternal() override = default;
    binder::Status enableAutosuspend(const sp& token, bool* _aidl_return) override;
    binder::Status forceSuspend(bool* _aidl_return) override;
    binder::Status getSuspendStats(SuspendInfo* _aidl_return) override;
    binder::Status getWakeLockStats(std::vector* _aidl_return) override;
    binder::Status getWakeupStats(std::vector* _aidl_return) override;
    void setSuspendService(const wp& suspend);
    status_t dump(int fd, const Vector& args) override;
   private:
    wp mSuspend;
};
}  // namespace V1_0
}  // namespace suspend
}  // namespace system
}  // namespace android
// SuspendControlService.cpp
binder::Status SuspendControlServiceInternal::enableAutosuspend(const sp& token,                 bool* _aidl_return) { const auto suspendService = mSuspend.promote();
    return retOk(suspendService != nullptr && suspendService->enableAutosuspend(token),
                 _aidl_return);
}
// SystemSuspend.h
#include #include #include #include #include #include #include #include #include #include "SuspendControlService.h"
#include "WakeLockEntryList.h"
#include "WakeupList.h"
namespace android {namespace system {namespace suspend {namespace V1_0 {using ::android::base::Result;
using ::android::base::unique_fd;
using ::android::system::suspend::internal::SuspendInfo;
using namespace std::chrono_literals;
class SystemSuspend;
struct SuspendStats { int success = 0;
    int fail = 0;
    int failedFreeze = 0;
    int failedPrepare = 0;
    int failedSuspend = 0;
    int failedSuspendLate = 0;
    int failedSuspendNoirq = 0;
    int failedResume = 0;
    int failedResumeEarly = 0;
    int failedResumeNoirq = 0;
    std::string lastFailedDev;
    int lastFailedErrno = 0;
    std::string lastFailedStep;
};
struct SleepTimeConfig { std::chrono::milliseconds baseSleepTime;
    std::chrono::milliseconds maxSleepTime;
    double sleepTimeScaleFactor;
    uint32_t backoffThreshold;
    std::chrono::milliseconds shortSuspendThreshold;
    bool failedSuspendBackoffEnabled;
    bool shortSuspendBackoffEnabled;
};
std::string readFd(int fd);
class SystemSuspend : public RefBase { public:
    SystemSuspend(unique_fd wakeupCountFd, unique_fd stateFd, unique_fd suspendStatsFd,
                  size_t maxStatsEntries, unique_fd kernelWakelockStatsFd,
                  unique_fd wakeupReasonsFd, unique_fd suspendTimeFd,
                  const SleepTimeConfig& sleepTimeConfig,
                  const sp& controlService,
                  const sp& controlServiceInternal,
                  bool useSuspendCounter = true);
    void incSuspendCounter(const std::string& name);
    void decSuspendCounter(const std::string& name);
    bool enableAutosuspend(const sp& token);
    void disableAutosuspend();
    bool forceSuspend();
    const WakeupList& getWakeupList() const;
    const WakeLockEntryList& getStatsList() const;
    void updateWakeLockStatOnAcquire(const std::string& name, int pid);
    void updateWakeLockStatOnRelease(const std::string& name, int pid);
    void updateStatsNow();
    Result getSuspendStats();
    void getSuspendInfo(SuspendInfo* info);
    std::chrono::milliseconds getSleepTime() const;
    unique_fd reopenFileUsingFd(const int fd, int permission);
   private:
    ~SystemSuspend(void) override;
    std::mutex mAutosuspendClientTokensLock;
    std::mutex mAutosuspendLock ACQUIRED_AFTER(mAutosuspendClientTokensLock);
    std::mutex mSuspendInfoLock;
    void initAutosuspendLocked()
        EXCLUSIVE_LOCKS_REQUIRED(mAutosuspendClientTokensLock, mAutosuspendLock);
    void disableAutosuspendLocked()
        EXCLUSIVE_LOCKS_REQUIRED(mAutosuspendClientTokensLock, mAutosuspendLock);
    void checkAutosuspendClientsLivenessLocked()
        EXCLUSIVE_LOCKS_REQUIRED(mAutosuspendClientTokensLock);
    bool hasAliveAutosuspendTokenLocked() EXCLUSIVE_LOCKS_REQUIRED(mAutosuspendClientTokensLock);
    std::condition_variable mAutosuspendCondVar GUARDED_BY(mAutosuspendLock);
    uint32_t mSuspendCounter GUARDED_BY(mAutosuspendLock);
    std::vector> mAutosuspendClientTokens GUARDED_BY(mAutosuspendClientTokensLock);
    std::atomic mAutosuspendEnabled GUARDED_BY(mAutosuspendLock){false};
    std::atomic mAutosuspendThreadCreated GUARDED_BY(mAutosuspendLock){false};
unique_fd mWakeupCountFd;
    unique_fd mStateFd;
    unique_fd mSuspendStatsFd;
    unique_fd mSuspendTimeFd;
    SuspendInfo mSuspendInfo GUARDED_BY(mSuspendInfoLock);
    const SleepTimeConfig kSleepTimeConfig;
    // Amount of thread sleep time between consecutive iterations of the suspend loopstd::chrono::milliseconds mSleepTime;
    int32_t mNumConsecutiveBadSuspends GUARDED_BY(mSuspendInfoLock);
    // Updates thread sleep time and suspend stats depending on the result of suspend attemptvoid updateSleepTime(bool success, const struct SuspendTime& suspendTime);
    sp mControlService;
    sp mControlServiceInternal;
    WakeLockEntryList mStatsList;
    WakeupList mWakeupList;
    // If true, use mSuspendCounter to keep track of native wake locks. Otherwise, rely on// /sys/power/wake_lock interface to block suspend.// TODO(b/128923994): remove dependency on /sys/power/wake_lock interface.bool mUseSuspendCounter;
    unique_fd mWakeLockFd;
    unique_fd mWakeUnlockFd;
    unique_fd mWakeupReasonsFd;
};
}  // namespace V1_0
}  // namespace suspend
}  // namespace system
}  // namespace android
------------------------------------------------------------------------
// SystemSuspend.cpp
static const char kSleepState[] = "mem";
bool SystemSuspend::enableAutosuspend(const sp& token) { auto tokensLock = std::lock_guard(mAutosuspendClientTokensLock);
    auto autosuspendLock = std::lock_guard(mAutosuspendLock);
    // Disable zygote kernel wakelock, since explicitly attempting to// enable autosuspend. This should be done even if autosuspend is// already enabled, since it could be the case that the framework// is restarting and connecting to the existing suspend service.if (!WriteStringToFd(kZygoteKernelWakelock, mWakeUnlockFd)) { PLOG(ERROR) << "error writing " << kZygoteKernelWakelock << " to " << kSysPowerWakeUnlock;
    }
    bool hasToken = std::find(mAutosuspendClientTokens.begin(), mAutosuspendClientTokens.end(),
                              token) != mAutosuspendClientTokens.end();
    if (!hasToken) { mAutosuspendClientTokens.push_back(token);
    }
    if (mAutosuspendEnabled) { LOG(ERROR) << "Autosuspend already started.";
        return false;
    }
    mAutosuspendEnabled = true;
    initAutosuspendLocked();
    return true;
}
// 创建auto suspend线程,每隔200ms
void SystemSuspend::initAutosuspendLocked() { if (mAutosuspendThreadCreated) { LOG(INFO) << "Autosuspend thread already started.";
        return;
    }
    std::thread autosuspendThread([this] { auto autosuspendLock = std::unique_lock(mAutosuspendLock);
        bool shouldSleep = true;
        while (true) { { base::ScopedLockAssertion autosuspendLocked(mAutosuspendLock);
                if (!mAutosuspendEnabled) { mAutosuspendThreadCreated = false;
                    return;
                }
                // If we got here by a failed write to /sys/power/wakeup_count; 
                //don't sleep// since we didn't attempt to suspend on the last cycle of this loop.
                if (shouldSleep) { mAutosuspendCondVar.wait_for(
                        autosuspendLock, mSleepTime,
                        [this]() REQUIRES(mAutosuspendLock) { return !mAutosuspendEnabled; });
                }
                if (!mAutosuspendEnabled) continue;
                autosuspendLock.unlock();
            }
            lseek(mWakeupCountFd, 0, SEEK_SET);
            string wakeupCount = readFd(mWakeupCountFd);
            { autosuspendLock.lock();
                base::ScopedLockAssertion autosuspendLocked(mAutosuspendLock);
                if (wakeupCount.empty()) { PLOG(ERROR) << "error reading from /sys/power/wakeup_count";
                    continue;
                }
                // 从/sys/power/wakeup_count节点读取数据,如果为空,则进入suspend失败
                shouldSleep = false;
                mAutosuspendCondVar.wait(autosuspendLock, [this]() REQUIRES(mAutosuspendLock) { return mSuspendCounter == 0 || !mAutosuspendEnabled;
                });
                if (!mAutosuspendEnabled) continue;
                autosuspendLock.unlock();
            }
            bool success;
            { auto tokensLock = std::lock_guard(mAutosuspendClientTokensLock);
                // TODO: Clean up client tokens after soaking the new approach// checkAutosuspendClientsLivenessLocked();autosuspendLock.lock();
                base::ScopedLockAssertion autosuspendLocked(mAutosuspendLock);
                if (!hasAliveAutosuspendTokenLocked()) { disableAutosuspendLocked();
                    continue;
                }
                // Check suspend counter hasn't increased while checking client livenessif (mSuspendCounter > 0) { continue;
                }
                // The mutex is locked and *MUST* remain locked until we write to /sys/power/state.// Otherwise, a WakeLock might be acquired after we check mSuspendCounter and before// we write to /sys/power/state.if (!WriteStringToFd(wakeupCount, mWakeupCountFd)) { PLOG(VERBOSE) << "error writing to /sys/power/wakeup_count";
                    continue;
                }
                // 将"men"字符串接入/sys/power/state节点,
                // 表示进入suspend to ram模式,调用kernel suspend的state_store函数
                success = WriteStringToFd(kSleepState, mStateFd);
                shouldSleep = true;
                autosuspendLock.unlock();
            }
            if (!success) { PLOG(VERBOSE) << "error writing to /sys/power/state";
            }
            struct SuspendTime suspendTime = readSuspendTime(mSuspendTimeFd);
            updateSleepTime(success, suspendTime);
            std::vector wakeupReasons = readWakeupReasons(mWakeupReasonsFd);
            if (wakeupReasons == std::vector({kUnknownWakeup})) { LOG(INFO) << "Unknown/empty wakeup reason. Re-opening wakeup_reason file.";
                mWakeupReasonsFd =
                    std::move(reopenFileUsingFd(mWakeupReasonsFd.get(), O_CLOEXEC | O_RDONLY));
            }
            mWakeupList.update(wakeupReasons);
            mControlService->notifyWakeup(success, wakeupReasons);
            // Take the lock before returning to the start of the loopautosuspendLock.lock();
        }
    });
    autosuspendThread.detach();
    mAutosuspendThreadCreated = true;
    LOG(INFO) << "automatic system suspend enabled";
}

2.3.4 kernel suspend

// mian.c
// state_store是一个syscall调用函数,为进入suspend的入口函数,
// 接收user space传递的"mem",准备进入suspend to ram模式
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
                           const char *buf, size_t n)
{ suspend_state_t state;
        int error;
        
        pr_info("%s:Userspace to kernel, process start write command \"echo mem > /sys/power/state\". The process is \"%s\"(pid %i). The thread is \"%s\"(tid %i).\n",
                STR_KERNEL_LOG_ENTER, current->group_leader->comm, current->group_leader->pid, current->comm, current->pid);
        
        error = pm_autosleep_lock();
        if (error)
                return error;
        
        if (pm_autosleep_state() > PM_SUSPEND_ON) { error = -EBUSY;
                goto out;
        }
        // "mem"字符串封装到suspend_state_t数据结构
        state = decode_state(buf, n);
        // 检查user space传递的suspend模式是否符合规定的范围
        if (state < PM_SUSPEND_MAX) { if (state == PM_SUSPEND_MEM)
                    state = mem_sleep_current;
            // 调用suspend.c函数,进入suspend阶段
            // 返回值为0表示suspend successed,否则supend failed,常用于suspend debug调试
            error = pm_suspend(state);
            if (!error) { pr_info("%s:(( end )), kernel resume end.\n", STR_KERNEL_LOG_EXIT);
        } else if (state == PM_SUSPEND_MAX) { // PM_SUSPEND_MAX为suspend to disk,进入hibernate睡眠模式,走另一条流程
        error = hibernate();
        } else { error = -EINVAL;
        }
 out:
        pm_autosleep_unlock();
        return error ? error : n;
}
// suspend模式的字符串封装到suspend_state_t,并返回。如果是supend disk模式,则返回PM_SUSPEND_MAX
static suspend_state_t decode_state(const char *buf, size_t n)
{#ifdef CONFIG_SUSPEND
        suspend_state_t state;
#endif
        char *p; 
        int len;
        p = memchr(buf, '\n', n); 
        len = p ? p - buf : n;
        /* Check hibernation first. */
        // 如果是supend disk模式,则返回PM_SUSPEND_MAX。表示系统进入suspend to disk
        if (len == 4 && str_has_prefix(buf, "disk"))
                return PM_SUSPEND_MAX;
#ifdef CONFIG_SUSPEND
        for (state = PM_SUSPEND_MIN; state < PM_SUSPEND_MAX; state++) { const char *label = pm_states[state];
                if (label && len == strlen(label) && !strncmp(buf, label, len))
                        return state;
        }
#endif
        return PM_SUSPEND_ON;
}
// suspend.h
#define PM_SUSPEND_ON           ((__force suspend_state_t) 0)
#define PM_SUSPEND_TO_IDLE      ((__force suspend_state_t) 1)
#define PM_SUSPEND_STANDBY      ((__force suspend_state_t) 2)
#define PM_SUSPEND_MEM          ((__force suspend_state_t) 3)
#define PM_SUSPEND_MIN          PM_SUSPEND_TO_IDLE
#define PM_SUSPEND_MAX          ((__force suspend_state_t) 4)
enum suspend_stat_step { SUSPEND_FREEZE = 1,
        SUSPEND_PREPARE,
        SUSPEND_SUSPEND,
        SUSPEND_SUSPEND_LATE,
        SUSPEND_SUSPEND_NOIRQ,
        SUSPEND_RESUME_NOIRQ,
        SUSPEND_RESUME_EARLY,
        SUSPEND_RESUME
};
struct suspend_stats { int     success;
        int     fail;
        int     failed_freeze;
        int     failed_prepare;
        int     failed_suspend;
        int     failed_suspend_late;
        int     failed_suspend_noirq;
        int     failed_resume;
        int     failed_resume_early;
        int     failed_resume_noirq;
#define REC_FAILED_NUM  2
        int     last_failed_dev;
        char    failed_devs[REC_FAILED_NUM][40];
        int     last_failed_errno;
        int     errno[REC_FAILED_NUM];
        int     last_failed_step;
        enum suspend_stat_step  failed_steps[REC_FAILED_NUM];
};
#define PM_SUSPEND_FLAG_FW_SUSPEND      BIT(0)
#define PM_SUSPEND_FLAG_FW_RESUME       BIT(1)
#define PM_SUSPEND_FLAG_NO_PLATFORM     BIT(2)
/* Suspend-to-idle state machnine. */
enum s2idle_states { S2IDLE_STATE_NONE,      /* Not suspended/suspending. */
        S2IDLE_STATE_ENTER,     /* Enter suspend-to-idle. */
        S2IDLE_STATE_WAKE,      /* Wake up from suspend-to-idle. */
};
/* Hibernation and suspend events */
#define PM_HIBERNATION_PREPARE  0x0001 /* Going to hibernate */
#define PM_POST_HIBERNATION     0x0002 /* Hibernation finished */
#define PM_SUSPEND_PREPARE      0x0003 /* Going to suspend the system */
#define PM_POST_SUSPEND         0x0004 /* Suspend finished */
#define PM_RESTORE_PREPARE      0x0005 /* Going to restore a saved image */
#define PM_POST_RESTORE         0x0006 /* Restore failed */
----------------------------------------------------------------------------
// suspend.c
/**
 * pm_suspend - Externally visible function for suspending the system.
 * @state: System sleep state to enter.
 *
 * Check if the value of @state represents one of the supported states,
 * execute enter_state() and update system suspend statistics.
 */
 // 系统进入suspend模式
 // 更新系统suspend信息
int pm_suspend(suspend_state_t state)
{ int error;
        if (state <= PM_SUSPEND_ON || state >= PM_SUSPEND_MAX)
                return -EINVAL;
        pr_info("suspend entry (%s)\n", mem_sleep_labels[state]);
        // 进入supend
        error = enter_state(state);
        // 统计suspend 成功与失败次数
        if (error) { suspend_stats.fail++;
                dpm_save_failed_errno(error);
        } else { suspend_stats.success++;
        }
        pr_info("suspend exit\n");
        return error;
}
/**
 * enter_state - Do common work needed to enter system sleep state.
 * @state: System sleep state to enter.
 *
 * Make sure that no one else is trying to put the system into a sleep state.
 * Fail if that's not the case.  Otherwise, prepare for system suspend, make the
 * system enter the given sleep state and clean up after wakeup.
 */
static int enter_state(suspend_state_t state)
{ int error;
        trace_suspend_resume(TPS("suspend_enter"), state, true);
        if (state == PM_SUSPEND_TO_IDLE) {#ifdef CONFIG_PM_DEBUG
                if (pm_test_level != TEST_NONE && pm_test_level <= TEST_CPUS) { pr_warn("Unsupported test mode for suspend to idle, please choose none/freezer/devices/platform.\n");
                        return -EAGAIN;
                }
#endif
} else if (!valid_state(state)) { return -EINVAL;
        }
        if (!mutex_trylock(&system_transition_mutex))
                return -EBUSY;
        // 如果是进入suspend to idle模式,会先调用s2idle_begin函数,将设置s2idle_state = S2IDLE_STATE_NONE
        if (state == PM_SUSPEND_TO_IDLE)
                s2idle_begin();
        if (sync_on_suspend_enabled) { trace_suspend_resume(TPS("sync_filesystems"), 0, true);
                ksys_sync_helper();
                trace_suspend_resume(TPS("sync_filesystems"), 0, false);
        }
        pm_pr_dbg("Preparing system for sleep (%s)\n", mem_sleep_labels[state]);
        pm_suspend_clear_flags();
        // 1. 进入suspend第一阶段
        error = suspend_prepare(state);
        if (error)
                goto Unlock;
        if (suspend_test(TEST_FREEZER))
                goto Finish;
        trace_suspend_resume(TPS("suspend_enter"), state, false);
        pm_pr_dbg("Suspending system (%s)\n", mem_sleep_labels[state]);
        pm_restrict_gfp_mask();
        pr_info("%s:Suspend_devices.\n", STR_KERNEL_LOG_ENTER);
        // 2. 进入suspend的第二阶段
        error = suspend_devices_and_enter(state);
        pm_restore_gfp_mask();
 Finish:
        events_check_enabled = false;
        pm_pr_dbg("Finishing wakeup.\n");
        suspend_finish();
 Unlock:
        mutex_unlock(&system_transition_mutex);
        return error;
}

2.3.4.1 kernel suspend流程的第一阶段 —suspend_prepare

/**
 * suspend_prepare - Prepare for entering system sleep state.
 *
 * Common code run for every system sleep state that can be entered (except for
 * hibernation).  Run suspend notifiers, allocate the "suspend" console and
 * freeze processes.
 */
 // suspend idle、standby、mem模式都会走该流程,suspend disk另走hibernate()流程
 // 1. suspend流程的第一阶段
static int suspend_prepare(suspend_state_t state)
{ int error;
        if (!sleep_state_supported(state))
                return -EPERM;
        // 1.1 切换控制台,使用suspend下的控制台,重定向到kmsg
        pm_prepare_console();
        // 1.2 suspend前调用阻塞通知链,通知PM_SUSPEND_PREPARE事件
        error = pm_notifier_call_chain_robust(PM_SUSPEND_PREPARE, PM_POST_SUSPEND);
        if (error)
                goto Restore;
        trace_suspend_resume(TPS("freeze_processes"), 0, true);
        // 1.3 冻结用户进程和部分内核线程及worker线程
        error = suspend_freeze_processes();
        trace_suspend_resume(TPS("freeze_processes"), 0, false);
        if (!error)
                return 0;
        log_suspend_abort_reason("One or more tasks refusing to freeze");
        suspend_stats.failed_freeze++;
        dpm_save_failed_step(SUSPEND_FREEZE);
        pm_notifier_call_chain(PM_POST_SUSPEND);
 Restore:
        pm_restore_console();
        return error;
}
// console.c 
void pm_prepare_console(void)
{ if (!pm_vt_switch())
                return;
        // 切换到suspend控制台
        orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1); 
        if (orig_fgconsole < 0)
                return;
        // suspend控制台日志重定向到kmsg,方便调试
        orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
        return;
}
--------------------------------------------------------------------------------
// mian.c
int pm_notifier_call_chain_robust(unsigned long val_up, unsigned long val_down)
{ int ret;
        ret = blocking_notifier_call_chain_robust(&pm_chain_head, val_up, val_down, NULL);
        return notifier_to_errno(ret);
}
--------------------------------------------------------------------------------
// notifier.c
// 调用阻塞通知链,接收suspend prepaer事件,通知注册在该链上的所有回调函数做处理
int blocking_notifier_call_chain_robust(struct blocking_notifier_head *nh,
                unsigned long val_up, unsigned long val_down, void *v) 
{ int ret = NOTIFY_DONE;
        /*
         * We check the head outside the lock, but if this access is
         * racy then it does not matter what the result of the test
         * is, we re-check the list after having taken the lock anyway:
         */
        if (rcu_access_pointer(nh->head)) { down_read(&nh->rwsem);
                ret = notifier_call_chain_robust(&nh->head, val_up, val_down, v); 
                up_read(&nh->rwsem);
        }
        return ret;
}
--------------------------------------------------------------------------------
// power.h
// 1.3 进程冻结
#ifdef CONFIG_SUSPEND_FREEZER
static inline int suspend_freeze_processes(void)
{ int error;
        // 1.3.1 冻结用户进程,如果冻结失败,则直接返回
        error = freeze_processes();
        /*  
         * freeze_processes() automatically thaws every task if freezing
         * fails. So we need not do anything extra upon error.
         */
        if (error)
                return error;
        // 1.3.2 冻结内核线程,如果冻结失败,则已被冻结的内核线程
        error = freeze_kernel_threads();
        /*
         * freeze_kernel_threads() thaws only kernel threads upon freezing
         * failure. So we have to thaw the userspace tasks ourselves.
         */
        if (error)
                // 如果内核线程冻结失败,则已被冻结的用户进程
                thaw_processes();
        return error;
}
----------------------------------------------------------------------
// process.c
/**
 * freeze_processes - Signal user space processes to enter the refrigerator.
 * The current thread will not be frozen.  The same process that calls
 * freeze_processes must later call thaw_processes.
 *
 * On success, returns 0.  On failure, -errno and system is fully thawed.
 */
 // 1.3.1 冻结用户进程
int freeze_processes(void)
{ int error;
        error = __usermodehelper_disable(UMH_FREEZING);
        if (error)
                return error;
        /* Make sure this task doesn't get frozen */
        current->flags |= PF_SUSPEND_TASK;
        if (!pm_freezing)
                atomic_inc(&system_freezing_cnt);
                pm_wakeup_clear(0);
        pr_info("%s:Freezing user space processes ... ", STR_KERNEL_LOG_ENTER);
        pm_freezing = true;
        // // 参数传入true,表示冻结用户进程
        error = try_to_freeze_tasks(true);
        if (!error) { __usermodehelper_set_disable_depth(UMH_DISABLED);
                pr_cont("done.");
        }
        pr_cont("\n");
        BUG_ON(in_atomic());
        /*
         * Now that the whole userspace is frozen we need to disable
         * the OOM killer to disallow any further interference with
         * killable tasks. There is no guarantee oom victims will
         * ever reach a point they go away we have to wait with a timeout.
         */
         // 用户进程冻结完后, OOM killer做disable处理,防止对任何对可杀死任务的进一步干扰
        if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs)))
                error = -EBUSY;
        if (error)
                // 如果用户进程冻结失败,则解冻所有已被冻结的用户进程
                thaw_processes();
        return error;
}
// 解冻所有已被冻结的用户进程
void thaw_processes(void)
{ struct task_struct *g, *p;
        struct task_struct *curr = current;
        trace_suspend_resume(TPS("thaw_processes"), 0, true);
        if (pm_freezing)
                atomic_dec(&system_freezing_cnt);
        pm_freezing = false;
        pm_nosig_freezing = false;
        // kernel oom killer重新使能
        oom_killer_enable();
        pr_info("%s:Restarting tasks ... ", STR_KERNEL_LOG_EXIT);
        __usermodehelper_set_disable_depth(UMH_FREEZING);
        // 解冻worker线程
        thaw_workqueues();
        cpuset_wait_for_hotplug();
        read_lock(&tasklist_lock);
        for_each_process_thread(g, p) { /* No other threads should have PF_SUSPEND_TASK set */
                WARN_ON((p != curr) && (p->flags & PF_SUSPEND_TASK));
                __thaw_task(p);  // freezer模块提供的解冻函数入口,根据P判断是用户进程还是内核线程
        }
        read_unlock(&tasklist_lock);
        WARN_ON(!(curr->flags & PF_SUSPEND_TASK));
        curr->flags &= ~PF_SUSPEND_TASK;
        usermodehelper_enable();
        schedule();
        pr_cont("done.\n");
        trace_suspend_resume(TPS("thaw_processes"), 0, false);
}
// 冻结用户进程和内核线程都会走到这个函数;user_only true表示用户进程,false表示内核线程
static int try_to_freeze_tasks(bool user_only)
{ struct task_struct *g, *p;
        unsigned long end_time;
        unsigned int todo;
        bool wq_busy = false;
        ktime_t start, end, elapsed;
        unsigned int elapsed_msecs;
        bool wakeup = false;
        int sleep_usecs = USEC_PER_MSEC;
        char last_process[TASK_COMM_LEN];
        char last_thread[TASK_COMM_LEN];
        int last_process_pid;
        int last_thread_tid;
        start = ktime_get_boottime();
        end_time = jiffies + msecs_to_jiffies(freeze_timeout_msecs);
        // 如果是内核线程,则先冻结worker线程
        if (!user_only)
                freeze_workqueues_begin();
        while (true) { todo = 0;
                read_lock(&tasklist_lock);
                for_each_process_thread(g, p) { if (p == current || !freeze_task(p)) // 调用cgroup freezer的冻结函数
                                continue;
                        if (!freezer_should_skip(p)) { strcpy(last_process, g->comm);
                                last_process_pid = g->pid;
                                strcpy(last_thread, p->comm);
                                last_thread_tid = p->pid;
                                todo++;
                        }
                }
                read_unlock(&tasklist_lock);
                if (!user_only) { wq_busy = freeze_workqueues_busy();
                        todo += wq_busy;
                }
                if (!todo || time_after(jiffies, end_time))
                        break;
                if (pm_wakeup_pending()) { wakeup = true;
                        break;
                }
                /*
                 * We need to retry, but first give the freezing tasks some
                 * time to enter the refrigerator.  Start with an initial
                 * 1 ms sleep followed by exponential backoff until 8 ms.
                 */
                usleep_range(sleep_usecs / 2, sleep_usecs);
                if (sleep_usecs < 8 * USEC_PER_MSEC)
                        sleep_usecs *= 2;
        }
        end = ktime_get_boottime();
        elapsed = ktime_sub(end, start);
        elapsed_msecs = ktime_to_ms(elapsed);
        if (wakeup) { pr_cont("\n");
                pr_err("Freezing of tasks aborted after %d.%03d seconds",
                       elapsed_msecs / 1000, elapsed_msecs % 1000);
                pr_err("%s:wakeup true. Process is [%s](pid = %d), thread is [%s](tid = %d)!\n", STR_KERNEL_LOG_ENTER_ERR, last_process, last_process_pid, last_thread, last_thread_tid);
        } else if (todo) { pr_cont("\n");
                pr_err("Freezing of tasks failed after %d.%03d seconds"
                       " (%d tasks refusing to freeze, wq_busy=%d):\n",
                       elapsed_msecs / 1000, elapsed_msecs % 1000,
                       todo - wq_busy, wq_busy);
                pr_err("%s:todo true. Process is [%s](pid = %d), thread is [%s](tid = %d)!\n", STR_KERNEL_LOG_ENTER_ERR, last_process, last_process_pid, last_thread, last_thread_tid);
                if (wq_busy)
                        show_workqueue_state();
                if (pm_debug_messages_on) { read_lock(&tasklist_lock);
                        for_each_process_thread(g, p) { if (p != current && !freezer_should_skip(p)
                                    && freezing(p) && !frozen(p)) { sched_show_task(p);
                                        trace_android_vh_try_to_freeze_todo_unfrozen(p);
                                }
                        }
                        read_unlock(&tasklist_lock);
                }
                trace_android_vh_try_to_freeze_todo(todo, elapsed_msecs, wq_busy);
        } else { pr_cont("(elapsed %d.%03d seconds) ", elapsed_msecs / 1000,
                        elapsed_msecs % 1000);
        }
        return todo ? -EBUSY : 0;
}
/**
 * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
 *
 * On success, returns 0.  On failure, -errno and only the kernel threads are
 * thawed, so as to give a chance to the caller to do additional cleanups
 * (if any) before thawing the userspace tasks. So, it is the responsibility
 * of the caller to thaw the userspace tasks, when the time is right.
 */
 // 冻结内核线程
int freeze_kernel_threads(void)
{ int error;
        pr_info("%s:Freezing remaining freezable tasks ... ", STR_KERNEL_LOG_ENTER);
        pm_nosig_freezing = true;
        // 参数传入false,表示冻结内核线程
        error = try_to_freeze_tasks(false);
        if (!error)
                pr_cont("done.");
        pr_cont("\n");
        BUG_ON(in_atomic());
        if (error)
                // 若内核线程冻结失败,则解冻已被冻结的内核线程
                thaw_kernel_threads();
        return error;
}
// 解冻内核线程,包括worker线程和其它的内核线程
void thaw_kernel_threads(void)
{ struct task_struct *g, *p;
        pm_nosig_freezing = false;
        pr_info("%s:Restarting kernel threads ... ", STR_KERNEL_LOG_EXIT);
        // 解冻worker线程
        thaw_workqueues();
        read_lock(&tasklist_lock);
        for_each_process_thread(g, p) { // 对struct p->flags赋值
                if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
                        // 解冻其它的内核线程,根据P判断是内核线程还是用户进程
                        __thaw_task(p);
        }
        read_unlock(&tasklist_lock);
        schedule();
        pr_cont("done.\n");
}

Linux kernel cgroup freezer模块具体的冻结与解冻策略,另起一篇文章讲述。

小结:

  1. suspend第一阶段主要做了三个事情

    a. 切换控制台,使用suspend下的控制台,重定向到kmsg

    b.suspend前调用阻塞通知链,通知PM_SUSPEND_PREPARE事件

    c.冻结用户进程和内核worker线程及其它线程

  2. freezer模块的入口函数

    冻结入口函数:freeze_task(struct task_struct *p)

    解冻入口函数:__thaw_task(struct task_struct *p)

    2.3.4.2 kernel suspend流程的第二阶段 —suspend_devices_and_enter

// suspend.c
/**
 * suspend_devices_and_enter - Suspend devices and enter system sleep state.
 * @state: System sleep state to enter.
 */
int suspend_devices_and_enter(suspend_state_t state)
{ int error;
        bool wakeup = false;
        // 如果当前系统不支持该类型的suspend模式,则return
        if (!sleep_state_supported(state))
                return -ENOSYS;
        pm_suspend_target_state = state;
        // 若为suspend idle模式
        if (state == PM_SUSPEND_TO_IDLE)
                pm_set_suspend_no_platform();
        // 1. platform suspend
        error = platform_suspend_begin(state);
        if (error)
                goto Close;
        // 2. console suspend
        suspend_console();
        suspend_test_start();
        // 3. dpm suspend
        error = dpm_suspend_start(PMSG_SUSPEND);
        if (error) { pr_err("Some devices failed to suspend, or early wake event detected\n");
                log_suspend_abort_reason(
                                "Some devices failed to suspend, or early wake event detected");
                goto Recover_platform;
        }
        suspend_test_finish("suspend devices");
        if (suspend_test(TEST_DEVICES))
                goto Recover_platform;
        do { // 4. system suspend
                error = suspend_enter(state, &wakeup);
        } while (!error && !wakeup && platform_suspend_again(state));
 Resume_devices:
        pr_info("%s:Resume_devices start.\n", STR_KERNEL_LOG_EXIT);
        suspend_test_start();
        dpm_resume_end(PMSG_RESUME);
        suspend_test_finish("resume devices");
        trace_suspend_resume(TPS("resume_console"), state, true);
        resume_console();
        trace_suspend_resume(TPS("resume_console"), state, false);
        pr_info("%s:Resume_devices end.\n", STR_KERNEL_LOG_EXIT);
 Close:
        platform_resume_end(state);
        pm_suspend_target_state = PM_SUSPEND_ON;
        return error;
 Recover_platform:
        platform_recover(state);
        goto Resume_devices;
}

1)platform suspend —platform_suspend_begin

// 1. platform suspend
// suspend.c
static int platform_suspend_begin(suspend_state_t state)
{ if (state == PM_SUSPEND_TO_IDLE && s2idle_ops && s2idle_ops->begin)
                return s2idle_ops->begin();
        else if (suspend_ops && suspend_ops->begin)
                return suspend_ops->begin(state);
        else
                return 0;
}

2)console suspend —suspend_console

// 2. console suspend
// printk.c
/**
 * suspend_console - suspend the console subsystem
 *
 * This disables printk() while we go into suspend states
 */
void suspend_console(void)
{ if (!console_suspend_enabled)
                return;
        pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
        console_lock();
        console_suspended = 1; 
        up_console_sem();
}

3)dpm suspend —dpm_suspend_start

// 3. dpm suspend
// kernel/drivers/base/power/main.c
/**
 * dpm_suspend_start - Prepare devices for PM transition and suspend them.
 * @state: PM transition of the system being carried out.
 *
 * Prepare all non-sysdev devices for system PM transition and execute "suspend"
 * callbacks for them.
 */
 // 3.系统进入睡眠状态之前对非sysdev设备进行准备并将其挂起
int dpm_suspend_start(pm_message_t state)
{ ktime_t starttime = ktime_get();
        int error;
        error = dpm_prepare(state);
        if (error) { suspend_stats.failed_prepare++;
                dpm_save_failed_step(SUSPEND_PREPARE);
        } else 
                // 非sysdev设备进入suspend
                error = dpm_suspend(state);
        dpm_show_time(starttime, state, error, "start");
        return error;
}
EXPORT_SYMBOL_GPL(dpm_suspend_start);
/**
 * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices.
 * @state: PM transition of the system being carried out.
 */
int dpm_suspend(pm_message_t state)
{ ktime_t starttime = ktime_get();
        int error = 0;
        trace_suspend_resume(TPS("dpm_suspend"), state.event, true);
        might_sleep();
        // devfreq & cpufreq suspend
        // 暂停设备的频率调节
        devfreq_suspend();
        cpufreq_suspend();
        mutex_lock(&dpm_list_mtx);
        pm_transition = state;
        async_error = 0;
        while (!list_empty(&dpm_prepared_list)) { struct device *dev = to_device(dpm_prepared_list.prev);
                get_device(dev);
                mutex_unlock(&dpm_list_mtx);
                // device进入suspend
                error = device_suspend(dev);
                mutex_lock(&dpm_list_mtx);
                if (error) { pm_dev_err(dev, state, "", error);
                        dpm_save_failed_dev(dev_name(dev));
                        put_device(dev);
                        break;
                }
                if (!list_empty(&dev->power.entry))
                        list_move(&dev->power.entry, &dpm_suspended_list);
                put_device(dev);
                if (async_error)
                        break;
        }
        mutex_unlock(&dpm_list_mtx);
        async_synchronize_full();
        if (!error)
                error = async_error;
        if (error) { suspend_stats.failed_suspend++;
                dpm_save_failed_step(SUSPEND_SUSPEND);
        }
        dpm_show_time(starttime, state, error, NULL);
        trace_suspend_resume(TPS("dpm_suspend"), state.event, false);
        return error;
}

4)system suspend —suspend_enter

// 4. system suspend
// suspend.c
/**
 * suspend_enter - Make the system enter the given sleep state.
 * @state: System sleep state to enter.
 * @wakeup: Returns information that the sleep state should not be re-entered.
 *
 * This function should be called after devices have been suspended.
 */
 // 4. 系统进入休眠状态
static int suspend_enter(suspend_state_t state, bool *wakeup)
{ int error, last_dev;
        // 1) 调用平台的prepare回调函数
        error = platform_suspend_prepare(state);
        if (error)
                goto Platform_finish;
        // 2) 执行所有设备的suspend late回调函数
        error = dpm_suspend_late(PMSG_SUSPEND);
        if (error) { last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
                last_dev %= REC_FAILED_NUM;
                pr_err("late suspend of devices failed\n");
                log_suspend_abort_reason("late suspend of %s device failed",
                                         suspend_stats.failed_devs[last_dev]);
                goto Platform_finish;
        }
        // 3) 执行s2idle_ops的prepare回调函数
        error = platform_suspend_prepare_late(state);
        if (error)
                goto Devices_early_resume;
        // 4) 执行所有设备的noirq回调,暂停cpuidle,关闭中断
        error = dpm_suspend_noirq(PMSG_SUSPEND);
        if (error) { last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
                last_dev %= REC_FAILED_NUM;
                pr_err("noirq suspend of devices failed\n");
                log_suspend_abort_reason("noirq suspend of %s device failed",
                                         suspend_stats.failed_devs[last_dev]);
                goto Platform_early_resume;
        }
        // 5) 执行prepare_late回调函数
        error = platform_suspend_prepare_noirq(state);
        if (error)
                goto Platform_wake;
        if (suspend_test(TEST_PLATFORM))
                goto Platform_wake;
        if (state == PM_SUSPEND_TO_IDLE) { s2idle_loop();
                goto Platform_wake;
        }
        // 6) 关闭non-boot cpu
        // "Boot" CPU是系统启动时作为第一个启动的CPU核心,
        // 它负责执行引导加载程序(bootloader)的任务,以及进行系统初始化的工作
        error = suspend_disable_secondary_cpus();
        if (error || suspend_test(TEST_CPUS)) { log_suspend_abort_reason("Disabling non-boot cpus failed");
                goto Enable_cpus;
        }
        // 7) 关闭本地中断
        arch_suspend_disable_irqs();
        BUG_ON(!irqs_disabled());
        system_state = SYSTEM_SUSPEND;
        // 8) 调用syscoe suspend回调函数
        error = syscore_suspend();
        pr_info("%s:<< end >>, kernel suspend end.\n", STR_KERNEL_LOG_ENTER);
        if (!error) { // 9) 检查系统中是否有wakelock,若存在,则停止suspend
                *wakeup = pm_wakeup_pending();
                if (!(suspend_test(TEST_CORE) || *wakeup)) { trace_suspend_resume(TPS("machine_suspend"),
                                state, true);
                        // 10) 真正发起系统休眠,有可能通过SMC陷入EL3,经由ATF再通过SCMI告诉其它MCU做系统休眠
                        error = suspend_ops->enter(state);
                        trace_suspend_resume(TPS("machine_suspend"),
                                state, false);
                } else if (*wakeup) { error = -EBUSY;
                }
                pr_info("%s:(( start )), kernel resume start.\n", STR_KERNEL_LOG_EXIT);
                syscore_resume();
        }
        system_state = SYSTEM_RUNNING;
        arch_suspend_enable_irqs();
        BUG_ON(irqs_disabled());
        pr_info("%s:Enable_cpus start.\n", STR_KERNEL_LOG_EXIT);
 Enable_cpus:
        suspend_enable_secondary_cpus();
        pr_info("%s:Platform_wake start.\n", STR_KERNEL_LOG_EXIT);
 Platform_wake:
        platform_resume_noirq(state);
        dpm_resume_noirq(PMSG_RESUME);
        pr_info("%s:Platform_early_resume start.\n", STR_KERNEL_LOG_EXIT);
 Platform_early_resume:
        platform_resume_early(state);
        pr_info("%s:Devices_early_resume start.\n", STR_KERNEL_LOG_EXIT);
 Devices_early_resume:
        dpm_resume_early(PMSG_RESUME);
        pr_info("%s:Platform_finish start.\n", STR_KERNEL_LOG_EXIT);
 Platform_finish:
        platform_resume_finish(state);
        return error;
} 

接下来看下pm_wakeup_pending(void)函数,判断系统中是否存在wakeup source。存在,则susepnd失败。

wakeup source在wakelock管理模块中被申请或创建。参考:https://blog.csdn.net/youthcowboy/article/details/134312903?spm=1001.2014.3001.5502 中wakeup_source_register(struct device *dev, const char *name)函数。

// kernel/drivers/base/power/wakeup.c
bool pm_wakeup_pending(void)
{ unsigned long flags;
        bool ret = false;
        char suspend_abort[MAX_SUSPEND_ABORT_LEN];
        raw_spin_lock_irqsave(&events_lock, flags);
        if (events_check_enabled) { unsigned int cnt, inpr;
                split_counters(&cnt, &inpr); 
                ret = (cnt != saved_count || inpr > 0);
                events_check_enabled = !ret;
        }
        raw_spin_unlock_irqrestore(&events_lock, flags);
        if (ret) { pm_pr_dbg("Wakeup pending, aborting suspend\n");
                pm_print_active_wakeup_sources();
                pm_get_active_wakeup_sources(suspend_abort,
                                             MAX_SUSPEND_ABORT_LEN);
                log_suspend_abort_reason(suspend_abort);
                pr_info("PM: %s\n", suspend_abort);
        }
        return ret || atomic_read(&pm_abort_suspend) > 0;
}
void pm_get_active_wakeup_sources(char *pending_wakeup_source, size_t max) 
{ struct wakeup_source *ws, *last_active_ws = NULL;
        int len = 0; 
        bool active = false;
        rcu_read_lock();
        // 遍历wakeup_sources list,判断系统中是否存在wakeup source。存在,则susepnd失败。
        // wakeup source在wakelock管理模块中被申请或创建
        list_for_each_entry_rcu(ws, &wakeup_sources, entry) { if (ws->active && len < max) { if (!active)
                                len += scnprintf(pending_wakeup_source, max, 
 "Pending Wakeup Sources: ");
                        len += scnprintf(pending_wakeup_source + len, max - len, 
                                "%s ", ws->name);
                        active = true;
                } else if (!active &&
                           (!last_active_ws ||
                            ktime_to_ns(ws->last_time) > ktime_to_ns(last_active_ws->last_time))) { last_active_ws = ws;
                }
        }
        if (!active && last_active_ws) { scnprintf(pending_wakeup_source, max, 
                                "Last active Wakeup Source: %s",
                                last_active_ws->name);
        }
        rcu_read_unlock();
}
// 1) 调用平台的prepare回调函数
// suspend.c
static int platform_suspend_prepare(suspend_state_t state)
{ return state != PM_SUSPEND_TO_IDLE && suspend_ops->prepare ?
            suspend_ops->prepare() : 0;
}
// 2) 执行所有设备的suspend late回调函数
// kernel/drivers/base/power/main.c
/**
 * dpm_suspend_late - Execute "late suspend" callbacks for all devices.
 * @state: PM transition of the system being carried out.
 */
int dpm_suspend_late(pm_message_t state)
{ ktime_t starttime = ktime_get();
        int error = 0;
        trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
        mutex_lock(&dpm_list_mtx);
        pm_transition = state;
        async_error = 0;
        // 遍历 apm suspend list
        while (!list_empty(&dpm_suspended_list)) { struct device *dev = to_device(dpm_suspended_list.prev);
                get_device(dev);
                mutex_unlock(&dpm_list_mtx);
                // device进入suspend
                error = device_suspend_late(dev);
                mutex_lock(&dpm_list_mtx);
                if (!list_empty(&dev->power.entry))
                        list_move(&dev->power.entry, &dpm_late_early_list);
                if (error) { pm_dev_err(dev, state, " late", error);
                        dpm_save_failed_dev(dev_name(dev));
                        put_device(dev);
                        break;
                }
                put_device(dev);
                if (async_error)
                        break;
        }
        mutex_unlock(&dpm_list_mtx);
        async_synchronize_full();
        if (!error)
                error = async_error;
        if (error) { suspend_stats.failed_suspend_late++;
                dpm_save_failed_step(SUSPEND_SUSPEND_LATE);
                dpm_resume_early(resume_event(state));
        }
        dpm_show_time(starttime, state, error, "late");
        trace_suspend_resume(TPS("dpm_suspend_late"), state.event, false);
        return error;
}
/**
 * __device_suspend_late - Execute a "late suspend" callback for given device.
 * @dev: Device to handle.
 * @state: PM transition of the system being carried out.
 * @async: If true, the device is being suspended asynchronously.
 *
 * Runtime PM is disabled for @dev while this function is being executed.
 */
 // 该函数会让设备进入挂起状态。在执行挂起操作之前,
 // 该函数会禁用设备的运行时电源管理,并等待设备的子设备完成挂起操作。
 // 然后,该函数通过执行各种挂起回调函数来完成设备的挂起操作。
 // 最后,该函数将设置设备的is_late_suspended标志为true,表示设备已经完成晚期挂起操作。
 // 这意味着在该函数执行完毕后,设备应该处于挂起状态。
static int __device_suspend_late(struct device *dev, pm_message_t state, bool async)
{ pm_callback_t callback = NULL;
        const char *info = NULL;
        int error = 0;
        TRACE_DEVICE(dev);
        TRACE_SUSPEND(0);
        // runtme PM disable
        __pm_runtime_disable(dev, false);
        // 等待设备的子设备完成suspend操作
        dpm_wait_for_subordinate(dev, async);
        if (async_error)
                goto Complete;
        if (pm_wakeup_pending()) { async_error = -EBUSY;
                goto Complete;
        }
        if (dev->power.syscore || dev->power.direct_complete)
                goto Complete;
        // 检查设备的电源域、设备类型、设备类别和设备总线是否定义了suspend回调函数。
        // 如果存在相应的回调函数,则执行suspend回调函数。
        if (dev->pm_domain) { info = "late power domain ";
                callback = pm_late_early_op(&dev->pm_domain->ops, state);
        } else if (dev->type && dev->type->pm) { info = "late type ";
                callback = pm_late_early_op(dev->type->pm, state);
        } else if (dev->class && dev->class->pm) { info = "late class ";
                callback = pm_late_early_op(dev->class->pm, state);
        } else if (dev->bus && dev->bus->pm) { info = "late bus ";
                callback = pm_late_early_op(dev->bus->pm, state);
        }
        if (callback)
                goto Run;
        if (dev_pm_skip_suspend(dev))
                goto Skip;
        // 如果设备的驱动程序定义了supend回调函数,也执行该回调函数
        if (dev->driver && dev->driver->pm) { info = "late driver ";
                callback = pm_late_early_op(dev->driver->pm, state);
        }
Run:
        error = dpm_run_callback(callback, dev, state, info);
        if (error) { async_error = error;
                log_suspend_abort_reason("Device %s failed to %s late: error %d",
                                         dev_name(dev), pm_verb(state.event), error);
                goto Complete;
        }
        dpm_propagate_wakeup_to_parent(dev);
Skip:
        dev->power.is_late_suspended = true;
Complete:
        TRACE_SUSPEND(error);
        complete_all(&dev->power.completion);
        return error;
}
// 3) 执行s2idle_ops的prepare回调函数
static int platform_suspend_prepare_late(suspend_state_t state)
{ return state == PM_SUSPEND_TO_IDLE && s2idle_ops && s2idle_ops->prepare ?
                s2idle_ops->prepare() : 0;
}
// 4) 执行所有设备的noirq回调,暂停cpuidle,关闭设备中断
/**
 * dpm_suspend_noirq - Execute "noirq suspend" callbacks for all devices.
 * @state: PM transition of the system being carried out.
 *
 * Prevent device drivers' interrupt handlers from being called and invoke
 * "noirq" suspend callbacks for all non-sysdev devices.
 */
int dpm_suspend_noirq(pm_message_t state)
{ int ret; 
        // 暂定空闲的cpu
        cpuidle_pause();
        // 激活系统所有的唤醒中断,以确保在系统进入挂起状态后能够正常被唤醒
        device_wakeup_arm_wake_irqs();
        // supend中断程序
        suspend_device_irqs();
        // 执行设备进入suspend状态的操作
        ret = dpm_noirq_suspend_devices(state);
        if (ret)
                dpm_resume_noirq(resume_event(state));
        return ret; 
}
// 5) 执行prepare_late回调函数
static int platform_suspend_prepare_noirq(suspend_state_t state)
{ if (state == PM_SUSPEND_TO_IDLE)
                return s2idle_ops && s2idle_ops->prepare_late ?
                        s2idle_ops->prepare_late() : 0;
        return suspend_ops->prepare_late ? suspend_ops->prepare_late() : 0;
}
// kernel/drivers/acpi/sleep.c
static const struct platform_s2idle_ops acpi_s2idle_ops = { .begin = acpi_s2idle_begin,
        .prepare = acpi_s2idle_prepare,
        .prepare_late = acpi_s2idle_prepare_late,
        .wake = acpi_s2idle_wake,
        .restore_early = acpi_s2idle_restore_early,
        .restore = acpi_s2idle_restore,
        .end = acpi_s2idle_end,
};
static int acpi_s2idle_prepare_late(void)
{ if (!lps0_device_handle || sleep_no_lps0)
                return 0;
        if (pm_debug_messages_on)
                lpi_check_constraints();
        acpi_sleep_run_lps0_dsm(ACPI_LPS0_SCREEN_OFF);
        acpi_sleep_run_lps0_dsm(ACPI_LPS0_ENTRY);
        return 0;
}
// 6) 关闭non-boot cpu
// boot cpu ---加载bootloader程序的cpu(cpu 0),其它核cpu均为non-boot
// kernel/include/linux/cpu.h
static inline int suspend_disable_secondary_cpus(void)
{ int cpu = 0;
        if (IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU))
                cpu = -1; 
        return freeze_secondary_cpus(cpu);
}
// kernel/kernel/cpu.c
int freeze_secondary_cpus(int primary)  // primary为0,表示cpu 0
{ int cpu, error = 0;
        cpu_maps_update_begin();
        if (primary == -1) { primary = cpumask_first(cpu_online_mask);
                if (!housekeeping_cpu(primary, HK_FLAG_TIMER))
                        primary = housekeeping_any_cpu(HK_FLAG_TIMER);
        } else { if (!cpu_online(primary))
                        primary = cpumask_first(cpu_online_mask);
        }
        /*
         * We take down all of the non-boot CPUs in one shot to avoid races
         * with the userspace trying to use the CPU hotplug at the same time
         */
        cpumask_clear(frozen_cpus);
        pr_info("Disabling non-boot CPUs ...\n");
        for_each_online_cpu(cpu) { // 遍历所有cpu核
                if (cpu == primary)
                        continue;    // 若为cpu0,则continue
                if (pm_wakeup_pending()) { // 默认为false
                        pr_info("Wakeup pending. Abort CPU freeze\n");
                        error = -EBUSY;
                        break;
                }
                trace_suspend_resume(TPS("CPU_OFF"), cpu, true);
                // non-boot cpu suspend
                error = _cpu_down(cpu, 1, CPUHP_OFFLINE);
                trace_suspend_resume(TPS("CPU_OFF"), cpu, false);
                if (!error)
                        cpumask_set_cpu(cpu, frozen_cpus);
                else { pr_err("Error taking CPU%d down: %d\n", cpu, error);
                        break;
                }
        }
        if (!error)
                BUG_ON(num_online_cpus() > 1);
        else
                pr_err("Non-boot CPUs are not disabled\n");
        /*
         * Make sure the CPUs won't be enabled by someone else. We need to do
         * this even in case of failure as all freeze_secondary_cpus() users are
         * supposed to do thaw_secondary_cpus() on the failure path.
         */
        cpu_hotplug_disabled++;
        cpu_maps_update_done();
        return error;
}
// 7) 关闭本地中断
/* default implementation */
void __weak arch_suspend_disable_irqs(void)
{ local_irq_disable();
}
// kernel/arch/arm64/include/asm/irqflags.h
static inline void arch_local_irq_disable(void)
{ if (system_has_prio_mask_debugging()) { u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1);
                WARN_ON_ONCE(pmr != GIC_PRIO_IRQON && pmr != GIC_PRIO_IRQOFF);
        }
        asm volatile(ALTERNATIVE(
                "msr    daifset, #2             // arch_local_irq_disable",
                __msr_s(SYS_ICC_PMR_EL1, "%0"),
                ARM64_HAS_IRQ_PRIO_MASKING)
                :
                : "r" ((unsigned long) GIC_PRIO_IRQOFF)
                : "memory");
}
// 8) 调用syscoe suspend回调函数
 #ifdef CONFIG_PM_SLEEP
/**
 * syscore_suspend - Execute all the registered system core suspend callbacks.
 *
 * This function is executed with one CPU on-line and disabled interrupts.
 */
 // 执行所有的registere到system core的suspend callbacks回调
int syscore_suspend(void)
{ struct syscore_ops *ops;
        int ret = 0;
        trace_suspend_resume(TPS("syscore_suspend"), 0, true);
        pm_pr_dbg("Checking wakeup interrupts\n");
        /* Return error code if there are any wakeup interrupts pending. */
        if (pm_wakeup_pending())
                return -EBUSY;
        WARN_ONCE(!irqs_disabled(),
                "Interrupts enabled before system core suspend.\n");
        list_for_each_entry_reverse(ops, &syscore_ops_list, node)
                if (ops->suspend) { pm_pr_dbg("Calling %pS\n", ops->suspend);
                        ret = ops->suspend();
                        if (ret)
                                goto err_out;
                        WARN_ONCE(!irqs_disabled(),
                                "Interrupts enabled after %pS\n", ops->suspend);
                }
        trace_suspend_resume(TPS("syscore_suspend"), 0, false);
        return 0;
 err_out:
        log_suspend_abort_reason("System core suspend callback %pS failed",
                ops->suspend);
        pr_err("PM: System core suspend callback %pF failed.\n", ops->suspend);
        list_for_each_entry_continue(ops, &syscore_ops_list, node)
                if (ops->resume)
                        ops->resume();
        return ret;
}
EXPORT_SYMBOL_GPL(syscore_suspend);
// 9) 检查系统中是否有wakelock,若存在,则停止suspend
 pm_wakeup_pending()    ---默认return false
// 10) 真正发起系统休眠,有可能通过SMC陷入EL3,经由ATF再通过SCMI告诉其它MCU做系统休眠
suspend_ops->enter(state);
// kernel/drivers/acpi/sleep.c
#ifdef CONFIG_SUSPEND
static u32 acpi_suspend_states[] = { [PM_SUSPEND_ON] = ACPI_STATE_S0,
        [PM_SUSPEND_STANDBY] = ACPI_STATE_S1,
        [PM_SUSPEND_MEM] = ACPI_STATE_S3,
        [PM_SUSPEND_MAX] = ACPI_STATE_S5
};
static const struct platform_suspend_ops acpi_suspend_ops = { .valid = acpi_suspend_state_valid,
        .begin = acpi_suspend_begin,
        .prepare_late = acpi_pm_prepare,
        .enter = acpi_suspend_enter,
        .wake = acpi_pm_finish,
        .end = acpi_pm_end,
};
/**
 *        acpi_suspend_enter - Actually enter a sleep state.
 *        @pm_state: ignored
 *
 *        Flush caches and go to sleep. For STR we have to call arch-specific
 *        assembly, which in turn call acpi_enter_sleep_state().
 *        It's unfortunate, but it works. Please fix if you're feeling frisky.
 */
 // 默认进入ACPI_STATE_S3(suspend to ram),执行acpi_suspend_lowlevel()
 // ACPI是“高级配置与电源管理接口”,它是计算机系统中用于配置硬件和管理电源的开放标准
static int acpi_suspend_enter(suspend_state_t pm_state)
{ acpi_status status = AE_OK;
         // 状态值在之前的platform_suspend_begin(suspend_state_t state)传进来
        u32 acpi_state = acpi_target_sleep_state;   
        int error;
        // 刷新CPU缓存,确保数据已经写回内存
        ACPI_FLUSH_CPU_CACHE();
        trace_suspend_resume(TPS("acpi_suspend"), acpi_state, true);
        switch (acpi_state) { case ACPI_STATE_S1: // suspend to standby
                barrier();
                status = acpi_enter_sleep_state(acpi_state);
                break;
        case ACPI_STATE_S3:    // suspend to ram
                if (!acpi_suspend_lowlevel)
                        return -ENOSYS;
                error = acpi_suspend_lowlevel();
                if (error)
                        return error;
                pr_info(PREFIX "Low-level resume complete\n");
                pm_set_resume_via_firmware();
                break;
        }
        trace_suspend_resume(TPS("acpi_suspend"), acpi_state, false);
        /* This violates the spec but is required for bug compatibility. */
        acpi_write_bit_register(ACPI_BITREG_SCI_ENABLE, 1);
        /* Reprogram control registers */
        acpi_leave_sleep_state_prep(acpi_state);
        /* ACPI 3.0 specs (P62) says that it's the responsibility
         * of the OSPM to clear the status bit [ implying that the
         * POWER_BUTTON event should not reach userspace ]
         *
         * However, we do generate a small hint for userspace in the form of
         * a wakeup event. We flag this condition for now and generate the
         * event later, as we're currently too early in resume to be able to
         * generate wakeup events.
         */
        if (ACPI_SUCCESS(status) && (acpi_state == ACPI_STATE_S3)) { acpi_event_status pwr_btn_status = ACPI_EVENT_FLAG_DISABLED;
                acpi_get_event_status(ACPI_EVENT_POWER_BUTTON, &pwr_btn_status);
                if (pwr_btn_status & ACPI_EVENT_FLAG_STATUS_SET) { acpi_clear_event(ACPI_EVENT_POWER_BUTTON);
                        /* Flag for later */
                        pwr_btn_event_pending = true;
                }
        }
        /*
         * Disable and clear GPE status before interrupt is enabled. Some GPEs
         * (like wakeup GPE) haven't handler, this can avoid such GPE misfire.
         * acpi_leave_sleep_state will reenable specific GPEs later
         */
        acpi_disable_all_gpes();
        /* Allow EC transactions to happen. */
        acpi_ec_unblock_transactions();
        suspend_nvs_restore();
        return ACPI_SUCCESS(status) ? 0 : -EFAULT;
}

小结:

1)suspend第二阶段做了以下4个步骤

a.platform suspend

b.console suspend,控制台进行suspend

c.dpm suspend,系统进入睡眠状态之前对非sysdev设备进行准备并将其挂起

d.system suspend,系统进入suspend

调用平台的prepare回调函数;

执行所有设备的suspend late回调函数;

执行s2idle_ops的prepare回调函数;

行所有设备的noirq回调,暂停cpuidle,关闭中断;

关闭non-boot cpu;

关闭本地中断;

调用syscoe suspend回调函数;

检查系统中是否有wakelock,若存在,则停止suspend;

真正发起系统休眠,有可能通过SMC陷入EL3,经由ATF再通过SCMI告诉其它MCU做系统休眠;

2.4 系统进入休眠的场景

按键、灭屏等

2.5 系统休眠与睡眠的区别

suspend常见的几种状态模式:idle、standby、mem、disk

休眠模式(Suspend):当系统进入休眠状态时,它会暂停所有进程的运行,并将系统状态保存到内存中(suspend to ram)。这包括当前的工作以及系统的各种设置和状态。在休眠模式下,计算机会关闭显示器和硬盘,并且减少对其他硬件设备的供电,以节省电量。当用户再次唤醒计算机时,系统会恢复之前的状态,并继续进行未完成的任务。

睡眠模式(Hibernate):与休眠模式相比,睡眠模式更加节能。当系统进入睡眠状态时,它会将当前的工作和系统状态保存到硬盘上(suspend to disk)的一个特定文件中,然后关闭计算机。在睡眠模式下,计算机几乎不消耗电量,因为所有硬件设备都被关闭。当用户再次启动计算机时,系统会从保存的文件中恢复,并继续之前的工作。(可理解为深度休眠模式)

2.6 system hibernate流程

Suspend to disk模式,一般都称作系统睡眠。

在state_store函数中,休眠模式执行pm_suspend函数,睡眠模式执行hibernate函数。

// mian.c
// state_store是一个syscall调用函数,为进入suspend的入口函数,
// 接收user space传递的"mem",准备进入suspend to ram模式
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
                           const char *buf, size_t n)
{ suspend_state_t state;
        int error;
        
        pr_info("%s:Userspace to kernel, process start write command \"echo mem > /sys/power/state\". The process is \"%s\"(pid %i). The thread is \"%s\"(tid %i).\n",
                STR_KERNEL_LOG_ENTER, current->group_leader->comm, current->group_leader->pid, current->comm, current->pid);
        
        error = pm_autosleep_lock();
        if (error)
                return error;
        
        if (pm_autosleep_state() > PM_SUSPEND_ON) { error = -EBUSY;
                goto out;
        }
        // "mem"字符串封装到suspend_state_t数据结构
        state = decode_state(buf, n);
        // 检查user space传递的suspend模式是否符合规定的范围
        if (state < PM_SUSPEND_MAX) { if (state == PM_SUSPEND_MEM)
                    state = mem_sleep_current;
            // 调用suspend.c函数,进入suspend阶段
            // 返回值为0表示suspend successed,否则supend failed,常用于suspend debug调试
            error = pm_suspend(state);
            if (!error) { pr_info("%s:(( end )), kernel resume end.\n", STR_KERNEL_LOG_EXIT);
        } else if (state == PM_SUSPEND_MAX) { // PM_SUSPEND_MAX为suspend to disk,进入hibernate睡眠模式,走另一条流程
        error = hibernate();
        } else { error = -EINVAL;
        }
 out:
        pm_autosleep_unlock();
        return error ? error : n;
}
// hibernate.c
/**
 * hibernate - Carry out system hibernation, including saving the image.
 */
int hibernate(void)
{ bool snapshot_test = false;
        int error;
        if (!hibernation_available()) { pm_pr_dbg("Hibernation not available.\n");
                return -EPERM;
        }
        lock_system_sleep();
        /* The snapshot device should not be opened while we're running */
        if (!hibernate_acquire()) { error = -EBUSY;
                goto Unlock;
        }
        pr_info("hibernation entry\n");
        pm_prepare_console();
        error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION);
        if (error)
                goto Restore;
        ksys_sync_helper();
        error = freeze_processes();
        if (error)
                goto Exit;
        lock_device_hotplug();
        /* Allocate memory management structures */
        error = create_basic_memory_bitmaps();
        if (error)
                goto Thaw;
        error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
        if (error || freezer_test_done)
                goto Free_bitmaps;
        if (in_suspend) { unsigned int flags = 0;
                if (hibernation_mode == HIBERNATION_PLATFORM)
                        flags |= SF_PLATFORM_MODE;
                if (nocompress)
                        flags |= SF_NOCOMPRESS_MODE;
                else
                        flags |= SF_CRC32_MODE;
                pm_pr_dbg("Writing hibernation image.\n");
                error = swsusp_write(flags);
                swsusp_free();
                if (!error) { if (hibernation_mode == HIBERNATION_TEST_RESUME)
                                snapshot_test = true;
                        else
                                power_down();
                }
                in_suspend = 0;
                pm_restore_gfp_mask();
        } else { pm_pr_dbg("Hibernation image restored successfully.\n");
        }
 Free_bitmaps:
        free_basic_memory_bitmaps();
 Thaw:
        unlock_device_hotplug();
        if (snapshot_test) { pm_pr_dbg("Checking hibernation image\n");
                error = swsusp_check();
                if (!error)
                        error = load_image_and_restore();
        }
        thaw_processes();
        /* Don't bother checking whether freezer_test_done is true */
        freezer_test_done = false;
 Exit:
        pm_notifier_call_chain(PM_POST_HIBERNATION);
 Restore:
        pm_restore_console();
        hibernate_release();
 Unlock:
        unlock_system_sleep();
        pr_info("hibernation exit\n");
        return error;
}

三、系统唤醒机制

3.1 system resume核心代码逻辑

按照1.3流程框架中resume部分进行模块化介绍.

3.1.1 platform dependent suspend

这部分代码由各平台厂商实现

3.1.2 syscore resume

执行所有的resume register callback函数,并在当前cpu上执行,且关闭当前cpu的中断。

// kernel/drivers/base/syscore.c
/**
 * syscore_resume - Execute all the registered system core resume callbacks.
 *
 * This function is executed with one CPU on-line and disabled interrupts.
 */
void syscore_resume(void)
{ struct syscore_ops *ops;
        trace_suspend_resume(TPS("syscore_resume"), 0, true);
        WARN_ONCE(!irqs_disabled(),
                "Interrupts enabled before system core resume.\n");
        // 遍历注册在syscore resume中所有的call list
        list_for_each_entry(ops, &syscore_ops_list, node)
                if (ops->resume) { if (initcall_debug)
                                pr_info("PM: Calling %pS\n", ops->resume);
                        // 调用各自的resume函数
                        ops->resume();
                        // 关闭当前cpu上的本地中断
                        WARN_ONCE(!irqs_disabled(),
                                "Interrupts enabled after %pS\n", ops->resume);
                }
        trace_suspend_resume(TPS("syscore_resume"), 0, false);
}
EXPORT_SYMBOL_GPL(syscore_resume);

3.1.3 irqs PM

// arch/arm64/include/asm/irqflags.h
static inline void arch_local_irq_enable(void)
{ if (system_has_prio_mask_debugging()) { u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1);
            WARN_ON_ONCE(pmr != GIC_PRIO_IRQON && pmr != GIC_PRIO_IRQOFF);
    }
    // 使用了ARM64的特定指令(msr daifclr, #3)来清除中断寄存器寄存器的IRQ位,从而启用中断
    // 使能(Enabled)状态:IRQ位被设置为"0"; 禁止(Disabled)状态:IRQ位被设置为"1"
    asm volatile(ALTERNATIVE(
            "msr        daifclr, #3  // #3表示要清除中断寄存器中的"I"位和"F"位,即IRQ和FIQ位设置为0
            __msr_s(SYS_ICC_PMR_EL1, "%0"),
            ARM64_HAS_IRQ_PRIO_MASKING)
            :
            : "r" ((unsigned long) GIC_PRIO_IRQON)
            : "memory");
    pmr_sync();
}

3.1.4 cpu PM

打开non-boot cpu.

// kernel/include/linux/cpu.h
static inline void suspend_enable_secondary_cpus(void)
{ return thaw_secondary_cpus();
}
// kernel/cpu.c
void thaw_secondary_cpus(void)
{ int cpu, error;
        /* Allow everyone to use the CPU hotplug again */
        cpu_maps_update_begin();
        __cpu_hotplug_enable();
        if (cpumask_empty(frozen_cpus))
                goto out;
        pr_info("Enabling non-boot CPUs ...\n");
        arch_thaw_secondary_cpus_begin();
        for_each_cpu(cpu, frozen_cpus) { trace_suspend_resume(TPS("CPU_ON"), cpu, true);
                error = _cpu_up(cpu, 1, CPUHP_ONLINE);
                trace_suspend_resume(TPS("CPU_ON"), cpu, false);
                if (!error) { pr_info("CPU%d is up\n", cpu);
                        continue;
                }
                pr_warn("Error taking CPU%d up: %d\n", cpu, error);
        }
        arch_thaw_secondary_cpus_end();
        cpumask_clear(frozen_cpus);
out:
        cpu_maps_update_done();
}

3.1.5 device PM

// kernel/drivers/base/power/main.c
/**
 * dpm_resume_noirq - Execute "noirq resume" callbacks for all devices.
 * @state: PM transition of the system being carried out.
 *
 * Invoke the "noirq" resume callbacks for all devices in dpm_noirq_list and
 * allow device drivers' interrupt handlers to be called.
 */
void dpm_resume_noirq(pm_message_t state)
{ // 无IRQ情况下的设备恢复操作
        dpm_noirq_resume_devices(state);
        // 重新启用设备的IRQ中断
        resume_device_irqs();
        // 取消设备的唤醒IRQ中断警报
        device_wakeup_disarm_wake_irqs();
}
// kernel/irq/pm.c
void resume_device_irqs(void)
{ resume_irqs(false);
}
EXPORT_SYMBOL_GPL(resume_device_irqs);
static void resume_irqs(bool want_early)
{ struct irq_desc *desc;
        int irq;
        for_each_irq_desc(irq, desc) { unsigned long flags;
                bool is_early = desc->action &&
                        desc->action->flags & IRQF_EARLY_RESUME;
                if (!is_early && want_early)
                        continue;
                if (irq_settings_is_nested_thread(desc))
                        continue;
                raw_spin_lock_irqsave(&desc->lock, flags);
                resume_irq(desc);
                raw_spin_unlock_irqrestore(&desc->lock, flags);
        }
}

3.1.6 PM core

恢复控制台

// kernel/printk/printk.c
void resume_console(void)
{ if (!console_suspend_enabled)
                return;
        down_console_sem();
        console_suspended = 0;
        console_unlock();
        pr_flush(1000, true);
}

线程解冻

// kernel/power/suspend.c
static void suspend_finish(void)
{ suspend_thaw_processes();
        pm_notifier_call_chain(PM_POST_SUSPEND);
        pm_restore_console();
}
// kernel/power/power.h
static inline void suspend_thaw_processes(void)
{ // 最终调用到freezer模块进行解冻
        thaw_processes();
}

3.4 系统被唤醒的场景

按键、RTC、亮屏、USB拔插等。