2016-04-15
张超《Linux内核分析》MOOC课程http://mooc.study.163.com/course/USTC-1000029000
一、分析
进程调度的时机与进程切换
操作系统原理中介绍了大量进程调度算法,这些算法从实现的角度看仅仅是从运行队列中选择一个新进程,选择的过程中运用了不同的策略而已。对于理解操作系统的工作机制,反而是进程的调度时机与进程的切换机制更为关键。
进程调度的时机:
schedule()是个内核函数,不是内核函数。所以用户态的进程不能直接调用,只能间接调用。内核线程是只有内核态没有用户态的特殊进程。
1.中断处理过程(包括时钟中断、I/O中断、系统调用和异常)中,直接调用schedule(),或者返回用户态时根据need_resched标记调用schedule();
2.内核线程可以直接调用schedule()进行进程切换,也可以在中断处理过程中进行调度,也就是说内核线程作为一类的特殊的进程可以主动调度,也可以被动调度;
3.用户态进程无法实现主动调度,仅能通过陷入内核态后的某个时机点进行调度,即在中断处理过程中进行调度。
进程切换:
1.为了控制进程的执行,内核必须有能力挂起正在CPU上执行的进程,并恢复以前挂起的某个进程的执行,这叫做进程切换、任务切换、上下文切换;
2.挂起正在CPU上执行的进程,与中断时保存现场是不同的,中断前后是在同一个进程上下文中,只是由用户态转向内核态执行;
3.进程上下文包含了进程执行需要的所有信息
I 用户地址空间:包括程序代码,数据,用户堆栈等 II 控制信息:进程描述符,内核堆栈等
III 硬件上下文(注意中断也要保存硬件上下文只是保存的方法不同)
4.schedule()函数选择一个新的进程来运行,并调用context_switch进行上下文的切换,这个宏调用switch_to来进行关键上下文切换
schedule 在/linux-3.18.6/kernel/sched/core.c
2733/* 2734 * __schedule() is the main scheduler function. 2735 * 2736 * The main means of driving the scheduler and thus entering this function are: 2737 * 2738 * 1. Explicit blocking: mutex, semaphore, waitqueue, etc. 2739 * 2740 * 2. TIF_NEED_RESCHED flag is checked on interrupt and userspace return 2741 * paths. For example, see arch/x86/entry_64.S. 2742 * 2743 * To drive preemption between tasks, the scheduler sets the flag in timer 2744 * interrupt handler scheduler_tick(). 2745 * 2746 * 3. Wakeups don't really cause entry into schedule(). They add a 2747 * task to the run-queue and that's it. 2748 * 2749 * Now, if the new task added to the run-queue preempts the current 2750 * task, then the wakeup sets TIF_NEED_RESCHED and schedule() gets 2751 * called on the nearest possible occasion: 2752 * 2753 * - If the kernel is preemptible (CONFIG_PREEMPT=y): 2754 * 2755 * - in syscall or exception context, at the next outmost 2756 * preempt_enable(). (this might be as soon as the wake_up()'s 2757 * spin_unlock()!) 2758 * 2759 * - in IRQ context, return from interrupt-handler to 2760 * preemptible context 2761 * 2762 * - If the kernel is not preemptible (CONFIG_PREEMPT is not set) 2763 * then at the next: 2764 * 2765 * - cond_resched() call 2766 * - explicit schedule() call 2767 * - return from syscall or exception to user-space 2768 * - return from interrupt-handler to user-space 2769 */ 2770static void __sched __schedule(void) 2771{ 2772 struct task_struct *prev, *next; 2773 unsigned long *switch_count; 2774 struct rq *rq; 2775 int cpu; 2776 2777need_resched: 2778 preempt_disable(); 2779 cpu = smp_processor_id(); 2780 rq = cpu_rq(cpu); 2781 rcu_note_context_switch(cpu); 2782 prev = rq->curr; 2783 2784 schedule_debug(prev); 2785 2786 if (sched_feat(HRTICK)) 2787 hrtick_clear(rq); 2788 2789 /* 2790 * Make sure that signal_pending_state()->signal_pending() below 2791 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) 2792 * done by the caller to avoid the race with signal_wake_up(). 2793 */ 2794 smp_mb__before_spinlock(); 2795 raw_spin_lock_irq(&rq->lock); 2796 2797 switch_count = &prev->nivcsw; 2798 if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) { 2799 if (unlikely(signal_pending_state(prev->state, prev))) { 2800 prev->state = TASK_RUNNING; 2801 } else { 2802 deactivate_task(rq, prev, DEQUEUE_SLEEP); 2803 prev->on_rq = 0; 2804 2805 /* 2806 * If a worker went to sleep, notify and ask workqueue 2807 * whether it wants to wake up a task to maintain 2808 * concurrency. 2809 */ 2810 if (prev->flags & PF_WQ_WORKER) { 2811 struct task_struct *to_wakeup; 2812 2813 to_wakeup = wq_worker_sleeping(prev, cpu); 2814 if (to_wakeup) 2815 try_to_wake_up_local(to_wakeup); 2816 } 2817 } 2818 switch_count = &prev->nvcsw; 2819 } 2820 2821 if (task_on_rq_queued(prev) || rq->skip_clock_update < 0) 2822 update_rq_clock(rq); 2823 2824 next = pick_next_task(rq, prev); 2825 clear_tsk_need_resched(prev); 2826 clear_preempt_need_resched(); 2827 rq->skip_clock_update = 0; 2828 2829 if (likely(prev != next)) { 2830 rq->nr_switches++; 2831 rq->curr = next; 2832 ++*switch_count; 2833 2834 context_switch(rq, prev, next); /* unlocks the rq */ 2835 /* 2836 * The context switch have flipped the stack from under us 2837 * and restored the local variables which were saved when 2838 * this task called schedule() in the past. prev == current 2839 * is still correct, but it can be moved to another cpu/rq. 2840 */ 2841 cpu = smp_processor_id(); 2842 rq = cpu_rq(cpu); 2843 } else 2844 raw_spin_unlock_irq(&rq->lock); 2845 2846 post_schedule(rq); 2847 2848 sched_preempt_enable_no_resched(); 2849 if (need_resched()) 2850 goto need_resched; 2851}