momo zone

调核人的blog

tsc hpet lapic 内核计时的一些疑惑

事情的缘由是这样的,/sys/device/system/clocksource/clocksource0目录下的available_clocksource和current_clocksource分别显示系统可用的时钟源和当前选择的时钟源。但我并不清楚切换时钟源到底产生什么影响。

available_clocksource的来源自代码sysfs_show_available_clocksources,他实际就是打印挂载clocksource_list链表下面的clocksource结构中的name。对于intel x86一般而言打印的结果包括tsc,hpet,acpi_pm。挂载链表需要通过函数clocksource_register,__clocksource_register_scale对时钟源进行注册。后者实际由包装函数clocksource_register_hz和clocksource_register_khz调用。

那么对于tsc,由init_tsc_clocksource()进行注册。对于hpet由hpet_clocksource_register()注册,acpi_pm由init_acpi_pm_clocksource()注册。此外还有一个clocksource_register()也注册了时钟源i8253,也就是传统的PIT,但却没有在available_clocksource中显示,原因是这个时钟源在符合某些条件下将明确不启用,具体原因看代码init_pit_clocksource。但这并不意味着pit在整个内核中不被使用,实际上内核依旧通过setup_default_timer_irq将pit注册到了irq0,通过cat /proc/interrupt也看得到。可以发现中断数一般不为0,但数字也不大,原因是在注册irq0并启用到hpet注册并启用中间有一段时间,该时间内pit仍然像古董计算机那样作为内核默认(当前)的时钟源。

搞清楚时钟源(clocksource)了再说时钟事件设备(clock_event_device),它的结构中有一个event_handler回调函数,用来触发并驱动内核中已安装的内核定时器。打个比方就像引爆氢弹的那个小原子弹。另外还有一个结构,时钟设备(tick_device),它其实就是clock_event_device的包装结构,多出来的成员mode和clock_event_device.mode中指示的周期触发or单触发模式是基本一致的(深入了解的话启用动态时钟后可能不一致,这里为了简化不考虑动态时钟)。 时钟设备是动态时钟特性引入的结构,这里可以简单认为它就是时钟事件设备。cat /proc/timer_list 可以看到根时钟设备,定时器相关的信息:

该文件分为4个部分,第一部分打印头部,由函数 timer_list_header() 负责:

Timer List Version: v0.7
HRTIMER_MAX_CLOCK_BASES: 4
now at 52813472596941 nsecs

第二部分比较复杂由print_cpu()负责,两层循环打印各个cpu和各个cpu内部4个定时器链中各个定时器的状态

cpu: 1
 clock 0:
  .base:       ffff88017fc8c760
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: , tick_sched_timer, S:01, tick_nohz_idle_exit, swapper/1/0
 # expires at 52813473000000-52813473000000 nsecs [in 403059 to 403059 nsecs]
 #1: , hrtimer_wakeup, S:01, schedule_hrtimeout_range_clock, kwin/1646
 # expires at 52813475032185-52813475082185 nsecs [in 2435244 to 2485244 nsecs]
 #2: , hrtimer_wakeup, S:01, schedule_hrtimeout_range_clock, Chrome_CacheThr/15866
 # expires at 52813613513070-52813619614069 nsecs [in 140916129 to 147017128 nsecs]
 #3: , hrtimer_wakeup, S:01, schedule_hrtimeout_range_clock, chrome/15834
 # expires at 52813774083871-52813774483870 nsecs [in 301486930 to 301886929 nsecs]
 #4: , watchdog_timer_fn, S:01, watchdog_enable, watchdog/1/26
 # expires at 52816686401055-52816686401055 nsecs [in 3213804114 to 3213804114 nsecs]
 #5: , hrtimer_wakeup, S:01, futex_wait_queue_me, chrome/16029
 # expires at 52817513568326-52817513618326 nsecs [in 4040971385 to 4041021385 nsecs]
 #6: , timerfd_tmrproc, S:01, do_timerfd_settime, systemd-journal/414
 # expires at 52820000000000-52820000000000 nsecs [in 6527403059 to 6527403059 nsecs]
 #7: , hrtimer_wakeup, S:01, schedule_hrtimeout_range_clock, kded4/1552
 # expires at 52821469657613-52821478488612 nsecs [in 7997060672 to 8005891671 nsecs]
 #8: , hrtimer_wakeup, S:01, futex_wait_queue_me, BrowserBlocking/15943
 # expires at 52837007486888-52837007536888 nsecs [in 23534889947 to 23534939947 nsecs]
 #9: , timerfd_tmrproc, S:01, do_timerfd_settime, systemd/1
 # expires at 52840250000000-52840250000000 nsecs [in 26777403059 to 26777403059 nsecs]
 #10: , hrtimer_wakeup, S:01, futex_wait_queue_me, chrome/16827
 # expires at 52850026037451-52850026087451 nsecs [in 36553440510 to 36553490510 nsecs]
 #11: , hrtimer_wakeup, S:01, schedule_hrtimeout_range_clock, master/1116
 # expires at 52851644484709-52851704484708 nsecs [in 38171887768 to 38231887767 nsecs]
 #12: , hrtimer_wakeup, S:01, futex_wait_queue_me, WorkerPool/147/9520
 # expires at 52987931753150-52987931803150 nsecs [in 174459156209 to 174459206209 nsecs]
 #13: , hrtimer_wakeup, S:01, futex_wait_queue_me, WorkerPool/149/9546
 # expires at 52987932822056-52987932872056 nsecs [in 174460225115 to 174460275115 nsecs]
 #14: , hrtimer_wakeup, S:01, futex_wait_queue_me, WorkerPool/48/12843
 # expires at 53002566669134-53002566719134 nsecs [in 189094072193 to 189094122193 nsecs]
 #15: , hrtimer_wakeup, S:01, futex_wait_queue_me, WorkerPool/97/6284
 # expires at 53003147109047-53003147159047 nsecs [in 189674512106 to 189674562106 nsecs]
 #16: , hrtimer_wakeup, S:01, schedule_hrtimeout_range_clock, qmgr/1118
 # expires at 53091644442566-53091744442566 nsecs [in 278171845625 to 278271845625 nsecs]
 #17: , hrtimer_wakeup, S:01, schedule_hrtimeout_range_clock, NetworkManager/876
 # expires at 53109000969906-53109100969906 nsecs [in 295528372965 to 295628372965 nsecs]
 #18: , it_real_fn, S:01, do_setitimer, qmgr/1118
 # expires at 53124644427060-53124644427060 nsecs [in 311171830119 to 311171830119 nsecs]
 #19: , it_real_fn, S:01, do_setitimer, master/1116
 # expires at 53124644484430-53124644484430 nsecs [in 311171887489 to 311171887489 nsecs]
 #20: , hrtimer_wakeup, S:01, futex_wait_queue_me, WorkerPool/44/13426
 # expires at 53188257243107-53188257293107 nsecs [in 374784646166 to 374784696166 nsecs]
 #21: , hrtimer_wakeup, S:01, futex_wait_queue_me, WorkerPool/41/9679
 # expires at 53189846117399-53189846167399 nsecs [in 376373520458 to 376373570458 nsecs]
 #22: , hrtimer_wakeup, S:01, futex_wait_queue_me, WorkerPool/249/13559
 # expires at 53287021792511-53287021842511 nsecs [in 473549195570 to 473549245570 nsecs]
 #23: , hrtimer_wakeup, S:01, futex_wait_queue_me, WorkerPool/1175/11753
 # expires at 53404805411154-53404805461154 nsecs [in 591332814213 to 591332864213 nsecs]
 #24: , hrtimer_wakeup, S:01, futex_wait_queue_me, WorkerPool/1524/15244
 # expires at 53404827318575-53404827368575 nsecs [in 591354721634 to 591354771634 nsecs]
 #25: , hrtimer_wakeup, S:01, schedule_hrtimeout_range_clock, Chrome_FileThre/15863
 # expires at 53586342492462-53586442492462 nsecs [in 772869895521 to 772969895521 nsecs]
 #26: , hrtimer_wakeup, S:01, schedule_hrtimeout_range_clock, dhclient/11767
 # expires at 87553216902146-87553316902146 nsecs [in 34739744305205 to 34739844305205 nsecs]
 #27: , hrtimer_wakeup, S:01, futex_wait_queue_me, Chrome_DBThread/15862
 # expires at 89583413156055-89583413206055 nsecs [in 36769940559114 to 36769940609114 nsecs]
 clock 1:
  .base:       ffff88017fc8c7a0
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1445646478326986436 nsecs
active timers:
 #0: , hrtimer_wakeup, S:01, futex_wait_queue_me, QThread/1942
 # expires at 1445699294416776000-1445699294416826000 nsecs [in 1445646480944179059 to 1445646480944229059 nsecs]
 #1: , timerfd_tmrproc, S:01, do_timerfd_settime, systemd/1425
 # expires at 9223372036854775807-9223372036854775807 nsecs [in 9223319223382178866 to 9223319223382178866 nsecs]
 clock 2:
  .base:       ffff88017fc8c7e0
  .index:      2
  .resolution: 1 nsecs
  .get_time:   ktime_get_boottime
  .offset:     29936432888514 nsecs
active timers:
 clock 3:
  .base:       ffff88017fc8c820
  .index:      3
  .resolution: 1 nsecs
  .get_time:   ktime_get_clocktai
  .offset:     1445646478326986436 nsecs
active timers:
  .expires_next   : 52813474000000 nsecs
  .hres_active    : 1
  .nr_events      : 13118249
  .nr_retries     : 1557
  .nr_hangs       : 0
  .max_hang_time  : 0 nsecs
  .nohz_mode      : 2
  .last_tick      : 52813467000000 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 4347480763
  .idle_calls     : 127885
  .idle_sleeps    : 122866
  .idle_entrytime : 52813471757310 nsecs
  .idle_waketime  : 52813455529583 nsecs
  .idle_exittime  : 52813471757310 nsecs
  .idle_sleeptime : 585808925443 nsecs
  .iowait_sleeptime: 8495380650 nsecs
  .last_jiffies   : 4347480763
  .next_jiffies   : 4347480795
  .idle_expires   : 52813498000000 nsecs
jiffies: 4347480770
......

第三部分是打印负责广播事件的tickdevice信息,由timer_list_show_tickdevice_header()负责:

Tick Device: mode:     1
Broadcast device
Clock Event Device: hpet
 max_delta_ns:   149983013276
 min_delta_ns:   13410
 mult:           61496111
 shift:          32
 mode:           4
 next_event:     9223372036854775807 nsecs
 set_next_event: hpet_legacy_next_event
 set_mode:       hpet_legacy_set_mode
 event_handler:  tick_handle_oneshot_broadcast
 retries:        0

tick_broadcast_mask: 00000000
tick_broadcast_oneshot_mask: 00000000

第四部分是打印各个cpu上的tickdevice,由print_tickdevice()负责:

Tick Device: mode:     1
Per CPU device: 0
Clock Event Device: lapic
 max_delta_ns:   103080567906
 min_delta_ns:   1000
 mult:           89477311
 shift:          32
 mode:           3
 next_event:     52813475000000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt
 retries:        4

Tick Device: mode:     1
Per CPU device: 1
Clock Event Device: lapic
 max_delta_ns:   103080567906
 min_delta_ns:   1000
 mult:           89477311
 shift:          32
 mode:           3
 next_event:     52813474000000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt
 retries:        0

Tick Device: mode:     1
Broadcast device
Clock Event Device: hpet
 max_delta_ns:   149983013276
 min_delta_ns:   13410
 mult:           61496111
 shift:          32
 mode:           1
 next_event:     9223372036854775807 nsecs
 set_next_event: hpet_legacy_next_event
 set_mode:       hpet_legacy_set_mode
 event_handler:  tick_handle_oneshot_broadcast
 retries:        0

tick_broadcast_mask: 00000000
tick_broadcast_oneshot_mask: 00000000

Tick Device: mode:     1
Per CPU device: 0
Clock Event Device: lapic
 max_delta_ns:   103080567906
 min_delta_ns:   1000
 mult:           89477311
 shift:          32
 mode:           3
 next_event:     40610606442603 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt
 retries:        1

Tick Device: mode:     1
Per CPU device: 1
Clock Event Device: lapic
 max_delta_ns:   103080567906
 min_delta_ns:   1000
 mult:           89477311
 shift:          32
 mode:           3
 next_event:     40610612271380 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt
 retries:        0

看到这里感觉有点奇怪啊,tsc和acpi_pm怎么没有啊?难道这种时钟源没有包装成时钟事件设备?回到代码找证据,先看clock_event_device的注册,由两个函数,clockevents_register_device和clockevent_config_and_register,后面一个实际上包裹了前一个。setup_APIC_timer注册了lapic。init_one_hpet_msi_clockevent依据cpu插入事件注册了hpet。clockevent_i8253_init注册了pit,当然如果启用了hpet,则不会走这个代码,他们是互斥的。然后就没有了,确认了,tsc和acpi_pm不会被注册成时钟事件设备,那么会不会直接注册成时钟设备呢?继续看代码,注册时钟设备是通过tick_setup_device实现,该函数主要会在时钟事件设备注册时由clockevents_notify_release调用。所以他们肯定只能作为时钟源来使用了。另一个值的注意的是hpet既可以作为时钟源也可以作为时钟设备,lapic只能作为时钟设备。

为什么,通过浏览tsc的代码我发现它不能用来计时,简单来说它不能通过中断来产生“滴答”,而只能通过read_tsc来读一个时间戳,而这也就是时钟源的含义。那么时钟源怎么用的?可以使用clock_gettime系统调用来观察。

clock_gettime -> sys_call -> sys_clock_gettime -> getnstimeofday -> timekeeping_get_ns->read_tsc -> native_read_tsc

红色的部分就是clocksource的read虚函数。切换current_clocksource也就是切换这个虚函数的实现。

Advertisements

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: