We’ve been banned from photography and video here at the first Haswell tech talk, IDF 2012 which means there’s bound to be some interesting slides up on screen. Here are my notes from the session – these focus on very deep features of the architecture.
I’m live-writing this (posted at end of session) so please excuse grammatical errors. Treat this as a set of notes that won’t be updated as I’ll need to move on to the next session here at IDF, San Francisco.
Remember Haswell is the next generation ‘family’ of Core microprocessor built on 22nm with Tri-Gate technology. This generation is known as a ‘tock’ – new architecture.
“retain key features of Sandy Bridge and Ivy bridge”
Why? Consistent developer environment. Power efficiency needs to go across all usage scenarios.
Performance, modularity, Power Innovations
2-4 cores across family
GT1, GT2, GT3 graphics
> 10 days of connected standby
20x better idle power.
Haswell achieves, at same power level, twice the graphics performance. Or, can be dialed-back. Same graphics as on Ivy Bridge for half the power. (We saw 8W in demo earlier. 8W was mentioned again here.)
New power states S0ix state. Active Idle:
- Improvements in realizable battery life
- Transparent to well-written s/w (no work for developers)
- Leverages learning from phone and tablet development (Atom I assume)
Active Power improvements:
- Turbo improvements (a little bit more, better load balancing)
- Finer grain control of power islands
- Power optimized CPUu to PCHlink
Idle Power notes
- Optimized deliver of power and power gating
- Added C-states and faster transition times (up to 25%)
- Manufacturing process optimizations.
Embedded controller idle power < 5W
Panel self refresh
Additional interface support (I2C, SDIO, I2S, UART)
New link power management states (USB, SATA, PCI Express)
Microarchitecture / Performance
Verbatim from slide:
- Improved code fetch BW
- Better branch prediction
- Larger OOO windows and corresponding structures
- Increased throughput via 2 new dispatch ports
- Lower L2 TLB
- Lower virtualisation latencies
New compute instructions
Intel Advanced Vector Extensions 2 (AVX2)
[Complex slide passes too quickly] Additional new scalar instructions details detailed.
At this stage the architecture changes are difficult to translate into end-user advantages. Obviously speed of some operations requires use of the new instructions.
- Same sizes as Sandy Bridge.
- L1 data rate improvement: 96 bytes/cycle (64B read, 32B write)
- L2 64 Bytes / cycle (Sandy Bridge was 32)
These changes clearly help old and new code.
Transactional Syncronization Extensions (TSX) detailed: In essence this helps developers write efficient parallel code.
4th gen of integrated graphics adds support of latest APis
- DX 11.1
- Open CL 1.2
- Open GL 4.0
GT3 versions of the SKU will add a new ‘slice’ of processing engines. GT3 is either high performance or very efficient option. (I believe GT3 is for Ultrabooks)
CPU and GPU voltage and frequency is totally de-coupled.
MJPEG decode now in hardware. (for lower power webcam conferencing)
MPEG2 encode now in hardware (for DVD creation, DLNA streaming)
Higher encoding quality
Stand alone video quality engine
frame rate conversion and image stabilization in hardware
Scalable quick sync video
Power management for Ultrabook
Scalable Video Coding codec. Enabler for multi-party video conferencing.
4Kx2K video playback
New features in video processing:
- Gamut Expansion
- Skin Tone Tuned Images Enhancement Filter
- Frame Rate Conversion
- Image Stabilisation
Confirmed Ultrabooks will have GT3 versions of the graphics engine. One slice can be turned off if needed.