Bug 476 - ati-drivers DMA bug with intel_iommu
Summary: ati-drivers DMA bug with intel_iommu
Status: CLOSED WONTFIX
Alias: None
Product: AMD Catalyst™Proprietary Display Driver
Classification: Unclassified
Component: Kernel Module (show other bugs)
Version: .archived
Hardware: Radeon HD 6000 Series Linux
: low normal
Assignee: nobody
URL:
Depends on:
Blocks:
 
Reported: 2012-04-04 04:12 CDT by cchtml
Modified: 2013-03-16 06:25 CDT (History)
2 users (show)



Attachments
dmesg with errors (68.75 KB, text/plain)
2012-04-04 04:13 CDT, cchtml
Details

Note You need to log in before you can comment on or make changes to this bug.
Description cchtml 2012-04-04 04:12:28 CDT
Description of problem: 
On my HP Elitebook 8560p the ATI drivers fail with a DMA error leaving behind Xorg in a unkillable state.

I tested this with kernel 3.3.0-rc7, 3.2.1-gentoo, and with ATI drivers 12.3 and 12.2

Steps to reproduce:
1. install ati-drivers 12.3 (or 12.2, i did not try older ones)
2. make sure you got intel_iommu loaded at start or built into the kernel.
3. start X

Actual result: 
Unkillable Xorg:
root      3798 99.0  0.1  81152 12844 tty7     Rs+  10:24   2:03 /usr/bin/Xorg :0 -br -verbose -logverbose 7 -auth /var/run/gdm/auth-for-gdm-Gqu8wg/database -nolisten tcp vt7

Expected result: 
Working X.

Workarounds: 
 - disable intel_iommu in kernel config,
 - use "intel_iommu=off" as a command line to the kernel.

Relevant dmesg output (see attachment for full dmesg):
[   36.563299] [fglrx] ATIF platform detected with notification ID: 0xd0
[   36.812625] fglrx_pci 0000:01:00.0: irq 60 for MSI/MSI-X
[   36.813111] [fglrx] Firegl kernel thread PID: 3922
[   36.813287] [fglrx] Firegl kernel thread PID: 3923
[   36.813472] [fglrx] Firegl kernel thread PID: 3924
[   36.813612] [fglrx] IRQ 60 Enabled
[   36.990811] [fglrx] Gart USWC size:1280 M.
[   36.990813] [fglrx] Gart cacheable size:508 M.
[   36.990816] [fglrx] Reserved FB block: Shared offset:0, size:1000000 
[   36.990817] [fglrx] Reserved FB block: Unshared offset:f8fd000, size:403000 
[   36.990819] [fglrx] Reserved FB block: Unshared offset:3fff4000, size:c000 
[   37.000546] DRHD: handling fault status reg 3
[   37.000550] DMAR:[DMA Read] Request device [01:00.0] fault addr 22485e000 
[   37.000551] DMAR:[fault reason 02] Present bit in context entry is clear

and after some waiting i get this oops:
[  216.732920] [fglrx] ASIC hang happened
[  216.732923] Pid: 3798, comm: Xorg Tainted: P           O 3.2.12-gentoo #1
[  216.732924] Call Trace:
[  216.732947]  [<ffffffffa02c1719>] KCL_DEBUG_OsDump+0x9/0x10 [fglrx]
[  216.732964]  [<ffffffffa02ceacc>] firegl_hardwareHangRecovery+0x1c/0x50 [fglrx]
[  216.732993]  [<ffffffffa0369919>] ? _ZN4Asic9WaitUntil15ResetASICIfHungEv+0x9/0x10 [fglrx]
[  216.733022]  [<ffffffffa03698bc>] ? _ZN4Asic9WaitUntil15WaitForCompleteEv+0x9c/0xf0 [fglrx]
[  216.733050]  [<ffffffffa03644be>] ? _ZN15ExecutableUnits10CPRingIdleE15idle_WaitMethod12_QS_CP_RING_+0x11e/0x1e0 [fglrx]
[  216.733078]  [<ffffffffa036434c>] ? _ZN15ExecutableUnits7PM4idleE15idle_WaitMethod+0x4c/0x90 [fglrx]
[  216.733105]  [<ffffffffa0363eb6>] ? _ZN15ExecutableUnits9assertPM4Eb+0x56/0x70 [fglrx]
[  216.733133]  [<ffffffffa036e1b9>] ? _ZN8AsicR6009assertPM4Eb+0x39/0x80 [fglrx]
[  216.733158]  [<ffffffffa033c983>] ? CMMQS_Initialize_WA+0x183/0x1b0 [fglrx]
[  216.733177]  [<ffffffffa02ee3c2>] ? firegl_cmmqs_init+0x642/0xb80 [fglrx]
[  216.733193]  [<ffffffffa02d14d4>] ? firegl_init_iommu+0x94/0x170 [fglrx]
[  216.733211]  [<ffffffffa02ed616>] ? firegl_cmmqs_createdriver+0x96/0x1a0 [fglrx]
[  216.733214]  [<ffffffff810582c2>] ? capable+0x12/0x20
[  216.733232]  [<ffffffffa02ed580>] ? firegl_uvd_destroy+0x4e0/0x4e0 [fglrx]
[  216.733247]  [<ffffffffa02ca62d>] ? firegl_ioctl+0x1ed/0xf30 [fglrx]
[  216.733256]  [<ffffffffa02bba29>] ? ip_firegl_unlocked_ioctl+0x9/0x10 [fglrx]
[  216.733259]  [<ffffffff81117a4e>] ? do_vfs_ioctl+0x8e/0x500
[  216.733261]  [<ffffffff81106be0>] ? vfs_write+0x120/0x160
[  216.733263]  [<ffffffff81117f0a>] ? sys_ioctl+0x4a/0x80
[  216.733266]  [<ffffffff81401cbb>] ? system_call_fastpath+0x16/0x1b
[  216.733269] pubdev:0xffffffffa055ce40, num of device:1 , name:fglrx, major 8, minor 95. 
[  216.733270] device 0 : 0xffff88022f370000 .
[  216.733272] Asic ID:0x6760, revision:0x3c, MMIOReg:0xffffc900118c0000.
[  216.733273] FB phys addr: 0xc0000000, MC :0xf00000000, Total FB size :0x40000000.
[  216.733275] gart table MC:0xf0f8fd000, Physical:0xcf8fd000, size:0x402000.
[  216.733277] mc_node :FB, total 1 zones
[  216.733278]     MC start:0xf00000000, Physical:0xc0000000, size:0xfd00000.
[  216.733279]     Mapped heap -- Offset:0x0, size:0xf8fd000, reference count:1, mapping count:0,
[  216.733281]     Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0,
[  216.733282]     Mapped heap -- Offset:0xf8fd000, size:0x403000, reference count:1, mapping count:0,
[  216.733284] mc_node :INV_FB, total 1 zones
[  216.733285]     MC start:0xf0fd00000, Physical:0xcfd00000, size:0x30300000.
[  216.733286]     Mapped heap -- Offset:0x302f4000, size:0xc000, reference count:1, mapping count:0,
[  216.733288] mc_node :GART_USWC, total 3 zones
[  216.733289]     MC start:0x40100000, Physical:0x0, size:0x50000000.
[  216.733290]     Mapped heap -- Offset:0x0, size:0x2000000, reference count:1, mapping count:0,
[  216.733292] mc_node :GART_CACHEABLE, total 3 zones
[  216.733293]     MC start:0x10400000, Physical:0x0, size:0x2fd00000.
[  216.733294]     Mapped heap -- Offset:0x0, size:0x200000, reference count:1, mapping count:0,
[  216.733296]     Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0,
[  216.733298] GRBM : 0xa0003828, SRBM : 0x200000c0 .
[  216.733300] CP_RB_BASE : 0x401000, CP_RB_RPTR : 0x10 , CP_RB_WPTR :0x10.
[  216.733302] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0x0, CP_IB1_BASE_LO:0x0.
[  216.733304] last submit IB buffer -- MC :0x0. Can't found mapped physical page for this MC .
[  216.733305] Dump the trace queue.
[  216.733306] End of dump

Hardware infos:
01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Seymour [Radeon HD 6400M Series] (prog-if 00 [VGA controller])
	Subsystem: Hewlett-Packard Company Device 161a
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 58
	Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at d4400000 (64-bit, non-prefetchable) [size=128K]
	Region 4: I/O ports at 4000 [size=256]
	Expansion ROM at d4440000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00478  Data: 0000
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: radeon
	Kernel modules: fglrx, radeon
Comment 1 cchtml 2012-04-04 04:13:27 CDT
Created attachment 419 [details]
dmesg with errors
Comment 2 sascha.krissler 2012-05-17 15:45:47 CDT
(In reply to comment #0)
> Description of problem: 
> On my HP Elitebook 8560p the ATI drivers fail with a DMA error leaving behind
> Xorg in a unkillable state.
> 
> I tested this with kernel 3.3.0-rc7, 3.2.1-gentoo, and with ATI drivers 12.3
> and 12.2
> 
> Steps to reproduce:
> 1. install ati-drivers 12.3 (or 12.2, i did not try older ones)
> 2. make sure you got intel_iommu loaded at start or built into the kernel.
> 3. start X
> 
> Actual result: 
> Unkillable Xorg:
> root      3798 99.0  0.1  81152 12844 tty7     Rs+  10:24   2:03 /usr/bin/Xorg
> :0 -br -verbose -logverbose 7 -auth /var/run/gdm/auth-for-gdm-Gqu8wg/database
> -nolisten tcp vt7
> 
> Expected result: 
> Working X.
> 
> Workarounds: 
>  - disable intel_iommu in kernel config,
>  - use "intel_iommu=off" as a command line to the kernel.
> 
> Relevant dmesg output (see attachment for full dmesg):
> [   36.563299] [fglrx] ATIF platform detected with notification ID: 0xd0
> [   36.812625] fglrx_pci 0000:01:00.0: irq 60 for MSI/MSI-X
> [   36.813111] [fglrx] Firegl kernel thread PID: 3922
> [   36.813287] [fglrx] Firegl kernel thread PID: 3923
> [   36.813472] [fglrx] Firegl kernel thread PID: 3924
> [   36.813612] [fglrx] IRQ 60 Enabled
> [   36.990811] [fglrx] Gart USWC size:1280 M.
> [   36.990813] [fglrx] Gart cacheable size:508 M.
> [   36.990816] [fglrx] Reserved FB block: Shared offset:0, size:1000000 
> [   36.990817] [fglrx] Reserved FB block: Unshared offset:f8fd000, size:403000 
> [   36.990819] [fglrx] Reserved FB block: Unshared offset:3fff4000, size:c000 
> [   37.000546] DRHD: handling fault status reg 3
> [   37.000550] DMAR:[DMA Read] Request device [01:00.0] fault addr 22485e000 
> [   37.000551] DMAR:[fault reason 02] Present bit in context entry is clear
> 
> and after some waiting i get this oops:
> [  216.732920] [fglrx] ASIC hang happened
> [  216.732923] Pid: 3798, comm: Xorg Tainted: P           O 3.2.12-gentoo #1
> [  216.732924] Call Trace:
> [  216.732947]  [<ffffffffa02c1719>] KCL_DEBUG_OsDump+0x9/0x10 [fglrx]
> [  216.732964]  [<ffffffffa02ceacc>] firegl_hardwareHangRecovery+0x1c/0x50
> [fglrx]
> [  216.732993]  [<ffffffffa0369919>] ?
> _ZN4Asic9WaitUntil15ResetASICIfHungEv+0x9/0x10 [fglrx]
> [  216.733022]  [<ffffffffa03698bc>] ?
> _ZN4Asic9WaitUntil15WaitForCompleteEv+0x9c/0xf0 [fglrx]
> [  216.733050]  [<ffffffffa03644be>] ?
> _ZN15ExecutableUnits10CPRingIdleE15idle_WaitMethod12_QS_CP_RING_+0x11e/0x1e0
> [fglrx]
> [  216.733078]  [<ffffffffa036434c>] ?
> _ZN15ExecutableUnits7PM4idleE15idle_WaitMethod+0x4c/0x90 [fglrx]
> [  216.733105]  [<ffffffffa0363eb6>] ?
> _ZN15ExecutableUnits9assertPM4Eb+0x56/0x70 [fglrx]
> [  216.733133]  [<ffffffffa036e1b9>] ? _ZN8AsicR6009assertPM4Eb+0x39/0x80
> [fglrx]
> [  216.733158]  [<ffffffffa033c983>] ? CMMQS_Initialize_WA+0x183/0x1b0 [fglrx]
> [  216.733177]  [<ffffffffa02ee3c2>] ? firegl_cmmqs_init+0x642/0xb80 [fglrx]
> [  216.733193]  [<ffffffffa02d14d4>] ? firegl_init_iommu+0x94/0x170 [fglrx]
> [  216.733211]  [<ffffffffa02ed616>] ? firegl_cmmqs_createdriver+0x96/0x1a0
> [fglrx]
> [  216.733214]  [<ffffffff810582c2>] ? capable+0x12/0x20
> [  216.733232]  [<ffffffffa02ed580>] ? firegl_uvd_destroy+0x4e0/0x4e0 [fglrx]
> [  216.733247]  [<ffffffffa02ca62d>] ? firegl_ioctl+0x1ed/0xf30 [fglrx]
> [  216.733256]  [<ffffffffa02bba29>] ? ip_firegl_unlocked_ioctl+0x9/0x10
> [fglrx]
> [  216.733259]  [<ffffffff81117a4e>] ? do_vfs_ioctl+0x8e/0x500
> [  216.733261]  [<ffffffff81106be0>] ? vfs_write+0x120/0x160
> [  216.733263]  [<ffffffff81117f0a>] ? sys_ioctl+0x4a/0x80
> [  216.733266]  [<ffffffff81401cbb>] ? system_call_fastpath+0x16/0x1b
> [  216.733269] pubdev:0xffffffffa055ce40, num of device:1 , name:fglrx, major
> 8, minor 95. 
> [  216.733270] device 0 : 0xffff88022f370000 .
> [  216.733272] Asic ID:0x6760, revision:0x3c, MMIOReg:0xffffc900118c0000.
> [  216.733273] FB phys addr: 0xc0000000, MC :0xf00000000, Total FB size
> :0x40000000.
> [  216.733275] gart table MC:0xf0f8fd000, Physical:0xcf8fd000, size:0x402000.
> [  216.733277] mc_node :FB, total 1 zones
> [  216.733278]     MC start:0xf00000000, Physical:0xc0000000, size:0xfd00000.
> [  216.733279]     Mapped heap -- Offset:0x0, size:0xf8fd000, reference
> count:1, mapping count:0,
> [  216.733281]     Mapped heap -- Offset:0x0, size:0x1000000, reference
> count:1, mapping count:0,
> [  216.733282]     Mapped heap -- Offset:0xf8fd000, size:0x403000, reference
> count:1, mapping count:0,
> [  216.733284] mc_node :INV_FB, total 1 zones
> [  216.733285]     MC start:0xf0fd00000, Physical:0xcfd00000, size:0x30300000.
> [  216.733286]     Mapped heap -- Offset:0x302f4000, size:0xc000, reference
> count:1, mapping count:0,
> [  216.733288] mc_node :GART_USWC, total 3 zones
> [  216.733289]     MC start:0x40100000, Physical:0x0, size:0x50000000.
> [  216.733290]     Mapped heap -- Offset:0x0, size:0x2000000, reference
> count:1, mapping count:0,
> [  216.733292] mc_node :GART_CACHEABLE, total 3 zones
> [  216.733293]     MC start:0x10400000, Physical:0x0, size:0x2fd00000.
> [  216.733294]     Mapped heap -- Offset:0x0, size:0x200000, reference count:1,
> mapping count:0,
> [  216.733296]     Mapped heap -- Offset:0xef000, size:0x11000, reference
> count:1, mapping count:0,
> [  216.733298] GRBM : 0xa0003828, SRBM : 0x200000c0 .
> [  216.733300] CP_RB_BASE : 0x401000, CP_RB_RPTR : 0x10 , CP_RB_WPTR :0x10.
> [  216.733302] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0x0, CP_IB1_BASE_LO:0x0.
> [  216.733304] last submit IB buffer -- MC :0x0. Can't found mapped physical
> page for this MC .
> [  216.733305] Dump the trace queue.
> [  216.733306] End of dump
> 
> Hardware infos:
> 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Seymour
> [Radeon HD 6400M Series] (prog-if 00 [VGA controller])
>     Subsystem: Hewlett-Packard Company Device 161a
>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
>     Latency: 0, Cache Line Size: 64 bytes
>     Interrupt: pin A routed to IRQ 58
>     Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
>     Region 2: Memory at d4400000 (64-bit, non-prefetchable) [size=128K]
>     Region 4: I/O ports at 4000 [size=256]
>     Expansion ROM at d4440000 [disabled] [size=128K]
>     Capabilities: [50] Power Management version 3
>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>         Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>     Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
>         DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1
> unlimited
>             ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>         DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>             RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>             MaxPayload 128 bytes, MaxReadReq 512 bytes
>         DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>         LnkCap:    Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0
> <64ns, L1 <1us
>             ClockPM- Surprise- LLActRep- BwNot-
>         LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>             ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>         LnkSta:    Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive-
> BWMgmt- ABWMgmt-
>         DevCap2: Completion Timeout: Not Supported, TimeoutDis-
>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-,
> Selectable De-emphasis: -6dB
>              Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
> ComplianceSOS-
>              Compliance De-emphasis: -6dB
>         LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
> EqualizationPhase1-
>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>     Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>         Address: 00000000fee00478  Data: 0000
>     Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010
> <?>
>     Capabilities: [150 v1] Advanced Error Reporting
>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
> MalfTLP- ECRC- UnsupReq- ACSViol-
>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
> MalfTLP- ECRC- UnsupReq- ACSViol-
>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
> MalfTLP+ ECRC- UnsupReq- ACSViol-
>         CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>         AERCap:    First Error Pointer: 00, GenCap- CGenEn- ChkCap+ ChkEn-
>     Kernel driver in use: radeon
>     Kernel modules: fglrx, radeon

the problem seems to be in the fglrx kernel module in firegl_public.c
#if defined(CONFIG_AMD_IOMMU) || defined(CONFIG_DMAR)
    #define FIREGL_DMA_REMAPPING
#endif

if you #define FIREGL_DMA_REMAPPING unconditionally, the problem disappears,
indicating that the CONFIG_DMAR changed to something different on newer kernels.
Comment 3 Michael Cronenworth 2013-01-19 11:00:45 CST
This message is a reminder that your bug is marked as Catalyst 12.3.

The current Catalyst version is 13.1.

Approximately 7 days from now the Bugzilla administrator will be removing the
12.3 version. At that time your bug will be CLOSED as WONTFIX.

Bug Reporter: Thank you for reporting this issue. However, the Bugzilla
administrator provides this as a unofficial, free service to AMD customers, and
I like to keep my systems neat and tidy. If you would like to keep your bug
from being closed, please try a new Catalyst version and update the 'version'
field if the issue still occurs.

If you are unable to update the version, please make a comment and someone will
change it for you.
Comment 4 Michael Cronenworth 2013-01-26 10:49:58 CST
This bug is being closed due to the 'version' being 12.3 after 7 days of the
previous closure notice.

Thank you for your bug report.
Comment 5 John Newman 2013-03-16 06:12:28 CDT
Hi,

This actually still occurs with 13.1, with the exact same trace as above when  intel_iommu=on. I've tried with the stable 3.7.10 kernel and also the latest from git, same results.  intel_iommu=off does not cause the trace and X starts normally.

With it set to on, when X fails I am able to use alt+sysrq+k && ctrl+alt+f2 to get back to a console, perhaps the OP doesn't know about that trick.  However, once at that point it is _impossible (seriously!) to unload the fglrx module.  And on shutdown the / filesystem does not cleanly unmount due to it being absolutely stuck in use.

This is on gentoo linux with an ASUS 7870.

Part of what drew me to ATI cards over one of your competitors, is that you support VT-D passthrough much much better. :-) This bug unfortunately blocks that from being used if you want the same type of card for dom0 along with another one for domU.


Did you see the possible small fix at the bottom of comment #2?  


He says if you just define the symbol unconditionally it works.  Well, as he also eluded to, perhaps the condition just needs corrected a bit:

~ $ zcat /proc/config.gz | grep DMAR
CONFIG_DMAR_TABLE=y

~ $ zcat /proc/config.gz | grep IOMMU
# CONFIG_CALGARY_IOMMU is not set
CONFIG_IOMMU_HELPER=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
# CONFIG_AMD_IOMMU is not set
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_DEFAULT_ON=y
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
# CONFIG_IOMMU_STRESS is not set


Perhaps it should be:

#if defined(CONFIG_AMD_IOMMU) || defined(CONFIG_DMAR_TABLE) || defined(CONFIG_INTEL_IOMMU)
    #define FIREGL_DMA_REMAPPING
#endif


Seems reasonable in scope to try right?  I will probably just try to set AMD_IOMMU=y for no reason and it might work.  Will report back if that's a possible workaround.  :-)

Thanks!
Comment 6 John Newman 2013-03-16 06:25:09 CDT
Yeah, that's all it took, just setting CONFIG_AMD_IOMMU=y is enough to trip the code into defining that and fixing the trace.  That's an ok workaround, just adds something unnecessary into the kernel, I don't think that would cause any side-effects.

So, it's safe to say that the symbols in your condition there just need a once over.   Everything is all happy now.

~ $ dmesg | grep -e DMAR -e IOMMU
[    0.000000] ACPI: DMAR 000000003cdabc50 00128 (v01 A M I   OEMDMAR 00000001 INTL 00000001)
[    0.223934] dmar: IOMMU 0: reg_base_addr fbffe000 ver 1:0 cap d2078c106f0462 ecap f020ff
[    0.224454] dmar: IOMMU 1: reg_base_addr b7ffc000 ver 1:0 cap d2078c106f0462 ecap f020ff
[    0.225689] IOAPIC id 3 under DRHD base  0xfbffe000 IOMMU 0
[    0.225877] IOAPIC id 0 under DRHD base  0xb7ffc000 IOMMU 1
[    0.226064] IOAPIC id 2 under DRHD base  0xb7ffc000 IOMMU 1
[    1.284879] IOMMU 0 0xfbffe000: using Queued invalidation
[    1.285072] IOMMU 1 0xb7ffc000: using Queued invalidation
[    1.285280] IOMMU: Setting RMRR:
[    1.285485] IOMMU: Setting identity map for device 0000:00:1d.0 [0x3cccf000 - 0x3ccf3fff]
[    1.285829] IOMMU: Setting identity map for device 0000:00:1a.0 [0x3cccf000 - 0x3ccf3fff]
[    1.286158] IOMMU: Prepare 0-16MiB unity mapping for LPC
[    1.286366] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
~ $ lsmod | grep fglrx
fglrx                5001311  135 
~ $ dmesg | tail
[  131.084412] fglrx_pci 0000:04:00.0: irq 129 for MSI/MSI-X
[  131.085859] <6>[fglrx] Firegl kernel thread PID: 10754
[  131.086155] <6>[fglrx] Firegl kernel thread PID: 10755
[  131.086467] <6>[fglrx] Firegl kernel thread PID: 10756
[  131.086657] <6>[fglrx] IRQ 129 Enabled
[  131.104935] <6>[fglrx] Gart USWC size:1280 M.
[  131.104945] <6>[fglrx] Gart cacheable size:508 M.
[  131.104948] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 
[  131.104949] <6>[fglrx] Reserved FB block: Unshared offset:f8fc000, size:404000 
[  131.104951] <6>[fglrx] Reserved FB block: Unshared offset:7ffef000, size:11000 
~ $ glxinfo | head
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: ATI
server glx version string: 1.4
server glx extensions:
    GLX_ARB_multisample, GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_OML_swap_method, 
    GLX_SGI_make_current_read, GLX_SGI_swap_control, GLX_SGIS_multisample, 
    GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, GLX_SGIX_visual_select_group