SATA drives or chipset throwing DRDY ERR and ICRC ABRT

All we need is an easy explanation of the problem, so here it is.

I have an SD-VIA-1A2S PCI card with 2 sata ports (and one ATA-133 that isn’t used). Two new Western Digital Caviar Green drives (WD10EARS 1TB) throw repeated errors in kern.log (removed date/time/host info for brevity):

[    7.376475] ata2.00: exception Emask 0x12 SAct 0x0 SErr 0x1000500 action 0x6
[    7.376480] ata2.00: BMDMA stat 0x5
[    7.376483] ata2: SError: { UnrecovData Proto TrStaTrns }
[    7.376489] ata2.00: cmd c8/00:40:20:00:00/00:00:00:00:00/e0 tag 0 dma 32768 in
[    7.376490]          res 51/84:2f:20:00:00/00:00:00:00:00/e0 Emask 0x12 (ATA bus error)
[    7.376493] ata2.00: status: { DRDY ERR }
[    7.376495] ata2.00: error: { ICRC ABRT }
[    7.376504] ata2: hard resetting link

I’m using Ubuntu 9.04 – 2.6.28-18-generic, though I have tried live cds of Ubuntu 9.10, Fedora 12 and OpenSUSE 11.2 – all running various 2.6.31 kernels – and all received the same error.

Based on testing these drives and this card in two other machines and combos of connecting the drives directly to the motherboard or the add-in card, I’m relatively convinced that it’s the VIA chipset that is the problem. Another computer that also has an onboard VIA SATA chipset (like the add-in card) produces the same errors when the drives are directly on that motherboard. I have been able to verify that the drives are perfectly good, and I tried everything I can think of in terms of swapping cables, psu isn’t overloaded, etc.

The error happens on boot once or twice, after using fdisk on the drive once or twice, and constantly when attempting to sync a new mdadm raid 1 array created on the two drives.

Any thoughts on where to go from here – driver/kernel wise?

I’m completely open to buying a new PCI add-in card if someone can recommend one with 2 internal sata ports that works well in Debian/Ubuntu.

Thanks!

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

I can recommend the Promise and SiliconImage chipsets as alternatives to the VIA. I’m using a PCI adapter with a SiI-3124 chipset currently and haven’t had any trouble with it.

I’ve had good experiences with earlier IDE chipsets from both manufacturers, but haven’t yet had occasion to test out a Promise SATA chip. I highly recommend getting away from the VIA chip; I’ve dealt with lots of flaky VIA chips and I prefer to avoid them when possible.

Method 2

I know this is a bit old, but I had this issue on a new machine I’m building and the issue seemed to be this. Here was my original error:

[  595.535123] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[  595.535127] ata2.00: BMDMA stat 0x64
[  595.535132] ata2.00: failed command: WRITE DMA EXT
[  595.535140] ata2.00: cmd 35/00:00:08:3c:11/00:02:00:00:00/e0 tag 0 dma 262144 out
[  595.535145] ata2.00: status: { DRDY ERR }
[  595.535147] ata2.00: error: { ICRC ABRT }
[  595.535182] ata2: soft resetting link

I had turned on a BIOS option to turn the two ‘main’ (0/1) SATA ports into IDE mode or something of that sort and it had somehow screwed up the bus communication to the other non-SATA or secondary devices on the bus. I know the description here is a bit vague, but it’s difficult to tell on some mobos which is primary/secondary and which bus is associated with which.

I can just say that turning the option back so that all 6 of my onboard SATA were ACHI made my errors go away immediately. Where they were immediately reproducible in bulk running bonnie or iozone, the change has made these 2 benchmark programs immediately run without errors and complete in 1/2 hour instead of 2-3.

Method 3

DRDY ERR messages actually seems to be reported as a kernel bug in a lot of systems which seems to relate a lot with Ubuntu and to a smaller extent Debian. I am investigating this because this is something that has started happening with me recently. I would recommend the following (You will require a bootable CD for some of this and you may need it due to disk issues for all of this. The Ubuntu desktop install CD works well without making you install anything):

  1. Put “options libata noacpi=1” in /etc/modprobe.d/options.conf
  2. Run “e2fsck -f -c -v /dev/sda1” but replace /dev/sda1 with the partitions causing the error. As far as I know, e2fsck needs a partition with the file system so this probably won’t work on the whole disk. If it does work on the whole disk, you still need to run it on the partitions anyways. You need a bootable CD for this.
  3. Edit the file /boot/grub/menu.lst and on the line that starts with “# kopt” add “noapic” to the end of the line. The # at the start is important and does not act like a comment. Do not remove the #.
  4. This does not affect the disk but if you change “splash” to “nosplash” and remove the word “quiet” from /boot/grub/menu.lst on the line that starts with “# defoptions” Then it will not have an image when you boot ubuntu but instead will give you more verbose output.
  5. On Ubuntu, after you change anything inside /boot/grub/menu.lst you must run /usr/sbin/update-grub

Method 4

I just had a similar experience to the previous late-poster. I have a Dell OptiPlex 9020 which came with 2 drives in a mirrored RAID configuration. I decided to break the mirroring and use the two drives as separate drives. So I reconfigured the setup of the RAID controller to see the two disks as two Non-RAID disks. Rebooted and everything was as expected. Except that I started getting the above mentioned errors. But it was very random and flaky.

Finally tonight I came across this thread and figured it out. I went into the BIOS setup (which is entirely separate from the RAID controller setup) and saw that I still had the interface set to “RAID” instead of “ACHI”. As soon as I switched it to ACHI and rebooted, the system booted much, much faster than ever before, and, best of all, no errors.

Yes!

Method 5

I did change to AHCI in my BIOS and it didn’t work. But after I checked my partition table, it said I have a broken GPT table.

[email protected]:~$ sudo gdisk -l /dev/sda
[sudo] password for dan: 
GPT fdisk (gdisk) version 0.8.4

Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Warning! One or more CRCs don't match. You should repair the disk!

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: damaged

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Disk /dev/sda: 625140335 sectors, 298.1 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 4FF348B9-D041-49A6-AD98-18C15F055F2D
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 625142414
Partitions will be aligned on 8-sector boundaries
Total free space is 0 sectors (0 bytes)

Number  Start (sector)    End (sector)  Size       Code  Name
   1              34       625142414   298.1 GiB   0700  

Then I just typed w to write the GPT table again.
Finally I rebooted my system.

And now it’s working like a charm!

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply