Deadlock under load with Linux 5.9 and other recent kernels

Christian Hewitt christianshewitt at gmail.com
Mon Sep 28 09:36:44 EDT 2020


> On 28 Sep 2020, at 3:06 pm, Patrik Nilsson <nipatriknilsson at gmail.com> wrote:
> 
> Hi!
> 
> To me this bug description is very similar to what I'm struggling with on an amd64-platform.
> 
> When I get too much data sent via usb, it seems as the usb controlmsg is delayed so it times out and unmounts the block device.
> 
> I have been working on my related bug for long to get it easily reproducible, but failed. It is there all the time. New hardware is on its way so I can continue my testing.
> 
> Maybe you can test the patch I'm using to see if it works better for you?
> 
> In the meanwhile here is my description of my bug:
> 
>> I have stress tested the usb system. To the USB is now seven mechanical hard disks and two ssd disks connected. Six processes are at the same time writing random data to the disks. One of them is to the ssd disk I couldn't write data to before without it failed. Also the other usb-ssd disk is my root partition.
>> 
>> Before I applied the patch, my root partition sometimes failed to be kept mounted. Now I have not had any crashes.
>> 
>> This is a quick fix for hard disks, but working. It continued to work when I started three virtualbox guests and let them also do work. The guests' hard disks is on my usb-root partition.
>> 
>> It doesn't work if I also use my usb2ethernet adapter (ID 2001:4a00 D-Link Corp.), although my root partition and two randomize tests survived. Maybe a much larger timeout in this case will help? But this I don't find as a good solution.
>> 
>> The behavior is the same on the other (much slower) computer with a different usb hub. I have also tested it with exactly the same setup as earlier, with no mechanical hard disks, and it works with the patch and not without it.
> 
> Best regards,
> Patrik
> 
> ---start of diff---
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 5b768b80d1ee..3c550934815c 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -105,7 +105,7 @@ MODULE_PARM_DESC(use_both_schemes,
>  DECLARE_RWSEM(ehci_cf_port_reset_rwsem);
>  EXPORT_SYMBOL_GPL(ehci_cf_port_reset_rwsem);
> 
> -#define HUB_DEBOUNCE_TIMEOUT    2000
> +#define HUB_DEBOUNCE_TIMEOUT    10000
>  #define HUB_DEBOUNCE_STEP      25
>  #define HUB_DEBOUNCE_STABLE     100
> 
> diff --git a/include/linux/usb.h b/include/linux/usb.h
> index 20c555db4621..e64d441bb78f 100644
> --- a/include/linux/usb.h
> +++ b/include/linux/usb.h
> @@ -1841,8 +1841,8 @@ extern int usb_set_configuration(struct usb_device *dev, int configuration);
>   * USB identifies 5 second timeouts, maybe more in a few cases, and a few
>   * slow devices (like some MGE Ellipse UPSes) actually push that limit.
>   */
> -#define USB_CTRL_GET_TIMEOUT    5000
> -#define USB_CTRL_SET_TIMEOUT    5000
> +#define USB_CTRL_GET_TIMEOUT    10000
> +#define USB_CTRL_SET_TIMEOUT    10000
> 
> 
>  /**
> ---end of diff---

No obvious changes with this patch applied. Here’s output https://pastebin.com/raw/ZMgwNqgm

Christian


More information about the linux-amlogic mailing list