Project

General

Profile

MityDSP-L138F Boot issue, I2C0 (PCB Dev Post)

Added by Jon Cox over 1 year ago

Hello,

We are developing on the MityDSP-L138F platform and have noticed an intermittent issue with the I2C bus hanging due to SDA being held LOW. We are uncertain whether it is a slave or the OMAP I2C Module pulling SDA low.

We are using I2C0 (soon to switch everything to I2C1). The issue can be faithfully reproduced if we lower the pull-up resistors below ~1.2kOhm or above a large value (timing, VIH VIL issues). However, it does seem to still occur at random. Shorting SDA to HIGH (5V) resolves the issue and the bus continues operation.

Our question: is there a way to either soft reset the I2C Module via the ICMDR IRS bit == 0, or to manually send 9 CLK pulses on SCLK? We have perused the board support package but cannot see a clear way to directly command the I2C SCLK pin. Is there a simple way to MUX the pin to GPIO and toggle it 9 times? We have also found the ICMDR and ICIVR byte addresses (0x01C2 2024, 0x01C2 2028) but we're unsure how to directly read/write to these.

Thanks for the support, and let us know if we can provide any additional info regarding this issue.

Best,

Jon Cox
Genus IntelliGen


Replies (20)

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jonathan Cormier over 1 year ago

Jon Cox wrote:

Hello,

We are developing on the MityDSP-L138F platform and have noticed an intermittent issue with the I2C bus hanging due to SDA being held LOW. We are uncertain whether it is a slave or the OMAP I2C Module pulling SDA low.

This is almost certainly an i2c slave holding SDA low. I know on a separate project, I've seen particular issues with i2c slaves that use 16-bit addresses, but really any i2c slave could misbehave and hold a bus hostage.

We are using I2C0 (soon to switch everything to I2C1). The issue can be faithfully reproduced if we lower the pull-up resistors below ~1.2kOhm or above a large value (timing, VIH VIL issues). However, it does seem to still occur at random. Shorting SDA to HIGH (5V) resolves the issue and the bus continues operation.

Our question: is there a way to either soft reset the I2C Module via the ICMDR IRS bit == 0, or to manually send 9 CLK pulses on SCLK? We have perused the board support package but cannot see a clear way to directly command the I2C SCLK pin. Is there a simple way to MUX the pin to GPIO and toggle it 9 times? We have also found the ICMDR and ICIVR byte addresses (0x01C2 2024, 0x01C2 2028) but we're unsure how to directly read/write to these.

What kernel are you using?

The 3.2 kernel appears to already have i2c bus recovery supported for the i2c-davinci driver.
https://support.criticallink.com/gitweb/?p=linux-davinci.git;a=commit;h=8574faf9a5ae71fdd84c6413779a9b076138eb9e

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jonathan Cormier over 1 year ago

Note as is, the bus recovery I linked looks like it requires i2c gpio pins which probably won't work for the L138. In the newer kernel version they have a way to put the i2c module into a manual mode which doesn't require separate gpio control. I could take a stab at backporting that if you are using the 3.2 kernel. I already have some of the i2c-core generic bus recovery stuff backported to 3.2 that I could leverage.

https://support.criticallink.com/gitweb/?p=linux-davinci.git;a=blob;f=drivers/i2c/busses/i2c-davinci.c;h=11caafa0e050cd41e23885063b1ab12c16c5e8ee;hb=refs/heads/linux4.19_wip#l368

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jon Cox over 1 year ago

Jonathan,

Thank you for the very quick and thorough response. We have found the i2c-davinci driver in our MDK / board support package ARM/linux/linux-davinci/drivers/i2c/busses/i2c-davinci.c.

We are using the 3.2 kernel. Based on the L138F pinout, I agree that the I2C0 pins are not MUX'd to the GPIO.

If you think backporting the I2C manual mode feature in the newer kernel would be possible, that would be excellent. Otherwise, we can try to implement a GPIO->Switch that momentarily shorts SDA to 5V on command. I worry that the I2C slave device that has hung the SDA line will be in an unrecoverable state after this and require a power reset, as it certainly isn't the preferred method of NACK, 9 SCLK pulses, STOP as it is documented above.

Let me know if you need any further information or if you think backporting would be possible.

Thanks,
Jon Cox

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jonathan Cormier over 1 year ago

Hi Jon,

Can you try out the commits pushed to this branch and let me know if it works? I don't have time right now to try it out but it builds.
https://support.criticallink.com/gitweb/?p=linux-davinci.git;a=shortlog;h=refs/heads/mitydsp-linux-v3.2_i2c_recovery

I grabbed the i2c-core generic bus recovery changes, the i2c-davinci bus recovery changes, and grabbed a few davinci-specific fixes that looked like they might be fixes for potential bus lockups.

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jon Cox over 1 year ago

Jonathan,

Excellent - thanks for the very quick and thorough fix on this.

Regarding the usage of these committed changes, will we have to invoke function calls to resolve the bus hang or should this potentially catch and fix any potential bus lockups?

Best,
Jon Cox

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jonathan Cormier over 1 year ago

Its supposed to catch a bus hang and switch into bus recovery by itself.

I'd suggest going into drivers/i2c/i2c-core.c and changing dev_dbg to dev_err so you can see when the kernel tries to do these recoveries in dmesg.

int i2c_recover_bus(struct i2c_adapter *adap)
{
    if (!adap->bus_recovery_info)
        return -EOPNOTSUPP;

    dev_dbg(&adap->dev, "Trying i2c bus recovery\n");
    return adap->bus_recovery_info->recover_bus(adap);
}

The timeout detection is done in i2c_davinci_wait_bus_not_ready which is also where it kicks off the bus recovery.

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jonathan Cormier over 1 year ago

Let me know if it works and i'll merge the commits into the main 3.2 branch

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jon Cox over 1 year ago

Jonathan,

Fantastic! This is exactly what we were looking to implement, and the fact that it is automatic as-is is even better. We've compiled the kernel and will test it on our systems today/tomorrow and let you know whether it works as intended.

Thanks again for all the expertise on this matter.

Best,
Jon Cox

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jon Cox over 1 year ago

Jonathan,

We have tested the compiled branch and it seems to be working as intended. We have not noticed any I2C hangs yet after 100 boot cycles, so I think it is safe to say the kernel changes are at least as stable as the live one. If you can merge the commits to the main 3.2 branch we can build on this going forward.

I did want to pick your brain on one other topic that is somewhat related to this I2C issue.

Our design only uses the I2C0 bus and not the I2C1 bus for all system peripherals. We are planning to switch all I2C devices to the I2C1 bus in the future revision after realizing that the MityDSP actually requires the I2C0 bus on boot to read from its on-board PROM, initialize PMIC, initialize RAM.

We are not sure if this is related, but it seems that sometimes when the system boots it will not perform these 3 steps correctly. That is, we note that the device should drive the DONE LED once the boot has finished, which normally takes 10-15 seconds from power on, but instead we observe the LED slowly lighting (pulling high slowly) after around 5-6 seconds and the system never finishes the boot process. This is resolved usually by a hard power reset. We've scoped the power rails to the MityDSP and on the MityDSP and they appear correct in these instances.

The only measured difference during these failure to boot events is that the I2C0 SCL line behaves differently.
  • On normal boot, we notice the following: high frequency clock burst, low frequency clock burst, and then 3 normal bursts at 100kHz, and then the system boots shortly after. I assume this is the reading from PROM, initializing PMIC, and initializing RAM steps.
  • On boot failure, we notice the following: high frequency clock burst, low frequency clock burst, and then only 1 or 2 normal bursts at 100kHz. It seems to me that the system is failing to communicate to the devices on the MityDSP board that are required for boot.

The only way we have been able to reproduce this in a somewhat repeatable fashion is to hard power cycle with only 1-2 seconds between ON/OFF, whereas normally we wait 5-10 seconds between ON/OFF. It seems to happen at random, however.

Does anything above stick out to you? We are trying to ensure this random boot issue does not occur on our production systems. Thanks!

Best,
Jon Cox

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jonathan Cormier over 1 year ago

Jon Cox wrote in RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post):

Jonathan,

We have tested the compiled branch and it seems to be working as intended. We have not noticed any I2C hangs yet after 100 boot cycles, so I think it is safe to say the kernel changes are at least as stable as the live one. If you can merge the commits to the main 3.2 branch we can build on this going forward.

Okay will do. I was told someone would have time next week to look it over and merge it in.

I did want to pick your brain on one other topic that is somewhat related to this I2C issue.

Our design only uses the I2C0 bus and not the I2C1 bus for all system peripherals. We are planning to switch all I2C devices to the I2C1 bus in the future revision after realizing that the MityDSP actually requires the I2C0 bus on boot to read from its on-board PROM, initialize PMIC, initialize RAM.

We are not sure if this is related, but it seems that sometimes when the system boots it will not perform these 3 steps correctly. That is, we note that the device should drive the DONE LED once the boot has finished, which normally takes 10-15 seconds from power on, but instead we observe the LED slowly lighting (pulling high slowly) after around 5-6 seconds and the system never finishes the boot process. This is resolved usually by a hard power reset. We've scoped the power rails to the MityDSP and on the MityDSP and they appear correct in these instances.

Note the DONE led is driven by the fpga after it is programmed and is an open drain output. I'm not sure what would cause it to slowly light as it should be either pulled to ground or pulled to 3.3V via a 330-ohm resistor. The power good LED is hooked up to the same 3.3V, does it fade in?

The only measured difference during these failure to boot events is that the I2C0 SCL line behaves differently.
  • On normal boot, we notice the following: high frequency clock burst, low frequency clock burst, and then 3 normal bursts at 100kHz, and then the system boots shortly after. I assume this is the reading from PROM, initializing PMIC, and initializing RAM steps.
  • On boot failure, we notice the following: high frequency clock burst, low frequency clock burst, and then only 1 or 2 normal bursts at 100kHz. It seems to me that the system is failing to communicate to the devices on the MityDSP board that are required for boot.

That's possible but could also be a side effect of whatever else is going wrong. It would be really good to have console boot logs from when it fails to boot, to know what went wrong.
It's certainly possible for an external i2c device to cause issues with booting though in that kind of case, I would expect there to be no i2c communication. In your boot failure case, when the bus goes idle do the signals return to an idle high state or do they get stuck low? If they go back high, then it's unlikely an i2c problem that is the root of the issue.

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jon Cox over 1 year ago

Jonathan,

Excellent, thanks for the steadfast effort on this!

Interesting regarding the DONE LED. We are not certain why it slowly lights either, as we observe it being strongly driven HIGH once the boot finishes. The POWER GOOD LED is always on after powering on in both normal and failed boot cases - it does not slowly fade HIGH either.

We can capture some console boot logs on a normal boot and on a failed boot and report back. We've looked awhile back and if I can recall correctly it stopped at something related to a "power register" or something similar. We'll get the exact log.
I agree, it is a long shot that the external I2C devices on I2C0 are causing the boot issue, but just observing that these are on the same bus used for the critical boot processes. The bus goes idle and the signals return high, which makes me think that some communication occurred and one of the boot process steps failed for some reason, terminating the process and resulting in some null activity state.

I'll grab some logs, let us know if you have any further ideas. Thanks again for all the support here.

Best,
Jon Cox

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jonathan Cormier about 1 year ago

We have tested the compiled branch and it seems to be working as intended. We have not noticed any I2C hangs yet after 100 boot cycles, so I think it is safe to say the kernel changes are at least as stable as the live one. If you can merge the commits to the main 3.2 branch we can build on this going forward.

Merged the changes

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jon Cox about 1 year ago

Jonathan,

Excellent - thank you for merging. We will pull in those official changes.

I just received a UART/FTDI cable so I should be able to check the boot log via CLI. I'll let you know what I find on Monday.

Best,
Jon

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jon Cox about 1 year ago

Jonathan,

Apologies on the delay, we've had most of the systems tied up in-use for the past few weeks but I was able to grab some UART captures this morning.

I've attached 4 files - BOOT1-4. BOOT1 did not boot successfully...seems it got stuck while unpackaging the kernel.

It seemed that BOOT2, BOOT3, BOOT4 booted properly, but their output sequence was in different order. Does this point to the need for tighter boot sequencing in our code base?

Let me know your thoughts on this...thanks again!

Best,
Jon

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jonathan Cormier about 1 year ago

Jon Cox wrote in RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post):

Jonathan,

Apologies on the delay, we've had most of the systems tied up in-use for the past few weeks but I was able to grab some UART captures this morning.

Hi Jon, Is this still related to the i2c failures you were talking about or is this a different issue?

I've attached 4 files - BOOT1-4. BOOT1 did not boot successfully...seems it got stuck while unpackaging the kernel.

Unpacking did finish, looks like it got stuck after the kernel was started but before the serial port got enabled. Crashes here are a pain to track down since we got no breadcrumbs from the serial port.

It seemed that BOOT2, BOOT3, BOOT4 booted properly, but their output sequence was in different order. Does this point to the need for tighter boot sequencing in our code base?

I don't think there is any concern with things running in different orders during bootup.

Are all these boots from the same SOM/hardware? I'm trying to understand the scope of your current issue. Is this limited to just one SOM/baseboard or something all units see and it's intermittent?

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jon Cox about 1 year ago

Jonathan,

Thanks for taking a look at the logs.

This is an issue we have been having in parallel to the I2C issues. Good news is the updated kernel does seem to experience much less I2C hanging than previously...now I believe the only I2C bugs are related to software and not hardware.

This issue during the boot above is why we are swapping all of our peripheral communications from I2C0 to I2C1, in case a peripheral device is erroneously interacting with I2C0 on boot and causing failure for some reason.

Unpacking did finish, looks like it got stuck after the kernel was started but before the serial port got enabled

--> Ahh, I see that now. Do you believe this type of issue is something related to our physical hardware on-board (microSD flash interface?), or more likely to be caused by the software stack?

The boot outputs above are all from the same SOM and hardware. This problem exists regardless of the SOM unit or our physical hardware...seems to be a very random/intermittent issue (making it hard to capture the failure too).

Let me know if anything comes to mind. Thanks again.

Best,
Jon

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jonathan Cormier about 1 year ago

Jon Cox wrote in RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post):

Jonathan,

Thanks for taking a look at the logs.

This is an issue we have been having in parallel to the I2C issues. Good news is the updated kernel does seem to experience much less I2C hanging than previously...now I believe the only I2C bugs are related to software and not hardware.

This issue during the boot above is why we are swapping all of our peripheral communications from I2C0 to I2C1, in case a peripheral device is erroneously interacting with I2C0 on boot and causing failure for some reason.

At this point in the boot, we've already read the eeprom and the pmic is all setup. The i2c0 bus becomes a lot less crucial from the SOMs perspective. I don't think an i2c issue would cause a failure at this spot in the boot.
If you wanted to test this, you could boot into u-boot and then short the i2c bus, issue the boot command and see if the kernel gets stuck at the booting kernel line.

Unpacking did finish, looks like it got stuck after the kernel was started but before the serial port got enabled

--> Ahh, I see that now. Do you believe this type of issue is something related to our physical hardware on-board (microSD flash interface?), or more likely to be caused by the software stack?

In my experience, it is usually from a software change, though those failures are usually not intermittent. I'm not sure what to point at with an intermittent hang (the lack of serial output is really blinding us here).

The boot outputs above are all from the same SOM and hardware. This problem exists regardless of the SOM unit or our physical hardware...seems to be a very random/intermittent issue (making it hard to capture the failure too).

Hmm okay. Is it possible to run test cycles using our baseboard with one of these SOMs and the same software? That might let us know if it's something coming from your custom baseboard or if it's more likely to be software-related.

Note sometimes it is possible to get early console output by enabling EARLY_PRINTK in the kernel build and then adding earlyprintk=ttyS1,115200n8 to the kernel bootargs.

RE: MityDSP-L138F Boot issue, I2C0 (PCB Dev Post) - Added by Jon Cox about 1 year ago

Hi Jonathan,

Hope you've been well! I am going to acquire a baseboard tester to evaluate the SOM and our software independent of our custom baseboard.

In the meantime, the interposer board that swaps our I2C0 and I2C1 has arrived. I'm working with our software engineer to understand how to swap our peripherals over to target I2C1 instead of I2C0.

Currently, the system boots and I can terminal into root, but all of our peripheral devices currently on I2C0 are not functional (as expected). Is there a very straightforward way of swapping the devices from I2C0 over to use I2C1? This may be a naive question, but I figured it would be better to inquire first than to dive in blindly.

Thanks! Hope your December is going well too.

Best,
Jon

    (1-20/20)
    Go to top
    Add picture from clipboard (Maximum size: 1 GB)