Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot participate as slave on a multi-drop bus if there are other slaves #674

Closed
ksjoberg opened this issue Oct 11, 2021 · 29 comments · Fixed by #1435
Closed

Cannot participate as slave on a multi-drop bus if there are other slaves #674

ksjoberg opened this issue Oct 11, 2021 · 29 comments · Fixed by #1435

Comments

@ksjoberg
Copy link
Contributor

ksjoberg commented Oct 11, 2021

Versions

  • Python: 3.7
  • OS: Current Raspberry OS
  • Pymodbus: 2.5.2
  • Modbus Hardware (if used): Exar CC-USB-RS485-150U / Exar 21B1411

Pymodbus Specific

  • Server: rtu - sync + async
  • Client: rtu - sync + async

Description

In serial/RTU mode, PyModbus participates on a Modbus bus using PySerial. Unfortunately, PySerial's read behaves differently to recv on sockets, as the pySerial timeout defines how long time it has to fill the 1024 bytes buffer of PyModbus before it has to return whatever it managed to get instead of filling the buffer with the payload of a TCP packet. This results in delayed processing as it could take up to a second for the ModbusSerialServer to start processing the message(s). The delay causes the requestor to retry its request, further filling the buffer of identical requests.

PyModbus in general seem to be unable to handle participation on multi-drop bus as Modbus over RS485 is, because a server can only parse requests. It gets confused seeing Modbus responses it has not generated itself on the wire it serves on as a server.

It seems a server can only, and is only expected to, have a ServerDecoder. Such a decoder can only parse Modbus Requests. When it fails, such as when a Response appears on the line, it will flush the buffer and lose data, including valid Requests still pending to be parsed. This flush of the input buffer again causes the requestor to retry its request and may/will trigger a failure-to-respond status of the PyModbus-implemented server.

While PyModbus attempts to do add framing of outgoing data by pausing between frames for 3.5 char times, it does not attempt to detect pauses between frames on incoming data. This makes the serial stream be an unframed series of bytes. As consequence, to recover framing information it must properly be able to parse both requests and responses (of all types present on the bus) to not come out of sync. This it does not do (yet?), either.

@dhoomakethu dhoomakethu modified the milestones: 3.0., 3.0.0 Oct 18, 2021
@janiversen
Copy link
Collaborator

Can you please check with version 3.0.2, I believe this is no longer an issue.

@janiversen janiversen removed this from the 3.0.0 milestone Dec 12, 2022
@janiversen janiversen added this to the Version 3.3 milestone Dec 23, 2022
@janiversen janiversen modified the milestones: Version 3.3, Versión 3.4 Jan 24, 2023
@peufeu2
Copy link
Contributor

peufeu2 commented Jan 26, 2023

I've looked into it and I don't think it's possible to solve this at the python level.

  • 1.5 char modbus timeout is about 1.5ms at 9600bps

pyserial's inter_byte_timeout only has a granularity of 0.1s, so it won't work.

The usual USB-RS485 or USB-serial adapters only run "full speed" USB, ie 12 Mbps and they transmit one frame per millisecond, so even if the OS told userland code at what microsecond timestamp a serial byte arrived on USB, the OS itself it would not know the exact arrival time of the bytes to the USB dongle with better than millisecond precision.

With hardware UARTs like the ones embedded in the Pi's, then the OS has at least a chance to implement such a short timeout, but will it? That's another question. I would bet on "nope" because heavyweight OSes like linux don't guarantee such low latencies anyway. Besides these chips' UARTs usually have FIFOs and DMA to avoid triggering too many interrupts, and by the time the bytes are in the FIFO, arrival time information is lost. So the UART hardware itself would have to have an inter_byte timeout feature. That's a thing in some microcontrollers, precisely to implement modbus.

Besides from the python side there can be latency between byte arrival time and serial.read() return due to the computer multitasking, other async coroutines running, etc. So measuring arrival time from the python side should only result in more dropped packets when latency is higher.

The simplest solution looks to be a hardware fix, with a microcontroller acting as a serial-to-modbus translator. On a micro running no other task there will be no latency issues to implement an inter-byte timeout. The micro would have to frame data transmitted to pymodbus to indicate packets, fortunately modbus is pretty slow and USB-serial links can run at hundreds of kbps so framing overhead shouldn't be an issue.

https://stackoverflow.com/questions/58898980/how-to-set-inter-byte-delay-timeout-to-milliseconds
pyserial/pyserial#88

@janiversen
Copy link
Collaborator

The problem here is not timing, but that he server receives responses from other slaves. When receiving a message it must be validated that it is destined for the server and otherwise discharged.

@peufeu2
Copy link
Contributor

peufeu2 commented Jan 26, 2023

That's easy: if the modbus frame's address header doesn't match the server's address, then the whole frame can be dropped. Problem is, in order to do that, the end of the frame has to be found, and the only way to do that is to detect inter-byte delay which is not possible for reasons explained above.

Without the extra meaning from the inter-byte delay, python will have no idea where the frames begin or end, unless we're the only master, or we're the only slave...

Parsing the packets wouldn't help: if another device crashes and stops its transmission mid-frame, the bus becomes available immediately after the timeout, so the master can transmit immediately if it wants, for example it could retry the transaction. And pyserial would receive one truncated packet followed by the new transmission, with no way to distinguish the two.

@janiversen
Copy link
Collaborator

There are quite a number of options, which will work….we do not need the inter har,

  • we can store the request and from that calculate the response.
  • the slave id is at the beginning of the response so we do need the whole frame

@dlmoffett
Copy link

Hey is there any update or workaroud to this?

I have just encountered and learned about this issue on

[[package]]
name = "pymodbus"
version = "2.5.3"

In the meantime I am in the process of updating to

[[package]]
name = "pymodbus"
version = "3.1.3"

in the hopes that it has been addressed; however seeing as this issue is still opened and marked for Version 3.3 I'm not optimistic.

I'm opened to any off-the-shelf hardware workarounds if anyone knows of such a thing? Something I can stick in front of my serial port to properly frame and filter responses from other server/slaves would be acceptable.

@janiversen
Copy link
Collaborator

it’s still open.

@janiversen
Copy link
Collaborator

I know what the real problem is, the server not only receives the requests, but also the responses from other servers which is need to discard. Furthermore it needs to discard requests where the slave id do not match (this is a bit trickier to do, because we do not configure a slave id for the server).

I suspect the server breaks down, can you please provide a debug log

# please use the following to format logs when posting them here
import pymodbus

pymodbus.pymodbus_apply_logging_config()

if you use the latest dev.

@dlmoffett
Copy link

@janiversen Sounds good. I have a branch ready to simply try the latest version of the library; however, I have held off on even trying that yet as I will need to be able to thoroughly explain the issue currently observed on version = "2.5.3".

I have enabled pymodbus' logging at the DEBUG level and am currently designing a clean/simplified test set up. I know logs from 2.5.3 may not be the most useful, but I will still provide details of my test setup and those logs here in case they yield value. I will then update to version = "3.1.3", repeat the experiment, and provide logs if the issue is still present.

@janiversen
Copy link
Collaborator

logs from v2.5.3 does not help us, with the issue. Logs should come from the newest version (3.3.0) or dev, reason being that large parts of the code have changed.

@dlmoffett
Copy link

dlmoffett commented Mar 9, 2023

@janiversen Here is the test setup:

We have 2x of our server/slave devices multi-dropped off of the same RS-485 physical bus. Their server/slave addresses are

  • 0x7c
  • 0x9b

All devices on the bus (both server/slaves, and the client/master) are using USB-to-RS-458 donglers which already suppress local echo for us.

Both of these server/slave devices are running

[[package]]
name = "pymodbus"
version = "3.2.0"
description = "A fully featured modbus protocol stack in python"
category = "main"
optional = false
python-versions = ">=3.8.0"
files = [
    {file = "pymodbus-3.2.0-py2.py3-none-any.whl", hash = "sha256:ca954cf0c7e88e7cfc0d273960919b68090e72e2a88f39d2a3b50f4ab8eaf2a9"},
    {file = "pymodbus-3.2.0.tar.gz", hash = "sha256:5769ac9a00ffc154e1761c69299c1968dcab1c2a61741b1caff89c4aea81dbcf"},
]

with pymodbus's logging enabled; however, the logging format being used is our own and not pymodbus's - so there will be slight differences in what you expect to see. If critical I'm happy to impose your logging format on our system temporarily and repeat the experiment.

The problem is still present on this version of the lib.

Our modbus client/master (modscan) is configured to alternate requesting the value of a single holding register from each server/slave. For this test, it was configured to consider a request to have timed out after 400ms. It should wait 1500ms between requests; though it's currently unclear to me if it respects that fully. I will be setting up an oscope to confirm. Alternately I can hack together a client/master script to be damn sure of timings.

Either way, the following is the output of the logs from server/slave 0x7c which shows a mix of both correct behavior and incorrect. I've manually annotated after the fact for my own understanding:

### We receive a properly framed command for us (0x7c) and respond correctly ###
[2023-03-09 19:29:49 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x7c 0x3 0x0 0x0 0x0 0x1 0x8e 0x27
[2023-03-09 19:29:49 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Getting Frame - 0x3 0x0 0x0 0x0 0x1
[2023-03-09 19:29:49 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Factory Request[ReadHoldingRegistersRequest': 3]
[2023-03-09 19:29:49 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Frame advanced, resetting header!!
[2023-03-09 19:29:49 +0000] [DEBUG] [140253118510912] [pymodbus.logging] validate: fc-[3] address-1: count-1
[2023-03-09 19:29:49 +0000] [DEBUG] [140253118510912] [pymodbus.logging] getValues: fc-[3] address-1: count-1
[2023-03-09 19:29:49 +0000] [DEBUG] [140253118510912] [pymodbus.logging] send: [ReadHoldingRegistersResponse (1)]- b'7c03020000d44e'
[2023-03-09 19:29:49 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Frame - [b"|\x03\x00\x00\x00\x01\x8e'"] not ready


### We receive a properly framed command for them (0x9b) and discard correctly ###
[2023-03-09 19:29:51 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x9b 0x3 0x0 0x0 0x0 0x1 0x99 0xf0
[2023-03-09 19:29:51 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Not a valid unit id - 155, ignoring!!
[2023-03-09 19:29:51 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Resetting frame - Current Frame in buffer - 0x9b 0x3 0x0 0x0 0x0 0x1 0x99 0xf0


### We receive a properly framed response from them (0x9b) and do what? I'm not sure, but seems there is an opportunity to discard, here ###
[2023-03-09 19:29:51 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x9b 0x3 0x2 0x0 0x0 0xe0 0x58
[2023-03-09 19:29:51 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Frame - [b'\x9b\x03\x02\x00\x00\xe0X'] not ready


### We receive a properly framed command for us (0x7c) and do what? I'm not totally sure but it appears that we concatenate it with the previous response from them (0x9b). We however, do not respond and timeout instead. ###
[2023-03-09 19:29:52 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x7c 0x3 0x0 0x0 0x0 0x1 0x8e 0x27
[2023-03-09 19:29:52 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Frame check failed, ignoring!!
[2023-03-09 19:29:52 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Resetting frame - Current Frame in buffer - 0x9b 0x3 0x2 0x0 0x0 0xe0 0x58 0x7c 0x3 0x0 0x0 0x0 0x1 0x8e 0x27


### We receive a partial command for them (0x9b) - lacking CRC bytes. ###
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x9b 0x3 0x0 0x0 0x0 0x1
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Frame - [b'\x9b\x03\x00\x00\x00\x01'] not ready


### We later receive the CRC bytes for the previous command for them (0x9b), concatenate them (I'm assuming), and discard correctly. ###
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x99 0xf0
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Not a valid unit id - 155, ignoring!!
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Resetting frame - Current Frame in buffer - 0x9b 0x3 0x0 0x0 0x0 0x1 0x99 0xf0


### We receive a properly framed response from them (0x9b) and again, do what I'm not sure. ###
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x9b 0x3 0x2 0x0 0x0 0xe0 0x58
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Frame - [b'\x9b\x03\x02\x00\x00\xe0X'] not ready

### Again we receive a properly framed command for us but fail to respond with the same syptoms as above. ###
[2023-03-09 19:29:55 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x7c 0x3 0x0 0x0 0x0 0x1 0x8e 0x27
[2023-03-09 19:29:55 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Frame check failed, ignoring!!
[2023-03-09 19:29:56 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Resetting frame - Current Frame in buffer - 0x9b 0x3 0x2 0x0 0x0 0xe0 0x58 0x7c 0x3 0x0 0x0 0x0 0x1 0x8e 0x27

### Etc... ###
[2023-03-09 19:29:57 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x9b 0x3 0x0 0x0 0x0 0x1 0x99
[2023-03-09 19:29:57 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Frame - [b'\x9b\x03\x00\x00\x00\x01\x99'] not ready
[2023-03-09 19:29:57 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0xf0

I'm 100% happy to attempt some monkey patching if you can suggest obvious places in code to patch (in fact I'll probably start digging through your guys' source to do that, anyways).

@janiversen
Copy link
Collaborator

I actually wrote that the problem is not solved in the current lib. I am very surprised to see that the server do not break down, that does not make sense.

Your analysis do help to pinpoint what we have to do, but I am back in square one because the server do not break.

@dlmoffett
Copy link

Correct, the server does appear to continue operation, and does appear to eventually "re-synchronize", later responding to messages.

So I would say the timeouts appear intermittent. Anecdotally, playing with timeout, and interval parameters on the client/master does change the frequency with with the server/slaves exhibit this problem.

As I said, I will probably start looking into your guys' source tonight or tomorrow to see if I can put together some solutions or at least further my understanding of the problem.

Any suggestions of where to start looking would help me out as I'm currently totally unfamiliar with your guys' source code ;p

@dlmoffett
Copy link

Looking at this line specifically:

### We receive a properly framed response from them (0x9b) and do what? I'm not sure, but seems there is an opportunity to discard, here ###
[2023-03-09 19:29:51 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x9b 0x3 0x2 0x0 0x0 0xe0 0x58
[2023-03-09 19:29:51 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Frame - [b'\x9b\x03\x02\x00\x00\xe0X'] not ready <--------------------THIS LINE

suggests to me that you are not currently detecting that this is actually a complete/ready frame (just from another device) and choose to hold onto it in some sort of buffer upon which later data is concatenated. To me I'm optimistic, that there is a quick fix here.

I say this because later in these same logs we see this:

### We receive a partial command for them (0x9b) - lacking CRC bytes. ###
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x9b 0x3 0x0 0x0 0x0 0x1
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Frame - [b'\x9b\x03\x00\x00\x00\x01'] not ready   <-------------NOTE THIS MESSAGE LACKS CRC BYTES AND IS IN FACT INCOMPLETE


### We later receive the CRC bytes for the previous command for them (0x9b), concatenate them (I'm assuming), and discard correctly. ###
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Handling data: 0x99 0xf0  <----------- WE LATER GET THE BYTES AND CONCATENATE THEM WITH THE BYTES ABOVE (I THINK) AND PROPERLY DETECT THE COMMAND TO THE OTHER SLAVE
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Not a valid unit id - 155, ignoring!!
[2023-03-09 19:29:54 +0000] [DEBUG] [140253118510912] [pymodbus.logging] Resetting frame - Current Frame in buffer - 0x9b 0x3 0x0 0x0 0x0 0x1 0x99 0xf0

@janiversen
Copy link
Collaborator

Looking forward to see a pull request.

@janiversen janiversen modified the milestones: Version 3.4, Version 3.5 Mar 10, 2023
@dlmoffett
Copy link

dlmoffett commented Mar 14, 2023

I have a patchover for this issue that works for our specific use cases and allows my slave devices to participate performantly on an industrial multi-drop serial RTU bus (but not completely without an occasional timeout).

I will share it here in case others are in immediate need; however, I suspect this is not the right way to solve this issue, so use at your own risk. YMMV.

This solution assumes the device you are operating on is fast enough to never fall behind. I.e. it can flush the entire buffer containing responses from other devices well before a command intended for it comes along. Per some of the discussion suggested by @peufeu2 I don't know that we'll be able to fully solve this issue in python. Depending on what else your software is doing in other co-routines, and also how busy your USB bus is (assuming you're using a USB-RS485 conversion) there is no guarantee your software wont ever flush a message intended for it along with commands/responses for/from other devices.

Practically speaking, I see less than 1% timeouts from my device with the following code on a very busy bus. However, I have the ability to control the polling interval (time between commands form the master to any device) and increase it to make it more favorable for my device allowing it more time to flush out garbage to/from other devices. If you don't have control of your master and it's running in extremely low latency you will have much more timeouts.

Here's the patched over ModbusRTUFramer

import logging

from pymodbus.framer.rtu_framer import ModbusRtuFramer

log = logging.getLogger(__name__)


class PatchedModbusRtuFramer(ModbusRtuFramer):
    def isFrameIntendedForUs(self, units: list[int]):
        try:
            unit = self._buffer[0]
            return unit in units
        except IndexError as e:
            return True

    def processIncomingPacket(self, data, callback, unit, **kwargs):
        if not isinstance(unit, (list, tuple)):
            unit = [unit]
        self.addToFrame(data)
        single = kwargs.get("single", False)
        while True:
            if not self.isFrameIntendedForUs(unit):
                self.resetFrame()
                log.info(f"❌ Frame - [{data}] not intended for us, ignoring!!")
                break
            elif self.isFrameReady():
                if self.checkFrame():
                    if self._validate_unit_id(unit, single):
                        self._process(callback)
                        log.info(f"✅ Frame - [{data}] responded to!!")
                    else:
                        header_txt = self._header["uid"]
                        log.info(f"Not a valid unit id - {header_txt}, ignoring!!")
                        self.resetFrame()
                        break
                else:
                    log.info("Frame check failed, ignoring!!")
                    self.resetFrame()
                    break
            else:
                log.debug(f"Frame - [{data}] not ready")
                break

Then you would use this like this (assuming StartAsyncSerialServer worked in version 3.2 per #1434):

        return await StartAsyncSerialServer(
            identity=_identity,
            context=_context,
            framer=PatchedModbusRtuFramer,
            port=port,
            stopbits=stop_bits,
            bytesize=byte_size,
            parity=parity,
            baudrate=baud,
            ignore_missing_slaves=True,
        )

As I said, I'm not confident that this solution doesn't have other unintended side effects as I'm not fully aware of the architecture of the library.

@janiversen
Copy link
Collaborator

Thanks for sharing, I will take a closer look soon and see if it can be integrated without sideeffects.

We have an unexpected 3.2.1 coming very shortly, and if possible I would like to include this.

@janiversen
Copy link
Collaborator

Do you have a possibility to make a couple of traces of:

  1. a normal situation, with multiple slaves
  2. the abnormal situation you want covered (might need a few words of text)

I can easily break the normal situation into e.g. packet not received in one go.

No need for big traces on the contrary small is beautiful.

I could use such a trace to make a test case the tests ProcessIncommingPacket

I have started to take a deeper look at your code, and while at it solve a couple of small problems in the framers.

Did you note we have a method very similar to your new method _validate_slave_id() in ModbusFramer.

@janiversen janiversen modified the milestones: Version 3.5, Version 3.3 Mar 14, 2023
@dlmoffett
Copy link

dlmoffett commented Mar 16, 2023

@janiversen Here are my test cases:

from unittest import mock

import pytest
from pymodbus.server.async_io import ServerDecoder

from fix_pymodbus.patched_modbus_rtu_framer import PatchedModbusRtuFramer


@pytest.fixture
def framer() -> PatchedModbusRtuFramer:
    return PatchedModbusRtuFramer(ServerDecoder())


@pytest.fixture
def callback() -> mock.Mock:
    return mock.Mock()


expected_unit = [2]
good_frame = b"\x02\x03\x00\x01\x00}\xd4\x18"


def test_complete_frame(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    serial_event = good_frame
    framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_complete_frame_bad_crc(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    serial_event = b"\x02\x03\x00\x01\x00}\xd4\x19"  # Manually mangled crc
    framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_not_called()


def test_complete_frame_wrong_unit(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    serial_event = (
        b"\x01\x03\x00\x01\x00}\xd4+"  # Frame with good CRC but other unit id
    )
    framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_not_called()


def test_big_split_response_frame_from_other_unit(
    framer: PatchedModbusRtuFramer, callback: mock.Mock
):
    # This is a single *response* from device id 1 after being queried for 125 holding register values
    # Because the response is so long it spans several serial events
    serial_events = [
        b"\x01\x03\xfa\xc4y\xc0\x00\xc4y\xc0\x00\xc4y\xc0\x00\xc4y\xc0\x00\xc4y\xc0\x00Dz\x00\x00C\x96\x00\x00",
        b"?\x05\x1e\xb8DH\x00\x00D\x96\x00\x00D\xfa\x00\x00DH\x00\x00D\x96\x00\x00D\xfa\x00\x00DH\x00",
        b"\x00D\x96\x00\x00D\xfa\x00\x00B\x96\x00\x00B\xb4\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00N,",
    ]
    for serial_event in serial_events:
        framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_not_called()


def test_split_frame(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    serial_events = [good_frame[:5], good_frame[5:]]
    for serial_event in serial_events:
        framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_complete_frame_trailing_data_without_unit_id(
    framer: PatchedModbusRtuFramer, callback: mock.Mock
):
    garbage = b"\x05\x04\x03"  # Note the garbage doesn't contain our unit id
    serial_event = garbage + good_frame
    framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_complete_frame_trailing_data_with_unit_id(
    framer: PatchedModbusRtuFramer, callback: mock.Mock
):
    garbage = b"\x05\x04\x03\x02\x01\x00"  # Note the garbage does contain our unit id
    serial_event = garbage + good_frame
    framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_split_frame_trailing_data_with_unit_id(
    framer: PatchedModbusRtuFramer, callback: mock.Mock
):
    garbage = b"\x05\x04\x03\x02\x01\x00"
    serial_events = [garbage + good_frame[:5], good_frame[5:]]
    for serial_event in serial_events:
        framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_wrapped_frame(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    garbage = b"\x05\x04\x03\x02\x01\x00"
    serial_event = garbage + good_frame + garbage
    framer.processIncomingPacket(serial_event, callback, expected_unit)

    # We probably should not respond in this case; in this case we've likely become desynchronized
    # i.e. this probably represents a case where a command came for us, but we didn't get
    # to the serial buffer in time (some other co-routine or perhaps a block on the USB bus)
    # and the master moved on and queried another device
    callback.assert_not_called()


def test_frame_with_trailing_data(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    garbage = b"\x05\x04\x03\x02\x01\x00"
    serial_event = good_frame + garbage
    framer.processIncomingPacket(serial_event, callback, expected_unit)

    # We should not respond in this case for identical reasons as test_wrapped_frame
    callback.assert_not_called()

Note that I was just able to further patch the ModbusRtuFramer to get us up to a true 100% synchronization/participation on the bus I'm working with right now (I believe).

Here is the further patched framer code:

import logging

from pymodbus.framer.rtu_framer import ModbusRtuFramer
from pymodbus.utilities import checkCRC
import struct

log = logging.getLogger(__name__)


class PatchedModbusRtuFramer(ModbusRtuFramer):
    def isFrameIntendedForUs(self, units: list[int]):
        try:
            unit = self._buffer[0]
            return unit in units
        except IndexError as e:
            return True

    def advanceToNextOccurrenceOfUnit(self, units: list[int]):
        j = None
        for i, b in enumerate(self._buffer):
            if b in units and i > 0:
                j = i
                break

        if j:
            self._buffer = self._buffer[j:]
        else:
            self._buffer = b""

        self._header = {"uid": 0x00, "len": 0, "crc": b"\x00\x00"}
        return len(self._buffer)

    def checkFrame(self):
        try:
            self.populateHeader()
            frame_size = self._header["len"]

            if len(self._buffer) > frame_size:
                # This means there are bytes *after* a valid frame inteded to us.
                # If there are bytes after that means we've probably de-sychronized
                # and the master has considered us as having timed out and moved on.
                # That or there is somehow a random sequence of bytes which looks like
                # a real command to us and magically a crc.
                # In either case, we must not respond.
                return False

            data = self._buffer[: frame_size - 2]
            crc = self._header["crc"]
            crc_val = (int(crc[0]) << 8) + int(crc[1])
            return checkCRC(data, crc_val)
        except (IndexError, KeyError, struct.error):
            return False

    def processIncomingPacket(self, data, callback, unit, **kwargs):
        if not isinstance(unit, (list, tuple)):
            unit = [unit]
        self.addToFrame(data)
        single = kwargs.get("single", False)
        while True:
            if not self.isFrameIntendedForUs(unit):
                if self.advanceToNextOccurrenceOfUnit(unit) == 0:
                    log.info(f"❌ Frame - [{data}] not intended for us, ignoring!!")
                    break
            elif self.isFrameReady():
                if self.checkFrame():
                    if self._validate_unit_id(unit, single):
                        self._process(callback)
                        log.info(f"✅ Frame - [{data}] responded to!!")
                    else:
                        header_txt = self._header["uid"]
                        log.info(f"Not a valid unit id - {header_txt}, ignoring!!")
                        self.resetFrame()
                        break
                else:
                    log.info("Frame check failed, ignoring!!")
                    if self.advanceToNextOccurrenceOfUnit(unit) == 0:
                        break
            else:
                log.info(f"Frame - [{data}] not ready")
                break

...yes I know this is some real "fast and loose" hacky code, but I'm under the gun to just make something work, right now. These snippets are provided only for reference/guidance purposes. I'm sure there are much cleaner ways to work this into the lib's code.

@janiversen
Copy link
Collaborator

Thanks for the test, would you like to submit them as a pull request and get credited.

@dlmoffett
Copy link

Further updates here. Once I put the above PatchedModbusRtuFramer on a bus with lots of other traffic I was still seeing occasional timeouts due to apparent false positive messages. I have a (hopefully) final update. This passed all of our acceptance tests for real devices on a real bus with lots and lots of real traffic (19200 baud):

import logging
import struct

from pymodbus.framer.rtu_framer import ModbusRtuFramer
from pymodbus.utilities import checkCRC

log = logging.getLogger(__name__)


class PatchedModbusRtuFramer(ModbusRtuFramer):
    def isFrameIntendedForUs(self, units: list[int]):
        try:
            # Always validate that the unit id and function code
            unit = self._buffer[0]
            fc = self._buffer[1]

            valid_unit_id = unit in units
            valid_function_code = fc in [2, 1, 5, 15, 4, 3, 6, 16, 43]

            if fc in [15, 16]:
                # These commands will have arbitrary payload lengths
                # Further validation is beneficial
                n_registers = int.from_bytes(self._buffer[4:6], byteorder="big")
                n_bytes = self._buffer[6]

                expected_number_of_bytes = n_registers * 2

                valid_number_of_registers = n_registers <= 123
                valid_number_of_bytes = n_bytes == expected_number_of_bytes
                return (
                    valid_unit_id
                    and valid_function_code
                    and valid_number_of_registers
                    and valid_number_of_bytes
                )
            return valid_unit_id and valid_function_code

        except IndexError as e:
            return True

    def advanceToNextOccurrenceOfUnit(self, units: list[int]):
        j = None
        for i, b in enumerate(self._buffer):
            if b in units and i > 0:
                j = i
                break

        if j:
            log.debug("⏭")
            self._buffer = self._buffer[j:]
        else:
            log.debug("🗑")
            self._buffer = b""

        self._header = {"uid": 0x00, "len": 0, "crc": b"\x00\x00"}
        return len(self._buffer)

    def checkFrame(self):
        try:
            self.populateHeader()
            frame_size = self._header["len"]

            if len(self._buffer) > frame_size:
                # This means there are bytes *after* a valid frame inteded to us.
                # If there are bytes after that means we've probably de-sychronized
                # and the master has considered us as having timed out and moved on.
                # That or there is somehow a random sequence of bytes which looks like
                # a real command to us and magically a crc.
                # In either case, we must not respond.
                return False

            data = self._buffer[: frame_size - 2]
            crc = self._header["crc"]
            crc_val = (int(crc[0]) << 8) + int(crc[1])
            return checkCRC(data, crc_val)
        except (IndexError, KeyError, struct.error):
            return False

    def processIncomingPacket(self, data, callback, unit, **kwargs):
        log.debug(f"🌟 {data.hex(':')}")
        if not isinstance(unit, (list, tuple)):
            unit = [unit]
        self.addToFrame(data)
        single = kwargs.get("single", False)
        while True:
            log.debug(f"🧪 {self._buffer.hex(':')}")
            if not self.isFrameIntendedForUs(unit):
                if self.advanceToNextOccurrenceOfUnit(unit) == 0:
                    break
            elif self.isFrameReady():
                if self.checkFrame():
                    if self._validate_unit_id(unit, single):
                        log.debug(f"✅ {self._buffer.hex(':')}")
                        self._process(callback)
                    else:
                        self.resetFrame()
                        break
                else:
                    log.debug("Frame check failed, ignoring!!")
                    if self.advanceToNextOccurrenceOfUnit(unit) == 0:
                        break
            else:
                log.debug(f"⌛ {self._buffer.hex(':')}")
                break

I only added a couple more unit test; however I should probably have added several more surrounding the extra logic added to isFrameIntendedForUs:

from unittest import mock

import pytest
from pymodbus.server.async_io import ServerDecoder

from fix_pymodbus.patched_modbus_rtu_framer import PatchedModbusRtuFramer


@pytest.fixture
def framer() -> PatchedModbusRtuFramer:
    return PatchedModbusRtuFramer(ServerDecoder())


@pytest.fixture
def callback() -> mock.Mock:
    return mock.Mock()


expected_unit = [2]
good_frame = b"\x02\x03\x00\x01\x00}\xd4\x18"


def test_complete_frame(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    serial_event = good_frame
    framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_complete_frame_bad_crc(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    serial_event = b"\x02\x03\x00\x01\x00}\xd4\x19"  # Manually mangled crc
    framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_not_called()


def test_complete_frame_wrong_unit(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    serial_event = (
        b"\x01\x03\x00\x01\x00}\xd4+"  # Frame with good CRC but other unit id
    )
    framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_not_called()


def test_big_split_response_frame_from_other_unit(
    framer: PatchedModbusRtuFramer, callback: mock.Mock
):
    # This is a single *response* from device id 1 after being queried for 125 holding register values
    # Because the response is so long it spans several serial events
    serial_events = [
        b"\x01\x03\xfa\xc4y\xc0\x00\xc4y\xc0\x00\xc4y\xc0\x00\xc4y\xc0\x00\xc4y\xc0\x00Dz\x00\x00C\x96\x00\x00",
        b"?\x05\x1e\xb8DH\x00\x00D\x96\x00\x00D\xfa\x00\x00DH\x00\x00D\x96\x00\x00D\xfa\x00\x00DH\x00",
        b"\x00D\x96\x00\x00D\xfa\x00\x00B\x96\x00\x00B\xb4\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
        b"\x00\x00\x00\x00\x00\x00\x00N,",
    ]
    for serial_event in serial_events:
        framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_not_called()


def test_split_frame(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    serial_events = [good_frame[:5], good_frame[5:]]
    for serial_event in serial_events:
        framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_complete_frame_trailing_data_without_unit_id(
    framer: PatchedModbusRtuFramer, callback: mock.Mock
):
    garbage = b"\x05\x04\x03"  # Note the garbage doesn't contain our unit id
    serial_event = garbage + good_frame
    framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_complete_frame_trailing_data_with_unit_id(
    framer: PatchedModbusRtuFramer, callback: mock.Mock
):
    garbage = b"\x05\x04\x03\x02\x01\x00"  # Note the garbage does contain our unit id
    serial_event = garbage + good_frame
    framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_split_frame_trailing_data_with_unit_id(
    framer: PatchedModbusRtuFramer, callback: mock.Mock
):
    garbage = b"\x05\x04\x03\x02\x01\x00"
    serial_events = [garbage + good_frame[:5], good_frame[5:]]
    for serial_event in serial_events:
        framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_coincidental_1(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    garbage = b"\x02\x14\x07"
    serial_events = [garbage, good_frame[:5], good_frame[5:]]
    for serial_event in serial_events:
        framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_coincidental_2(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    garbage = b"\x02\x10\x07"
    serial_events = [garbage, good_frame[:5], good_frame[5:]]
    for serial_event in serial_events:
        framer.processIncomingPacket(serial_event, callback, expected_unit)
    callback.assert_called_once()


def test_wrapped_frame(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    garbage = b"\x05\x04\x03\x02\x01\x00"
    serial_event = garbage + good_frame + garbage
    framer.processIncomingPacket(serial_event, callback, expected_unit)

    # We probably should not respond in this case; in this case we've likely become desynchronized
    # i.e. this probably represents a case where a command came for us, but we didn't get
    # to the serial buffer in time (some other co-routine or perhaps a block on the USB bus)
    # and the master moved on and queried another device
    callback.assert_not_called()


def test_frame_with_trailing_data(framer: PatchedModbusRtuFramer, callback: mock.Mock):
    garbage = b"\x05\x04\x03\x02\x01\x00"
    serial_event = good_frame + garbage
    framer.processIncomingPacket(serial_event, callback, expected_unit)

    # We should not respond in this case for identical reasons as test_wrapped_frame
    callback.assert_not_called()

NOTE: my hacky bolted on solution here is firmly in the "byte scanning" category. It looks at every byte as it comes to us and evaluates if it could be the beginning of a message to us based on various criteria. It basically doesn't rely on any sort of strict timing. So this is not really a modbus RTU implementation. It is my belief that we cannot ever guarantee that users using python and this library can properly detect the 28-bit silence mark condition, stay synchronized, and truly implement modbus RTU. In fact, I currently believe that can probably only be done on a microprocessor where priority interrupt number 1 is the UART and any other business logic only ever happens in the gaps between.

It's just too dependent physical hardware and even more: the user's code. If they ever do anything CPU bound and fail to do it in another process they are likely to become de-synchronized. If they ever do anything I/O bound in the same co-routine they are likely to become de-synchronized. If they are using a USB dongle for modbus, but also have heavy traffic on their USB bus they are likely to become de-synchronized. If their OS does OS things they are likely to become de-synchronized.

Long story short: I suggest that this library include sufficient disclaimers about modbus RTU, especially multi-drop RTU.

All that said, I'm no modbus expert, nor am I a python expert. So maybe it's possible to guarantee a more thorough solution.

@janiversen
Copy link
Collaborator

I am trying Ito keep up with you in #1435, by taking what you have done and convert it into a structure better suited for the library (and in some instances for python).

Seems I need to keep your code in parallel, so I can identify what you have changed.

feel free to add patches in #1435.

@janiversen
Copy link
Collaborator

I still see one bigger problem with your solution, and that is to detect the end of a frame, without trying to control the CRC after each byte.

In a direct client-server, when the clients sends a request it uses that to calculate the expected response size and therefore we not need to rely on timing.

In a multi-drop environment, I think we still need to look at every request to calculate the expected response (independent of whether it is for us), that will handle all the OK cases. On top of that your byte search is good (I actually used that in a couple of real world devices that are on the marked) but it needs to be coupled with a dynamic crc checker to see when we have a valid frame.

In real world it is impossible to avoid retransmissions due to bad lines etc, so that is not a worry I have, what is needed is to secure the sync is achieved relative fast.

@janiversen
Copy link
Collaborator

I am looking into modifying generateCRC so it can be called incrementally, that way we have a crc for the current buffer and can use that to control the next 2 bytes, if not match, move 1 byte into the buffer and try again.

@janiversen
Copy link
Collaborator

Looking at your new code, you miss a number of function codes, but the idea of looking for reduces the probability of a non-positive find a lot.

Did you have a look at the specification (it is referenced in our README), that might help you further.

@janiversen
Copy link
Collaborator

I think I have found a simpler way, based on your ideas, I will soon publish it in the PR.

@dlmoffett
Copy link

Yea for sure. Like I said I was just trying to make something work for our limited use cases, not necessarily make it work well.

@janiversen
Copy link
Collaborator

All your test cases work, a few changed functionality, and a number added.

I hope you will give it a test run (it will be released later as part of version 3.3, but it is on dev) !

If you see cases not covered, feel free to add test cases and I will try so solve them.

@dlmoffett
Copy link

Niiiice. I'll check out the PR. I have a rat's nest of 3 USB-RS485 dongles here multi-dropped on the same bus and some test code to run 1 master and 2 slaves on that bus. I can perhaps subject your branch to some real-system testing and let you now what I see.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants