Discussion:
Correction to the "Ethernet problem"
(too old to reply)
Leon Pollak
2008-01-30 13:57:10 UTC
Permalink
Sorry, the information about 7s was incorrect - now I received the case when
it was also insufficient. :(

But one thing is undoubted - when there is the first packet lost, there is
also doubled arp request. Below is the tcpdump printout (133=rtems, 57=pc):

Packet is lost:
arp who-has 192.168.50.133 tell 192.168.50.133 <====WHY IS THIS?
arp who-has 192.168.50.57 tell 192.168.50.133
arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
arp who-has 192.168.50.57 tell 192.168.50.133 <====WHY IS THIS?
arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto UDP (17), length 74)
192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
My Data:1111111111111111
IP (tos 0x0, ttl 64, id 3, offset 0, flags [none], proto UDP (17), length 74)
192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
My Data:2222222222222222


Packet is not lost:
arp who-has 192.168.50.133 tell 192.168.50.133
arp who-has 192.168.50.57 tell 192.168.50.133
arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP (17), length 74)
192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
My Data: 0000000000000000
IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto UDP (17), length 74)
192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
My Data: 1111111111111111
IP (tos 0x0, ttl 64, id 3, offset 0, flags [none], proto UDP (17), length 74)
192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
My Data: 2222222222222222


Any help, please?
--
Leon
Ian Caddy
2008-01-31 00:42:09 UTC
Permalink
Hi Leon,
Post by Leon Pollak
Sorry, the information about 7s was incorrect - now I received the case when
it was also insufficient. :(
But one thing is undoubted - when there is the first packet lost, there is
arp who-has 192.168.50.133 tell 192.168.50.133 <====WHY IS THIS?
This first ARP request is always done when you have a static IP address,
at least in the RTEMS stack.

It comes about through a function in if_ether.c called arp_ifinit which
is called from ether_ioctl (in if_ethersubr.c) when an IP address is
setup. I assume this is to let the rest of the network know who you are
and also maybe to check if there is an address conflict.
Post by Leon Pollak
arp who-has 192.168.50.57 tell 192.168.50.133
arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
arp who-has 192.168.50.57 tell 192.168.50.133 <====WHY IS THIS?
arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
OK, I think I might have sorted this one out as well!

How quickly do you send your UDP packets? Do you wait for a response
from the other end before sending your second packet? If not, I think
you are sending your second UDP packet before the first one has actually
gone.

Let me explain. All the work for ARP is performed in arpresolve (in
cpukit/libnetworking/netinet/if_ether.c)

arpresolve will check the current arp table and if a valid entry is
there, it will return the ether address and return code of 1. This
means that the higher level can send the packet. If the ether address
is not yet resolved, the packet (mbuf) will be held for transmission in
an arptable holding buffer and the return code of 0 will be sent. This
tells the higher level not to worry about this packet, it will be send
by ARP once it receives the correct ether address.

The problem is that if you send another packet, *before* the arp
response is seen, that first packet is ditched, and replaced with the
new packet, and another ARP request sent. In arpresolve, look for
la->la_hold which is the last packet (mbuf) that you had provided. If
it exists it will be freed.

The simple solution to your problem, is to wait slightly after your
first UDP packet (for the other end to provide an ARP response) and you
should be fine.

Remember, UDP is not a guaranteed packet mechanism. The stack does not
guarantee to send any UDP packet and it is upto you (application) to
ensure that the packet arrives at the other end. In this case, you
would be looking for some sort of response from the other end before
sending your 1 packet or some other mechanism of recovery.

I hope this helps.

regards,

Ian Caddy
--
Ian Caddy
Goanna Technologies Pty Ltd
+61 8 9444 2634
Leon Pollak
2008-01-31 07:30:32 UTC
Permalink
Hello, Ian.

Thank you for your detailed explanation - it is very useful to know the way
the stack works. Just it seems rather strange that packets are not queued,
but have only one depth back log...

But, to my pity, this is not the problem in my case - I have 3s pause between
sequential sends.
I produced the printout with time information:

21:10:01.552784 arp who-has 192.168.50.133 tell 192.168.50.133
21:10:01.655041 arp who-has 192.168.50.57 tell 192.168.50.133
21:10:01.655067 arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
21:10:04.656151 arp who-has 192.168.50.57 tell 192.168.50.133
21:10:04.656174 arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
21:10:04.657101 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto UDP
(17), length 74) 192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
Data:11111111111111111111 (must be 000000 !!! - packet lost)
21:10:07.656255 IP (tos 0x0, ttl 64, id 3, offset 0, flags [none], proto UDP
(17), length 74) 192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
Data:22222222222222222222

For some reason, together with missed arp also the first packet is missed!

If I make a pause of 7сек before opening the socket, then:

21:18:25.784785 arp who-has 192.168.50.133 tell 192.168.50.133
21:18:32.887687 arp who-has 192.168.50.57 tell 192.168.50.133
21:18:32.887716 arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
21:18:32.888663 IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP
(17), length 74) 192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
Data: 0000000000000 !!!
21:18:35.888739 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto UDP
(17), length 74) 192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
Data: 1111111111111111
21:18:38.889054 IP (tos 0x0, ttl 64, id 3, offset 0, flags [none], proto UDP
(17), length 74) 192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
Data: 2222222222222222222

But this works not always too!:
21:18:04.378562 arp who-has 192.168.50.133 tell 192.168.50.133
21:18:11.481686 arp who-has 192.168.50.57 tell 192.168.50.133
21:18:11.481716 arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
21:18:14.482711 arp who-has 192.168.50.57 tell 192.168.50.133
21:18:14.482739 arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
21:18:14.483658 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto UDP
(17), length 74) 192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
Data: 1111111111111111
21:18:17.483039 IP (tos 0x0, ttl 64, id 3, offset 0, flags [none], proto UDP
(17), length 74) 192.168.50.133.33100 > 192.168.50.57.33100: UDP, length 46
Data: 2222222222222222222
Post by Ian Caddy
Hi Leon,
Post by Leon Pollak
Sorry, the information about 7s was incorrect - now I received the case
when it was also insufficient. :(
But one thing is undoubted - when there is the first packet lost, there
is also doubled arp request. Below is the tcpdump printout (133=rtems,
arp who-has 192.168.50.133 tell 192.168.50.133 <====WHY IS THIS?
This first ARP request is always done when you have a static IP address,
at least in the RTEMS stack.
It comes about through a function in if_ether.c called arp_ifinit which
is called from ether_ioctl (in if_ethersubr.c) when an IP address is
setup. I assume this is to let the rest of the network know who you are
and also maybe to check if there is an address conflict.
Post by Leon Pollak
arp who-has 192.168.50.57 tell 192.168.50.133
arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
arp who-has 192.168.50.57 tell 192.168.50.133 <====WHY IS THIS?
arp reply 192.168.50.57 is-at 00:19:db:ed:16:94
OK, I think I might have sorted this one out as well!
How quickly do you send your UDP packets? Do you wait for a response
from the other end before sending your second packet? If not, I think
you are sending your second UDP packet before the first one has actually
gone.
Let me explain. All the work for ARP is performed in arpresolve (in
cpukit/libnetworking/netinet/if_ether.c)
arpresolve will check the current arp table and if a valid entry is
there, it will return the ether address and return code of 1. This
means that the higher level can send the packet. If the ether address
is not yet resolved, the packet (mbuf) will be held for transmission in
an arptable holding buffer and the return code of 0 will be sent. This
tells the higher level not to worry about this packet, it will be send
by ARP once it receives the correct ether address.
The problem is that if you send another packet, *before* the arp
response is seen, that first packet is ditched, and replaced with the
new packet, and another ARP request sent. In arpresolve, look for
la->la_hold which is the last packet (mbuf) that you had provided. If
it exists it will be freed.
The simple solution to your problem, is to wait slightly after your
first UDP packet (for the other end to provide an ARP response) and you
should be fine.
Remember, UDP is not a guaranteed packet mechanism. The stack does not
guarantee to send any UDP packet and it is upto you (application) to
ensure that the packet arrives at the other end. In this case, you
would be looking for some sort of response from the other end before
sending your 1 packet or some other mechanism of recovery.
I hope this helps.
regards,
Ian Caddy
--
            Dr.Leon M.Pollak
                Director
       PLR Information Systems Ltd.
Tel.:+972-98657670  |  POB 8130, H'Aomanut 9,
Fax.:+972-98657621  |  Poleg Industrial Zone,
Mob.:+972-544739246 |  Netanya, 42160, Israel.
Thomas Doerfler
2008-01-31 07:44:54 UTC
Permalink
Leon,
Post by Leon Pollak
Hello, Ian.
Thank you for your detailed explanation - it is very useful to know the way
the stack works. Just it seems rather strange that packets are not queued,
but have only one depth back log...
But, to my pity, this is not the problem in my case - I have 3s pause between
sequential sends.
obviously we were looking in the wrong direction? Maybe you do not have
a transmit problem, but a receive problem. Your system sends an ARP
request, an ARP response is sent, but since your system repeats the ARP
request for the next packet, it did not receive the ARP response properly.

look at the networking statistics once again, after a "good" and after a
bad case. Maybe in the bad case one ARP packet was lost? Now the
question would be whether it was lost already on driver level, or it is
somewhat altered so it is not recognized properly in the network stack
as an ARP packet.

By the way (I like shooting arrows in the dark) how did you ensure, that
the cache and the networking SDMA work coherent (think about setting the
"GBL" bit in the "FCRx" Function code registers of the parameter ram.

wkr,
Thomas.
--
--------------------------------------------
embedded brains GmbH
Thomas Doerfler Obere Lagerstr. 30
D-82178 Puchheim Germany
Tel. : +49-89-18 90 80 79-2
Fax : +49-89-18 90 80 79-9
email: ***@embedded-brains.de
PGP public key available on request

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Leon Pollak
2008-01-31 18:54:21 UTC
Permalink
Thomas, hello.

Sorry to turn to you solely - I think that nobody understands this like you,
if at all.

Your last letter gave me the hint and all the day i studied and maid
experiments. The results are:

1. After passing the initial problem, the driver (and the rest) works fine - I
did very extensive and heavy tests both on speed and load. So, the problem is
only in the first step.

2. The initial problem looks strange - neither driver nor controller do not
see a problem. I mean that when driver says that there were no interrupts,
controller does not say that the packets were. This is correct for both rx
and tx, as I discovered that also tx packets do not exit the unit!

3. Now, all this leads me to the following: as I am totally zero in PHYs, I
should like to ask you - is it possible that after reset the PHY is not ready
and thus some i/o that is done helps it to come to the working state?

I looked into our "National 8349" PHY's manual and saw that it comes up in the
default auto-negotiation state. And I can see on my switch that it detects it
as 100BaseT line (and when I connected to the 10BasteT hub, it also
auto-configured itself). But may be this is insufficient for it and some
actions must be done?

As it is my first time with 100BaseT and PHY, I am lost...

Thank you very much for your willing to help...

Best Regards

Leon
Post by Thomas Doerfler
Leon,
Post by Leon Pollak
Hello, Ian.
Thank you for your detailed explanation - it is very useful to know the
way the stack works. Just it seems rather strange that packets are not
queued, but have only one depth back log...
But, to my pity, this is not the problem in my case - I have 3s pause
between sequential sends.
obviously we were looking in the wrong direction? Maybe you do not have
a transmit problem, but a receive problem. Your system sends an ARP
request, an ARP response is sent, but since your system repeats the ARP
request for the next packet, it did not receive the ARP response properly.
look at the networking statistics once again, after a "good" and after a
bad case. Maybe in the bad case one ARP packet was lost? Now the
question would be whether it was lost already on driver level, or it is
somewhat altered so it is not recognized properly in the network stack
as an ARP packet.
By the way (I like shooting arrows in the dark) how did you ensure, that
the cache and the networking SDMA work coherent (think about setting the
"GBL" bit in the "FCRx" Function code registers of the parameter ram.
wkr,
Thomas.
Ian Caddy
2008-02-01 01:07:05 UTC
Permalink
Hi Leon,
Post by Leon Pollak
Thomas, hello.
Sorry to turn to you solely - I think that nobody understands this like you,
if at all.
Your last letter gave me the hint and all the day i studied and maid
1. After passing the initial problem, the driver (and the rest) works fine - I
did very extensive and heavy tests both on speed and load. So, the problem is
only in the first step.
Yes, I agree with Thomas, that you look like you are loosing that first
receive packet containing the ARP response sometimes.
Post by Leon Pollak
2. The initial problem looks strange - neither driver nor controller do not
see a problem. I mean that when driver says that there were no interrupts,
controller does not say that the packets were. This is correct for both rx
and tx, as I discovered that also tx packets do not exit the unit!
Sorry, I didn't quite understand this bit. Are you saying that at the
start there are no tx packets as well? What about the ARP request that
was sent out? We need to find out what happens to the ARP response.

Is your Ethernet MAC driver your own or are you using a standard one?

If your own are you sure you are initing the Rx portion at the same time
as the Tx portion?

Is there a way to breakpoint your system on the first receive packet
that you get and trace it up into the stack? This would show whether
the ARP response was at least getting into your firmware and not being
lost somewhere else. (Due to hardware not inited or something similar).
Post by Leon Pollak
3. Now, all this leads me to the following: as I am totally zero in PHYs, I
should like to ask you - is it possible that after reset the PHY is not ready
and thus some i/o that is done helps it to come to the working state?
I looked into our "National 8349" PHY's manual and saw that it comes up in the
default auto-negotiation state. And I can see on my switch that it detects it
as 100BaseT line (and when I connected to the 10BasteT hub, it also
auto-configured itself). But may be this is insufficient for it and some
actions must be done?
I can't see it being the PHY. They will not transmit or receive before
they have finished the auto-negotiation process. As far as I know they
don't bring up the transmit and receive paths differently. If you can
transmit out of a PHY (your ARP request) then I am sure the PHY would
also be able to receive.

regards,

Ian Caddy
--
Ian Caddy
Goanna Technologies Pty Ltd
+61 8 9444 2634
Leon Pollak
2008-02-04 11:10:47 UTC
Permalink
Thanks, Ian and Thomas.
Below are my thought and test results.
Post by Ian Caddy
Post by Leon Pollak
2. The initial problem looks strange - neither driver nor controller do
not see a problem. I mean that when driver says that there were no
interrupts, controller does not say that the packets were. This is
correct for both rx and tx, as I discovered that also tx packets do not
exit the unit!
Sorry, I didn't quite understand this bit. Are you saying that at the
start there are no tx packets as well? What about the ARP request that
was sent out? We need to find out what happens to the ARP response.
I noticed that very rarely, but tx packet is not shown by tcpdump too.
Post by Ian Caddy
Is your Ethernet MAC driver your own or are you using a standard one?
I can say that it is my own. But it strictly tries to follow the mbx860
driver.
Post by Ian Caddy
If your own are you sure you are initing the Rx portion at the same time
as the Tx portion?
Even earlier.
Post by Ian Caddy
Is there a way to breakpoint your system on the first receive packet
that you get and trace it up into the stack? This would show whether
the ARP response was at least getting into your firmware and not being
lost somewhere else. (Due to hardware not inited or something similar).
Yes, there is. And I did this and saw that the first in-packet is passed and
processed as required. BUT THE FIRST RECEIVED IS NOT THE FIRST SENT!
The sniffer shows that there was a packet sent to the box, but there are no
traces of this event in the box - neither HW nor SW say they saw it.
Post by Ian Caddy
Post by Leon Pollak
3. Now, all this leads me to the following: as I am totally zero in PHYs,
I should like to ask you - is it possible that after reset the PHY is not
ready and thus some i/o that is done helps it to come to the working
state?
I looked into our "National 8349" PHY's manual and saw that it comes up
in the default auto-negotiation state. And I can see on my switch that it
detects it as 100BaseT line (and when I connected to the 10BasteT hub, it
also auto-configured itself). But may be this is insufficient for it and
some actions must be done?
I can't see it being the PHY. They will not transmit or receive before
they have finished the auto-negotiation process. As far as I know they
don't bring up the transmit and receive paths differently. If you can
transmit out of a PHY (your ARP request) then I am sure the PHY would
also be able to receive.
I am not specialist in this.
But what I see is that without any indications and observable causes the first
rx frame ALWAYS does not arrive after HW reset. And PHY's are connected to
the reset.

Contrary, if I simply rerun the code from the boot point by setting program
counter to the reset value, I NEVER loose a package!

--
Post by Ian Caddy
Normally the PHY should take some seconds (2-3) to negotiate the
transfer parameters (10/100MBit, Full/Halfduplex...) with the switch.
When does the PHY come out of reset? Together with the processor reset?
Yes, they both are connected to the same reset line.
Post by Ian Caddy
I agree with Ian, that the auto-negotiation time is not a hot candidate
for the problem, because I would expect that the TX side will work only,
when the RX side is ready, and obviously you can transmit properly.
Almost. Again, I observed the case when I did not see TX packet exiting,
although I can not reproduce this often.
Post by Ian Caddy
I have discussed things with my collegue Peter. He gave the hint, that
maybe the receive interrupt is not handled properly.
Rx and Tx interupts are handled in the same interrupt handler. How did
you organize it? Did you properly implement the case, that you enter
ther interrupt handler (due to the packet transmitted) and you have a
transmit event AND a receive event at the same time? You should check this.
The schema follows exactly the driver for mbx8260.
The ISR checks both interrupts independently and sequentially, starting from
RX interrupt.
I also checked that there are no pending/lost interrupts - this may be traced
in buffer descriptors too.
Post by Ian Caddy
Apart from this hint: maybe you can check/eliminate part of the
communication path, by directly connecting the PC and the RTEMS system
using a TP crossover cable.
I think this was the most productive idea..:)
To rework the cables was the problem, but I connected my PC and the box via
the old 10BaseT HUB and, meanwhile, I am not able to reproduce the problem -
nothing is lost!



So, aren't these the PHYs?
---------------------------

Many thanks to both of you for your help.
--
Leon
Joel Sherrill
2008-02-04 16:44:20 UTC
Permalink
Post by Leon Pollak
Post by Thomas Doerfler
Apart from this hint: maybe you can check/eliminate part of the
communication path, by directly connecting the PC and the RTEMS system
using a TP crossover cable.
I think this was the most productive idea..:)
To rework the cables was the problem, but I connected my PC and the box via
the old 10BaseT HUB and, meanwhile, I am not able to reproduce the problem -
nothing is lost!
Great. Finally something to hang onto. :)

If you use the same test network setup but replace the RTEMS
box with another PC, do you see the same behavior?

Do you have another brand of switch?

I am wondering if the switch is eating a packet that a hub doesn't.
We have been focusing on your driver and it may not be that.


--joel
Post by Leon Pollak
So, aren't these the PHYs?
---------------------------
Many thanks to both of you for your help.
--
Leon
_______________________________________________
rtems-users mailing list
http://rtems.rtems.org/mailman/listinfo/rtems-users
--
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherrill-***@public.gmane.org On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985
Thomas Doerfler
2008-02-04 19:16:10 UTC
Permalink
Leon,
Post by Leon Pollak
Post by Thomas Doerfler
Apart from this hint: maybe you can check/eliminate part of the
communication path, by directly connecting the PC and the RTEMS system
using a TP crossover cable.
I think this was the most productive idea..:)
To rework the cables was the problem, but I connected my PC and the box via
the old 10BaseT HUB and, meanwhile, I am not able to reproduce the problem -
nothing is lost!
When changing from 100Base-T Switch to a 10MBit Hub, you change several
things:
- The bit rate is slower
- All connections work in half-duplex instead of full duplex
- This also means you MIGHT get collisions (which is implossible, I
think, in full duplex)
- The Hub is a really dumb "repeater", which multiplies a bitstream
coming in to all ports in realtime
- A switch learns and forgets:

It learns, which MAC addresses are available at which port (which source
MAC addresses come from a certain port), and directs any packet only to
the port which corresponds to the destination MAC address. So, before it
can route a packet to your system, your system must have sent a packet
with its own MAC address in the source MAC address field. Only then the
switch knows where it is connected.

And the switch forgets: When it has not received a certain MAC address
for a while (about 15 minutes, I guess), then it erases the proper MAC
address from its routing table.

Can you check, wheter the packet loss only occurs after about 15 minutes
of non-traffic with your system?

In that case it would be worth to put in a different (higher-quality)
switch, which does the learning faster. Actually I never herad about a
switch learning too slow, but ethernet and its protocols rely on the
fact, that packet may get lost, so the effect would not be obvious in
most cases.

wkr,
Thomas.
Post by Leon Pollak
So, aren't these the PHYs?
---------------------------
Many thanks to both of you for your help.
--
--------------------------------------------
embedded brains GmbH
Thomas Doerfler Obere Lagerstr. 30
D-82178 Puchheim Germany
Tel. : +49-89-18 90 80 79-2
Fax : +49-89-18 90 80 79-9
email: Thomas.Doerfler-L1vi/***@public.gmane.org
PGP public key available on request

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Leon Pollak
2008-02-04 19:27:21 UTC
Permalink
Post by Thomas Doerfler
Leon,
Post by Leon Pollak
Post by Thomas Doerfler
Apart from this hint: maybe you can check/eliminate part of the
communication path, by directly connecting the PC and the RTEMS system
using a TP crossover cable.
I think this was the most productive idea..:)
To rework the cables was the problem, but I connected my PC and the box
via the old 10BaseT HUB and, meanwhile, I am not able to reproduce the
problem - nothing is lost!
When changing from 100Base-T Switch to a 10MBit Hub, you change several
- The bit rate is slower
- All connections work in half-duplex instead of full duplex
- This also means you MIGHT get collisions (which is implossible, I
think, in full duplex)
- The Hub is a really dumb "repeater", which multiplies a bitstream
coming in to all ports in realtime
It learns, which MAC addresses are available at which port (which source
MAC addresses come from a certain port), and directs any packet only to
the port which corresponds to the destination MAC address. So, before it
can route a packet to your system, your system must have sent a packet
with its own MAC address in the source MAC address field. Only then the
switch knows where it is connected.
And the switch forgets: When it has not received a certain MAC address
for a while (about 15 minutes, I guess), then it erases the proper MAC
address from its routing table.
Can you check, wheter the packet loss only occurs after about 15 minutes
of non-traffic with your system?
No! It is always lost after I do HW reset (via BDM). And never - when I
restart the program jumping to boot vector (after it was loaded after reset
via BDM and run once). All these runs are done in 2-3 minutes, not more.
Post by Thomas Doerfler
In that case it would be worth to put in a different (higher-quality)
switch, which does the learning faster. Actually I never herad about a
switch learning too slow, but ethernet and its protocols rely on the
fact, that packet may get lost, so the effect would not be obvious in
most cases.
Well, the switch theory is clear.
But isn't the first arp request enough for the switch to learn that RTEMS box
has that MAC address? Actually, even the first arp "box to itself" should
teach it were is it, no?
Besides, IMHO, this can not explain the loss of INCOMING packet after two
outgoing were sent and I saw them in my PC via the same switch.

Thanks!
--
Leon
Ian Caddy
2008-02-05 00:51:38 UTC
Permalink
Hi Leon,

Look like we are getting closer now... ;-)
Post by Leon Pollak
Post by Thomas Doerfler
Leon,
Post by Leon Pollak
Post by Thomas Doerfler
Apart from this hint: maybe you can check/eliminate part of the
communication path, by directly connecting the PC and the RTEMS system
using a TP crossover cable.
I think this was the most productive idea..:)
To rework the cables was the problem, but I connected my PC and the box
via the old 10BaseT HUB and, meanwhile, I am not able to reproduce the
problem - nothing is lost!
When changing from 100Base-T Switch to a 10MBit Hub, you change several
- The bit rate is slower
- All connections work in half-duplex instead of full duplex
- This also means you MIGHT get collisions (which is implossible, I
think, in full duplex)
- The Hub is a really dumb "repeater", which multiplies a bitstream
coming in to all ports in realtime
It learns, which MAC addresses are available at which port (which source
MAC addresses come from a certain port), and directs any packet only to
the port which corresponds to the destination MAC address. So, before it
can route a packet to your system, your system must have sent a packet
with its own MAC address in the source MAC address field. Only then the
switch knows where it is connected.
And the switch forgets: When it has not received a certain MAC address
for a while (about 15 minutes, I guess), then it erases the proper MAC
address from its routing table.
Can you check, wheter the packet loss only occurs after about 15 minutes
of non-traffic with your system?
No! It is always lost after I do HW reset (via BDM). And never - when I
restart the program jumping to boot vector (after it was loaded after reset
via BDM and run once). All these runs are done in 2-3 minutes, not more.
Post by Thomas Doerfler
In that case it would be worth to put in a different (higher-quality)
switch, which does the learning faster. Actually I never herad about a
switch learning too slow, but ethernet and its protocols rely on the
fact, that packet may get lost, so the effect would not be obvious in
most cases.
Well, the switch theory is clear.
But isn't the first arp request enough for the switch to learn that RTEMS box
has that MAC address? Actually, even the first arp "box to itself" should
teach it were is it, no?
Besides, IMHO, this can not explain the loss of INCOMING packet after two
outgoing were sent and I saw them in my PC via the same switch.
From your debugging it looks like it is something to do with the PHY
interaction with the particular switch.

Are you using an external PHY? If so, do you have the part number?

Does you firmware initialise the PHY at all or do you expect it to come
up in the correct state?

Is there an interrupt line from the PHY? If so, do you service this
interrupt line?

It is possible that something is happening with the PHY on the
completion of the negotiation or reception of the first packet, and it
may want to tell the MAC about it. I am just clutching at straws here
though... ;-)

regards,

Ian Caddy
--
Ian Caddy
Goanna Technologies Pty Ltd
+61 8 9444 2634
Leon Pollak
2008-02-05 09:47:35 UTC
Permalink
Post by Ian Caddy
Hi Leon,
Look like we are getting closer now... ;-)
I am sure that there are no undecidable problems in the world...:)
Post by Ian Caddy
Post by Leon Pollak
Well, the switch theory is clear.
But isn't the first arp request enough for the switch to learn that RTEMS
box has that MAC address? Actually, even the first arp "box to itself"
should teach it were is it, no?
Besides, IMHO, this can not explain the loss of INCOMING packet after two
outgoing were sent and I saw them in my PC via the same switch.
From your debugging it looks like it is something to do with the PHY
interaction with the particular switch.
Hmm... May be... Why do you think so?
Post by Ian Caddy
Are you using an external PHY?
Not sure what do you mean... The PHY is external to the MPC8247 CPU, but it is
on-board...:)
Post by Ian Caddy
If so, do you have the part number?
DP83849ID from National.
Post by Ian Caddy
Does you firmware initialise the PHY at all or do you expect it to come
up in the correct state?
The PHY does auto-negotiation. The spec says that this may take 2-3s after
reset. I give it about 14-16s as minimum and it is still not enough!?
Post by Ian Caddy
Is there an interrupt line from the PHY? If so, do you service this
interrupt line?
No, there is not, AFAIK.
Post by Ian Caddy
It is possible that something is happening with the PHY on the
completion of the negotiation or reception of the first packet, and it
may want to tell the MAC about it. I am just clutching at straws here
though... ;-)
Yes, I understand.
I read again through the spec and did not find anything suspicious. But I am
already old enough to know, that I read not what it written, but what I
expect(want) to read...:)
Post by Ian Caddy
regards,
Ian Caddy
Thanks for your willing to help.
--
Leon
Ian Caddy
2008-02-06 00:50:46 UTC
Permalink
Hi Leon,
Post by Leon Pollak
Post by Ian Caddy
From your debugging it looks like it is something to do with the PHY
interaction with the particular switch.
Hmm... May be... Why do you think so?
What I was trying to say here was that when you connect to a 10MB hub,
you have no problems, but when you connect through your switch you are
missing the first receive. Therefore it is an interaction problem
between the switch and the PHY causing the first packet to be lost.
Post by Leon Pollak
Post by Ian Caddy
Are you using an external PHY?
Not sure what do you mean... The PHY is external to the MPC8247 CPU, but it is
on-board...:)
By external, I meant to the CPU/MAC, this is normally the case, but you
never know what wonderful new processors are coming out these days. You
answered my question fine.
Post by Leon Pollak
Post by Ian Caddy
If so, do you have the part number?
DP83849ID from National.
I had a quick look at the datasheet and there are a couple of thing to
note. From a hardware side do you connect:

FX_EN
AN_EN
AN1
AN0

to anything or are they all left floating. If they are floating you
will get the hardware defaults which is auto-negotiate. This is just to
make sure the auto negotiation will go smoothly.
Post by Leon Pollak
Post by Ian Caddy
Does you firmware initialise the PHY at all or do you expect it to come
up in the correct state?
The PHY does auto-negotiation. The spec says that this may take 2-3s after
reset. I give it about 14-16s as minimum and it is still not enough!?
I noticed from your code that you don't seem to talk / init the PHY at
all anywhere in the code.

Can you talk to your PHY (you should be able to through the MII). There
are couple of really useful registers in the PHY, such as BMCR, BMSR,
PHYSTS which will give you most of the information you want out of the
PHY. If it is possible, could you get a copy of those values, before
you transmit your first packet, and then maybe again after you have
transmitted the first packet to see what is going on with the PHY?
Post by Leon Pollak
Post by Ian Caddy
Is there an interrupt line from the PHY? If so, do you service this
interrupt line?
No, there is not, AFAIK.
That is not as good, but it should still be possible to see what the
PHYSTS is, therefore the test mentioned above, to see exactly what the
PHY is doing around the time of the problem.

regards,

Ian Caddy
--
Ian Caddy
Goanna Technologies Pty Ltd
+61 8 9444 2634
Chris Johns
2008-02-06 05:09:53 UTC
Permalink
Post by Leon Pollak
Post by Ian Caddy
Does you firmware initialise the PHY at all or do you expect it to come
up in the correct state?
The PHY does auto-negotiation. The spec says that this may take 2-3s after
reset. I give it about 14-16s as minimum and it is still not enough!?
What does the PHY say is happening ? It is the simplest way to rule the PHY in
or out.

Experience with different PHY devices and switches has taught me to follow the
IEEE standard for the MII type PHY devices and to monitor the status registers
and act on the results. I am not saying this is your specific problem, rather
I would not expect the device to act automatically for you in all cases.

You should also monitor the PHY during operation. What happens if someone
pulls a cable, turns the switch off, or moves the cable to a port that is half
duplex and you are configured for full duplex ?

Implementing monitoring and support is more work but the results out way the
time involved.

Regards
Chris

Thomas Doerfler
2008-02-05 08:03:58 UTC
Permalink
Leon,
Post by Leon Pollak
Well, the switch theory is clear.
But isn't the first arp request enough for the switch to learn that RTEMS box
has that MAC address? Actually, even the first arp "box to itself" should
teach it were is it, no?
Besides, IMHO, this can not explain the loss of INCOMING packet after two
outgoing were sent and I saw them in my PC via the same switch.
Ok, this really points to the hardware reset at the "source" of the
problem, either inside the PHY or the MPC8260.

- Once again: After reset, the PHY has at least 5 seconds time for
auto-negotiation, before the ethernet traffic starts? This should be
ensured with the download time anyway...

- Can you possibly send me the source of the ethernet driver just to
double check, that everything is all right there?

wkr,
Thomas.
Post by Leon Pollak
Thanks!
--
--------------------------------------------
embedded brains GmbH
Thomas Doerfler Obere Lagerstr. 30
D-82178 Puchheim Germany
Tel. : +49-89-18 90 80 79-2
Fax : +49-89-18 90 80 79-9
email: ***@embedded-brains.de
PGP public key available on request

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Leon Pollak
2008-02-05 08:15:22 UTC
Permalink
Post by Thomas Doerfler
Leon,
Post by Leon Pollak
Well, the switch theory is clear.
But isn't the first arp request enough for the switch to learn that RTEMS
box has that MAC address? Actually, even the first arp "box to itself"
should teach it were is it, no?
Besides, IMHO, this can not explain the loss of INCOMING packet after two
outgoing were sent and I saw them in my PC via the same switch.
Ok, this really points to the hardware reset at the "source" of the
problem, either inside the PHY or the MPC8260.
- Once again: After reset, the PHY has at least 5 seconds time for
auto-negotiation, before the ethernet traffic starts? This should be
ensured with the download time anyway...
Yes! Download time is about 6s already. And then I put a delay of 7s before
opening the socket and still there were (very rarely) lost RX packets
Post by Thomas Doerfler
- Can you possibly send me the source of the ethernet driver just to
double check, that everything is all right there?
Of course, I can.
But I really feel very ashamed to make you so busy with my problems...:(

The code is attached.
Please, note that the driver is intended to work with all 3 FCC's which the
MPC8260 has, but now I test it with the first only.
Post by Thomas Doerfler
wkr,
Thomas.
No words to express my thankfulness.
--
Leon
Thomas Doerfler
2008-02-01 10:07:59 UTC
Permalink
Leon,
Post by Leon Pollak
Thomas, hello.
Sorry to turn to you solely - I think that nobody understands this like you,
if at all.
I am trying my best...
Post by Leon Pollak
Your last letter gave me the hint and all the day i studied and maid
1. After passing the initial problem, the driver (and the rest) works fine - I
did very extensive and heavy tests both on speed and load. So, the problem is
only in the first step.
2. The initial problem looks strange - neither driver nor controller do not
see a problem. I mean that when driver says that there were no interrupts,
controller does not say that the packets were. This is correct for both rx
and tx, as I discovered that also tx packets do not exit the unit!
3. Now, all this leads me to the following: as I am totally zero in PHYs, I
should like to ask you - is it possible that after reset the PHY is not ready
and thus some i/o that is done helps it to come to the working state?
Normally the PHY should take some seconds (2-3) to negotiate the
transfer parameters (10/100MBit, Full/Halfduplex...) with the switch.
Some questions:

When does the PHY come out of reset? Together with the processor reset?
Or do you need to pull a separate line by software? (this means that the
PHY comes out of reset later than the processor and auto.negotiation
might not yet be finished when you start the networking stack).

I agree with Ian, that the auto-negotiation time is not a hot candidate
for the problem, because I would expect that the TX side will work only,
when the RX side is ready, and obviously you can transmit properly.

I have discussed things with my collegue Peter. He gave the hint, that
maybe the receive interrupt is not handled properly.

Rx and Tx interupts are handled in the same interrupt handler. How did
you organize it? Did you properly implement the case, that you enter
ther interrupt handler (due to the packet transmitted) and you have a
transmit event AND a receive event at the same time? You should check this.

Apart from this hint: maybe you can check/eliminate part of the
communication path, by directly connecting the PC and the RTEMS system
using a TP crossover cable.

Well that's it, no more ideas from our side...

Good Luck!

Thomas.
Post by Leon Pollak
I looked into our "National 8349" PHY's manual and saw that it comes up in the
default auto-negotiation state. And I can see on my switch that it detects it
as 100BaseT line (and when I connected to the 10BasteT hub, it also
auto-configured itself). But may be this is insufficient for it and some
actions must be done?
As it is my first time with 100BaseT and PHY, I am lost...
Thank you very much for your willing to help...
Best Regards
Leon
Post by Thomas Doerfler
Leon,
Post by Leon Pollak
Hello, Ian.
Thank you for your detailed explanation - it is very useful to know the
way the stack works. Just it seems rather strange that packets are not
queued, but have only one depth back log...
But, to my pity, this is not the problem in my case - I have 3s pause
between sequential sends.
obviously we were looking in the wrong direction? Maybe you do not have
a transmit problem, but a receive problem. Your system sends an ARP
request, an ARP response is sent, but since your system repeats the ARP
request for the next packet, it did not receive the ARP response properly.
look at the networking statistics once again, after a "good" and after a
bad case. Maybe in the bad case one ARP packet was lost? Now the
question would be whether it was lost already on driver level, or it is
somewhat altered so it is not recognized properly in the network stack
as an ARP packet.
By the way (I like shooting arrows in the dark) how did you ensure, that
the cache and the networking SDMA work coherent (think about setting the
"GBL" bit in the "FCRx" Function code registers of the parameter ram.
wkr,
Thomas.
--
--------------------------------------------
embedded brains GmbH
Thomas Doerfler Obere Lagerstr. 30
D-82178 Puchheim Germany
Tel. : +49-89-18 90 80 79-2
Fax : +49-89-18 90 80 79-9
email: ***@embedded-brains.de
PGP public key available on request

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Continue reading on narkive:
Loading...