Discussion:
[virt-tools-list] VirtViewer and TCP Keepalives
Colin Coe
2013-09-03 03:09:12 UTC
Permalink
Hi all

I had a quick google and grepped through the v0.5.7 but don't see TCP
keepalives being used.

We're running RHEV 3.2 in a couple of sites and find that if we have a
SPICE session to a VM open but idle for for around 15 minutes, it
disconnects. This is annoying our users.

Is there a way to stop these idle timeouts? Would implementing TCP
keepalives be a solution?

Thanks

CC
--
RHCE#805007969328369
Marc-André Lureau
2013-09-03 15:20:23 UTC
Permalink
----- Original Message -----
Post by Colin Coe
Hi all
I had a quick google and grepped through the v0.5.7 but don't see TCP
keepalives being used.
We're running RHEV 3.2 in a couple of sites and find that if we have a SPICE
session to a VM open but idle for for around 15 minutes, it disconnects.
This is annoying our users.
Is there a way to stop these idle timeouts? Would implementing TCP keepalives
be a solution?
Since spice-gtk v0.10, sockets are set with keepalive.

Are you testing with a Windows client or on RHEL?

Can you tell more about the client & networking setup?

thanks
Colin Coe
2013-09-04 02:03:38 UTC
Permalink
Hi

The bulk of our users run Windows XP (32 bit) clients accessing either Win
XP or Win 7 guests. Myself and the other sys admin run Fedora 18 and
Fedora 19 desktops respectively. We all experience the time out problem.

Between the client PCs (Windows and Fedora), we have a pair of firewall
clusters of different brands. I am the primary administrator of the "inner"
firewall cluster but have no visibility of the outer firewall cluster.

Basically we have:

Client PCs --- Outer FW cluster --- Inner FW cluster --- RHEV-M and
RHEV-H (actually fat RHEL)

I have ensured that the FW cluster I maintain does not do timeouts on idle
connections. I can ssh into a RHEL VM, and leave the SSH session idle for
days without a disconnect. Once I get a SPICE session to a VM, if I don't
use it for about 15 minutes or so, it either freezes completely (the SPICE
session, not the VM) or disconnects with "unable to connect to the graphic
server"

Thanks

CC
Post by Marc-André Lureau
----- Original Message -----
Post by Colin Coe
Hi all
I had a quick google and grepped through the v0.5.7 but don't see TCP
keepalives being used.
We're running RHEV 3.2 in a couple of sites and find that if we have a
SPICE
Post by Colin Coe
session to a VM open but idle for for around 15 minutes, it disconnects.
This is annoying our users.
Is there a way to stop these idle timeouts? Would implementing TCP
keepalives
Post by Colin Coe
be a solution?
Since spice-gtk v0.10, sockets are set with keepalive.
Are you testing with a Windows client or on RHEL?
Can you tell more about the client & networking setup?
thanks
--
RHCE#805007969328369
Marc-André Lureau
2013-09-04 11:45:04 UTC
Permalink
----- Original Message -----
Hi
The bulk of our users run Windows XP (32 bit) clients accessing either Win XP
or Win 7 guests. Myself and the other sys admin run Fedora 18 and Fedora 19
desktops respectively. We all experience the time out problem.
Between the client PCs (Windows and Fedora), we have a pair of firewall
clusters of different brands. I am the primary administrator of the "inner"
firewall cluster but have no visibility of the outer firewall cluster.
Client PCs --- Outer FW cluster --- Inner FW cluster --- RHEV-M and RHEV-H
(actually fat RHEL)
I have ensured that the FW cluster I maintain does not do timeouts on idle
connections. I can ssh into a RHEL VM, and leave the SSH session idle for
days without a disconnect. Once I get a SPICE session to a VM, if I don't
use it for about 15 minutes or so, it either freezes completely (the SPICE
session, not the VM) or disconnects with "unable to connect to the graphic
server"
Have you checked the VM are not suspended or down? (virsh list --all)
thanks
Colin Coe
2013-09-04 12:19:02 UTC
Permalink
Hi

The VMs are fine, in a running state. If you close the spice session (kill
the window) and then open it again, it works fine until it is left idle
again for about 15 minutes.

Thanks

CC
Post by Marc-André Lureau
----- Original Message -----
Post by Colin Coe
Hi
The bulk of our users run Windows XP (32 bit) clients accessing either
Win XP
Post by Colin Coe
or Win 7 guests. Myself and the other sys admin run Fedora 18 and Fedora
19
Post by Colin Coe
desktops respectively. We all experience the time out problem.
Between the client PCs (Windows and Fedora), we have a pair of firewall
clusters of different brands. I am the primary administrator of the
"inner"
Post by Colin Coe
firewall cluster but have no visibility of the outer firewall cluster.
Client PCs --- Outer FW cluster --- Inner FW cluster --- RHEV-M and
RHEV-H
Post by Colin Coe
(actually fat RHEL)
I have ensured that the FW cluster I maintain does not do timeouts on
idle
Post by Colin Coe
connections. I can ssh into a RHEL VM, and leave the SSH session idle for
days without a disconnect. Once I get a SPICE session to a VM, if I don't
use it for about 15 minutes or so, it either freezes completely (the
SPICE
Post by Colin Coe
session, not the VM) or disconnects with "unable to connect to the
graphic
Post by Colin Coe
server"
Have you checked the VM are not suspended or down? (virsh list --all)
thanks
--
RHCE#805007969328369
Marc-André Lureau
2013-09-04 15:23:10 UTC
Permalink
Ok, I tried leaving a connection to guest with RHEL host, and I managed to caputre some debugging:

(remote-viewer:21306): GSpice-DEBUG: spice-channel.c:913 main-1:0: Read error Error receiving data: Connection timed out

(remote-viewer:21306): GSpice-CRITICAL **: recv hdr: Connection timed out
(remote-viewer:21306): GSpice-DEBUG: spice-channel.c:2136 got socket error: 25

then it closes remote-viewer


Could you open a bug?

thanks

----- Original Message -----
Hi
The VMs are fine, in a running state. If you close the spice session (kill
the window) and then open it again, it works fine until it is left idle
again for about 15 minutes.
Thanks
CC
----- Original Message -----
Hi
The bulk of our users run Windows XP (32 bit) clients accessing either Win XP
or Win 7 guests. Myself and the other sys admin run Fedora 18 and Fedora 19
desktops respectively. We all experience the time out problem.
Between the client PCs (Windows and Fedora), we have a pair of firewall
clusters of different brands. I am the primary administrator of the "inner"
firewall cluster but have no visibility of the outer firewall cluster.
Client PCs --- Outer FW cluster --- Inner FW cluster --- RHEV-M and RHEV-H
(actually fat RHEL)
I have ensured that the FW cluster I maintain does not do timeouts on idle
connections. I can ssh into a RHEL VM, and leave the SSH session idle for
days without a disconnect. Once I get a SPICE session to a VM, if I don't
use it for about 15 minutes or so, it either freezes completely (the SPICE
session, not the VM) or disconnects with "unable to connect to the graphic
server"
Have you checked the VM are not suspended or down? (virsh list --all)
thanks
--
RHCE#805007969328369
_______________________________________________
virt-tools-list mailing list
https://www.redhat.com/mailman/listinfo/virt-tools-list
Colin Coe
2013-09-04 23:02:12 UTC
Permalink
Hi

On a kind of related note, my colleague finds that around twice a day, his
SPICE session to his Windows 7 guest locks as if "Windows Key"+L was
pressed. He is using a Fedora 19 client.

Is there any debugging we can do to help track this down?

Thanks

CC
Post by Marc-André Lureau
Ok, I tried leaving a connection to guest with RHEL host, and I managed to
(remote-viewer:21306): GSpice-DEBUG: spice-channel.c:913 main-1:0: Read
error Error receiving data: Connection timed out
(remote-viewer:21306): GSpice-CRITICAL **: recv hdr: Connection timed out
(remote-viewer:21306): GSpice-DEBUG: spice-channel.c:2136 got socket error: 25
then it closes remote-viewer
Could you open a bug?
thanks
----- Original Message -----
Hi
The VMs are fine, in a running state. If you close the spice session
(kill
the window) and then open it again, it works fine until it is left idle
again for about 15 minutes.
Thanks
CC
----- Original Message -----
Post by Colin Coe
Hi
The bulk of our users run Windows XP (32 bit) clients accessing either
Win
Post by Colin Coe
XP
or Win 7 guests. Myself and the other sys admin run Fedora 18 and
Fedora 19
Post by Colin Coe
desktops respectively. We all experience the time out problem.
Between the client PCs (Windows and Fedora), we have a pair of firewall
clusters of different brands. I am the primary administrator of the
"inner"
Post by Colin Coe
firewall cluster but have no visibility of the outer firewall cluster.
Client PCs --- Outer FW cluster --- Inner FW cluster --- RHEV-M and
RHEV-H
Post by Colin Coe
(actually fat RHEL)
I have ensured that the FW cluster I maintain does not do timeouts on
idle
Post by Colin Coe
connections. I can ssh into a RHEL VM, and leave the SSH session idle
for
Post by Colin Coe
days without a disconnect. Once I get a SPICE session to a VM, if I
don't
Post by Colin Coe
use it for about 15 minutes or so, it either freezes completely (the
SPICE
Post by Colin Coe
session, not the VM) or disconnects with "unable to connect to the
graphic
Post by Colin Coe
server"
Have you checked the VM are not suspended or down? (virsh list --all)
thanks
--
RHCE#805007969328369
_______________________________________________
virt-tools-list mailing list
https://www.redhat.com/mailman/listinfo/virt-tools-list
--
RHCE#805007969328369
Marc-André Lureau
2013-09-04 23:23:18 UTC
Permalink
----- Original Message -----
Post by Colin Coe
Hi
On a kind of related note, my colleague finds that around twice a day, his
SPICE session to his Windows 7 guest locks as if "Windows Key"+L was
pressed. He is using a Fedora 19 client.
Is there any debugging we can do to help track this down?
Is this VM managed by rhevm agents & services?

There is nothing obvious from Spice that explains why Windows would lock automatically.
Post by Colin Coe
Thanks
CC
Ok, I tried leaving a connection to guest with RHEL host, and I managed to
(remote-viewer:21306): GSpice-DEBUG: spice-channel.c:913 main-1:0: Read error
Error receiving data: Connection timed out
(remote-viewer:21306): GSpice-CRITICAL **: recv hdr: Connection timed out
(remote-viewer:21306): GSpice-DEBUG: spice-channel.c:2136 got socket error: 25
then it closes remote-viewer
Could you open a bug?
thanks
----- Original Message -----
Hi
The VMs are fine, in a running state. If you close the spice session (kill
the window) and then open it again, it works fine until it is left idle
again for about 15 minutes.
Thanks
CC
----- Original Message -----
Post by Colin Coe
Hi
The bulk of our users run Windows XP (32 bit) clients accessing either Win
XP
or Win 7 guests. Myself and the other sys admin run Fedora 18 and Fedora 19
desktops respectively. We all experience the time out problem.
Between the client PCs (Windows and Fedora), we have a pair of firewall
clusters of different brands. I am the primary administrator of the "inner"
firewall cluster but have no visibility of the outer firewall cluster.
Client PCs --- Outer FW cluster --- Inner FW cluster --- RHEV-M and RHEV-H
(actually fat RHEL)
I have ensured that the FW cluster I maintain does not do timeouts on idle
connections. I can ssh into a RHEL VM, and leave the SSH session idle for
days without a disconnect. Once I get a SPICE session to a VM, if I don't
use it for about 15 minutes or so, it either freezes completely (the SPICE
session, not the VM) or disconnects with "unable to connect to the graphic
server"
Have you checked the VM are not suspended or down? (virsh list --all)
thanks
--
RHCE#805007969328369
_______________________________________________
virt-tools-list mailing list
https://www.redhat.com/mailman/listinfo/virt-tools-list
--
RHCE#805007969328369
_______________________________________________
virt-tools-list mailing list
https://www.redhat.com/mailman/listinfo/virt-tools-list
Marc-André Lureau
2013-09-04 23:26:10 UTC
Permalink
----- Original Message -----
Post by Colin Coe
Hi
On a kind of related note, my colleague finds that around twice a day, his
SPICE session to his Windows 7 guest locks as if "Windows Key"+L was
pressed. He is using a Fedora 19 client.
Is there any debugging we can do to help track this down?
Thanks
CC
Ok, I tried leaving a connection to guest with RHEL host, and I managed to
(remote-viewer:21306): GSpice-DEBUG: spice-channel.c:913 main-1:0: Read error
Error receiving data: Connection timed out
(remote-viewer:21306): GSpice-CRITICAL **: recv hdr: Connection timed out
(remote-viewer:21306): GSpice-DEBUG: spice-channel.c:2136 got socket error: 25
then it closes remote-viewer
Could you open a bug?
Actually, I can't reproduce: I realized a few minutes later that my wifi card is really crappy, and hangs randomly (I couldn't even connect after that error). I have switch device, and left a client in idle for over 2 hours now, and there is no error so far.
Post by Colin Coe
thanks
----- Original Message -----
Hi
The VMs are fine, in a running state. If you close the spice session (kill
the window) and then open it again, it works fine until it is left idle
again for about 15 minutes.
Thanks
CC
----- Original Message -----
Post by Colin Coe
Hi
The bulk of our users run Windows XP (32 bit) clients accessing either Win
XP
or Win 7 guests. Myself and the other sys admin run Fedora 18 and Fedora 19
desktops respectively. We all experience the time out problem.
Between the client PCs (Windows and Fedora), we have a pair of firewall
clusters of different brands. I am the primary administrator of the "inner"
firewall cluster but have no visibility of the outer firewall cluster.
Client PCs --- Outer FW cluster --- Inner FW cluster --- RHEV-M and RHEV-H
(actually fat RHEL)
I have ensured that the FW cluster I maintain does not do timeouts on idle
connections. I can ssh into a RHEL VM, and leave the SSH session idle for
days without a disconnect. Once I get a SPICE session to a VM, if I don't
use it for about 15 minutes or so, it either freezes completely (the SPICE
session, not the VM) or disconnects with "unable to connect to the graphic
server"
Have you checked the VM are not suspended or down? (virsh list --all)
thanks
--
RHCE#805007969328369
_______________________________________________
virt-tools-list mailing list
https://www.redhat.com/mailman/listinfo/virt-tools-list
--
RHCE#805007969328369
_______________________________________________
virt-tools-list mailing list
https://www.redhat.com/mailman/listinfo/virt-tools-list
Colin Coe
2013-09-04 23:50:02 UTC
Permalink
Hmm, OK

On the Windows locking problem, only my colleague has reported this and
yes, the VM does have the latest RHEV Tools installed (3.2-12).

On the timeout, this occurs for both Windows (XP,7,8,2008R2,2012) and Linux
(RHEL5/6) VMs. All we need do to replicate the problem is to leave the
SPICE session alone for 15 minutes or so. How should we go about debugging
this?

Thanks

CC
Post by Marc-André Lureau
----- Original Message -----
Post by Colin Coe
Hi
On a kind of related note, my colleague finds that around twice a day,
his
Post by Colin Coe
SPICE session to his Windows 7 guest locks as if "Windows Key"+L was
pressed. He is using a Fedora 19 client.
Is there any debugging we can do to help track this down?
Thanks
CC
Ok, I tried leaving a connection to guest with RHEL host, and I managed
to
Post by Colin Coe
(remote-viewer:21306): GSpice-DEBUG: spice-channel.c:913 main-1:0: Read
error
Post by Colin Coe
Error receiving data: Connection timed out
(remote-viewer:21306): GSpice-CRITICAL **: recv hdr: Connection timed out
(remote-viewer:21306): GSpice-DEBUG: spice-channel.c:2136 got socket
25
then it closes remote-viewer
Could you open a bug?
Actually, I can't reproduce: I realized a few minutes later that my wifi
card is really crappy, and hangs randomly (I couldn't even connect after
that error). I have switch device, and left a client in idle for over 2
hours now, and there is no error so far.
Post by Colin Coe
thanks
----- Original Message -----
Hi
The VMs are fine, in a running state. If you close the spice session
(kill
Post by Colin Coe
the window) and then open it again, it works fine until it is left idle
again for about 15 minutes.
Thanks
CC
----- Original Message -----
Post by Colin Coe
Hi
The bulk of our users run Windows XP (32 bit) clients accessing
either
Post by Colin Coe
Post by Colin Coe
Win
XP
or Win 7 guests. Myself and the other sys admin run Fedora 18 and
Fedora
Post by Colin Coe
Post by Colin Coe
19
desktops respectively. We all experience the time out problem.
Between the client PCs (Windows and Fedora), we have a pair of
firewall
Post by Colin Coe
Post by Colin Coe
clusters of different brands. I am the primary administrator of the "inner"
firewall cluster but have no visibility of the outer firewall
cluster.
Post by Colin Coe
Post by Colin Coe
Client PCs --- Outer FW cluster --- Inner FW cluster --- RHEV-M and RHEV-H
(actually fat RHEL)
I have ensured that the FW cluster I maintain does not do timeouts on idle
connections. I can ssh into a RHEL VM, and leave the SSH session
idle for
Post by Colin Coe
Post by Colin Coe
days without a disconnect. Once I get a SPICE session to a VM, if I
don't
Post by Colin Coe
Post by Colin Coe
use it for about 15 minutes or so, it either freezes completely (the SPICE
session, not the VM) or disconnects with "unable to connect to the graphic
server"
Have you checked the VM are not suspended or down? (virsh list --all)
thanks
--
RHCE#805007969328369
_______________________________________________
virt-tools-list mailing list
https://www.redhat.com/mailman/listinfo/virt-tools-list
--
RHCE#805007969328369
_______________________________________________
virt-tools-list mailing list
https://www.redhat.com/mailman/listinfo/virt-tools-list
--
RHCE#805007969328369
Marc-André Lureau
2013-09-05 01:07:15 UTC
Permalink
----- Original Message -----
Post by Colin Coe
Hmm, OK
On the Windows locking problem, only my colleague has reported this and yes,
the VM does have the latest RHEV Tools installed (3.2-12).
On the timeout, this occurs for both Windows (XP,7,8,2008R2,2012) and Linux
(RHEL5/6) VMs. All we need do to replicate the problem is to leave the SPICE
session alone for 15 minutes or so. How should we go about debugging this?
I am trying with current RHEL devel host&guest, connection from f19 client, over local wifi and can't reproduce so far.

We need to narrow the problem. Can you reproduce with a similar setup? Have you tried with f19 client? I don't think that should make any difference.

Can you try to get to a point where you don't see the problem? (for example, on local network, perhaps even on localhost)

Also, are the RHEL guest setup with RHEVM tools (I never installed those), could you try with bare VMs (without any rhevm or spice agent/drivers etc)

thanks
Colin Coe
2013-09-05 05:46:01 UTC
Permalink
All our RHEL guests (VMs) have the rhevm-guest-agent RPM installed and the
related service 'ovirt-guest-agent' running.

I run an F18 client and my colleague runs an F19 client, we both see the
problem. We don't have any Fedora guests.

We have no way of running up SPICE sessions in the VLAN that our RHEV-H
nodes are in, however I setup a SPICE session from between the inner and
outer firewalls. This session lasted two hours before it was dropped with
"TCP packet out of state: First packet isn't SYN". I'm pretty sure this
isn't virt-viewers fault though as we see this from time to time when the
firewall cluster changes active node.

I've asked the group that look after the other firewall to investigate.

Thanks

CC
Post by Marc-André Lureau
----- Original Message -----
Post by Colin Coe
Hmm, OK
On the Windows locking problem, only my colleague has reported this and
yes,
Post by Colin Coe
the VM does have the latest RHEV Tools installed (3.2-12).
On the timeout, this occurs for both Windows (XP,7,8,2008R2,2012) and
Linux
Post by Colin Coe
(RHEL5/6) VMs. All we need do to replicate the problem is to leave the
SPICE
Post by Colin Coe
session alone for 15 minutes or so. How should we go about debugging
this?
I am trying with current RHEL devel host&guest, connection from f19
client, over local wifi and can't reproduce so far.
We need to narrow the problem. Can you reproduce with a similar setup?
Have you tried with f19 client? I don't think that should make any
difference.
Can you try to get to a point where you don't see the problem? (for
example, on local network, perhaps even on localhost)
Also, are the RHEL guest setup with RHEVM tools (I never installed those),
could you try with bare VMs (without any rhevm or spice agent/drivers etc)
thanks
--
RHCE#805007969328369
Colin Coe
2013-09-12 03:28:32 UTC
Permalink
Hi

I've been talking to the group who look after the other firewall cluster
and they say they can find no reason for this behaviour.

After starting a spice session to a VM on host 172.x.y.z, I used wireshark
on my F18 desktop with this filter "ip.host == 172.x.y.z and
(tcp.analysis.keep_alive or tcp.analysis.keep_alive_ack)" and got back no
results.

ps -ef | grep vmname showed spice on ports 5924 and 5925. ip.host ==
172.x.y.z and (tcp.port ==5924 or tcp.port == 5925) shows tens of thousands
of rows.

As a test, I changed the filter to just "tcp.analysis.keep_alive or
tcp.analysis.keep_alive_ack" and I get many "TCP Keep-Alive" rows from
other hosts

Any ideas on this?

Thanks

CC
Post by Colin Coe
All our RHEL guests (VMs) have the rhevm-guest-agent RPM installed and the
related service 'ovirt-guest-agent' running.
I run an F18 client and my colleague runs an F19 client, we both see the
problem. We don't have any Fedora guests.
We have no way of running up SPICE sessions in the VLAN that our RHEV-H
nodes are in, however I setup a SPICE session from between the inner and
outer firewalls. This session lasted two hours before it was dropped with
"TCP packet out of state: First packet isn't SYN". I'm pretty sure this
isn't virt-viewers fault though as we see this from time to time when the
firewall cluster changes active node.
I've asked the group that look after the other firewall to investigate.
Thanks
CC
Post by Marc-André Lureau
----- Original Message -----
Post by Colin Coe
Hmm, OK
On the Windows locking problem, only my colleague has reported this and
yes,
Post by Colin Coe
the VM does have the latest RHEV Tools installed (3.2-12).
On the timeout, this occurs for both Windows (XP,7,8,2008R2,2012) and
Linux
Post by Colin Coe
(RHEL5/6) VMs. All we need do to replicate the problem is to leave the
SPICE
Post by Colin Coe
session alone for 15 minutes or so. How should we go about debugging
this?
I am trying with current RHEL devel host&guest, connection from f19
client, over local wifi and can't reproduce so far.
We need to narrow the problem. Can you reproduce with a similar setup?
Have you tried with f19 client? I don't think that should make any
difference.
Can you try to get to a point where you don't see the problem? (for
example, on local network, perhaps even on localhost)
Also, are the RHEL guest setup with RHEVM tools (I never installed
those), could you try with bare VMs (without any rhevm or spice
agent/drivers etc)
thanks
--
RHCE#805007969328369
--
RHCE#805007969328369
Colin Coe
2013-09-17 02:26:23 UTC
Permalink
Hi all

Any thoughts on this?

Thanks

CC
Post by Colin Coe
Hi
I've been talking to the group who look after the other firewall cluster
and they say they can find no reason for this behaviour.
After starting a spice session to a VM on host 172.x.y.z, I used wireshark
on my F18 desktop with this filter "ip.host == 172.x.y.z and
(tcp.analysis.keep_alive or tcp.analysis.keep_alive_ack)" and got back no
results.
ps -ef | grep vmname showed spice on ports 5924 and 5925. ip.host ==
172.x.y.z and (tcp.port ==5924 or tcp.port == 5925) shows tens of thousands
of rows.
As a test, I changed the filter to just "tcp.analysis.keep_alive or
tcp.analysis.keep_alive_ack" and I get many "TCP Keep-Alive" rows from
other hosts
Any ideas on this?
Thanks
CC
Post by Colin Coe
All our RHEL guests (VMs) have the rhevm-guest-agent RPM installed and
the related service 'ovirt-guest-agent' running.
I run an F18 client and my colleague runs an F19 client, we both see the
problem. We don't have any Fedora guests.
We have no way of running up SPICE sessions in the VLAN that our RHEV-H
nodes are in, however I setup a SPICE session from between the inner and
outer firewalls. This session lasted two hours before it was dropped with
"TCP packet out of state: First packet isn't SYN". I'm pretty sure this
isn't virt-viewers fault though as we see this from time to time when the
firewall cluster changes active node.
I've asked the group that look after the other firewall to investigate.
Thanks
CC
Post by Marc-André Lureau
----- Original Message -----
Post by Colin Coe
Hmm, OK
On the Windows locking problem, only my colleague has reported this
and yes,
Post by Colin Coe
the VM does have the latest RHEV Tools installed (3.2-12).
On the timeout, this occurs for both Windows (XP,7,8,2008R2,2012) and
Linux
Post by Colin Coe
(RHEL5/6) VMs. All we need do to replicate the problem is to leave the
SPICE
Post by Colin Coe
session alone for 15 minutes or so. How should we go about debugging
this?
I am trying with current RHEL devel host&guest, connection from f19
client, over local wifi and can't reproduce so far.
We need to narrow the problem. Can you reproduce with a similar setup?
Have you tried with f19 client? I don't think that should make any
difference.
Can you try to get to a point where you don't see the problem? (for
example, on local network, perhaps even on localhost)
Also, are the RHEL guest setup with RHEVM tools (I never installed
those), could you try with bare VMs (without any rhevm or spice
agent/drivers etc)
thanks
--
RHCE#805007969328369
--
RHCE#805007969328369
--
RHCE#805007969328369
Colin Coe
2013-09-17 03:23:45 UTC
Permalink
Just downloaded spice-gtk3 source. When I grep 'setsockopt', the only
instance I see is setting TCP_NODELAY. It seems that while spice-gtk may
[1] set SO_KEEPALIVE, spice-gtk3 does not.

Thanks

CC

[1] I downloaded
https://dl.fedoraproject.org/pub/fedora/linux/releases/18/Everything/source/SRPMS/s/spice-gtk-0.14-1.fc18.src.rpmand
extracted spice-gtk-0.14.tar.bz2 from it. grepping did not show any
instances of setsocket.
Post by Colin Coe
Hi all
Any thoughts on this?
Thanks
CC
Post by Colin Coe
Hi
I've been talking to the group who look after the other firewall cluster
and they say they can find no reason for this behaviour.
After starting a spice session to a VM on host 172.x.y.z, I used
wireshark on my F18 desktop with this filter "ip.host == 172.x.y.z and
(tcp.analysis.keep_alive or tcp.analysis.keep_alive_ack)" and got back no
results.
ps -ef | grep vmname showed spice on ports 5924 and 5925. ip.host ==
172.x.y.z and (tcp.port ==5924 or tcp.port == 5925) shows tens of thousands
of rows.
As a test, I changed the filter to just "tcp.analysis.keep_alive or
tcp.analysis.keep_alive_ack" and I get many "TCP Keep-Alive" rows from
other hosts
Any ideas on this?
Thanks
CC
Post by Colin Coe
All our RHEL guests (VMs) have the rhevm-guest-agent RPM installed and
the related service 'ovirt-guest-agent' running.
I run an F18 client and my colleague runs an F19 client, we both see
the problem. We don't have any Fedora guests.
We have no way of running up SPICE sessions in the VLAN that our RHEV-H
nodes are in, however I setup a SPICE session from between the inner and
outer firewalls. This session lasted two hours before it was dropped with
"TCP packet out of state: First packet isn't SYN". I'm pretty sure this
isn't virt-viewers fault though as we see this from time to time when the
firewall cluster changes active node.
I've asked the group that look after the other firewall to investigate.
Thanks
CC
Post by Marc-André Lureau
----- Original Message -----
Post by Colin Coe
Hmm, OK
On the Windows locking problem, only my colleague has reported this
and yes,
Post by Colin Coe
the VM does have the latest RHEV Tools installed (3.2-12).
On the timeout, this occurs for both Windows (XP,7,8,2008R2,2012) and
Linux
Post by Colin Coe
(RHEL5/6) VMs. All we need do to replicate the problem is to leave
the SPICE
Post by Colin Coe
session alone for 15 minutes or so. How should we go about debugging
this?
I am trying with current RHEL devel host&guest, connection from f19
client, over local wifi and can't reproduce so far.
We need to narrow the problem. Can you reproduce with a similar setup?
Have you tried with f19 client? I don't think that should make any
difference.
Can you try to get to a point where you don't see the problem? (for
example, on local network, perhaps even on localhost)
Also, are the RHEL guest setup with RHEVM tools (I never installed
those), could you try with bare VMs (without any rhevm or spice
agent/drivers etc)
thanks
--
RHCE#805007969328369
--
RHCE#805007969328369
--
RHCE#805007969328369
--
RHCE#805007969328369
Marc-André Lureau
2013-09-17 12:43:56 UTC
Permalink
----- Original Message -----
Just downloaded spice-gtk3 source. When I grep 'setsockopt', the only
instance I see is setting TCP_NODELAY. It seems that while spice-gtk may [1]
set SO_KEEPALIVE, spice-gtk3 does not.
spice-gtk uses glib, g_socket_set_keepalive(), it was added in 0.10:
http://cgit.freedesktop.org/spice/spice-gtk/commit/?id=8fe6547b6181fb7acbabedcd6ed95caf263dd8cc

This can be verified with strace:

***@anakao:~$ strace -e trace=setsockopt -- spicy -p 5900
setsockopt(7, SOL_SOCKET, SO_ATTACH_FILTER, "\10\0\0\0\0\0\0\0\240\202J\5\377\177\0\0", 16) = 0
setsockopt(7, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
setsockopt(15, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(15, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(18, SOL_SOCKET, SO_PRIORITY, [6], 4) = 0
setsockopt(18, SOL_SOCKET, SO_RCVBUF, [65472], 4) = 0
setsockopt(18, SOL_SOCKET, SO_SNDBUF, [65472], 4) = 0
setsockopt(18, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
setsockopt(19, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(19, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(20, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(20, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(21, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(21, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(22, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(23, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(23, SOL_TCP, TCP_NODELAY, [1], 4) = 0
Colin Coe
2013-09-17 22:54:14 UTC
Permalink
OK, I'm confused. Why don't I see the keep alives in the wireshark dump?
I'm happy to upload my pcap file to dropbox.redhat.com for analysis

Apologies for harping on about this but the users are getting annoyed with
their sessions freezing or terminating. Also, we've confirmed that the
exact timing is 30 minutes not 15 minutes

I had a fairly long and productive chat with one of the guys that manage
the other firewall cluster. I gave him the src and dst addresses and dst
port and he was able to advise on the session state on the firewall. If I
opened a spice session to a RHEL server (no X) and minimised it, all 4
sessions on the 'odd' port started counting down from 1800. If we leave
the session minimised for then we get the "Could not connect to graphic
server" or similar, can't remember exact wording. However, if I start a
spice session to a RHEL server (no X) and minimise/un-minimise it every few
minutes (but no keyboard interaction) only 3 of the 4 sessions on the odd
port continue counting down from 1800. Once they hit zero, the screen is
still visible but nothing I type appears on the screen and sending a
keystroke with 'Send Key' does noting.

While I've been writing this, I've had an strace running on a remote-viewer
session.
---
[***@fedora18 ~]$ strace -e trace=setsockopt -p 15504
Process 15504 attached
---
Colin Coe
2013-09-18 02:54:19 UTC
Permalink
Hi

Further to this, I did an strace on the virtualisation host then opened a
session to the spice console.

---
[***@host 09:44 ~]# ps -ef | grep vm_name
qemu 5438 1 10 Aug28 ? 2-06:17:57 /usr/libexec/qemu-kvm
-name vm_name -snipped-
root 30619 30384 0 09:44 pts/0 00:00:00 grep vm_name
[***@host 09:44 ~]# strace -e trace=setsockopt -p 5438
Process 5438 attached - interrupt to quit
setsockopt(40, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(40, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(41, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(41, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(42, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(42, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(42, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(43, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(42, SOL_TCP, TCP_NODELAY, [1], 4) = 0

^CProcess 5438 detached
[***@host 10:18 ~]#
---
(Please note that the full output is pasted into the GSS ticket 932217)

I left the spice session (using remote-viewer on F18) open (minimised and
idle) for a bit over 30 minutes. When I restored the window, it was blank
and not responding to keyboard. The strace showed no keep alives.

At this point I've strace'd on both sides of the firewalls, on both client
and server, I've done packet captures but seen no keep alives. I note that
you tested using 'spicy' not remote-viewer. Is it possible that
remote-viewer didn't include the keep alives? Is this an option that needs
to be enabled somewhere?

Thanks

CC
Post by Colin Coe
OK, I'm confused. Why don't I see the keep alives in the wireshark dump?
I'm happy to upload my pcap file to dropbox.redhat.com for analysis
Apologies for harping on about this but the users are getting annoyed with
their sessions freezing or terminating. Also, we've confirmed that the
exact timing is 30 minutes not 15 minutes
I had a fairly long and productive chat with one of the guys that manage
the other firewall cluster. I gave him the src and dst addresses and dst
port and he was able to advise on the session state on the firewall. If I
opened a spice session to a RHEL server (no X) and minimised it, all 4
sessions on the 'odd' port started counting down from 1800. If we leave
the session minimised for then we get the "Could not connect to graphic
server" or similar, can't remember exact wording. However, if I start a
spice session to a RHEL server (no X) and minimise/un-minimise it every few
minutes (but no keyboard interaction) only 3 of the 4 sessions on the odd
port continue counting down from 1800. Once they hit zero, the screen is
still visible but nothing I type appears on the screen and sending a
keystroke with 'Send Key' does noting.
While I've been writing this, I've had an strace running on a
remote-viewer session.
---
Process 15504 attached
---
Christophe Fergeau
2013-09-18 08:04:46 UTC
Permalink
Hey,
Post by Colin Coe
While I've been writing this, I've had an strace running on a remote-viewer
session.
---
Process 15504 attached
---
Colin Coe
2013-09-18 10:11:54 UTC
Permalink
Hi

How do I do this with RHEV guests? I'm using a F18 client and the SPICE
console option is to use the browser plugin. when I put it on native
client, Firefox downloaded and tried to run a .vv file.

Thanks

CC
Post by Marc-André Lureau
Hey,
Post by Colin Coe
While I've been writing this, I've had an strace running on a
remote-viewer
Post by Colin Coe
session.
---
Process 15504 attached
---
Christophe Fergeau
2013-09-18 10:55:07 UTC
Permalink
Post by Colin Coe
Hi
How do I do this with RHEV guests? I'm using a F18 client and the SPICE
console option is to use the browser plugin. when I put it on native
client, Firefox downloaded and tried to run a .vv file.
The .vv file contains a password valid for about 2 minutes, so you can
download it and quickly run strace ... remote-viewer foo.vv from the
command line.

Christophe
Colin Coe
2013-09-18 11:53:52 UTC
Permalink
So I've downloaded the file, noted the password and run "remote-viewer
spice://172.x.y.z:nnn" (where 172.x.y.z:nnn is shown in console.vv). I get
an error dialog with "Unable to connect to the graphic server
spice://172.x.y.z:nnn"

I must be missing the obvious.
Post by Christophe Fergeau
Post by Colin Coe
Hi
How do I do this with RHEV guests? I'm using a F18 client and the SPICE
console option is to use the browser plugin. when I put it on native
client, Firefox downloaded and tried to run a .vv file.
The .vv file contains a password valid for about 2 minutes, so you can
download it and quickly run strace ... remote-viewer foo.vv from the
command line.
Christophe
--
RHCE#805007969328369
Christophe Fergeau
2013-09-18 11:57:05 UTC
Permalink
Post by Colin Coe
So I've downloaded the file, noted the password and run "remote-viewer
spice://172.x.y.z:nnn" (where 172.x.y.z:nnn is shown in console.vv). I get
an error dialog with "Unable to connect to the graphic server
spice://172.x.y.z:nnn"
I must be missing the obvious.
Don't even look at the .vv file content, you only need to pass it as an
argument to remote-viewer:

strace ... remote-viewer yourvvfile.vv

Christophe
Colin Coe
2013-09-18 12:01:10 UTC
Permalink
Yep, tried that and get error "Cannot determine the connection type from
URI"

remote-viewer -V
remote-viewer version 0.5.4
Post by Christophe Fergeau
Post by Colin Coe
So I've downloaded the file, noted the password and run "remote-viewer
spice://172.x.y.z:nnn" (where 172.x.y.z:nnn is shown in console.vv). I
get
Post by Colin Coe
an error dialog with "Unable to connect to the graphic server
spice://172.x.y.z:nnn"
I must be missing the obvious.
Don't even look at the .vv file content, you only need to pass it as an
strace ... remote-viewer yourvvfile.vv
Christophe
--
RHCE#805007969328369
Christophe Fergeau
2013-09-18 12:11:22 UTC
Permalink
Post by Colin Coe
Yep, tried that and get error "Cannot determine the connection type from
URI"
remote-viewer -V
remote-viewer version 0.5.4
Ah hmm this is too old to be able to open the .vv file, support was added
in 0.5.5. I'd say you are missing some certificates for the SSL connection.
You can get the CA certificate from the IP of your RHEV/oVirt instance,
wget https://ovirt.example.com/ca.crt and then pass this to remote-viewer
using --spice-ca-file=ca.crt. You should also check if the .vv file defines
a host subject, and if so, pass it with --spice-host-subject.

Christophe
Colin Coe
2013-09-18 12:21:57 UTC
Permalink
I'll wait til Friday (not in the office tomorrow and working from home ATM)
and try from my colleagues PC (F19) as that has virt-viewer 0.5.6

Thanks
Post by Christophe Fergeau
Post by Colin Coe
Yep, tried that and get error "Cannot determine the connection type from
URI"
remote-viewer -V
remote-viewer version 0.5.4
Ah hmm this is too old to be able to open the .vv file, support was added
in 0.5.5. I'd say you are missing some certificates for the SSL connection.
You can get the CA certificate from the IP of your RHEV/oVirt instance,
wget https://ovirt.example.com/ca.crt and then pass this to remote-viewer
using --spice-ca-file=ca.crt. You should also check if the .vv file defines
a host subject, and if so, pass it with --spice-host-subject.
Christophe
--
RHCE#805007969328369
Colin Coe
2013-09-19 23:48:56 UTC
Permalink
OK, using my colleagues F19 PC, I was able to get an strace of
remote-viewer.

grepping for setsockopt on the strace output shows
---
30548 setsockopt(10, SOL_SOCKET, SO_ATTACH_FILTER,
"\10\0\0\0\0\0\0\0\20\204\26\363\377\177\0\0", 16) = 0
30548 setsockopt(10, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
30548 setsockopt(18, SOL_SOCKET, SO_ATTACH_FILTER,
"\10\0\0\0\0\0\0\0\260\202\26\363\377\177\0\0", 16) = 0
30548 setsockopt(18, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
30548 setsockopt(9, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
30548 setsockopt(9, SOL_TCP, TCP_NODELAY, [1], 4) = 0
30548 setsockopt(9, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
30548 setsockopt(9, SOL_TCP, TCP_NODELAY, [1], 4) = 0
30548 setsockopt(22, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
30548 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0
30548 setsockopt(22, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
30548 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0
30548 setsockopt(25, SOL_SOCKET, SO_KEEPALIVE, [1], 4 <unfinished ...>
30548 <... setsockopt resumed> ) = 0
30548 setsockopt(25, SOL_TCP, TCP_NODELAY, [1], 4 <unfinished ...>
30548 <... setsockopt resumed> ) = 0
30548 setsockopt(25, SOL_SOCKET, SO_KEEPALIVE, [1], 4 <unfinished ...>
30548 <... setsockopt resumed> ) = 0
30548 setsockopt(25, SOL_TCP, TCP_NODELAY, [1], 4 <unfinished ...>
30548 <... setsockopt resumed> ) = 0
30548 setsockopt(24, SOL_SOCKET, SO_KEEPALIVE, [1], 4 <unfinished ...>
30548 <... setsockopt resumed> ) = 0
30548 setsockopt(24, SOL_TCP, TCP_NODELAY, [1], 4 <unfinished ...>
30548 <... setsockopt resumed> ) = 0
30548 setsockopt(25, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
30548 setsockopt(25, SOL_TCP, TCP_NODELAY, [1], 4) = 0
30548 setsockopt(30, SOL_SOCKET, SO_ATTACH_FILTER,
"\10\0\0\0\0\0\0\0\300~\26\363\377\177\0\0", 16) = 0
30548 setsockopt(30, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
30548 setsockopt(27, SOL_SOCKET, SO_ATTACH_FILTER,
"\10\0\0\0\0\0\0\0\340z\26\363\377\177\0\0", 16) = 0
30548 setsockopt(27, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
---

So remote-viewer is sending keep alives...

Back to wondering why, after 30 minutes of inactivity, the SPICE sessions
become unusable in our environment.

CC
Post by Colin Coe
I'll wait til Friday (not in the office tomorrow and working from home
ATM) and try from my colleagues PC (F19) as that has virt-viewer 0.5.6
Thanks
Post by Christophe Fergeau
Post by Colin Coe
Yep, tried that and get error "Cannot determine the connection type from
URI"
remote-viewer -V
remote-viewer version 0.5.4
Ah hmm this is too old to be able to open the .vv file, support was added
in 0.5.5. I'd say you are missing some certificates for the SSL connection.
You can get the CA certificate from the IP of your RHEV/oVirt instance,
wget https://ovirt.example.com/ca.crt and then pass this to remote-viewer
using --spice-ca-file=ca.crt. You should also check if the .vv file defines
a host subject, and if so, pass it with --spice-host-subject.
Christophe
--
RHCE#805007969328369
--
RHCE#805007969328369
Christophe Fergeau
2013-09-20 08:32:58 UTC
Permalink
Post by Colin Coe
OK, I'm confused. Why don't I see the keep alives in the wireshark dump?
I'm happy to upload my pcap file to dropbox.redhat.com for analysis
Apologies for harping on about this but the users are getting annoyed with
their sessions freezing or terminating. Also, we've confirmed that the
exact timing is 30 minutes not 15 minutes
I had a fairly long and productive chat with one of the guys that manage
the other firewall cluster. I gave him the src and dst addresses and dst
port and he was able to advise on the session state on the firewall. If I
opened a spice session to a RHEL server (no X) and minimised it, all 4
sessions on the 'odd' port started counting down from 1800.
What is that counter? Is it the firewall keeping track of keep alive
packets, and dropping the connection when this second-based counter drops
to 0?

Christophe
Colin Coe
2013-09-20 09:04:48 UTC
Permalink
Yep. That's the session timeout count down on the firewall.

CC

---

Sent from my Galaxy Nexus
Post by Christophe Fergeau
Post by Colin Coe
OK, I'm confused. Why don't I see the keep alives in the wireshark dump?
I'm happy to upload my pcap file to dropbox.redhat.com for analysis
Apologies for harping on about this but the users are getting annoyed
with
Post by Colin Coe
their sessions freezing or terminating. Also, we've confirmed that the
exact timing is 30 minutes not 15 minutes
I had a fairly long and productive chat with one of the guys that manage
the other firewall cluster. I gave him the src and dst addresses and dst
port and he was able to advise on the session state on the firewall. If
I
Post by Colin Coe
opened a spice session to a RHEL server (no X) and minimised it, all 4
sessions on the 'odd' port started counting down from 1800.
What is that counter? Is it the firewall keeping track of keep alive
packets, and dropping the connection when this second-based counter drops
to 0?
Christophe
Christophe Fergeau
2013-09-20 11:12:20 UTC
Permalink
Hi,
Post by Colin Coe
After starting a spice session to a VM on host 172.x.y.z, I used wireshark
on my F18 desktop with this filter "ip.host == 172.x.y.z and
(tcp.analysis.keep_alive or tcp.analysis.keep_alive_ack)" and got back no
results.
I started a SPICE connection from my f19 system to a RHEL host and left
it running for a few hours along with wireshark, and I got a few keep alive
packets in the wireshark log. This is something that you can probably try
btw, create a VM which use SPICE on a box you can access 'directly'
(without going through the firewall), and see if you have the same issue
with keep alive.

Christophe

Loading...