Discussion:
problem: XMLDOM object fails, XMLHTTP succeeds in downloading same data.
(too old to reply)
R.Wieser
2016-10-22 06:50:51 UTC
Permalink
Hello All,

I'm trying to use the Microsoft.XMLDOM object to grab some data from an RSS
page. The link returns an error and no data is available (upon further
inspection with a "raw download" tool it seems to be a 403 - forbidden --
even though the sought-for data follows it). However, when I use the
Microsoft.XMLHTTP object everything goes fine.

Can anyone explain what the difference between those two might be, as well
as how to enable the XMLDOM object to retrieve the data ?

Regards,
Rudy Wieser
R.Wieser
2016-10-22 07:16:32 UTC
Permalink
However, when I use the Microsoft.XMLHTTP object
everything goes fine.
Update: Checking the objects status just now reveals it also returned a
403 - forbitten result. Together with the requested data (odd).

Regards,
Rudy Wieser
Hello All,
I'm trying to use the Microsoft.XMLDOM object to grab some data from an RSS
page. The link returns an error and no data is available (upon further
inspection with a "raw download" tool it seems to be a 403 - forbidden --
even though the sought-for data follows it). However, when I use the
Microsoft.XMLHTTP object everything goes fine.
Can anyone explain what the difference between those two might be, as well
as how to enable the XMLDOM object to retrieve the data ?
Regards,
Rudy Wieser
JJ
2016-10-22 11:52:08 UTC
Permalink
Post by R.Wieser
However, when I use the Microsoft.XMLHTTP object
everything goes fine.
Update: Checking the objects status just now reveals it also returned a
403 - forbitten result. Together with the requested data (odd).
Seems like a server problem rather than client.
What happen if you retrieve the resource using a download manager?
R.Wieser
2016-10-22 12:24:47 UTC
Permalink
JJ,
Post by JJ
Seems like a server problem rather than client.
The 403 is definitily a server problem. But both my "raw download" tool as
well as XMLHttp download the full XML data. XMLDOM however returns a
800C00008 (could not download) as parseError.errorCode, which, if I may take
the other two full downloads into account, simply isn't true.

However, when I just threw the downloaded XML data at the XMLDoc object
(loadXML) I found out that the encoding of the document wasn't acceptable.

Regards,
Rudy Wieser
Post by JJ
Post by R.Wieser
However, when I use the Microsoft.XMLHTTP object
everything goes fine.
Update: Checking the objects status just now reveals it also returned a
403 - forbitten result. Together with the requested data (odd).
Seems like a server problem rather than client.
What happen if you retrieve the resource using a download manager?
R.Wieser
2016-10-22 14:04:53 UTC
Permalink
Post by R.Wieser
However, when I just threw the downloaded XML data at the
XMLDoc object (loadXML) I found out that the encoding of
the document wasn't acceptable.
I download the raw data using the XMLHTTP object, remove the encoding spec
string from the starting "<?xml" element, and than load the resulting raw
data into the XMLDOM object. Might not be the cleanest solution, but it
works alright.

Regards,
Rudy Wieser
Post by R.Wieser
JJ,
Post by JJ
Seems like a server problem rather than client.
The 403 is definitily a server problem. But both my "raw download" tool as
well as XMLHttp download the full XML data. XMLDOM however returns a
800C00008 (could not download) as parseError.errorCode, which, if I may take
the other two full downloads into account, simply isn't true.
However, when I just threw the downloaded XML data at the XMLDoc object
(loadXML) I found out that the encoding of the document wasn't acceptable.
Regards,
Rudy Wieser
Post by JJ
Post by R.Wieser
However, when I use the Microsoft.XMLHTTP object
everything goes fine.
Update: Checking the objects status just now reveals it also returned a
403 - forbitten result. Together with the requested data (odd).
Seems like a server problem rather than client.
What happen if you retrieve the resource using a download manager?
JJ
2016-10-23 05:55:49 UTC
Permalink
Post by R.Wieser
Post by R.Wieser
However, when I just threw the downloaded XML data at the
XMLDoc object (loadXML) I found out that the encoding of
the document wasn't acceptable.
I download the raw data using the XMLHTTP object, remove the encoding spec
string from the starting "<?xml" element, and than load the resulting raw
data into the XMLDOM object. Might not be the cleanest solution, but it
works alright.
I'm not well aware of how standard-compliant Microsoft XML parser is, but I
do know that an XML parser (any XML parser) is very strict.

So, unless the retrieved XML actually has a flaw (try check it with online
XML validator), Microsoft XML parser might be the one that failed.
R.Wieser
2016-10-23 07:00:05 UTC
Permalink
JJ,
So, unless the retrieved XML actually has a flaw ... Microsoft XML parser
might be the one that failed.

They both seem to be OK. It was just the encoding spec which (after the
403 HTTP status) threw a(nother) wrench into it all. After removing that
the XMLDOM parser has no problems with it.

Regards,
Rudy Wieser
Post by R.Wieser
Post by R.Wieser
However, when I just threw the downloaded XML data at the
XMLDoc object (loadXML) I found out that the encoding of
the document wasn't acceptable.
I download the raw data using the XMLHTTP object, remove the encoding spec
string from the starting "<?xml" element, and than load the resulting raw
data into the XMLDOM object. Might not be the cleanest solution, but it
works alright.
I'm not well aware of how standard-compliant Microsoft XML parser is, but I
do know that an XML parser (any XML parser) is very strict.
So, unless the retrieved XML actually has a flaw (try check it with online
XML validator), Microsoft XML parser might be the one that failed.
JJ
2016-10-24 07:09:28 UTC
Permalink
Post by R.Wieser
They both seem to be OK. It was just the encoding spec which (after the
403 HTTP status) threw a(nother) wrench into it all. After removing that
the XMLDOM parser has no problems with it.
The XML tag defines which XML specifications used by the document. So, if
you remove it, the parser would be less stricter.

Have you checked the original XML data with an XML validator? With the XML
tag still intact.
R.Wieser
2016-10-24 07:50:47 UTC
Permalink
JJ,
Post by JJ
The XML tag defines which XML specifications used by the document.
So, if you remove it, the parser would be less stricter.
Correct. But I'm only removing the character-encoding spec*.

Its a choice between the XML parser rejecting the whole document because it
*might* encounter a multi-byte UTF-8 char, or getting an XML structure back
with (possibly) some non-tag text(!) chars that are not ASCII.

*you made me aware that currently I indeed pretty-much remove the whole spec
(as part of a quick test), where I should only, if present, remove the UTF-8
encoding part of it. Going to fix that now.

Regards,
Rudy Wieser
Post by JJ
Post by R.Wieser
They both seem to be OK. It was just the encoding spec which (after the
403 HTTP status) threw a(nother) wrench into it all. After removing that
the XMLDOM parser has no problems with it.
The XML tag defines which XML specifications used by the document. So, if
you remove it, the parser would be less stricter.
Have you checked the original XML data with an XML validator? With the XML
tag still intact.
R.Wieser
2016-10-24 09:20:14 UTC
Permalink
Ackkk .... I was too quick in my "that must be it" conclusion. :-\

It turns out that *any* XML data which specifies the UTF-8 encoding (have
not found any other) throws a 0xC00CE56F error (encoding switching not
permitted).

But, and thats the odd part, only when loading the XML data as text (loadXML
method), but not when I let it download the data by providing an URL (load
method -- might have something to do with the in the HTTP-header returned
content-type though).

Also, all documents I've found tell me that UTF-8 is the default, so I
really don't get why it even thinks it would need to switch encoding.

A quick peek at the web does not show any way to either set or get the
objects current encoding (when nothing has been loaded yet) ...

Oh well, removing it from the initial "?xml" tag works wonders. :-)

Regards,
Rudy Wieser
Post by R.Wieser
JJ,
Post by JJ
The XML tag defines which XML specifications used by the document.
So, if you remove it, the parser would be less stricter.
Correct. But I'm only removing the character-encoding spec*.
Its a choice between the XML parser rejecting the whole document because it
*might* encounter a multi-byte UTF-8 char, or getting an XML structure back
with (possibly) some non-tag text(!) chars that are not ASCII.
*you made me aware that currently I indeed pretty-much remove the whole spec
(as part of a quick test), where I should only, if present, remove the UTF-8
encoding part of it. Going to fix that now.
Regards,
Rudy Wieser
Post by JJ
Post by R.Wieser
They both seem to be OK. It was just the encoding spec which (after
the
Post by JJ
Post by R.Wieser
403 HTTP status) threw a(nother) wrench into it all. After removing
that
Post by JJ
Post by R.Wieser
the XMLDOM parser has no problems with it.
The XML tag defines which XML specifications used by the document. So, if
you remove it, the parser would be less stricter.
Have you checked the original XML data with an XML validator? With the XML
tag still intact.
Loading...