Project

General

Profile

Actions

Feature #2734

closed

Setting multiple response headers with same name

Added by jornane over 7 years ago. Updated about 1 month ago.

Status:
Fixed
Priority:
Normal
Category:
mod_magnet
Target version:
-
ASK QUESTIONS IN Forums:
No

Description

I want to set multiple response headers with the same name.

I've tried the following:

setenv.add-response-header += (
        "X-Robots-Tag" => "googlebot: noindex, nofollow",
        "X-Robots-Tag" => "mediapartners-google: noindex, nofollow",
        "X-Robots-Tag" => "facebookexternalhit: noindex, nofollow",
)
This gave an error.

I've tried:

setenv.add-response-header += (
        "X-Robots-Tag" => ("googlebot: noindex, nofollow", "mediapartners-google: noindex, nofollow", "facebookexternalhit: noindex, nofollow")
)
This also gave an error.

I've tried:

setenv.add-response-header += (
        "X-Robots-Tag" => "googlebot: noindex, nofollow",
)
setenv.add-response-header += (
        "X-Robots-Tag" => "mediapartners-google: noindex, nofollow",
)
setenv.add-response-header += (
        "X-Robots-Tag" => "facebookexternalhit: noindex, nofollow",
)
This didn't give an error, but it just concatinated all of them in one header and put commas between them.

What is the correct way to do this?

Actions #1

Updated by gstrauss over 7 years ago

According to the RFC, headers can (and should) use list syntax and be combined into a single line.
See https://tools.ietf.org/html/rfc7230#section-3.2 Header Fields
The only exception noted by the RFC is the historical (late 1990s) Set-Cookie header.

A quick look at https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag suggests that X-Robots-Tag is also non-compliant and subject to corruption by proxies which, following RFC recommendations, might combine them into a single line.

<sigh> RFCs exist for many reasons, including the attempt to define conformance and avoid messes like this.

Actions #2

Updated by jornane over 7 years ago

gstrauss wrote:

A quick look […] suggests that X-Robots-Tag is also non-compliant

Since the RFC states that "A sender MUST NOT generate multiple header fields" and "A recipient MAY combine multiple header fields", it seems that the lighttpd implementation is indeed RFC-compliant. I don't know how X-Robots-Tag will behave when concatinated, but I understand that it would be incorrect to ask lighttpd to change its implementation.

Thank you for your explaination, gstrauss!

I don't seem to be able to close this ticket myself, so if someone else could be so kind?

Actions #3

Updated by gstrauss over 7 years ago

A partial workaround is to set the header based on the User-Agent, adjusting the user-agent string as appropriate since I made up the ones below without verifying them.

$HTTP["user-agent"] =~ "googlebot" {
    setenv.add-response-header += (
            "X-Robots-Tag" => "googlebot: noindex, nofollow",
    )
else $HTTP["user-agent"] =~ "mediapartners-google" {
    setenv.add-response-header += (
            "X-Robots-Tag" => "mediapartners-google: noindex, nofollow",
    )
}
else $HTTP["user-agent"] =~ "facebook" {
    setenv.add-response-header += (
            "X-Robots-Tag" => "facebookexternalhit: noindex, nofollow",
    )
}

That won't work for all cases since 'unavailable_after: ...' looks like it is expected to be in its own header. Then again, implementation details will change depending on browser and crawler.

Actions #4

Updated by gstrauss over 7 years ago

  • Status changed from New to Wontfix

RFC7230 3.2 Header Fields

[...]

A sender MUST NOT generate multiple header fields with the same field
name in a message unless either the entire field value for that
header field is defined as a comma-separated list [i.e., #(values)]
or the header field is a well-known exception (as noted below).

A recipient MAY combine multiple header fields with the same field
name into one "field-name: field-value" pair, without changing the
semantics of the message, by appending each subsequent field value to
the combined field value in order, separated by a comma. [...]

The "unless" section is important for the sender, as that is what allows the recipient to safely combine the lines.

In any case, X-Robot-Tag is not compliant.

I am going to mark this as "Won't Fix", but if there was sufficient clamoring (I am subscribed to this ticket), I might consider a patch to mod_setenv which allowed \r\n in the strings, so that someone could say "I really mean this" with setenv.add-response-header += ( "X-Robots-Tag" => ("googlebot: noindex, nofollow\r\nX-Robots-Tag: mediapartners-google: noindex, nofollow\r\nX-Robots-Tag: facebookexternalhit: noindex, nofollow") ) (Note, the previous statement do not work since it is not implemented, and is not very enduser-friendly)

Actions #5

Updated by gstrauss over 7 years ago

Beware of the (very unfortunate) limitations in lighttpd 1.4.x config parsing if you use the workaround above and have multiple setenv.add-response-header directives.

See https://redmine.lighttpd.net/boards/2/topics/6541 for further explanation.

Actions #6

Updated by stbuehler over 7 years ago

  • Target version deleted (1.4.x)
Actions #7

Updated by gstrauss 3 months ago

  • ASK QUESTIONS IN Forums set to No

mod_magnet magnet.attract-response-start-to (since lighttpd 1.4.56) can support a lua script to implement this logic.

Actions #8

Updated by gstrauss about 1 month ago

  • Category changed from mod_setenv to mod_magnet
  • Status changed from Wontfix to Fixed

mod_magnet magnet.attract-response-start-to (since lighttpd 1.4.56) can support a lua script to implement this logic.

Actions

Also available in: Atom