Review of skin irritation/corrosion Hazards on the basis of human data: A regulatory perspective.

Regulatory classification of skin irritation has historically been based on rabbit data, however current toxicology processes are transitioning to in vitro alternatives. The in vitro assays have to provide sufficient level of sensitivity as well as specificity to be accepted as replacement methods for the existing in vivo assays. This is usually achieved by comparing the in vitro results to classifications obtained in animals. Significant drawback of this approach is that neither in vivo nor in vitro methods are calibrated against human hazard data and results obtained in these assays may not correspond to situation in human.The main objective of this review was to establish an extended database of substances classified according to their human hazard to serve for further development of alternative methods relevant to human health as well as resource for improved regulatory classification. The literature has been reviewed to assemble all the available information on the testing of substances in the human 4 h human patch test, which is the only standardized protocol in humans matching the exposure conditions of the regulatory accepted in vivo rabbit skin irritation test.A total of 81 substances tested according to the defined 4 h human patch test protocol were found and collated into a dataset together with their existing in vivo classifications published in the literature. While about 50% of the substances in the database are classified as irritating based on the rabbit skin test, on using the 4 h HPT test, less than 20% were identified as acutely irritant to human skin. Based on the presented data, it can be concluded that the rabbit skin irritation test largely over-predicts human responses for the evaluated chemicals. Correct classification of the acute skin irritation hazard will only be possible if newly developed in vitro toxicology methods will be calibrated to produce results relevant to man.


Introduction
One of the most important advances in regulatory toxicology has been the implementation of the Globally Harmonised System (GHS) for the identification, classification and labelling of substances, mixtures and preparations (United Nations-Economic Commission for Europe, 2009). The hazard associated with a single chemical substance or a mixture of 2 or more substances refers formulations. Hazard information from human studies is unfortunatley not available, since due to ethical reasons, testing in humans for clasification and labelling purposes is not accepted.
In the clinics, reports on acute skin irritation are rare; skin corrosion (chemical burns) do occur, but even so, the actual exposure is often hard to characterise. Utimately nevertheless, the value of any piece of toxicological work is the prediction of effects seen in exposed human population. To obtain controlled human acute skin irritation information, an alternative strategy involving a protocol for the use of human volunteers, the 4 h Human Patch Test (4 h HPT), to characterize skin irritation hazard has been developed and described extensively in the literature (Basketter, 1994;Basketter et al. 1994a,b;York et al. 1996;Robinson et al., 2001).
The 4 h HPT provides the opportunity to identify substances with significant skin irritation potential without recourse to the use of animals. It can be applied for the evaluation skin effects of single substances as well as mixtures and formulations (Robinson et al., 2005). The human skin irritation test is very similar to the regulatory accepted in vivo rabbit skin irritation test, but it is designed to limit the intensity of skin reactions in human volunteers. The value of the method is in 1) providing data for the identification of those substances or formulation which should or should not be classified as irritant, and 2) providing "gold standard" data for future validations of alternative/in vitro methods replacing the in vivo rabbit test for classification and labelling purposes in regulatory toxicology.
In the material that follows, the literature has been surveyed to permit the assembly of an extended catalogue of substances to which human subjects have been exposed using the 4 h HPT protocol. Only on very few occasions, substances appeared to possess a greater ability to generate irritant skin reactions than had been expected. More importantly, many more substances had only a very limited effect on skin. Consequently, it is essential that new in vitro toxicology tests are calibrated and whenever possible validated against human data rather than use information from in vivo rabbit assays obtained usually from outdated databases.

The 4 h human patch test -protocol
The human 4 h patch test has been described in complete detail in the literature (Basketter et al. 1994a;York et al., 1996;Robinson et al., 2001;2005). Briefly, the human patch test procedure involves application of 0.2 ml (0.2g for solid test materials) on a 25 mm plain Hill Top Chamber containing a Webril pad (Hill Top Companies, Cincinnati, Ohio, USA), moistened for solid test materials, to the skin of the upper outer arm of 30 human volunteers for up to 4 hours.
To avoid the production of unacceptably strong reactions, test materials are applied progressively from 15 and 30 minutes through 1, 2, 3 and 4 h. Each progressive application is at a new skin site. The shorter exposure periods can be omitted if the study directors are satisfied that excessive reactions will not occur following longer exposure. Treatment sites are assessed for the presence of irritation at 24, 48 and 72 h after patch removal. A volunteer with a reaction at any of the assessments is considered to have demonstrated a "positive" irritant reaction and treatment with the causative substance does not proceed on that person. For panellists with a "+" or greater response at application times of less than 4 h, it is assumed that they would present a stronger irritant reaction if exposed for 4 h. However, once a "+" or greater response is obtained, there is no need to subject these panellists to further treatment with that substance. In evaluating the results, what is measured is the number of panellists who had a positive "irritant" reaction after a 4-h exposure. If irritation reactions to the undiluted test substance is an significantly greater than or not significantly different (using Fisher's exact test) from the level of reaction in that same panel of volunteers to 20% SDS, the substance should be classified as irritant to skin (I); where the level of reaction is substantially and statistically significantly lower than the response to SDS, the substance is not classified (NC) . Very occasionally, where the response is significantly stronger (and faster to occur), e.g. to 0.5% NaOH, then the substance is suggested to be a potential corrosive (C).
In all the above mentioned studies, 20% sodium dodecyl sulfate (SDS) was used as positive control, for reasons that have been well documented (Basketter et al. 1994a;York et al., 1996;Robinson et al., 2001). A minimum of one third of the panel should react to SDS for the study to be regarded as valid, although exception may be made, e.g. when a large proportion of the panel react to the test substance.

Results
The results of the human 4 h patch tests conducted on 81 substances are presented in Table 1, together with their CAS numbers and experimental results. These data have been collated from three main publications (Robinson et al., 2001;Basketter et al., 2004;Jirova et al., 2010). Table 1 also reports the proportion of test subjects reacting to the test substance as well as their response to the concurrent 20% SDS positive control. From this information, the final column records how the materials should be classified on the basis of the human response. It is important to mention that use of the positive control in each experiment has compensated for the inevitable variation that occurs beween different human volunteer panels. Furthermore, it has already been demonstrated that the presence of atopicity, and factors such as gender, ethnicity, age, geography and season have no impact on the conclusions drawn from the results (Griffiths et al., 1996;Basketter et al., 1996a,b;McFadden et al., 1997;Robinson et al., 1998;1999;2001).    Based on in vivo rabbit tests, more than 50% of chemicals are classified as irritants in Table 1 (Robinson et al., 2001;Basketter et al., 2004;Jirova et al., 2010), wheras in the human patch test, using the classification critiera described earlier, only about 20% of the substances tested were identified as human irritants, with two possible corrosive classifications (#45 Lactic Acid; #67 0.5% Sodium Hydroxide).

Regulatory relevance of human data
According to the European CLP Regulation (Commission Regulation, 2009), classification of any substance or mixture should preferably be generated in accordance with the test methods referred in Regulation (EC) No.    , 2008a). Classification should be carried out on the basis of all relevant information on the hazards of the substance or mixture and there is an obligation to evaluate the quality of all available information.
It is important to keep in mind that the classification of a substance as irritant in existing in vivo protocols used for regulatory toxicology purposes reflects only a significant potential of substance for the production of an acute irritant effect. The cumulative irritant capability of a substance is not taken into account. Regulatory decision not to classify a substance, mixture or formulation does not by any means imply that the product is entirely free of any skin irritation potential, only that the level of irritant activity is likely not sufficient to trigger classification.
Although it is not allowed to test substances on humans for the purpose of CLP Regulation, the manufacturer, importer or downstream user should, for the purpose of classification, take into account all human data available, such as epidemiological studies on exposed populations, accidental or occupational exposure data, and clinical studies. That information should be compared with the criteria for the different hazard classes and differentiations, so that the manufacturer, importer or downstream user can arrive at a conclusion as to whether or not the substance or mixture should be classified as hazardous.
Reflecting on results presented in Table 1, the current classification decision strategy based on human 4 h patch test states that substance whose irritant capacity is significantly less than 20% SDS should not be classified. However, this conclusion might require some reconsideration. Under conditions, where a panel of volunteers is large and the statistical significance of Fisher´s exact test and final classification may be influenced, the provision could be included, that positive classification would normally occure if more than 20% of panellist reacted to the test substance, also considering the precautional principle for later accidental exposure in humans. In this case, a recommended number of panellists involved in the study should be defined.
The quality and relevance of existing human data for hazard assessment should always be critically reviewed. There may be a significant level of uncertainty in existing human data due to poor reporting and lack of specific information on exposure. Diagnosis confirmed by expert physicians may be missing. Confounding factors may not have been accounted for. Small group sizes may flaw the statistical strength of evidence and many other factors may compromise the validity of human data. In clinical and scientific studies the selection of individuals for the test and the control groups must be carefully considered. Any clinical studies may however contribute to the weight of evidence assessment with other available information such as existing data from animal or other experimental studies.
Importantly, when human data demonstrate hazards that have not been identified by animal studies, the animal results should be weighed against human data and expert judgement should be used to ensure the best protection of human health when evaluating both the animal and human data, as specified in Recital 28 of the CLP Regulation. Actually, the available data indicate that human skin is, in most cases, less sensitive than that of rabbits (Phillips et al., 1972;Nixon et al., 1975;Campbell & Bruce, 1981).
A critical review of the value of human studies is provided in IR/CSA Section R.4.3.3 and more specific considerations for the skin corrosion/irritation endpoint are given in IR/CSA Section R.7.2.4.2. IR/ CSA Guidance on Information Requirements and Chemical Safety Assessment, ECHA, 2008 (http:// guidance.echa.europa.eu/docs/guidance_document/ information_requirements_en.htm).

Use of human data for development of relevant in vitro assays
In vitro alternatives for the identification of skin irritation have been the subject of investigation and development in a considerable number of laboratories for many years (reviewed in Eskes et al., 2012;Welss et al., 2004;Gibbs, 2009), and these are now broadly accepted by regulatory authorities (ECHA, 2008b;Commission Regulation, 2009). These alternatives were established to recapitulate the results previously obtained from in vivo rabbit studies, which are very sensitive, however they poorly reflect human exposure scenarios and thus also human hazard (Phillips et al., 1972;Nixon et al., 1975;Campbell & Bruce, 1981).
Clinically, skin irritation is a type of dermatitis whose causation is complex and which involves repeated exposures to a range of noxious stimuli. Skin corrosion, where substances can cause burns and irreverisble damage is a much more clear cut situation. Because of the intensity of the skin responses to corrosive substances and the irreversibility of effects, correct prediction of corrosive effect is of great importance. Thus, incorrect classification of corrosive substances, either by the in vivo rabbit assay or by in vitro methods established on the rabbit based classification, remains a cause of some concern.
The data presented in Table 1 offer results with 81 substances which can be used to assess the ability of in vitro methods to predict accurately the acute skin irritation and corrosion potential of a range of substances. The results include two substances, lactic acid and 0.5% sodium hydroxide, which based on rabbit data were not thought to be potentially corrosive, but for which the results of the human study suggest corrosive classification. Correct classification of lactic acid and NaOH including their dilutions is of specific importance as they are used as ingredients in consumer products (cosmetics) for keratolytic purposes. It is of concern whether keratolysis based on skin corrosive effect for cosmetic purposes should be generally acceptable.

Conclusion
The retrospective evaluation of existing human data presented in this paper provides n unique opportunity to compare data on skin irritation hazard classification obtained with classic regulatory acccepted methods (i.e. the in vivo rabbit skin irritation test) with human data on hazard, with the ultimate aim to enhance the accuracy of the information on hazard contained in manufacturers' safety data sheets. The information presented in Table 1 can and should be used to develop alternative methods that provide classification and labelling that is most relevant to the true human hazard.