6. Data handling and analysis
Detailed instructions about coding and data transfer are found in Section 19.
6.1 Data quality and handling
6.1.1 Demographic data
Demographic data is the requested personal information for each individual participant. Ideally these questions should be well laid out, easy to complete and on the first page of the questionnaire (example section 7 & section 20.2). The questions ask for participants name, age, date of birth, school, gender of participant, the date of completing the questionnaire and optional questions on ethnicity. Similar questions are asked on the ADULT questionnaire. ‘Office Use Only’ boxes at the top of the first page allow the person conducting the survey to keep an account of the unique identification number for the participant and school as well as the number of times the questionnaire has been sent out. Further information is available in the field workers guide in sections 15, 16, 17 & 20. It is advisable to pre-code the questionnaires for each age group before printing and to pre-code the language of the questionnaire to enable an exact account to be kept of numbers of translated questionnaires. A list of coding numbers for translations can be found in section 20. If your language is not listed, please contact the GAN Global Centre (contact address section 21). Also refer section 20 for an example of coding for ‘Office Use Only’ boxes.
Where comparisons between ethnic groups are planned, each individual centre should follow the question on ethnicity used in the most recent Census of Populations for that centre/country.
The completed questionnaire should be carefully checked if possible at the time of conducting the survey (for the older age group) or as soon as possible after collecting the questionnaires from the school. Any obvious errors with the demographic data should be corrected by obtaining the information from the schools. Any changes made to the demographic data must be well documented, dated and signed by the person making the changes (see example section 20.7).
6.1.2 Core questionnaires
The data for asthma, rhinitis and eczema must be entered on to the computer exactly as it is presented in the questionnaire and must not be changed under any circumstances, even if the information is inconsistent. If for some unique reason a questionnaire is altered, a copy of the data should be made before the changes and a record kept as to the reason why this change was made. It is vital that the original data is available to the GAN Global Centre. The questionnaire must not be altered for consistency between the stem and following questions. If some questions are left blank on a particular questionnaire, it will be at the discretion of the GAN Global Centre as to whether that questionnaire is excluded. The Coding and Data Transfer Section gives instructions on data handling, data entry and submission to the GAN Global Centre.
6.1.3 Data entry
Each centre is responsible for coding its own data and data entry, although in some regions/countries, one centre may take responsibility for this. It is an expectation that at least 10% of the Data will be double entered. This will allow researchers to gauge the number of mistakes being made with data entry. Double entry is a common method of data entry that minimises data entry errors and is the expected method of data entry for the Global Asthma Network. The data is entered two times, preferably by two different people. The two versions of the data set are compared and any differences checked against the original questionnaire. Dedicated data entry software such as SPSS (Statistical Package for the Social Sciences) will allow the comparison between the first and second entry to occur as the second entry is made. Any inconsistencies can be resolved at that time based on the original questionnaire. If there are too many mistakes in the double entered sample, the full data set should be double entered. If alternative methods are planned, these should be discussed in advance with the GAN Global Centre.
Epi-Info is a free epidemiological software package distributed by the Centers for Disease Control and Prevention, and may be downloaded from http://www.cdc.gov/epiinfo/. Since 2000, though, the Epi-Info package has not included the capability for immediate comparison of double entered data. However, it does include a number of useful statistical functions.
Some centres may wish to use questionnaire scanning software such as OMR (Optical Mark Recognition) for data entry. This is acceptable but if so procedures to deal with data entry errors must be documented and sent to the GAN Global Centre. The scanning software should also scan and keep an image of the questionnaire so that it can be checked when an error appears and manually corrected if necessary. The questionnaires may need specific preparations to be suitable for being read by a scanner. Copies of the paper questionnaire used must be provided to the GAN Global Centre. The name of the software and its manufacturer, and documentation describing the software should be sent to the GAN Global Centre, and/or a website address for the documentation. The software should have the ability to export the data set as a .CSV file.
The minimum requirements for questionnaire scanning software are:
- A questionnaire layout which facilitates the scanning procedure: e.g. a large margin separating the text from the marking boxes
- High quality BLACK printing of questionnaires, to avoid movements of the text, even half a millimetre.
- A software package which detects any marking errors and allows for comparisons with the scanned questionnaire (as if it were the real paper) and manual error correction.
The questionnaires must be kept for a minimum period according to local Ethics Committee requirements to allow cross checking against the computer record, if this should be necessary.
Data is to be sent to the GAN Global Centre as detailed in the data and coding transfer section and collaborators will be sent an acknowledgement of receipt of data. Please check this occurs, because email can occasionally be missed. The GAN Global Centre will then forward the data onto the appropriate Data Centre and a report will be generated. This report will provide a summary of the data checks and will identify areas where a response is requested from the collaborating centre. This data checking process must be completed before centre data will be included in the analysis for publications of the Global Asthma Network. At the GAN Global Centre, centre data will be entered onto a PC with the necessary statistical analysis capabilities and a copy of the data will be kept off site in a protected environment.
6.1.4 Satisfactory data set
To be included in the Global Asthma Network publications, centres must provide a complete data set and Centre Report to the GAN Global Centre. The data and the Centre Report will then undergo a checking process by the GAN Global Centre in conjunction with each centre and the data centres. A satisfactory data set is one that has complied with the data and methodology checks to the required standard of the Global Asthma Network Steering Group.
The data analysis will be undertaken at either London UK or Murcia Spain (see the data and coding transfer section and section 21 for further details). Each group of participants will be treated separately: 13/14 year olds, 6/7 year olds, and the Adults. Each parameter of prevalence and severity will be compared between locations. The cluster effect is not expected to be great, but will be adjusted for in the analysis.
The primary aim is to obtain internationally comparable estimates of the direction and magnitude of change in prevalence of symptoms of asthma, rhinoconjunctivitis and eczema as well as new data on asthma management and the environment.
- provide estimates of the direction and magnitude of the prevalence of symptoms of asthma and other allergies
- allow ecological studies of these trends
- allow associations with risk and protective factors
Comparisons of prevalence rates between different centres will be made using appropriate statistical methods. Crude rates can be compared by using contingency tables or logistic regression. For both prevalence analysis and management questions analysis, comparisons of standardised rates or data that needs controlling for confounding will involve multivariate logistic regression.
6.3 Ownership of data
Each centre owns their data. The collaborating centres will be recognised by the group title “Global Asthma Network Study Group”. All publications and communications involving international comparisons will have a named writing group “and the Global Asthma Network Study Group”. All Principal Investigators whose data is included in any publication will be listed and acknowledged in the appendix of publications of the worldwide data.
Each centre may publish its own data without the approval of the Global Asthma Network. However, the GAN Global Centre should receive a copy of any independent publications to archive and to publish on the Global Asthma Network website. All publications and communications arising from comparisons of more than five centres in different countries require the approval and authorisation of the Global Asthma Network Steering Group.