CC BY 4.0 · Brazilian Journal of Oncology 2024; 20: s00441787969
DOI: 10.1055/s-0044-1787969
Review Article

From Data to Manuscript: A Strategy for Young Oncologists to Write a Scientific Paper

1   Faculdade de Medicina do ABC, Sando André, São Paulo, Brazil
,
1   Faculdade de Medicina do ABC, Sando André, São Paulo, Brazil
,
2   Sociedade Brasileira de Oncologia Clínica, São Paulo, São Paulo, Brazil
› Author Affiliations
 

Abstract

Scientific manuscripts are the basis for the transmission of scientific data among physicians in all fields of medicine. To teach young oncologists the skills needed to author a paper, we decided to emulate how experienced clinicians perform this task. The first step is to create a spreadsheet with all the clinical data gathered and submit it to a statistical analysis using a statistical software package. The most important results are presented in the graphs and tables. The results should be explained in a logical and understandable manner. Writing the “Materials and Methods” section follows, with all the technical information that any other researcher may need to reproduce elsewhere the study in question. A critical-thinking stage, in which a review of the pertinent literature is conducted with the use of a reference management software, should provide all the knowledge and questions to write the “Introduction” and “Discussion” sections. The “Abstract” and “Title” are the final sections to be created. Following these steps, the author can correct the first draft of the manuscript for submission to a specific journal. Choosing the right journal and answering the reviewers' comments are also important steps in this process. Even if a young oncologist does not embark on an academic career, learning how to write a scientific manuscript is believed to be the best way to teach them how to read such manuscripts during their lifelong continuous self-education.


#

Introduction

Manuscripts form the basis of the transmission of scientific data among physicians in all fields of medicine. To devise a practical way to teach young oncologists the skills needed to author a paper, it is necessary to consider the usual methods more experienced writers[1] [2] use to perform this task.

In a different approach, we proposed to start with a spreadsheet containing all gathered clinical data and proceed with the statistical analysis. Based on the results, the literature is reviewed, and a proper introduction and discussion are provided. In the literature review, the idea is to search for interesting findings or unexpected results, which offer the opportunity to formulate new thoughts.

To exemplify this writing strategy in practice, only free software packages were used, including statistical software (Jeffreys's Amazing Statistics Program, JASP), a reference management system (RMS; Zotero), a word processor (Google Docs, Google LLC., Mountain View, CA, USA), and a spreadsheet (Google Sheets). Zotero is also compatible with Microsoft Word (Microsoft Corp., Redmond, WA, United States) and Google Chrome. [Table 1] shows all the software packages mentioned in this review with the electronic addresses for download or use, and the links to short online videos with the most important tips for their use.

Table 1

Free software packages to help with scientific data analysis, creation of graphs and tables, searching and retrieving bibliographic references, and writing papers

Software

Site for download or for use if online

Type

Short video tutorial links

Google Docs

https://docs.google.com/

Word processor

https://youtu.be/OBITNezSmLY?si=hQoKpFSskVuDnpiw

Zotero

https://www.zotero.org/

Reference management software (RMS)

https://youtu.be/tnbwKj6-pD8?si=WodRt8x7mMCcRvWs

JASP

https://jasp-stats.org/

Statistical software package

https://youtu.be/APRaBFC2lEQ?si=HzbvlWC-xKbgMosg

ResearchRabbit

https://www.researchrabbit.ai/

Reference retrieval software

https://youtu.be/16eOHCbi9fI?si=0P4VJzJnWrsvdTRc

Google Sheets

https://www.google.com/sheets/about/

Spreadsheet

https://youtu.be/zs3ku4uVoho?si=N7sDFuMF3Jn01kec

JANE

https://jane.biosemantics.org/

Journal author name estimator

https://youtu.be/278rHTSZiP8?si=HZw7ozBCirHl5hz2

ChatGPT

https://chat.openai.com/

Artificial intelligence

https://youtu.be/wnGPt030IG4?si=_8Kve7mMIV5DIH5

Note: For each package, free educational videos on YouTube have been selected to provide further practical information on how to use them.


Initial Considerations

Scientific language should always be simple, clear, and concise, avoiding excessive exposure and unnecessary or overly-obvious sections. Remember that space in renowned journals is competitive and expensive. There is no room for wastage.

An active voice is preferable to a passive voice most of the time. Use simple and concrete terms. To connect ideas and data logically and cohesively, akin to storytelling, is an effective approach. One must target the complexity of the text to a naive reader who may not be familiar with the jargon and the underlying concepts mentioned and explained in the introduction.

Ethical conduct is crucial for the authors during research and article writing. Honesty, transparency, and integrity are important throughout this process. Authors must accurately report their findings and acknowledge the contributions of others through proper citations to avoid plagiarism, a common reason for article rejection. Adherence to the ethical guidelines of the journals ensures the trustworthiness of scientific literature and upholds the integrity of the academic community. Ethical considerations are also key to guiding methodological applications.

Unethical publishing behaviors include incomplete or fraudulent reporting, plagiarism, and duplicate or overlapping publications.[3] Appropriate authorship attribution is essential. The International Committee of Medical Journal Editors[4] (ICMJE) outlines four criteria for authorship: significant contributions to the study's conception, design, data acquisition, analysis, or interpretation; drafting or revising the manuscript substantially; approving the final version; and ensuring accountability for the study's accuracy and integrity.


#

From the Spreadsheet and Data Analysis to the “Results” Section

After completing the data collection, we observed that the initial step typically involves inputting data into a spreadsheet. The sources of these data include medical records, questionnaires, or direct input from patients during the prospective stage of the study.

Usually, each line represents a subject or patient, and the columns, the variables. Nominal data should be coded as numbers for readability using statistical software. For instance, male and female can be coded as 1 and 2 respectively.

The analysis of the data in the spreadsheet requires statistical software. The most salient features of the analysis are tabulated or shown in graphs. These graphs and tables are the basis of the “Results” section of the paper, which should be the first part to be written, followed by the “Materials and Methods” section.[1]

In the “Results” section, the authors should avoid presenting their opinions or interpretations of the results. All comparisons, comments, and conclusions belong in the “Discussion” section.

If the authors save any spreadsheet created in either Google Sheets or Microsoft Excel to a file with the extension “*.ods”, they will be able to open it seamlessly in the JASP statistical software.[5]

After opening the file in JASP, each variable is classified as scale (continuous), ordinal, or nominal. Common examples of a scale (continuous variables) are age, height, weight, potassium level, and other laboratory parameters. Ordinal variables include clinical stage, classification of heart failure in stages of severity, and performance status. Nominal variables are, for example, yes or no, sex, race etc. Another fundamental step before analyzing the data is to ensure that there is no text in any of the spreadsheet cells, which should either contain a number or be empty (missing data).

The choice of which statistical test to use depends on the distribution of the studied variables and whether they are independent. For instance, parametric tests are better suited to evaluate continuous variables, whereas discrete variables are studied using nonparametric tests. To evaluate the weight of the patients before and after a diet, for example, the weight of each patient before and after the diet are dependent variables. On the other hand, if one wishes to study two distinct groups of patients and evaluate the incidence of hot flashes in those treated with an experimental medication compared to the group who received a placebo, the number of hot flashes in each of the groups is an independent variable. [Table 2] shows the types of statistical tests chosen according to the aforementioned parameters.

Table 2

Appropriate statistical tests to be applied according to data distribution (parametric or non-parametric), if samples are paired or independent, and if there are two or more samples to be compared

Types of statistical tests

Parametric

Non-Parametric

Independent samples

Paired samples

Independent Samples

Paired samples

Two samples

Two samples

Two samples

Two samples

t-test (Student)

t-test (Student)

Mann-Whitney; Chi-Squared (2 × 2); Fisher Exact

Wilcoxon; McNemar; Sign

More than two samples

More than two samples

More than two samples

More than two samples

ANOVA

ANOVA

Kruskal-Wallis

Cochran; Friedman

Abbreviation: ANOVA, analysis of variance.


The first step in the data analysis is a descriptive study of each variable of interest. Continuous variables can be summarized by their mean and 95% confidence intervals (95%CIs), whereas discrete variables (nominal or ordinal), by the number of times each value appears in a particular column of the spreadsheet and the corresponding percentage from the total number of values in that column. The variables that represent the clinical characteristics of the patients are shown in a table that is usually the first table of the paper ([Table 1]).

Next, the researcher seeks correlations between columns of data in the spreadsheet using t-tests, analysis of variance (ANOVA, when comparing a continuous variable between two groups of patients), or with the use of contingency tables (when comparing two different noncontinuous variables) and the Chi-squared or Fisher exact tests. Data depicted in graphs illustrate visual trends and are meant to show the most important results obtained. Other more complex analyses, such as multivariate regression, can be presented in tabular form, whereas survival data can be shown in Kaplan-Meier curves.

Once the researcher chooses the tables and graphs to be included in the “Results” section, an ideal sequence for their presentation is built, considering that one needs to present the results as a story from beginning to end in a logical and understandable sequence. Tables and graphs should be numbered sequentially; a caption should be written for each graph, and a title, for each table. Graphs and tables, along with their respective captions and titles, should be self-explanatory to the reader, that is, they should be interpretable without the need to refer to the article's text. The “Results” section of a clinical study may begin by describing the most salient demographic and pathological data of the patients included. Next, the researcher writes the rest of the “Results” section describing the most valuable information shown in the graphs and tables.


#

The “Materials and Methods” Section

The “Materials and Methods” section should include all the information needed for anyone who would like to reproduce the clinical study exactly as it was conducted.

In fact, if any questions arise as to whether some information should be included, asking whether it is needed to reproduce the study will determine if it should be added. Often, the “Materials and Methods” section written in the future tense as a part of the project that preceded data collection must be changed to the past tense.

Specific general guidelines for each type of scientific article (survey, randomized prospective trial, case reports etc. [https://www.equator-network.org/] significantly improve the quality of the manuscript.

Clinical trials need to be registered at trial repository sites such as clinicaltrials.com or the International Clinical Trials Registry Platform (ICTRP), and systematic reviews can be registered at The International Prospective Register of Systematic Reviews (PROSPERO). The “Instructions for Authors” of each scientific journal will mention that registration is required, and which are the adequate repositories for registration.[6] The institution(s) in which the study was conducted must be specified.

The study design should be clearly stated, such as prospective randomized, observational, retrospective, chart review, and case reports, for example. One needs to describe the criteria through which patients were included in the study. Therefore, the researcher defines inclusion criteria that will characterize the sample of patients to be studied and exclusion criteria that will remove specific patients that would otherwise be included. For instance, if one aims to include patients with metastatic lung adenocarcinoma in a study of chemoimmunotherapy, pregnant patients should be excluded due to potential risks to the fetus posed by the protocol, as well as patients with uncontrolled autoimmune diseases that could be exacerbated by immunotherapy.[6] [7]

All criteria employed in the paper, either for diagnosis (case definition), assessment of toxicities, treatment response etc., need to be mentioned and cited accordingly. Questionnaires that have been validated and are widely used can be cited. If the paper reports the use of a questionnaire for the first time, it can be added in full as an appendix and described in the “Materials and Methods” section.

Likewise, all medications used should be described with the dosages employed and the way in which they are to be administered. Laboratory and imaging tests should be performed, and if an experimental test is to be included, it should be described in detail regarding the reagents employed and how it was performed so that it could be reproduced by other researchers.

Procedures, tests, or medication protocols that are widely known do not need to be described in detail; instead, they can be briefly mentioned and cited so that the reader will know where to find more information. Patient follow-up details, such as the periodicity of clinical visits, tests, and imaging studies, need to be described.

In the “Materials and Methods” section, it is also necessary to include the main objective of the study, for which the sample size was calculated. Researchers must mention any secondary objectives as well. Mentioning the objectives in the “Materials and Methods” section implies that they were tested according to a prespecified protocol, rather than in a post-hoc manner after accessing the data.

Statistical methods include the assumptions employed for the calculation of the sample size, as well as which tests were used to analyze specific variables, and which statistical package was employed.

Clinical researchers also need to report ethical aspects, such as submission to an Institutional Review Board (IRB) and granting of informed consent by patients. Sometimes in the “Materials and Methods” section, but often in another part of the paper, conflict of interests (COI), if existent for any of the authors, needs to be disclosed to enable editors and readers to judge if any bias could be ascribed to the COI.

More recently, editors have also been asked to disclose the use of artificial intelligence (AI) tools in the writing of the manuscript. The use of AI tools for article writing and data analysis should be supplemented by the authors' review and correction of AI-generated results. The authors bear full responsibility for the content of the article.[8]

For clarity, the “Materials and Methods” section can be divided into various parts under different subtitles, such as “Patients”, “Treatments”, “Statistical Methods”.


#

Thinking and Studying

After writing the “Results” and “Materials and Methods” sections, one must proceed with a comprehensive review of the literature once again (the first time it was performed when developing the project). Here, the goal is to contextualize the current results with those of the published literature to provide information to be included in the “Introduction” and “Discussion” sections.

Even though a literature search was conducted earlier in the stage of project development, it is advisable to revisit the bibliography and consider including new articles, as necessary, or if there are any recent studies published on the topic.

This task is best accomplished using RMS that will digitally archive all the papers relevant for the researchers, who will then study them to understand the results obtained and put them into context. This step is critical to obtain the knowledge needed to be able to write the remaining sections of the paper. Furthermore, the references stored within the RMS can be later automatically cited into the manuscript.[9]

To search for papers in the literature, the free online ResearchRabbit software (https://www.researchrabbit.ai/) is particularly useful, as it automatically finds similar papers in the literature with the same keywords as the papers initially added into it. Many other important published papers yet unknown to the authors may surface with the use of this software. In addition, the retrieved papers may be added to the Zotero RMS.[9]

The Zotero RMS enables researchers to collect relevant papers directly from the internet using browsers such as Google Chrome or Microsoft Edge by installing specific extensions. After retrieving the selected papers, researchers can store them digitally for future citations, as well as study and annotate them within Zotero.

This step is critical because many times it is at this moment that one: a) learns why the results matter for the field of research; b) determines if the results described agree with those obtained by other authors or if they disagree, and why; c) identifies the limitations of the research; and d) identifies the next logical steps for future work. These points are critical materials for the discussion to be described subsequently.

Therefore, it is believed that by studying, thinking critically, collecting, and digitally storing the papers addressing these questions, researchers can craft both the “Introduction” and “Discussion” sections, while also citing references and automatically generating the reference list at the end of the article.


#

Writing the Introduction

In the introduction, it is first necessary to provide the background information that a non-specialized average reader needs to understand the relevance of the disease being studied.

The introduction typically comprises 4 to 5 paragraphs, transitioning from broad information to the specific topic intended to be addressed. It is crucial to provide background information understandable to an average reader without specialized knowledge, highlighting the relevance of the disease under study or the problem being tackled.

The first and second paragraphs should outline the disease's prevalence, diagnostic methods, treatment options, and/or prognostic factors to help readers understand the current state of management for this condition.

Subsequently, one must identify the gaps in knowledge or existing questions in the field, as well as clearly articulate the research questions the study aimed to answer. This section should also introduce the hypotheses and objectives.

Finally, the authors must precisely indicate what was done. The “Introduction” section should logically progress from general to specific, maintaining a coherent and logical narrative.[2] [3] [7] [10]


#

Writing the Discussion

The discussion should start by summarizing the main results and go ahead to put them in context with the other contributions to the field present in the literature. No new data should be presented in the discussion, and only those already presented in the “Results” section can be used.

In addition, the discussion needs to address disagreements with other authors and try to explain why these differences occurred. In this scenario, one can address the issue directly and persuade the reader that their position is correct or better.[1]

Next, one needs to critically reflect on the limitations of the data obtained emulating a rigorous reviewer and thus anticipate potential criticisms and, if possible, even respond to them. If there are strengths to the methodology or results obtained, the author should mention them. Authors need to reflect on the implications of the findings to the field and what are the next logical steps the research should take in the future.

Lastly, a clear conclusion based on and proportional to the data presented in the “Results” section should end this section of the paper. One should not overstate the conclusions going beyond what the presented data truly enables one to conclude.[2] [7] [10] [11]


#

Creating an Abstract

The title and the abstract are the parts of a paper that are most often read by researchers. Ideally, they should only be written after all sections have been written. The abstract follows the paper structure, including the “Background” (or “Introduction”), “Materials and Methods”, “Results”, “Discussion”, and “Conclusion” sections.

One approach is to select, copy, and paste whole sentences of the paper, and adapt them to a 250-word text format with all this information structured in the subsections described. Only essential information should be in the abstract, as many other details may be sought by the reader on the paper if they so wish.[2] [5] [7] [11]

ChatGPT generates satisfactory results when users input information from the “Introduction”, “Materials and Methods”, “Results”, and “Discussion” sections, and then request the generation of a structured summary consisting of 250 words.


#

Choosing a Title for the Paper

Ideally, the title should be informative, attractive, and include the main words that one would use to search for an article in electronic databases such as Google Scholar and MEDLINE. The title should be concise and captivating, as it is the most frequently read part of a paper. If it fails to grab the reader's attention, the paper may suffer as a result.[1] [6]

Crafting a title with all these ingredients is a challenging task for which experience is required and counseling with mentors may be valuable. Another resource that can help with title suggestions is to include the paper's abstract in ChatGPT and ask for a few title options for it. Authors then choose and edit, if necessary, the best title option.


#

How to Choose Keywords

The selection of appropriate keywords is crucial to ensure that the paper is easily retrievable in databases such as PubMed and MEDLINE. Papers are indexed in these databases based on their assigned keywords, which should be chosen from the collection of Medical Subject Headings (MeSH) terms available at https://www.ncbi.nlm.nih.gov/mesh. Additionally, it is strategically advantageous to include some of these keywords in the paper's title, if possible.

In addition to checking the MeSH database, one may use additional keywords already employed in similar papers cited in the manuscript.


#

Reviewing and Editing the Paper

After the authors finish the first draft, reviewing and editing the paper thoroughly for corrections is fundamental. During this step, it is also important to ask the coauthors, mentors, and other experienced researchers to review and make suggestions. This process of constructive criticism is essential to increase the chances of future publication, as each of these informal reviewers will anticipate many of the points journal reviewers would otherwise raise in the peer review.


#

Choosing the Right Journal to Submit the Paper

Choosing a journal is crucial because, in addition to the scientific quality and impact factor, depending on its readership's interests, your paper may be outrightly rejected if not adequate. Moreover, it is necessary to strictly follow the “Instructions for Authors” of the chosen journal, as non-adherence to them may also lead to the article's rejection.

Looking at their own cited references authors may find a journal that has published a similar article to the one they are writing. Once again, asking mentors and more experienced researchers regarding the choice of journal may be invaluable advice.[12]

By inputting the title and abstract of their paper in certain online resources (such as https://journalfinder.elsevier.com/ and https://jane.biosemantics.org/), authors may obtain an automatic generation of journal suggestions for submission.

Young oncologists need to know the type of journal to which they are submitting their papers. The “Instructions for Authors” usually mention in which databases the journal is indexed and its impact factor (IF), which is the quotient of how many citations the journal gets yearly divided by the number of papers published during that same year. The IF gives an idea of how relevant a journal is by the average number of citations its papers receive in the literature. Journals with higher IF are usually more competitive and difficult to get papers accepted.

A journal may be open-access or not depending on if the full paper is available online right after publication. Some of these journals charge publication fees. The Brazilian Journal of Oncology (BJO) is an open-access journal that charges no fees for publication. Unfortunately, in recent years, various open-access commercial journals have emerged that charge for both peer review and publication. They charge fees, offer easier ways to publish low-quality papers, and are known as predatory journals that should always be avoided.

Papers can be rejected outright by the editor or after peer review. The most common reasons for rejection before peer review are nonadherence to the formatting specified in the “Instructions for Authors”, lack of interest on the part of the journal or its readership in the paper's subject, or lack of novelty.

After peer review, the most frequent reasons are poor methodology (lack of a sound hypothesis, and flaws in patient selection, handling of the data, and randomization, as well as weaknesses in the analysis of the data etc.), grammatical and spelling errors, inconclusive results, plagiarism etc.[13]

If editors return the reviewer's comments to the authors, even if the paper was rejected, valuable suggestions can be incorporated into the revised manuscript prior to resubmission to another journal.

One must remember that the reviewer's comments most of the time are an important source of ideas to improve your paper, and they should be acknowledged one by one in an itemized response letter containing how that suggestion was incorporated into the paper or, if not, why. Comments that are easy to respond to are, for example, improving grammar and spelling errors, correcting information present in tables or graphs, adding or subtracting references, rephrasing a conclusion etc.

More challenging suggestions may involve changes in the methodology, such as increasing the number of patients or adding information that may not be readily available. These comments, even if they cannot be incorporated as suggested by the reviewer, may be added to the paper as a limitation in the discussion or even as a justification in the methodology. This demonstrates that all the editor's comments have been addressed as much as possible.

If there is a disagreement regarding one or more comments, a candid response should follow the comment explaining why there is disagreement and citing references, if adequate, to corroborate the author's view.


#
#

Conclusion

In conclusion, the present guide provides a comprehensive resource tailored specifically to young oncologists embarking on the journey of scientific writing. It emphasizes the importance of starting with thorough data collection and analysis and using free and accessible software tools to support this process. While there are other software tools available, we have selected free and useful ones. However, this choice can impose limitations on the study.

Critical elements such as ethical considerations, clear and concise writing, and the strategic use of an RMS are highlighted as vital to ensure the integrity and clarity of the manuscript. The present article meticulously outlines each step of the manuscript preparation process, from data entry to journal submission, offering practical advice on answering reviewers' comments and selecting the appropriate journal. This guide not only equips young oncologists with the necessary tools to effectively communicate their research but also enhances their ability to critically read and evaluate scientific literature, thereby contributing to their professional growth and the advancement of oncological knowledge.


#
#

Conflict of Interests

The authors have no conflict of interests to declare.

Authors' Contributions

AG: collection and assembly of data, conception and design, data analysis and interpretation, final approval of the manuscript, and writing of the manuscript; DIGC and MUPC: collection and assembly of data, data analysis and interpretation, final approval of the manuscript, and writing of the manuscript.



Address for correspondence

Mateus Uerlei Pereira da Costa
Sociedade Brasileira de Oncologia Clínica
São Paulo, São Paulo
Brazil   

Publication History

Received: 03 March 2024

Accepted: 21 May 2024

Article published online:
15 July 2024

© 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution 4.0 International License, permitting copying and reproduction so long as the original work is given appropriate credit (https://creativecommons.org/licenses/by/4.0/)

Thieme Revinter Publicações Ltda.
Rua do Matoso 170, Rio de Janeiro, RJ, CEP 20270-135, Brazil

Bibliographical Record
Auro del Giglio, Daniel Iracema Gomes de Cubero, Mateus Uerlei Pereira da Costa. From Data to Manuscript: A Strategy for Young Oncologists to Write a Scientific Paper. Brazilian Journal of Oncology 2024; 20: s00441787969.
DOI: 10.1055/s-0044-1787969