Learn how to leverage Financial-RoBERTa, a powerful pre-trained NLP model based on transformers, to perform sentiment analysis on financial texts like statements, earnings announcements, and news articles.
Introduction:
Sentiment analysis plays a crucial role in accounting and finance research, allowing analysts to gain valuable insights from financial texts. In this tutorial, we will explore the usage of Financial-RoBERTa, a pre-trained NLP model specifically designed for sentiment analysis in the financial domain. By the end of this guide, you will have a solid understanding of how to utilize Financial-RoBERTa to extract sentiment from various financial texts such as financial statements, earnings call transcripts, and more.
Section 1: Understanding Financial-RoBERTa
Financial-RoBERTa is an advanced NLP model specifically tailored for sentiment analysis in the financial domain. I trained it on a vast corpus of financial texts, including financial statements, earnings announcements, CSR reports, ESG news, and more. The model employs a sophisticated methodology that combines pre-training and fine-tuning based on the RoBERTa Large language model. It provides softmax outputs for three sentiment labels: Positive, Negative, and Neutral.
Section 2: Setting Up the Environment
Before diving into sentiment analysis with Financial-RoBERTa, we need to set up our environment. Follow these steps:
- Install the necessary libraries by running the following commands in your Python environment:
!pip install transformers
!pip install pandas
Transformers is a powerful Python module that provides a high-level interface for implementing state-of-the-art natural language processing (NLP) models, including pre-trained models like Financial-RoBERTa, for tasks such as sentiment analysis and language translation.
Pandas is a versatile Python module that offers easy-to-use data structures and data analysis tools, allowing users to manipulate and analyze structured data efficiently, making it ideal for tasks like storing sentiment analysis results in a structured DataFrame for further analysis and visualization.
- Import the required packages into your Python script:
from transformers import pipeline
import pandas as pd
Section 3: Loading the Financial-RoBERTa Model
To utilize Financial-RoBERTa, we need to download and load the model from Huggingface. Follow these steps:
- Download the Financial-RoBERTa model using the following code:
classifier = pipeline("sentiment-analysis", model="soleimanian/financial-roberta-large-sentiment")
This code creates a pipeline that specializes in sentiment analysis and specifies the Financial-RoBERTa model (“soleimanian/financial-roberta-large-sentiment”) to be used for the analysis. The pipeline is a convenient tool that combines multiple steps required for sentiment analysis, such as tokenization and classification, into a single process.
Section 4: Performing Sentiment Analysis
Now that we have the Financial-RoBERTa model ready, we can proceed with sentiment analysis. Use the following steps:
- Prepare a list of sentences or text snippets that you want to analyze for sentiment. For example:
Sentences = [
'Our DSO was 40.2 days for the first quarter of 2021 as compared to 40.9 days for the first quarter of 2020',
'Adjusted Operating Margin was 15.1% compared to 16.0% the year-ago quarter.',
'Gross profit for the first six months of fiscal 2020 was $89.3 million, or 42.3% gross margin, compared to $81.9 million, or 41.5% gross margin, in the year ago period.',
'Total jobs to be created for this project were estimated at 25; total jobs created are 20.',
'Premium income from insurance policies is recognized on an as earned basis.',
'Profit share and/or royalty revenue is reported as product revenue and is recognized based upon net sales or profit share of licensed products in licensed territories in the period the sales occur as provided by the collaboration agreement.',
'Thus we are fully in line with the overarching trend of decarbonization.',
'Our water waste increased by 10%.'
]
If your text is not already in a list format, you may need to convert it into a list before proceeding.
To convert the given text into a list of sentences using NLTK’s sentence tokenizer, you can use the following Python code:
!pip install nltk # use this line of code if you don't have nltk installed on your system
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize
text = """Our DSO was 40.2 days for the first quarter of 2021 as compared to 40.9 days for the first quarter of 2020 Adjusted Operating Margin was 15.1% compared to 16.0% the year-ago quarter. Gross profit for the first six months of fiscal 2020 was $89.3 million, or 42.3% gross margin, compared to $81.9 million, or 41.5% gross margin, in the year ago period. Total jobs to be created for this project were estimated at 25; total jobs created are 20. Premium income from insurance policies is recognized on an as earned basis. Profit share and/or royalty revenue is reported as product revenue and is recognized based upon net sales or profit share of licensed products in licensed territories in the period the sales occur as provided by the collaboration agreement. Thus we are fully in line with the overarching trend of decarbonization. Our water waste increased by 10%."""
sentences = sent_tokenize(text)
print(sentences)
In this code, we first import the necessary modules from NLTK. We then download the required tokenizer models by executing nltk.download('punkt')
. After that, we import sent_tokenize
from nltk.tokenize
module. Finally, we pass the text
variable to sent_tokenize
function, which returns a list of sentences stored in the sentences
variable. The list contains each sentence from the original text as a separate element.
- Execute sentiment analysis on each sentence using the Financial-RoBERTa model and store the results:
sentiment_results = {'sentence': [], 'label': [], 'score': []}
for sentence in sentences:
result = classifier(sentence)[0]
sentiment_results['sentence'].append(sentence)
sentiment_results['label'].append(result['label'])
sentiment_results['score'].append(result['score'])
The above code is responsible for performing sentiment analysis on a list of sentences and storing the results in a dictionary called sentiment_results
.
Here’s a step-by-step explanation of the code:
- First, a dictionary called
sentiment_results
is initialized with three empty lists as values: ‘sentence’, ‘label’, and ‘score’. This dictionary will be used to store the results of the sentiment analysis. - Next, a loop is executed for each
sentence
in thesentences
list. Thesentences
list represents the input sentences that need to be analyzed for sentiment. - Inside the loop, the
classifier
function is called with the currentsentence
as an argument. This function applies sentiment analysis using the Financial-RoBERTa model. The[0]
index is used to retrieve the first result from the list of results returned by theclassifier
function. - The sentiment analysis result for the current
sentence
is stored in theresult
variable. - The
sentence
is appended to the ‘sentence’ list insentiment_results
using theappend()
method. This adds the current sentence to the list of analyzed sentences. - The sentiment label from the
result
is appended to the ‘label’ list insentiment_results
. - The sentiment score from the
result
is appended to the ‘score’ list insentiment_results
.
After executing this code, the sentiment_results
dictionary will contain three lists: ‘sentence’, ‘label’, and ‘score’. Each element in these lists corresponds to the sentiment analysis results for the respective sentence in the sentences
list.
Section 5: Analyzing the Results
After performing sentiment analysis, we can analyze and present the results in a structured manner. Follow these steps:
- Create a DataFrame from the sentiment analysis results using pandas:
df = pd.DataFrame.from_dict(sentiment_results)
- Examine the sentiment labels and scores to gain insights into the sentiment expressed in the analyzed sentences:
# Example code to print the DataFrame
print(df)
Let’s save the results in a CSV file:
df.to_csv('Sentiment_Analysis.csv',index=False)
By following these steps, you can leverage the power of Financial-RoBERTa for sentiment analysis in accounting and finance research. Feel free to experiment with different financial texts and explore the valuable insights that sentiment analysis can provide.
I provide an example script via Google Colab. You can load your data to a Google Drive and run the script for free on a Colab.
contact:
Feel free to reach out to me via mohammad.soleimanian@concordia.ca with any questions or feedback you may have.
Conclusion:
In this tutorial, we have explored how to use Financial-RoBERTa, a pre-trained NLP model, for sentiment analysis in the field of accounting and finance research. We covered the installation of required libraries, loading the Financial-RoBERTa model, performing sentiment analysis on sample sentences, and analyzing the results. By incorporating sentiment analysis into your research workflow, you can uncover hidden patterns and sentiments in financial texts, enabling more informed decision-making.