top of page

Textual Analytics in Social Media​

Beliefs drive markets

individual beliefs that lead to new market insights and sustainable investment outcomes

With the growing accessibility of textual data in recent years, the potential to develop structural models that incorporate beliefs and micro decisions has emerged. Through our team's extensive analysis of thousands of news and social media outlets, we produce actionable textual insights and analytics. We monitor the exposure of content (reads, comments, likes) and its impact on audiences (speed of spread, sentiment tones, emotions). Additionally, we transform unstructured text from thousands of news and social media sources into structured indicators, enabling deeper insights based on a variety of principles. Our passion lies in generating sustainable research outputs, with a focus on extracting and leveraging micro-belief statements, which distill key sentiments and trends from the vast sea of data


Our Vision

Beliefs are central to asset pricing. Nearly all asset pricing models operate on the assumption that investors determine asset prices based on their beliefs regarding future payoffs. Someone unfamiliar with the field might assume that a significant portion of asset pricing research focuses on understanding how investors form these beliefs. Thus far, this hasn't been the case. The majority of theoretical and empirical studies in asset pricing aim to reverse-engineer these beliefs, adapting them within intricate models to align with observed asset prices.


One might naturally question if we could achieve further progress by uncovering belief dynamics both theoretically and empirically. Models of investor belief dynamics should address the sources of information that investors utilize. It's equally important to understand how investors process and digest this information before making investment decisions. Belief Analytics, at least in part, pursues these developments.


Image by Miguel Henriques

Belief measurement 

Recent years have seen an improvement in the availability of textual data, but for beliefs data to become a standard component of asset-pricing studies, further advancement in the collection and categorization of beliefs statements is required.

Image by Pietro Jeng

Acquisition of large amounts of data

Preparing a single plate of fried rice is easy; producing 100 million plates is another challenge. To tackle this scale, our project is focused on creating a robust infrastructure capable of processing vast volumes of specific texts from the Internet. Our emphasis lies on crafting automated programs that are fault-tolerant and excel at storing and retrieving data efficiently.

Image by Martin Sanchez

The theory of communication

We are constantly exposed to news, opinions, and emotions through social interaction, even if they are outdated.Additionally, they expose us to the spread of disinformation and noise. Due to this, the event itself may not be relevant, but the "story" that was coded, created, embellished, and distorted is. We represent a "big picture" of how financial news and opinions are spread based on the collected belief statements and how they differ based on the characteristics of the posts.

Image by Saad Ahmad

Modeling belief formation

In classical models, investors are assumed to be rational, taking into account all available historical data when learning about relevant stochastic processes for pricing.  But when mapping these models into the real world, it is not clear what “all available” means. We aim to close the gap between the simplistic environment investors face in asset-pricing models and the messy inputs investors face in the real world. Moreover, there are reasons to expect that memory and perception of social posts are biased. It is crucial to conduct empirical and theoretical research to better understand how investors select and process information conveyed in tons of media posts.

By far, by the numbers

Texts from 1.1 million videos

- Social media video platform (TikTok China )

Texts from 274.2 million posts

- Chinese Stock Forum

Texts from 9.9 million posts

- Chinese Mutual fund Forum

Texts from 8.6 million posts

- Chinese Furtures Forum


Releasing the power of text by fine-tuning vertical LLMs

Social media content serves as a lens into the public's perceptions of specific subjects or incidents. Nevertheless, the extraction of valuable insights from such expansive quantities of text data is far beyond the capabilities of even the most experienced human analysts.  This is where Large Language Models (LLMs) come to the rescue. Conceptually, an LLM functions as an advanced analytical instrument for processing natural languages. Analogous to a parrot echoing phrases within its environment, an LLM emulates human language patterns. A critical distinction, however, is that LLMs are trained on extensive datasets, enabling them to produce coherent and contextually appropriate results in analyzing financial reports, market trends, and generating economic forecasts.

Belief Analytics harnesses its immense collection of textual data to forge state-of-the-art, domain-specialized  LLMs tailored for both academic research and industrial application. Our fine-tuned vertical models are strategically focused on achieving unparalleled performance in specific usages, albeit with a trade-off in the model's overall generative capabilities.


We believe we can make sustainable contributions via the following three key dimensions:

Data-centric AI

Data-centric AI emphasizes the critical role of data quality, organization and integrity, contrasting with the traditional emphasis on the complexity of models or algorithms. This paradigm suggests that employing high-quality data can diminish the need for complex model architectures and large-scale datasets, thereby significantly lessen the computational load during fine-tuning 

At Belief Analytics, we concentrate on achieving excellence in training corpus. This is accomplished through diligent curation, accurate labeling, and comprehensive preprocessing of our text datasets. Our aim is to ensure that the applied corpus is bias-free, balanced, and specifically relevant to each distinct task. This approach not only makes the fine-tuning process more efficient in terms of resources but also enhances the overall effectiveness of our AI solutions.

Image by Yuhan Du

Benchmarks play a pivotal role in fine-tuning LLMs, as it is an essential framework for performance assessment and enhancement. At Belief Analytics, we are dedicated to developing benchmarks tailored specifically for vertical LLMs. This work is crucial for several reasons. Firstly, it provides standardized metrics for evaluating model performance in niche sectors, facilitating objective comparisons and providing deeper insights into specific model strengths. Secondly, these benchmarks are designed to address the unique challenges and needs of various vertical applications, leading to the creation of more focused and efficient models. Finally, these benchmarks are key in driving continuous corpus refinement that fully leverages the data-centric AI approach. By setting clear objectives and success criteria, appropriate benchmarks ensure our LLMs are not only powerful but also precisely tuned to the specific demands of their application domains.

LLMs Benchmarks

Image by Taylor Vick

Purpose Driven
Fine-Tuned  ​LLMs

Our fine-tuned LLMs are developed to understand and process the complex terminologies inherent in finance. They can also be deployed locally to comply with the privacy and security regulations of the financial sector. Besides, a key strength of our LLMs is their adaptability, allowing us to automate routine tasks and continually adjust to new demands and areas of interest in finance. This keeps our solutions at the forefront of technology, offering relevant and impactful insights for financial research and industry applications.

Finally, mutilple fine-tuned models may operate collaboratively under an advanced coordinating framework. This integration facilitates a diverse array of functionalities coupled with a superior level of automation. Such a synergistic approach empowers our models to deliver intricate and nuanced abilities, catering to a wide spectrum of technology-oriented financial innovations.

Our Key Members


Our Partner

Our partners are providing important aids in our area of focus. In our area of focus, our partners provide valuable aid. As a result of the collaborative projects that we undertake, we are able to work together toward the fulfillment of a common objective.

截屏2022-11-25 11.26.06.png
截屏2022-11-25 11.29.04.png
截屏2022-11-25 14.36.31.png
bottom of page