Enhanced PubChem MCP Server for AI Assistants
🧪 An advanced, robust, and privacy-focused MCP server that enables AI assistants to intelligently search and access chemical compound information from PubChem.
This PubChem MCP Server serves as a powerful bridge between AI assistants (such as those in AnythingLLM) and PubChem's extensive chemical database. It utilizes the Model Context Protocol (MCP) to allow AI models to perform smart, resilient searches for chemical compounds and retrieve their detailed properties programmatically.
✨ Core Features
This is not just another PubChem wrapper. This server has been rebuilt from the ground up to be highly robust and intelligent:
- 🧠 Smart Fallback Search: If a search for a common name (like "Vitamin D") fails, the server automatically performs a deeper search in the PubChem Substance database to locate the correct compound. This dramatically improves success rates for ambiguous queries.
- 🛡️ Robust Error Handling & Retries: The server is designed to gracefully handle API errors. If it encounters a "Server Busy" error from PubChem, it will automatically wait and retry the request, ensuring your queries succeed even during heavy load.
- 🔒 Optional Tor Proxy Support: You have full control over your privacy. A simple
config.inifile allows you to route all requests through the Tor network (via SOCKS5 or HTTP proxy), preventing your IP address from being exposed. The server is secure by default and will never leak your IP if the proxy connection fails. - 🔎 Single & Multiple Compound Search: Seamlessly manages requests for one or multiple compounds in a single prompt.
- 🧪 Detailed Property Retrieval: Access key chemical properties such as IUPAC Name, Molecular Formula, Molecular Weight, and most importantly, Monoisotopic Mass.
🚀 No Installation Required: Try It Live on Smithery.ai
For those new to MCP servers or who just want to test this tool's capabilities without any local setup, a live, hosted version is available on Smithery.ai. This lets you chat directly with the agent in your browser.
<-- Explore the Live PubChem Agent on Smithery.ai -->
How to get started:
- Click the link above to go to the server page.
- Sign in using your GitHub or Google account.
- Click the "Explore capabilities" button to open a chat interface and start testing!
For highest accuracy, especially with long decimal numbers, we strongly recommend using powerful models. The following have been tested and work excellently with this tool:>
* Anthropic's Claude 3 Sonnet
* OpenAI's GPT-4 Turbo (or newer versions like GPT-4o)>
We confirmed that both models correctly processed the full precision of the decimal numbers returned by the tool without any rounding.
🚀 Quick Start & Installation
This server is designed to be run locally, either on your desktop or within the AnythingLLM Docker environment.
1. Dependencies
This project relies on a few Python libraries. Make sure they are installed in your environment.
Create a requirements.txt file with the following content:
requests
bs4
mcp
pubchempy
pandas
PySocksInstall them using uv or pip:
uv venv
.venv\Scripts\activate
uv pip install -r requirements.txt
(Note: PySocks is only required if you plan to use the Tor SOCKS5 proxy feature.)2. Configuration
The server is configured via a config.ini file that is automatically created the first time you run it. This file will appear in the same directory as the pubchem_server.py script.
Default config.ini:
[proxy]
Change 'use_proxy' to 'true' to route all requests through a proxy.
Essential for protecting your privacy if you use Tor.
use_proxy = falseDefines the proxy type. Options:
socks5h -> (Recommended for Tor) The native SOCKS proxy for Tor. More secure.
http -> An HTTP proxy. If using Tor, requires additional configuration.
proxy_type = socks5hThe proxy address. THIS IS A CRITICAL SETTING!
- If you run this script on your DESKTOP, use: 127.0.0.1
- If you run this script inside DOCKER, use: host.docker.internal
host = 127.0.0.1The proxy port.
9050 -> Default port for Tor's SOCKS proxy.
8118 -> Default port for Tor's HTTP proxy (if enabled).
port = 90503. Integration with AnythingLLM
Follow the official documentation to add this as a custom MCP server. The key is to point the command to your Python executable and the pubchem_server.py script.
Example for AnythingLLM Desktop (plugins/anythingllm_mcp_servers.json):
{
"mcpServers": {
"pubchem": {
"name": "PubChem Server (Enhanced)",
"transport": "stdio",
"command": "\"C:\\path\\to\\your\\.venv\\Scripts\\python.exe\" \"C:\\path\\to\\your\\project\\pubchem_server.py\"",
"cwd": "C:\\path\\to\\your\\project"
}
}
}3.1 Implementation Examples with AnythingLLM
Here are some examples of how to interact with the agent. These tests were conducted using AnythingLLM Desktop, connecting to various Large Language Models via OpenRouter.
A crucial finding from our testing is the importance of the model you choose. We strongly recommend against using small or local models for this task. Smaller models often struggle to correctly parse the data returned by the tool, leading to errors such as hallucinated values, incorrect formatting, or, most critically, rounding decimal numbers, which defeats the purpose of this high-precision tool.
To ensure the integrity of your results, it's always a good practice to verify the raw data returned by the MCP. You can learn how to do this in the next section: 3.2 How to verify MCP outputs in the logs.
On the other hand, several models have demonstrated excellent performance. We achieved outstanding results with Google's Gemini 2.5 Flash lite. showed remarkable precision in handling long decimal numbers and correctly formatting the final Markdown table.
Google gemini 2.5 flash lite without prompt and with prompt keep perfect decimals (in the example use 0.6 of temperature)
For this specific task, my personal favorite is Gemini 2.5 Flash lite due to its consistent accuracy and reliability.
3.2 How to Verify MCP Outputs in the Logs
The best way to confirm that the agent is receiving the correct data and that the LLM is not making mistakes is to check the debug log file generated by the MCP server.
This server is configured to automatically create a log file in a subfolder within your project directory. The location will be:
your-project-folder/
└── logs_mcp/
└── mcp_debug.logInside mcp_debug.log, you will find the exact JSON data that the tool sends back to the LLM before the model processes it. You can manually inspect this JSON to verify any values, especially long decimal numbers from properties like monoisotopic_mass, ensuring that the LLM has not introduced any rounding errors or hallucinations in its final answer.
📊 Usage
Once integrated, your AI assistant will automatically use this tool when asked about chemical information. The primary tool exposed is search_compounds_by_name.
Example Prompt
This prompt demonstrates the tool's ability to handle a list of compounds with varying name types and retrieve multiple properties.
@agent
Please use your PubChem tool to find information on the following compounds:- Thiamine (Vitamin B1)
- Riboflavin (Vitamin B2)
- Folic Acid
- Hydroxocobalamin
For each compound found, please retrieve the IUPAC Name, Molecular Formula, and Monoisotopic Mass.Present all the results in a single Markdown table.
The agent will call the search_compounds_by_name tool, which will intelligently find each compound, retrieve its data, and return it to the LLM for formatting.🛠 MCP Tool Exposed
search_compounds_by_name
Searches for multiple compounds by name, one by one, using a smart fallback strategy with pauses and retries for maximum reliability.
Parameters:
-
names(List[str]): A list of compound names. Example:["Aspirin", "Ibuprofen"]
🙏 Acknowledgements
This project was heavily inspired by and built upon the foundation of the original PubChem-MCP-Server created by JackKuo666.
While this repository is not a direct fork, the original project was the crucial starting point. This version enhances the original concept with a focus on extreme robustness, intelligent search strategies, and user privacy through optional Tor integration 🧅.
Many thanks to the original author for their fantastic work and for sharing it with the community.
⚠️ Disclaimer
This tool is intended for research and educational purposes. Please respect PubChem's terms of service and use this tool responsibly.
--- Tranlated By Open Ai Tx | Last indexed: 2025-08-27 ---