1001 Freelance Projects -- LLM Reasoning Benchmark Questions + Python Evaluation Script

Latest Projects from
Freelance Marketplaces

View this project in detail (Note: you will be redirected to external marketplace)
Project title:
LLM Reasoning Benchmark Questions + Python Evaluation Script
Posted by:
External project from PeoplePerHour
Started:
21-Apr-2025 09:32 GMT
Description:
I need a freelancer to prepare benchmark questions and answers for testing a custom LLM’s reasoning ability. Scope: Question Set: Collect 500–600 LLM benchmark questions with correct answers. Focus areas: logical, mathematical, commonsense, analytical, and multi-step reasoning. Deliver as JSON or CSV. Python Script: Load questions and send them to an LLM (I'll handle API integration). Compare model answers to correct ones. Output a simple accuracy report. Requirements: Knowledge of LLMs, reasoning datasets, or NLP is preferred. Clean, documented code. Use only open or original questions.
Project ID:
3430175
Project category:

Project budget:

View this project in detail (Note: you will be redirected to external marketplace)

Project	Started
Confidentiality Declaration Assistance Category: Document Checking, Editing, Employment Law, Legal, Legal Consultation, Legal Research, Legal Writing, Proofreading Budget: $10 - $30 USD	04 Jul 2025 16:04 GMT
I need a profitable site with active 2D payment. Category: Backend Development, Frontend Development, Payment Gateway Integration, UI / User Interface, User Experience Research, User Interface / IA, User Research, Web Application, Web Design, Web Development Budget: $1000 - $100000 USD	04 Jul 2025 16:04 GMT
Social Selling e Vendas Category: Internet Marketing, Sales, Social Engine, Social Media Marketing Budget: $2 - $8 USD	04 Jul 2025 16:04 GMT
Aggressive Hip-Hop Diss Track Category: Audio Engineering, Audio Production, Audio Services, Creative Writing, Music, Music Production, Sound Design Budget: $10 - $30 USD	04 Jul 2025 16:03 GMT
Wikipedia Page Creation for Notable Individual Category: Academic Writing, Content Writing, Copywriting, Editing, Proofreading, Research, Research Writing, Wikipedia Budget: £250 - £750 GBP	04 Jul 2025 16:00 GMT
Android E-commerce App Development Category: Android, Android App Development, IPhone, Mobile App Development, Mobile Development, Payment Gateway Integration, PHP, UI / User Interface Budget: ₹1500 - ₹12500 INR	04 Jul 2025 16:00 GMT
AI Automation Sales Specialist Category: AI Agents, AI Chatbot Development, AI Development, CRM, PHP, Sales, Software Architecture, Web Design Budget: $750 - $1500 USD	04 Jul 2025 16:00 GMT
chrome extension Bot for Custom Site Navigation Category: Extensions & Additions, HTML, JavaScript Budget: $10 - $30 USD	04 Jul 2025 16:00 GMT
Visionary E-commerce Website Designer Needed Category: Frontend Development, Graphic Design, UI / User Interface, User Interface / IA, UX / User Experience, Web Design Budget: $750 - $1500 USD	04 Jul 2025 15:59 GMT
digital Marketing Support Urgent Category: API Integration, Blockchain, Data Entry, Data Management, Digital Marketing, Digital Networking, Payment Processing, Virtual Assistant, Web Development Budget: $15 - $25 USD	04 Jul 2025 15:59 GMT
Logistics Tracking Platform Development Category: .NET Core, Angular, AngularJS, Django, Full Stack Development, JavaScript, Mobile App Development, Node.js, React Native, React.js Budget: ₹75000 - ₹150000 INR	04 Jul 2025 15:59 GMT
Ammend x4 Architecture plan PDF files	04 Jul 2025 15:57 GMT
Professional UX Designer Needed for Mobile or Web App	04 Jul 2025 15:56 GMT
Portfolio Website Development Category: Backend Development, Frontend Development, HTML5, Java, Laravel, MySQL, PHP, Web Design Budget: $30 - $250 USD	04 Jul 2025 15:52 GMT
Accurate Typeng Printed Books or Articles to Word files Category: Article Writing, Copy Typing, Data Entry, Data Processing, Excel Budget: $30 - $250 USD	04 Jul 2025 15:52 GMT

Browse All Projects

New!
Проекты на русском (Projects in Russian)