Machine Learning Engineer, LLM Fine-Tuning

  • First Soft Solutions LLC
  • 04/02/2026
Full time Information Technology Telecommunications Java Python Software Engineer Testing

Job Description

Machine Learning Engineer, LLM Fine Tuning

We are actively hiring for a Machine Learning Engineer focused on LLM fine tuning for Verilog/RTL applications.

Location: San Jose, CA (Onsite)

Skills: LLM fine tuning, Verilog/RTL, AWS, Bedrock, SageMaker

Responsibilities
  • Own the technical roadmap for Verilog/RTL focused LLM capabilities-from model selection and adaptation to evaluation, deployment, and continuous improvement.
  • Lead a hands on team of applied scientists/engineers: set direction, unblock technically, review designs/code, and raise the bar on experimentation velocity and reliability.
  • Fine tune and customize models using state of the art techniques (LoRA/QLoRA, PEFT, instruction tuning, preference optimization/RLAIF) with robust HDL specific evals:
    • Compile /lint /simulate based pass rates, for code generation, constrained decoding to enforce syntax, and "does it synthesize" checks.
  • Design privacy first ML pipelines on AWS:
    • Training/customization and hosting using Amazon Bedrock and SageMaker (or EKS + KServe/Triton/DJL) for bespoke training needs.
    • Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints), IAM least privilege, CloudTrail auditing, and Secrets Manager for credentials.
    • Enforce encryption in transit/at rest, data minimization, no public egress for customer/RTL corpora.
  • Stand up dependable model serving: Bedrock model invocation where it fits, and/or low latency self hosted inference (vLLM/TensorRT LLM), autoscaling, and canary/blue green rollouts.
  • Build an evaluation culture: automatic regression suites that run HDL compilers/simulators, measure behavioral fidelity, and detect hallucinations/constraint violations; model cards and experiment tracking (MLflow/Weights & Biases).
  • Partner deeply with hardware design, CAD/EDA, Security, and Legal to source/prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements.
  • Drive productization: integrate LLMs with internal developer tools (IDEs/plug ins, code review bots, CI), retrieval (RAG) over internal HDL repos/specs, and safe tool use/function calling.
  • Mentor & uplevel: coach ICs on LLM best practices, reproducible training, critical paper reading, and building secure by default systems.
Qualifications
  • 10+ years total engineering experience with 5+ years in ML/AI or large scale distributed systems; 3+ years working directly with transformers/LLMs.
  • Proven track record shipping LLM powered features in production and leading ambiguous, cross functional initiatives at Staff level.
  • Deep hands on skill with PyTorch, Hugging Face Transformers/PEFT/TRL, distributed training (DeepSpeed/FSDP), quantization aware fine tuning (LoRA/QLoRA), and constrained/grammar guided decoding.
  • AWS expertise to design and defend secure enterprise deployments: Bedrock, SageMaker, S3, EC2/EKS/ECR, VPC/Subnets/Security Groups, IAM, KMS, PrivateLink, CloudWatch/CloudTrail, Step Functions, Batch, Secrets Manager.
  • Strong software engineering fundamentals: testing, CI/CD, observability, performance tuning; Python a must (bonus for Go/Java/C++).
  • Demonstrated ability to set technical vision and influence across teams; excellent written and verbal communication for execs and engineers.
Seniority Level

Mid Senior level

Employment Type

Full time

Job Function

Engineering and Information Technology

Industries

IT Services and IT Consulting