arXiv Preprints8
AX2026-05-15T17:59:57Z
Here's a simplified version:
---
Building accurate 3D models from photos taken at unknown angles is a hard problem in computer vision. Most current methods predict 3D shape by mapping each pixel to a point in space, which creates redundant data and bumpy surfaces.
We introduce **IVGT** (Implicit Visual Geometry Transformer), a new approach that builds smooth, continuous 3D geometry from multiple photos without needing camera positions.
Instead of predicting points pixel-by-pixel, IVGT learns a continuous 3D representation of the scene. You can query any point in 3D space, and the model returns its distance to the nearest surface (SDF) and color using small, fast decoders.
This lets us:
- Extract smooth surface geometry directly
- Render images, depth maps, and surface normals from any viewpoint
We train IVGT on multiple datasets at once, using 2D image supervision and 3D geometric constraints.
**Results:** IVGT works well on new scenes and performs strongly across many tasks — mesh reconstruction, point clouds, novel view synthesis, depth estimation, surface normals, and camera pose estimation.
AX2026-05-15T17:58:58Z
AI chips are pushing rack power toward 1MW each by 2027, and that creates a hard problem for datacenter designers: a facility built for one power density can end up "stranding" power it can't actually use as hardware changes over its lifetime. Getting this right matters because grid capacity is scarce, but it's hard to plan because feasibility, cost, and performance depend on many interacting factors (topology, placement, oversubscription, workload mix) that all shift over time and resist clean math.
We built a framework to evaluate power delivery designs on throughput, power, and cost across realistic arrival, oversubscription, and decommissioning patterns, combining deployment projections with operational data from Microsoft Azure. The results show that stranding across multiple resources significantly changes deployable capacity, real capex, and performance — and that rising AI rack density reshapes all three. The takeaway: for AI datacenters, plan for deployable capacity over time, not installed megawatts.
AX2026-05-15T17:52:57Z
Utilities now need to do three things: send bills customers can actually understand, attach a verifiable carbon number to every kWh sold, and schedule load around grid stress and emissions limits.
We propose one framework that combines four production-ready capabilities:
1. A generative AI agent that turns structured numbers into a plain-language bill for each customer, using constrained decoding to keep outputs safe.
2. A transformer-based forecaster that predicts next-day consumption with calibrated quantile ranges.
AX2026-05-15T17:49:24Z
AI is now part of the places where people share opinions online. Tools like LLMs help edit posts on LinkedIn and add context on X. Past studies showed AI can be biased and shape what people think in one-on-one chats. But we know less about how it affects group opinions when it sits between people talking to each other. We studied this in two ways: with experiments and with math.
First, we tested popular LLMs by asking them to edit human-written texts on hot topics. They pushed the texts in one direction. For example, they nudged writing to support gun control and oppose atheism.
Next, we built a math model of how opinions spread when an AI sits between users on a social network, changing what they say and read. Using the model and real network data, we showed that small AI biases can grow as they spread through the network. This shifts the group's overall opinion.
We then asked: can platforms control this bias? We audited X's "Explain this post" feature. We found that Grok leans pro-life on abortion topics. We traced this to specific design choices.
We end by discussing what this means for new laws being written in the European Union.
AX2026-05-15T17:48:25Z
**VLA-AD: Smaller, Faster Robot Policies via Semantic Distillation**
Large Vision-Language-Action (VLA) models work well for robot manipulation, but they're too big and slow for real-time control. We introduce **VLA-AD**, which shrinks them down.
**How it works:** A Vision-Language Model acts as an offline teacher's assistant. Instead of just copying the big model's actions, the small student also learns high-level cues — what phase of the task it's in, and which direction to move across frames. These extra signals are only used during training. At test time, the student runs alone.
**Results on LIBERO benchmarks:**
- With OpenVLA-7B as teacher: student is **44× smaller** (158M params), only **0.27%** behind on average, runs at **12.5 Hz** on an RTX 4090 (**3.28× faster**).
- With a π₀.₅-4B teacher: student **beats** the teacher on two suites, within **0.53%** on `libero_goal`.
**Why it works:** Phase and direction cues make the student ignore noisy teacher actions, like jittery gripper flips.
**Takeaway:** Offline semantic guidance from VLMs makes distilled robot policies smaller, faster, and more robust.
AX2026-05-15T17:48:22Z
Here's a simpler version:
We watermark generative models by hiding the signal in the model's learned dynamics, not in its weights or outputs. Specifically, we embed the watermark into the velocity field of a flow matching model.
We treat this as random coding over a continuous channel. During training, we add a small perturbation that depends on a secret key. At detection time, we recover the hidden message using only black-box queries. The perturbation is designed so the model's output distribution stays the same.
Experiments on MNIST and CIFAR-10 with different architectures show three things:
- Messages can be recovered reliably
- Generation quality is preserved
- Without the key, decoding is no better than chance
AX2026-05-15T17:43:16Z
Here's a simpler version:
---
Researchers often mix up two ways to test if transformer layers are "equivalent" for compression:
- **Replacement**: Can layer A's function stand in for layer B's, in place?
- **Interchange**: Do two layers give similar results when you swap their positions?
Both tests compare outputs using swap-KL, but they don't always agree. The gap between them can shift which layers look safe to prune — sometimes by several times — under the same evaluator. This matters most when replacement distances are large.
We tested both methods across model checkpoints and architectures:
- **Pythia (410M and 1.4B)**: The gap between replacement and interchange grows as training progresses.
- **Qwen3-8B (WikiText-2, 8B scale)**: Interchange-guided pruning is several times safer than replacement-guided at the same layer budget.
- **Llama-3.1-8B**: Both methods cost the same for pruning, even though interchange KL is lower. So a smaller metric gap doesn't always mean easier removal.
**Takeaway**: Before removing or merging layers, run both swap-KL tests on your target checkpoint. It only needs unlabeled forward passes.
AX2026-05-15T17:42:49Z
**Can LLM agents get better at decisions by writing notes to themselves — without any retraining?**
We built FORGE, a system that lets a group of agents learn from their mistakes by editing their own prompts. After each failure, a reflection agent (same model, no help from a smarter one) turns the bad run into reusable notes: rules, worked examples, or both. The best agent's notes get shared with the rest of the group, and agents that stop improving are retired to save compute.
We tested it on CybORG CAGE-2, a network-defense game with random outcomes, using four LLMs (Gemini-2.5-Flash-Lite, Grok-4-Fast, Llama-4-Maverick, Qwen3-235B). All four start out doing badly with frequent catastrophic failures.
**Results across all 12 setups:**
- 1.7-7.7× better than zero-shot
- 29-72% better than Reflexion (which learns alone)
- Catastrophic failure rates drop to ~1%
**What we learned:**
1. Sharing notes across the group is the key ingredient. Retiring agents just saves money.
2. Examples win on raw score for 3 of 4 models. Rules are cheaper (~40% fewer tokens) and more reliable.
3. Weaker models gain the most — FORGE narrows capability gaps rather than widening them.
**Caveat:** Only tested on CAGE-2 against one attacker. Cross-model claims are suggestive, not proven.
Patent Filings5
PT2020-06-02
Here's a simplified version:
For example, in modern **robotic automation**, a part might be complex and have many features like holes. The robot can use the relationships between these features to quickly learn the part's position or other process settings.
PT2014-12-10
Method for teaching a robot to move (steps 84, 88, 90, 92), using a system with:
- a robot (36, 94)
- a robot controller (34, 96) that has an automatic mode and a teach mode
- a PLC (32) connected (38) to the robot controller…
PT2019-05-07
Here's a simplified version:
**Original:**
> The invention relates to a method for programming a robot, in particular a robot comprising a robotic arm, in which method a movement to be performed by the robot is set up preferably in a robot programme by means of a predefined motion template, the motion template is selected from a database …
**Simplified:**
> This invention is a way to program a robot — specifically one with a robotic arm. You pick a ready-made motion template from a database and use it to set up the robot's movement in a program.
**Even shorter:**
> A method for programming a robot arm by picking a motion template from a database and using it to define the robot's movement.
PT2025-10-03
A robotic arm controller that smooths target motion by convolving it with a pulse train before sending it to the arm.
PT2012-08-15
Here's a simpler version:
"A robotic arm controller for tool motion and logic control. The robotic arm has multiple joints, each driven by a servo motor to control its movement path."