Schalal

GeoLLM

LLM Basics

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

Introduction

使用机器学习的空间预测已广泛开展,但有限的时空数据覆盖率、制备的高成本和数据的可获取性壁垒通常是此类研究所面临的问题。虽然诸如使用开放的全球遥感影像集进行贫困、人口、基础设施等研究也有所开展,但此类模型的预测能力通常有限,因为某些重要的特征可能在空间中并不是可见的。

LLMs已被证实是一类高效的基础模型,可以通过fine-tuned或prompted在诸如健康、教育、法律、金融和科研等领域进行应用,其原因在于LLMs已经对其训练用到的来自互联网的billions甚至trillions级别的数据token对应的corpus中的知识进行了压缩。本文即探索了大语言模型是否有空间知识,以及将这些知识提取生成一系列增强各类空间机器学习任务的变量的方法。

通过检索地址,LLM有一定的空间描述能力,但从中提取知识不是trivial的。直接输入坐标和任务描述的结果较差,其挑战在于LLM理解数字坐标和将其与现实世界关联的能力。本文方法是使用由OSM数据构造的提示词来向LLM提供足够的空间上下文信息,从而提升LLM提取空间知识的能力。提供邻域信息的此方法极大地由于只提供坐标的任务效果。

main contribution:

example prompts

Method

Experiments

Performance on tasks

performance 1

performance 2

few shot performance

Ablations on the prompt

ablation

Discussion

biases

Conclusion

prompt后模型表现佳:Our simple method revealed that LLMs are sample-efficient, rich in geospatial information, and robust across the globe. Crucially, GeoLLM shows promise in substantially mitigating the limitations of traditional geospatial covariates.

GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots

Geospatial Copilots解锁了通过自然语言指令在地球观察应用层面史无前例的潜力,然而现有的agents大多依赖于过于简化的任务和基于模板的提示词,这使其与真实世界的应用场景大多脱节。本文团队提出了GeoLLM-Engine,一个执行遥感平台常规复杂任务的tool-augemented agents环境。通过空间API工具、动态地图/UI和外部多模态知识库丰富了这一环境,借助GPT-4节点,覆盖了超过50万多工具任务和1.1M景遥感影像。使用多种state-of-the-art的agents和prompting技术,以应对long-horizon prompts,且超越传统的单任务image-caption范式。

GeoLLM-Engine

Introduction

LLM已彰显了增强地球观测流程的潜力,但现有的方法大多只考虑低层次的基于模板的提示词,仅能捕获预测的图像-标题对的表面文本形式(texture surface form),在高层次的自然语言中的功能性代理正确性(functional agent correctness)则通常被忽略。这种高层次的理解在地理空间领域很重要,因为地理空间领域包含了时空维度的各种复杂的多模态数据。

In this work, our key insight is that amidst the abundance of “geospatial benchmarking” works, a subtle refocus is necessary, prioritizing the construction of a robust engine as the foundation for benchmark creation, rather than the benchmarks themselves.

现实环境描述:集成各类API和动态地图网页UI,引入model-correctness检查器使后端引擎确认生成的benchmark的正确性。By reducing the necessity for human intervention, we can massively parallelize our benchmark suite across 100 GPT-4-Turbo nodes to create large-scale benchmarks.

GeoLLM-Engine Environment

前端环境、工具集、后端引擎、environment formulation、user intent formulation

user intent=${q, T, r, S}$被封装为四个部分:

通过与标准流程比较,就可以确认上述生成步骤的功能上的正确性。

GeoLLM-Engine Benchmark Suite

intern collection –> tool templates –> gpt-driven benchmark creation –> model-checker formulation –> agent evaluation metrics

intent examples

model-checks

Remote Sensing Datasets

使用了包括目标检测、土地利用分类和视觉问答等多种任务场景的1149612景遥感影像

Results

(Chammeleon、CoT、ReAct都是指导智能体通过以下方式隐式构建工具链的技术)

result

列举了若干强化学习的网页交互应用。

geospatial benchmarks

Limitations and Future Work

Conclusion

UrbanKGent: A Unified Large Language Model Agent Framework for Urban Knowledge Graph Construction

城市知识图构建,提取蕴含在城市空间实体的复杂空间关系和语义关系为异质图。

网站

framework

urban relational triplet extraction and knowledge graph completion

oberview

evaluation

URBAN GENERATIVE INTELLIGENCE (UGI): A FOUNDATIONAL PLATFORM FOR AGENTS IN EMBODIED CITY ENVIRONMENT

contributions framework training CityGPT agent process

Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

本文评估了多个多模态大模型在各类地理任务的能力。

Notable multimodal large language models (MLLMs) include PaLM-E , Flamingo, LLaVA-1.5, InstructBLIP, IDEFICS, Qwen, Kosmos-2, and recently, GPT-4V.

任务类型:

  1. localization 直接评估多模态大语言模型的地理空间能力 平均距离误差
  2. 遥感影像类
    • zero-shot遥感影像分类 准确率
    • 变化检测 同一地点的变化检测
    • 分割(网格分割/Svg分割) 分割
    • 目标检测 bbox
    • 小物件计数 count
  3. 地图
    • 区域识别:从轮廓线推测区划名称|从地图中推测城市名称|从地图中推测岛屿和水体名称 地图识别
    • 定位:map→real world 定位
    • 定位:real world→map 定位
  4. 旗帜识别 旗帜
  5. 失败案例
    • routing/navigation using maps 路线规划
    • drawing and improving country outlines 绘制边界线
    • annotating missing labels on travel maps 地铁站点标注添加
    • estimating population growth from satellite time series 人口预测
    • determining the elevation profile of mountains. 推断山脉

We demonstrate the strong capabilities of MLLMs to detect fine-grained details from imagery and perform nuanced reasoning, with potential applications in environmental research and disaster response. On the other hand, we highlight numerous cases where the current models are severely lacking, especially in map interpretation. Our analysis suggests there are geographical biases in thedata, with weaker performance consistently shown for regions such as Africa that were perhaps less represented inthe training distribution.

Empowering Robotics with Large Language Models: osmAG Map Comprehension with LLMs

osmAG(Area Graph in OpenStreetMap format)的移动机器人导航与定位。fine-tuning后的LLaMA2性能超过了GPT-3.5。

osmAG是一个XML格式的hierarchical, topometric and semantic地图表达,且可用于传统的机器人定位及规划算法。

osmAG

osmAG Map representation

fine-tuning

result

On the Opportunities and Challenges of Foundation Models for GeoAI

大预训练模型(large pre-trained model),也即基础模型(foundation model,FM)通常在大规模数据上以task-agnostic的方式被训练,通过fine-tuning,少样本甚至零样本的方式可以被调整以用于下游任务。本文探索了GeoAI领域开发多模态基础模型的使命和挑战。首先对多个地理空间领域的多种任务进行了现有FM的调查,结果显示仅包含文本模态数据的任务,如标注识别、区位描述和国家等级的人口时间序列预测,在零样本或少样本的情境下任务不可知FM性能表现超过了全监督任务特化的模型。然而在其它任务,特别是包含多模态数据的情况下,现有的FM则表现不如任务特化模型。基于这些观察,本文提出了开发GeoAI基础模型的挑战:解释地理空间任务的多模态性质。在讨论每种地理空间模态的不同挑战后,本文提出了多模态基础模型的可能性,通过地理空间对齐(geospatial alignment)对多类数据进行归因。本文最后总结了开发此类模型面临的特别的风险与挑战。

核心在于空间多模态数据的特别性,几何信息与语义信息并存,与文本/图像相比是特别的数据模态。The key technical challenge here is the inherently multimodal nature of GeoAI. The core data modalities in GeoAI include text, images (e.g., remote sensing or street view images), trajectory data, knowledge graphs, and geospatial vector data (e.g., map layers from OpenStreetMap), all of which contain important geospatial information (e.g., geometric and semantic information). Each modality exhibits special structures that require its own unique representation. While existing foundation models contain modules that can readily process some of these data modalities such as text and images, there are currently no foundation models capable of efectively managing many other distinctive data modalities essential for GeoAI tasks, such as movement trajectory data and other geospatial vector data.

EXPLORATION OF THE EFFECTIVENESS OF EXISTING FMS ON VARIOUS GEOSPATIAL DOMAINS:

A MULTIMODAL FOUNDATION MODEL FOR GEOAI

RISKS AND CHALLENGES

通用挑战:

multimodal FM

CONCLUSION

()