The abundance of multimedia data on the Web presents both challenges (how to annotate, search, and mine) and opportunities (crawling the Web to create large structured multimedia data bases which can be used to do inference effectively). Because of the huge data volume, considering all semantic concepts as on the same (flat) level is not viable. In this paper, we introduce a unified STRUCTURED representation called multimedia information networks (MINets), which incorporates ontology and cross-media links, covering both content and context knowledge. Ontology and cross-media structures are constructed and expanded by automatically constructing MINets from web-scale data by state-of-the-art information extraction and knowledge-based population techniques. The resultant MINet will contain a wide range of linkages, including logical, statistical, and semantic relations among informative concept nodes, which connects proliferative ontology as well as cross-media web-scale resources together. The raw data collected in construction phase often contain much noisy, incomplete, or even conflicting information which could be detrimental to information extraction and utilization. Then, the redundant link structure can be utilized to distill MINets and improve quality of information (QoI). Moreover, advanced inference theory and system can be built upon the linked MINets, and then high-level ontological knowledge can be inferred and integrated in a logically harmonious network structure in MINets which is consistent with human cognition. Even more, as information channels, the ontology and cross-media links in MINets connect informative knowledge resources together, which makes it possible to increase the portability of information between different resources to increase information utilization levels.