ldint.texi 30 KB


  1. \input texinfo
  2. @setfilename ldint.info
  3. @c Copyright (C) 1992-2022 Free Software Foundation, Inc.
  4. @ifnottex
  5. @dircategory Software development
  6. @direntry
  7. * Ld-Internals: (ldint). The GNU linker internals.
  8. @end direntry
  9. @end ifnottex
  10. @copying
  11. This file documents the internals of the GNU linker ld.
  12. Copyright @copyright{} 1992-2022 Free Software Foundation, Inc.
  13. Contributed by Cygnus Support.
  14. Permission is granted to copy, distribute and/or modify this document
  15. under the terms of the GNU Free Documentation License, Version 1.3 or
  16. any later version published by the Free Software Foundation; with the
  17. Invariant Sections being ``GNU General Public License'' and ``Funding
  18. Free Software'', the Front-Cover texts being (a) (see below), and with
  19. the Back-Cover Texts being (b) (see below). A copy of the license is
  20. included in the section entitled ``GNU Free Documentation License''.
  21. (a) The FSF's Front-Cover Text is:
  22. A GNU Manual
  23. (b) The FSF's Back-Cover Text is:
  24. You have freedom to copy and modify this GNU Manual, like GNU
  25. software. Copies published by the Free Software Foundation raise
  26. funds for GNU development.
  27. @end copying
  28. @iftex
  29. @finalout
  30. @setchapternewpage off
  31. @settitle GNU Linker Internals
  32. @titlepage
  33. @title{A guide to the internals of the GNU linker}
  34. @author Per Bothner, Steve Chamberlain, Ian Lance Taylor, DJ Delorie
  35. @author Cygnus Support
  36. @page
  37. @tex
  38. \def\$#1${{#1}} % Kluge: collect RCS revision info without $...$
  39. \xdef\manvers{2.10.91} % For use in headers, footers too
  40. {\parskip=0pt
  41. \hfill Cygnus Support\par
  42. \hfill \manvers\par
  43. \hfill \TeX{}info \texinfoversion\par
  44. }
  45. @end tex
  46. @vskip 0pt plus 1filll
  47. Copyright @copyright{} 1992-2022 Free Software Foundation, Inc.
  48. Permission is granted to copy, distribute and/or modify this document
  49. under the terms of the GNU Free Documentation License, Version 1.3
  50. or any later version published by the Free Software Foundation;
  51. with no Invariant Sections, with no Front-Cover Texts, and with no
  52. Back-Cover Texts. A copy of the license is included in the
  53. section entitled "GNU Free Documentation License".
  54. @end titlepage
  55. @end iftex
  56. @node Top
  57. @top
  58. This file documents the internals of the GNU linker @code{ld}. It is a
  59. collection of miscellaneous information with little form at this point.
  60. Mostly, it is a repository into which you can put information about
  61. GNU @code{ld} as you discover it (or as you design changes to @code{ld}).
  62. This document is distributed under the terms of the GNU Free
  63. Documentation License. A copy of the license is included in the
  64. section entitled "GNU Free Documentation License".
  65. @menu
  66. * README:: The README File
  67. * Emulations:: How linker emulations are generated
  68. * Emulation Walkthrough:: A Walkthrough of a Typical Emulation
  69. * Architecture Specific:: Some Architecture Specific Notes
  70. * GNU Free Documentation License:: GNU Free Documentation License
  71. @end menu
  72. @node README
  73. @chapter The @file{README} File
  74. Check the @file{README} file; it often has useful information that does not
  75. appear anywhere else in the directory.
  76. @node Emulations
  77. @chapter How linker emulations are generated
  78. Each linker target has an @dfn{emulation}. The emulation includes the
  79. default linker script, and certain emulations also modify certain types
  80. of linker behaviour.
  81. Emulations are created during the build process by the shell script
  82. @file{genscripts.sh}.
  83. The @file{genscripts.sh} script starts by reading a file in the
  84. @file{emulparams} directory. This is a shell script which sets various
  85. shell variables used by @file{genscripts.sh} and the other shell scripts
  86. it invokes.
  87. The @file{genscripts.sh} script will invoke a shell script in the
  88. @file{scripttempl} directory in order to create default linker scripts
  89. written in the linker command language. The @file{scripttempl} script
  90. will be invoked 5 (or, in some cases, 6) times, with different
  91. assignments to shell variables, to create different default scripts.
  92. The choice of script is made based on the command-line options.
  93. After creating the scripts, @file{genscripts.sh} will invoke yet another
  94. shell script, this time in the @file{emultempl} directory. That shell
  95. script will create the emulation source file, which contains C code.
  96. This C code permits the linker emulation to override various linker
  97. behaviours. Most targets use the generic emulation code, which is in
  98. @file{emultempl/generic.em}.
  99. To summarize, @file{genscripts.sh} reads three shell scripts: an
  100. emulation parameters script in the @file{emulparams} directory, a linker
  101. script generation script in the @file{scripttempl} directory, and an
  102. emulation source file generation script in the @file{emultempl}
  103. directory.
  104. For example, the Sun 4 linker sets up variables in
  105. @file{emulparams/sun4.sh}, creates linker scripts using
  106. @file{scripttempl/aout.sc}, and creates the emulation code using
  107. @file{emultempl/sunos.em}.
  108. Note that the linker can support several emulations simultaneously,
  109. depending upon how it is configured. An emulation can be selected with
  110. the @code{-m} option. The @code{-V} option will list all supported
  111. emulations.
  112. @menu
  113. * emulation parameters:: @file{emulparams} scripts
  114. * linker scripts:: @file{scripttempl} scripts
  115. * linker emulations:: @file{emultempl} scripts
  116. @end menu
  117. @node emulation parameters
  118. @section @file{emulparams} scripts
  119. Each target selects a particular file in the @file{emulparams} directory
  120. by setting the shell variable @code{targ_emul} in @file{configure.tgt}.
  121. This shell variable is used by the @file{configure} script to control
  122. building an emulation source file.
  123. Certain conventions are enforced. Suppose the @code{targ_emul} variable
  124. is set to @var{emul} in @file{configure.tgt}. The name of the emulation
  125. shell script will be @file{emulparams/@var{emul}.sh}. The
  126. @file{Makefile} must have a target named @file{e@var{emul}.c}; this
  127. target must depend upon @file{emulparams/@var{emul}.sh}, as well as the
  128. appropriate scripts in the @file{scripttempl} and @file{emultempl}
  129. directories. The @file{Makefile} target must invoke @code{GENSCRIPTS}
  130. with two arguments: @var{emul}, and the value of the make variable
  131. @code{tdir_@var{emul}}. The value of the latter variable will be set by
  132. the @file{configure} script, and is used to set the default target
  133. directory to search.
  134. By convention, the @file{emulparams/@var{emul}.sh} shell script should
  135. only set shell variables. It may set shell variables which are to be
  136. interpreted by the @file{scripttempl} and the @file{emultempl} scripts.
  137. Certain shell variables are interpreted directly by the
  138. @file{genscripts.sh} script.
  139. Here is a list of shell variables interpreted by @file{genscripts.sh},
  140. as well as some conventional shell variables interpreted by the
  141. @file{scripttempl} and @file{emultempl} scripts.
  142. @table @code
  143. @item SCRIPT_NAME
  144. This is the name of the @file{scripttempl} script to use. If
  145. @code{SCRIPT_NAME} is set to @var{script}, @file{genscripts.sh} will use
  146. the script @file{scripttempl/@var{script}.sc}.
  147. @item TEMPLATE_NAME
  148. This is the name of the @file{emultempl} script to use. If
  149. @code{TEMPLATE_NAME} is set to @var{template}, @file{genscripts.sh} will
  150. use the script @file{emultempl/@var{template}.em}. If this variable is
  151. not set, the default value is @samp{generic}.
  152. @item GENERATE_SHLIB_SCRIPT
  153. If this is set to a nonempty string, @file{genscripts.sh} will invoke
  154. the @file{scripttempl} script an extra time to create a shared library
  155. script. @ref{linker scripts}.
  156. @item OUTPUT_FORMAT
  157. This is normally set to indicate the BFD output format use (e.g.,
  158. @samp{"a.out-sunos-big"}. The @file{scripttempl} script will normally
  159. use it in an @code{OUTPUT_FORMAT} expression in the linker script.
  160. @item ARCH
  161. This is normally set to indicate the architecture to use (e.g.,
  162. @samp{sparc}). The @file{scripttempl} script will normally use it in an
  163. @code{OUTPUT_ARCH} expression in the linker script.
  164. @item ENTRY
  165. Some @file{scripttempl} scripts use this to set the entry address, in an
  166. @code{ENTRY} expression in the linker script.
  167. @item TEXT_START_ADDR
  168. Some @file{scripttempl} scripts use this to set the start address of the
  169. @samp{.text} section.
  170. @item SEGMENT_SIZE
  171. The @file{genscripts.sh} script uses this to set the default value of
  172. @code{DATA_ALIGNMENT} when running the @file{scripttempl} script.
  173. @item TARGET_PAGE_SIZE
  174. If @code{SEGMENT_SIZE} is not defined, the @file{genscripts.sh} script
  175. uses this to define it.
  176. @item ALIGNMENT
  177. Some @file{scripttempl} scripts set this to a number to pass to
  178. @code{ALIGN} to set the required alignment for the @code{end} symbol.
  179. @end table
  180. @node linker scripts
  181. @section @file{scripttempl} scripts
  182. Each linker target uses a @file{scripttempl} script to generate the
  183. default linker scripts. The name of the @file{scripttempl} script is
  184. set by the @code{SCRIPT_NAME} variable in the @file{emulparams} script.
  185. If @code{SCRIPT_NAME} is set to @var{script}, @code{genscripts.sh} will
  186. invoke @file{scripttempl/@var{script}.sc}.
  187. The @file{genscripts.sh} script will invoke the @file{scripttempl}
  188. script 5 to 9 times. Each time it will set the shell variable
  189. @code{LD_FLAG} to a different value. When the linker is run, the
  190. options used will direct it to select a particular script. (Script
  191. selection is controlled by the @code{get_script} emulation entry point;
  192. this describes the conventional behaviour).
  193. The @file{scripttempl} script should just write a linker script, written
  194. in the linker command language, to standard output. If the emulation
  195. name--the name of the @file{emulparams} file without the @file{.sc}
  196. extension--is @var{emul}, then the output will be directed to
  197. @file{ldscripts/@var{emul}.@var{extension}} in the build directory,
  198. where @var{extension} changes each time the @file{scripttempl} script is
  199. invoked.
  200. Here is the list of values assigned to @code{LD_FLAG}.
  201. @table @code
  202. @item (empty)
  203. The script generated is used by default (when none of the following
  204. cases apply). The output has an extension of @file{.x}.
  205. @item n
  206. The script generated is used when the linker is invoked with the
  207. @code{-n} option. The output has an extension of @file{.xn}.
  208. @item N
  209. The script generated is used when the linker is invoked with the
  210. @code{-N} option. The output has an extension of @file{.xbn}.
  211. @item r
  212. The script generated is used when the linker is invoked with the
  213. @code{-r} option. The output has an extension of @file{.xr}.
  214. @item u
  215. The script generated is used when the linker is invoked with the
  216. @code{-Ur} option. The output has an extension of @file{.xu}.
  217. @item shared
  218. The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
  219. this value if @code{GENERATE_SHLIB_SCRIPT} is defined in the
  220. @file{emulparams} file. The @file{emultempl} script must arrange to use
  221. this script at the appropriate time, normally when the linker is invoked
  222. with the @code{-shared} option. The output has an extension of
  223. @file{.xs}.
  224. @item c
  225. The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
  226. this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
  227. @file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf}. The
  228. @file{emultempl} script must arrange to use this script at the appropriate
  229. time, normally when the linker is invoked with the @code{-z combreloc}
  230. option. The output has an extension of
  231. @file{.xc}.
  232. @item cshared
  233. The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
  234. this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
  235. @file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf} and
  236. @code{GENERATE_SHLIB_SCRIPT} is defined in the @file{emulparams} file.
  237. The @file{emultempl} script must arrange to use this script at the
  238. appropriate time, normally when the linker is invoked with the @code{-shared
  239. -z combreloc} option. The output has an extension of @file{.xsc}.
  240. @item auto_import
  241. The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
  242. this value if @code{GENERATE_AUTO_IMPORT_SCRIPT} is defined in the
  243. @file{emulparams} file. The @file{emultempl} script must arrange to
  244. use this script at the appropriate time, normally when the linker is
  245. invoked with the @code{--enable-auto-import} option. The output has
  246. an extension of @file{.xa}.
  247. @end table
  248. Besides the shell variables set by the @file{emulparams} script, and the
  249. @code{LD_FLAG} variable, the @file{genscripts.sh} script will set
  250. certain variables for each run of the @file{scripttempl} script.
  251. @table @code
  252. @item RELOCATING
  253. This will be set to a non-empty string when the linker is doing a final
  254. relocation (e.g., all scripts other than @code{-r} and @code{-Ur}).
  255. @item CONSTRUCTING
  256. This will be set to a non-empty string when the linker is building
  257. global constructor and destructor tables (e.g., all scripts other than
  258. @code{-r}).
  259. @item DATA_ALIGNMENT
  260. This will be set to an @code{ALIGN} expression when the output should be
  261. page aligned, or to @samp{.} when generating the @code{-N} script.
  262. @item CREATE_SHLIB
  263. This will be set to a non-empty string when generating a @code{-shared}
  264. script.
  265. @item COMBRELOC
  266. This will be set to a non-empty string when generating @code{-z combreloc}
  267. scripts to a temporary file name which can be used during script generation.
  268. @end table
  269. The conventional way to write a @file{scripttempl} script is to first
  270. set a few shell variables, and then write out a linker script using
  271. @code{cat} with a here document. The linker script will use variable
  272. substitutions, based on the above variables and those set in the
  273. @file{emulparams} script, to control its behaviour.
  274. When there are parts of the @file{scripttempl} script which should only
  275. be run when doing a final relocation, they should be enclosed within a
  276. variable substitution based on @code{RELOCATING}. For example, on many
  277. targets special symbols such as @code{_end} should be defined when doing
  278. a final link. Naturally, those symbols should not be defined when doing
  279. a relocatable link using @code{-r}. The @file{scripttempl} script
  280. could use a construct like this to define those symbols:
  281. @smallexample
  282. $@{RELOCATING+ _end = .;@}
  283. @end smallexample
  284. This will do the symbol assignment only if the @code{RELOCATING}
  285. variable is defined.
  286. The basic job of the linker script is to put the sections in the correct
  287. order, and at the correct memory addresses. For some targets, the
  288. linker script may have to do some other operations.
  289. For example, on most MIPS platforms, the linker is responsible for
  290. defining the special symbol @code{_gp}, used to initialize the
  291. @code{$gp} register. It must be set to the start of the small data
  292. section plus @code{0x8000}. Naturally, it should only be defined when
  293. doing a final relocation. This will typically be done like this:
  294. @smallexample
  295. $@{RELOCATING+ _gp = ALIGN(16) + 0x8000;@}
  296. @end smallexample
  297. This line would appear just before the sections which compose the small
  298. data section (@samp{.sdata}, @samp{.sbss}). All those sections would be
  299. contiguous in memory.
  300. Many COFF systems build constructor tables in the linker script. The
  301. compiler will arrange to output the address of each global constructor
  302. in a @samp{.ctor} section, and the address of each global destructor in
  303. a @samp{.dtor} section (this is done by defining
  304. @code{ASM_OUTPUT_CONSTRUCTOR} and @code{ASM_OUTPUT_DESTRUCTOR} in the
  305. @code{gcc} configuration files). The @code{gcc} runtime support
  306. routines expect the constructor table to be named @code{__CTOR_LIST__}.
  307. They expect it to be a list of words, with the first word being the
  308. count of the number of entries. There should be a trailing zero word.
  309. (Actually, the count may be -1 if the trailing word is present, and the
  310. trailing word may be omitted if the count is correct, but, as the
  311. @code{gcc} behaviour has changed slightly over the years, it is safest
  312. to provide both). Here is a typical way that might be handled in a
  313. @file{scripttempl} file.
  314. @smallexample
  315. $@{CONSTRUCTING+ __CTOR_LIST__ = .;@}
  316. $@{CONSTRUCTING+ LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)@}
  317. $@{CONSTRUCTING+ *(.ctors)@}
  318. $@{CONSTRUCTING+ LONG(0)@}
  319. $@{CONSTRUCTING+ __CTOR_END__ = .;@}
  320. $@{CONSTRUCTING+ __DTOR_LIST__ = .;@}
  321. $@{CONSTRUCTING+ LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)@}
  322. $@{CONSTRUCTING+ *(.dtors)@}
  323. $@{CONSTRUCTING+ LONG(0)@}
  324. $@{CONSTRUCTING+ __DTOR_END__ = .;@}
  325. @end smallexample
  326. The use of @code{CONSTRUCTING} ensures that these linker script commands
  327. will only appear when the linker is supposed to be building the
  328. constructor and destructor tables. This example is written for a target
  329. which uses 4 byte pointers.
  330. Embedded systems often need to set a stack address. This is normally
  331. best done by using the @code{PROVIDE} construct with a default stack
  332. address. This permits the user to easily override the stack address
  333. using the @code{--defsym} option. Here is an example:
  334. @smallexample
  335. $@{RELOCATING+ PROVIDE (__stack = 0x80000000);@}
  336. @end smallexample
  337. The value of the symbol @code{__stack} would then be used in the startup
  338. code to initialize the stack pointer.
  339. @node linker emulations
  340. @section @file{emultempl} scripts
  341. Each linker target uses an @file{emultempl} script to generate the
  342. emulation code. The name of the @file{emultempl} script is set by the
  343. @code{TEMPLATE_NAME} variable in the @file{emulparams} script. If the
  344. @code{TEMPLATE_NAME} variable is not set, the default is
  345. @samp{generic}. If the value of @code{TEMPLATE_NAME} is @var{template},
  346. @file{genscripts.sh} will use @file{emultempl/@var{template}.em}.
  347. Most targets use the generic @file{emultempl} script,
  348. @file{emultempl/generic.em}. A different @file{emultempl} script is
  349. only needed if the linker must support unusual actions, such as linking
  350. against shared libraries.
  351. The @file{emultempl} script is normally written as a simple invocation
  352. of @code{cat} with a here document. The document will use a few
  353. variable substitutions. Typically each function names uses a
  354. substitution involving @code{EMULATION_NAME}, for ease of debugging when
  355. the linker supports multiple emulations.
  356. Every function and variable in the emitted file should be static. The
  357. only globally visible object must be named
  358. @code{ld_@var{EMULATION_NAME}_emulation}, where @var{EMULATION_NAME} is
  359. the name of the emulation set in @file{configure.tgt} (this is also the
  360. name of the @file{emulparams} file without the @file{.sh} extension).
  361. The @file{genscripts.sh} script will set the shell variable
  362. @code{EMULATION_NAME} before invoking the @file{emultempl} script.
  363. The @code{ld_@var{EMULATION_NAME}_emulation} variable must be a
  364. @code{struct ld_emulation_xfer_struct}, as defined in @file{ldemul.h}.
  365. It defines a set of function pointers which are invoked by the linker,
  366. as well as strings for the emulation name (normally set from the shell
  367. variable @code{EMULATION_NAME} and the default BFD target name (normally
  368. set from the shell variable @code{OUTPUT_FORMAT} which is normally set
  369. by the @file{emulparams} file).
  370. The @file{genscripts.sh} script will set the shell variable
  371. @code{COMPILE_IN} when it invokes the @file{emultempl} script for the
  372. default emulation. In this case, the @file{emultempl} script should
  373. include the linker scripts directly, and return them from the
  374. @code{get_scripts} entry point. When the emulation is not the default,
  375. the @code{get_scripts} entry point should just return a file name. See
  376. @file{emultempl/generic.em} for an example of how this is done.
  377. At some point, the linker emulation entry points should be documented.
  378. @node Emulation Walkthrough
  379. @chapter A Walkthrough of a Typical Emulation
  380. This chapter is to help people who are new to the way emulations
  381. interact with the linker, or who are suddenly thrust into the position
  382. of having to work with existing emulations. It will discuss the files
  383. you need to be aware of. It will tell you when the given "hooks" in
  384. the emulation will be called. It will, hopefully, give you enough
  385. information about when and how things happen that you'll be able to
  386. get by. As always, the source is the definitive reference to this.
  387. The starting point for the linker is in @file{ldmain.c} where
  388. @code{main} is defined. The bulk of the code that's emulation
  389. specific will initially be in @code{emultempl/@var{emulation}.em} but
  390. will end up in @code{e@var{emulation}.c} when the build is done.
  391. Most of the work to select and interface with emulations is in
  392. @code{ldemul.h} and @code{ldemul.c}. Specifically, @code{ldemul.h}
  393. defines the @code{ld_emulation_xfer_struct} structure your emulation
  394. exports.
  395. Your emulation file exports a symbol
  396. @code{ld_@var{EMULATION_NAME}_emulation}. If your emulation is
  397. selected (it usually is, since usually there's only one),
  398. @code{ldemul.c} sets the variable @var{ld_emulation} to point to it.
  399. @code{ldemul.c} also defines a number of API functions that interface
  400. to your emulation, like @code{ldemul_after_parse} which simply calls
  401. your @code{ld_@var{EMULATION}_emulation.after_parse} function. For
  402. the rest of this section, the functions will be mentioned, but you
  403. should assume the indirect reference to your emulation also.
  404. We will also skip or gloss over parts of the link process that don't
  405. relate to emulations, like setting up internationalization.
  406. After initialization, @code{main} selects an emulation by pre-scanning
  407. the command-line arguments. It calls @code{ldemul_choose_target} to
  408. choose a target. If you set @code{choose_target} to
  409. @code{ldemul_default_target}, it picks your @code{target_name} by
  410. default.
  411. @code{main} calls @code{ldemul_before_parse}, then @code{parse_args}.
  412. @code{parse_args} calls @code{ldemul_parse_args} for each arg, which
  413. must update the @code{getopt} globals if it recognizes the argument.
  414. If the emulation doesn't recognize it, then parse_args checks to see
  415. if it recognizes it.
  416. Now that the emulation has had access to all its command-line options,
  417. @code{main} calls @code{ldemul_set_symbols}. This can be used for any
  418. initialization that may be affected by options. It is also supposed
  419. to set up any variables needed by the emulation script.
  420. @code{main} now calls @code{ldemul_get_script} to get the emulation
  421. script to use (based on arguments, no doubt, @pxref{Emulations}) and
  422. runs it. While parsing, @code{ldgram.y} may call @code{ldemul_hll} or
  423. @code{ldemul_syslib} to handle the @code{HLL} or @code{SYSLIB}
  424. commands. It may call @code{ldemul_unrecognized_file} if you asked
  425. the linker to link a file it doesn't recognize. It will call
  426. @code{ldemul_recognized_file} for each file it does recognize, in case
  427. the emulation wants to handle some files specially. All the while,
  428. it's loading the files (possibly calling
  429. @code{ldemul_open_dynamic_archive}) and symbols and stuff. After it's
  430. done reading the script, @code{main} calls @code{ldemul_after_parse}.
  431. Use the after-parse hook to set up anything that depends on stuff the
  432. script might have set up, like the entry point.
  433. @code{main} next calls @code{lang_process} in @code{ldlang.c}. This
  434. appears to be the main core of the linking itself, as far as emulation
  435. hooks are concerned(*). It first opens the output file's BFD, calling
  436. @code{ldemul_set_output_arch}, and calls
  437. @code{ldemul_create_output_section_statements} in case you need to use
  438. other means to find or create object files (i.e. shared libraries
  439. found on a path, or fake stub objects). Despite the name, nobody
  440. creates output sections here.
  441. (*) In most cases, the BFD library does the bulk of the actual
  442. linking, handling symbol tables, symbol resolution, relocations, and
  443. building the final output file. See the BFD reference for all the
  444. details. Your emulation is usually concerned more with managing
  445. things at the file and section level, like "put this here, add this
  446. section", etc.
  447. Next, the objects to be linked are opened and BFDs created for them,
  448. and @code{ldemul_after_open} is called. At this point, you have all
  449. the objects and symbols loaded, but none of the data has been placed
  450. yet.
  451. Next comes the Big Linking Thingy (except for the parts BFD does).
  452. All input sections are mapped to output sections according to the
  453. script. If a section doesn't get mapped by default,
  454. @code{ldemul_place_orphan} will get called to figure out where it goes.
  455. Next it figures out the offsets for each section, calling
  456. @code{ldemul_before_allocation} before and
  457. @code{ldemul_after_allocation} after deciding where each input section
  458. ends up in the output sections.
  459. The last part of @code{lang_process} is to figure out all the symbols'
  460. values. After assigning final values to the symbols,
  461. @code{ldemul_finish} is called, and after that, any undefined symbols
  462. are turned into fatal errors.
  463. OK, back to @code{main}, which calls @code{ldwrite} in
  464. @file{ldwrite.c}. @code{ldwrite} calls BFD's final_link, which does
  465. all the relocation fixups and writes the output bfd to disk, and we're
  466. done.
  467. In summary,
  468. @itemize @bullet
  469. @item @code{main()} in @file{ldmain.c}
  470. @item @file{emultempl/@var{EMULATION}.em} has your code
  471. @item @code{ldemul_choose_target} (defaults to your @code{target_name})
  472. @item @code{ldemul_before_parse}
  473. @item Parse argv, calls @code{ldemul_parse_args} for each
  474. @item @code{ldemul_set_symbols}
  475. @item @code{ldemul_get_script}
  476. @item parse script
  477. @itemize @bullet
  478. @item may call @code{ldemul_hll} or @code{ldemul_syslib}
  479. @item may call @code{ldemul_open_dynamic_archive}
  480. @end itemize
  481. @item @code{ldemul_after_parse}
  482. @item @code{lang_process()} in @file{ldlang.c}
  483. @itemize @bullet
  484. @item create @code{output_bfd}
  485. @item @code{ldemul_set_output_arch}
  486. @item @code{ldemul_create_output_section_statements}
  487. @item read objects, create input bfds - all symbols exist, but have no values
  488. @item may call @code{ldemul_unrecognized_file}
  489. @item will call @code{ldemul_recognized_file}
  490. @item @code{ldemul_after_open}
  491. @item map input sections to output sections
  492. @item may call @code{ldemul_place_orphan} for remaining sections
  493. @item @code{ldemul_before_allocation}
  494. @item gives input sections offsets into output sections, places output sections
  495. @item @code{ldemul_after_allocation} - section addresses valid
  496. @item assigns values to symbols
  497. @item @code{ldemul_finish} - symbol values valid
  498. @end itemize
  499. @item output bfd is written to disk
  500. @end itemize
  501. @node Architecture Specific
  502. @chapter Some Architecture Specific Notes
  503. This is the place for notes on the behavior of @code{ld} on
  504. specific platforms. Currently, only Intel x86 is documented (and
  505. of that, only the auto-import behavior for DLLs).
  506. @menu
  507. * ix86:: Intel x86
  508. @end menu
  509. @node ix86
  510. @section Intel x86
  511. @table @emph
  512. @code{ld} can create DLLs that operate with various runtimes available
  513. on a common x86 operating system. These runtimes include native (using
  514. the mingw "platform"), cygwin, and pw.
  515. @item auto-import from DLLs
  516. @enumerate
  517. @item
  518. With this feature on, DLL clients can import variables from DLL
  519. without any concern from their side (for example, without any source
  520. code modifications). Auto-import can be enabled using the
  521. @code{--enable-auto-import} flag, or disabled via the
  522. @code{--disable-auto-import} flag. Auto-import is disabled by default.
  523. @item
  524. This is done completely in bounds of the PE specification (to be fair,
  525. there's a minor violation of the spec at one point, but in practice
  526. auto-import works on all known variants of that common x86 operating
  527. system) So, the resulting DLL can be used with any other PE
  528. compiler/linker.
  529. @item
  530. Auto-import is fully compatible with standard import method, in which
  531. variables are decorated using attribute modifiers. Libraries of either
  532. type may be mixed together.
  533. @item
  534. Overhead (space): 8 bytes per imported symbol, plus 20 for each
  535. reference to it; Overhead (load time): negligible; Overhead
  536. (virtual/physical memory): should be less than effect of DLL
  537. relocation.
  538. @end enumerate
  539. Motivation
  540. The obvious and only way to get rid of dllimport insanity is
  541. to make client access variable directly in the DLL, bypassing
  542. the extra dereference imposed by ordinary DLL runtime linking.
  543. I.e., whenever client contains something like
  544. @code{mov dll_var,%eax,}
  545. address of dll_var in the command should be relocated to point
  546. into loaded DLL. The aim is to make OS loader do so, and than
  547. make ld help with that. Import section of PE made following
  548. way: there's a vector of structures each describing imports
  549. from particular DLL. Each such structure points to two other
  550. parallel vectors: one holding imported names, and one which
  551. will hold address of corresponding imported name. So, the
  552. solution is de-vectorize these structures, making import
  553. locations be sparse and pointing directly into code.
  554. Implementation
  555. For each reference of data symbol to be imported from DLL (to
  556. set of which belong symbols with name <sym>, if __imp_<sym> is
  557. found in implib), the import fixup entry is generated. That
  558. entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3
  559. subsection. Each fixup entry contains pointer to symbol's address
  560. within .text section (marked with __fuN_<sym> symbol, where N is
  561. integer), pointer to DLL name (so, DLL name is referenced by
  562. multiple entries), and pointer to symbol name thunk. Symbol name
  563. thunk is singleton vector (__nm_th_<symbol>) pointing to
  564. IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing
  565. imported name. Here comes that "om the edge" problem mentioned above:
  566. PE specification rambles that name vector (OriginalFirstThunk) should
  567. run in parallel with addresses vector (FirstThunk), i.e. that they
  568. should have same number of elements and terminated with zero. We violate
  569. this, since FirstThunk points directly into machine code. But in
  570. practice, OS loader implemented the sane way: it goes thru
  571. OriginalFirstThunk and puts addresses to FirstThunk, not something
  572. else. It once again should be noted that dll and symbol name
  573. structures are reused across fixup entries and should be there
  574. anyway to support standard import stuff, so sustained overhead is
  575. 20 bytes per reference. Other question is whether having several
  576. IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes,
  577. it is done even by native compiler/linker (libth32's functions are in
  578. fact resident in windows9x kernel32.dll, so if you use it, you have
  579. two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is
  580. whether referencing the same PE structures several times is valid.
  581. The answer is why not, prohibiting that (detecting violation) would
  582. require more work on behalf of loader than not doing it.
  583. @end table
  584. @node GNU Free Documentation License
  585. @chapter GNU Free Documentation License
  586. @include fdl.texi
  587. @contents
  588. @bye